2018-05-31 07:44:49

by Zhen Lei

[permalink] [raw]
Subject: [PATCH 0/7] add non-strict mode support for arm-smmu-v3

In common, a IOMMU unmap operation follow the below steps:
1. remove the mapping in page table of the specified iova range
2. execute tlbi command to invalid the mapping which is cached in TLB
3. wait for the above tlbi operation to be finished
4. free the IOVA resource
5. free the physical memory resource

This maybe a problem when unmap is very frequently, the combination of tlbi
and wait operation will consume a lot of time. A feasible method is put off
tlbi and iova-free operation, when accumulating to a certain number or
reaching a specified time, execute only one tlbi_all command to clean up
TLB, then free the backup IOVAs. Mark as non-strict mode.

But it must be noted that, although the mapping has already been removed in
the page table, it maybe still exist in TLB. And the freed physical memory
may also be reused for others. So a attacker can persistent access to memory
based on the just freed IOVA, to obtain sensible data or corrupt memory. So
the VFIO should always choose the strict mode.

Some may consider put off physical memory free also, that will still follow
strict mode. But for the map_sg cases, the memory allocation is not controlled
by IOMMU APIs, so it is not enforceable.

Fortunately, Intel and AMD have already applied the non-strict mode, and put
queue_iova() operation into the common file dma-iommu.c., and my work is based
on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
unmap, but Intel and AMD IOMMU drivers are not.

Below is the performance data of strict vs non-strict for NVMe device:
Randomly Read IOPS: 146K(strict) vs 573K(non-strict)
Randomly Write IOPS: 143K(strict) vs 513K(non-strict)


Zhen Lei (7):
iommu/dma: fix trival coding style mistake
iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook
iommu: prepare for the non-strict mode support
iommu/amd: make sure TLB to be flushed before IOVA freed
iommu/dma: add support for non-strict mode
iommu/io-pgtable-arm: add support for non-strict mode
iommu/arm-smmu-v3: add support for non-strict mode

drivers/iommu/amd_iommu.c | 2 +-
drivers/iommu/arm-smmu-v3.c | 16 ++++++++++++---
drivers/iommu/arm-smmu.c | 2 +-
drivers/iommu/dma-iommu.c | 41 ++++++++++++++++++++++++++++++--------
drivers/iommu/io-pgtable-arm-v7s.c | 6 +++---
drivers/iommu/io-pgtable-arm.c | 28 ++++++++++++++------------
drivers/iommu/io-pgtable.h | 2 +-
drivers/iommu/ipmmu-vmsa.c | 2 +-
drivers/iommu/msm_iommu.c | 2 +-
drivers/iommu/mtk_iommu.c | 2 +-
drivers/iommu/qcom_iommu.c | 2 +-
include/linux/iommu.h | 5 +++++
12 files changed, 76 insertions(+), 34 deletions(-)

--
1.8.3




2018-05-31 07:44:52

by Zhen Lei

[permalink] [raw]
Subject: [PATCH 4/7] iommu/amd: make sure TLB to be flushed before IOVA freed

Although the mapping has already been removed in the page table, it maybe
still exist in TLB. Suppose the freed IOVAs is reused by others before the
flush operation completed, the new user can not correctly access to its
meomory.

Signed-off-by: Zhen Lei <[email protected]>
---
drivers/iommu/amd_iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 8fb8c73..93aa389 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2402,9 +2402,9 @@ static void __unmap_single(struct dma_ops_domain *dma_dom,
}

if (amd_iommu_unmap_flush) {
- dma_ops_free_iova(dma_dom, dma_addr, pages);
domain_flush_tlb(&dma_dom->domain);
domain_flush_complete(&dma_dom->domain);
+ dma_ops_free_iova(dma_dom, dma_addr, pages);
} else {
pages = __roundup_pow_of_two(pages);
queue_iova(&dma_dom->iovad, dma_addr >> PAGE_SHIFT, pages, 0);
--
1.8.3



2018-05-31 07:44:56

by Zhen Lei

[permalink] [raw]
Subject: [PATCH 7/7] iommu/arm-smmu-v3: add support for non-strict mode

1. Add IOMMU_CAP_NON_STRICT capability.
2. Dynamic choose strict or non-strict mode base on the iommu domain type.

Signed-off-by: Zhen Lei <[email protected]>
---
drivers/iommu/arm-smmu-v3.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 59b3387..25bccbd 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1440,6 +1440,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
return true;
case IOMMU_CAP_NOEXEC:
return true;
+ case IOMMU_CAP_NON_STRICT:
+ return true;
default:
return false;
}
@@ -1767,7 +1769,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
if (!ops)
return 0;

- return ops->unmap(ops, iova, size, IOMMU_STRICT);
+ return ops->unmap(ops, iova, size, IOMMU_DOMAIN_IS_STRICT(domain));
}

static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
@@ -1782,7 +1784,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
{
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;

- if (smmu)
+ if (smmu && IOMMU_DOMAIN_IS_STRICT(domain))
__arm_smmu_tlb_sync(smmu);
}

--
1.8.3



2018-05-31 07:45:10

by Zhen Lei

[permalink] [raw]
Subject: [PATCH 6/7] iommu/io-pgtable-arm: add support for non-strict mode

To support the non-strict mode, now we only tlbi and sync for the strict
mode. But for the non-leaf case, always follow strict mode.

Signed-off-by: Zhen Lei <[email protected]>
---
drivers/iommu/io-pgtable-arm.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e0f52db..1a65b7b 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -287,7 +287,7 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,

static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
unsigned long iova, size_t size, int lvl,
- arm_lpae_iopte *ptep);
+ arm_lpae_iopte *ptep, int strict);

static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
phys_addr_t paddr, arm_lpae_iopte prot,
@@ -329,7 +329,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);

tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
- if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp) != sz))
+ if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp, IOMMU_STRICT) != sz))
return -EINVAL;
}

@@ -526,7 +526,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
unsigned long iova, size_t size,
arm_lpae_iopte blk_pte, int lvl,
- arm_lpae_iopte *ptep)
+ arm_lpae_iopte *ptep, int strict)
{
struct io_pgtable_cfg *cfg = &data->iop.cfg;
arm_lpae_iopte pte, *tablep;
@@ -571,15 +571,17 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
}

if (unmap_idx < 0)
- return __arm_lpae_unmap(data, iova, size, lvl, tablep);
+ return __arm_lpae_unmap(data, iova, size, lvl, tablep, strict);
+
+ if (strict)
+ io_pgtable_tlb_add_flush(&data->iop, iova, size, size, true);

- io_pgtable_tlb_add_flush(&data->iop, iova, size, size, true);
return size;
}

static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
unsigned long iova, size_t size, int lvl,
- arm_lpae_iopte *ptep)
+ arm_lpae_iopte *ptep, int strict)
{
arm_lpae_iopte pte;
struct io_pgtable *iop = &data->iop;
@@ -604,7 +606,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
io_pgtable_tlb_sync(iop);
ptep = iopte_deref(pte, data);
__arm_lpae_free_pgtable(data, lvl + 1, ptep);
- } else {
+ } else if (strict) {
io_pgtable_tlb_add_flush(iop, iova, size, size, true);
}

@@ -615,12 +617,12 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
* minus the part we want to unmap
*/
return arm_lpae_split_blk_unmap(data, iova, size, pte,
- lvl + 1, ptep);
+ lvl + 1, ptep, strict);
}

/* Keep on walkin' */
ptep = iopte_deref(pte, data);
- return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep);
+ return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep, strict);
}

static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
@@ -633,7 +635,7 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
return 0;

- return __arm_lpae_unmap(data, iova, size, lvl, ptep);
+ return __arm_lpae_unmap(data, iova, size, lvl, ptep, strict);
}

static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
--
1.8.3



2018-05-31 07:45:45

by Zhen Lei

[permalink] [raw]
Subject: [PATCH 1/7] iommu/dma: fix trival coding style mistake

The static function iova_reserve_iommu_regions is only called by function
iommu_dma_init_domain, and the 'if (!dev)' check in iommu_dma_init_domain
affect it only, so we can safely move the check into it. I think it looks
more natural.

In addition, the local variable 'ret' is only assigned in the branch of
'if (region->type == IOMMU_RESV_MSI)', so the 'if (ret)' should also only
take effect in the branch, add a brace to enclose it.

No functional changes.

Signed-off-by: Zhen Lei <[email protected]>
---
drivers/iommu/dma-iommu.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ddcbbdb..4e885f7 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -231,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
LIST_HEAD(resv_regions);
int ret = 0;

+ if (!dev)
+ return 0;
+
if (dev_is_pci(dev))
iova_reserve_pci_windows(to_pci_dev(dev), iovad);

@@ -246,11 +249,12 @@ static int iova_reserve_iommu_regions(struct device *dev,
hi = iova_pfn(iovad, region->start + region->length - 1);
reserve_iova(iovad, lo, hi);

- if (region->type == IOMMU_RESV_MSI)
+ if (region->type == IOMMU_RESV_MSI) {
ret = cookie_init_hw_msi_region(cookie, region->start,
region->start + region->length);
- if (ret)
- break;
+ if (ret)
+ break;
+ }
}
iommu_put_resv_regions(dev, &resv_regions);

@@ -308,8 +312,6 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
}

init_iova_domain(iovad, 1UL << order, base_pfn);
- if (!dev)
- return 0;

return iova_reserve_iommu_regions(dev, domain);
}
--
1.8.3



2018-05-31 07:46:00

by Zhen Lei

[permalink] [raw]
Subject: [PATCH 5/7] iommu/dma: add support for non-strict mode

1. Save the related domain pointer in struct iommu_dma_cookie, make iovad
capable call domain->ops->flush_iotlb_all to flush TLB.
2. Define a new iommu capable: IOMMU_CAP_NON_STRICT, which used to indicate
that the iommu domain support non-strict mode.
3. During the iommu domain initialization phase, call capable() to check
whether it support non-strcit mode. If so, call init_iova_flush_queue
to register iovad->flush_cb callback.
4. All unmap(contains iova-free) APIs will finally invoke __iommu_dma_unmap
-->iommu_dma_free_iova. Use iovad->flush_cb to check whether its related
iommu support non-strict mode or not, and call IOMMU_DOMAIN_IS_STRICT to
make sure the IOMMU_DOMAIN_UNMANAGED domain always follow strict mode.

Signed-off-by: Zhen Lei <[email protected]>
---
drivers/iommu/dma-iommu.c | 29 ++++++++++++++++++++++++++---
include/linux/iommu.h | 3 +++
2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4e885f7..2e116d9 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -55,6 +55,7 @@ struct iommu_dma_cookie {
};
struct list_head msi_page_list;
spinlock_t msi_lock;
+ struct iommu_domain *domain;
};

static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
@@ -64,7 +65,8 @@ static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
return PAGE_SIZE;
}

-static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
+static struct iommu_dma_cookie *cookie_alloc(struct iommu_domain *domain,
+ enum iommu_dma_cookie_type type)
{
struct iommu_dma_cookie *cookie;

@@ -73,6 +75,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
spin_lock_init(&cookie->msi_lock);
INIT_LIST_HEAD(&cookie->msi_page_list);
cookie->type = type;
+ cookie->domain = domain;
}
return cookie;
}
@@ -94,7 +97,7 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
if (domain->iova_cookie)
return -EEXIST;

- domain->iova_cookie = cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
+ domain->iova_cookie = cookie_alloc(domain, IOMMU_DMA_IOVA_COOKIE);
if (!domain->iova_cookie)
return -ENOMEM;

@@ -124,7 +127,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
if (domain->iova_cookie)
return -EEXIST;

- cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
+ cookie = cookie_alloc(domain, IOMMU_DMA_MSI_COOKIE);
if (!cookie)
return -ENOMEM;

@@ -261,6 +264,17 @@ static int iova_reserve_iommu_regions(struct device *dev,
return ret;
}

+static void iova_flush_iotlb_all(struct iova_domain *iovad)
+{
+ struct iommu_dma_cookie *cookie;
+ struct iommu_domain *domain;
+
+ cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
+ domain = cookie->domain;
+
+ domain->ops->flush_iotlb_all(domain);
+}
+
/**
* iommu_dma_init_domain - Initialise a DMA mapping domain
* @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
@@ -276,6 +290,7 @@ static int iova_reserve_iommu_regions(struct device *dev,
int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
u64 size, struct device *dev)
{
+ const struct iommu_ops *ops = domain->ops;
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = &cookie->iovad;
unsigned long order, base_pfn, end_pfn;
@@ -313,6 +328,11 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,

init_iova_domain(iovad, 1UL << order, base_pfn);

+ if (ops->capable && ops->capable(IOMMU_CAP_NON_STRICT)) {
+ BUG_ON(!ops->flush_iotlb_all);
+ init_iova_flush_queue(iovad, iova_flush_iotlb_all, NULL);
+ }
+
return iova_reserve_iommu_regions(dev, domain);
}
EXPORT_SYMBOL(iommu_dma_init_domain);
@@ -392,6 +412,9 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
/* The MSI case is only ever cleaning up its most recent allocation */
if (cookie->type == IOMMU_DMA_MSI_COOKIE)
cookie->msi_iova -= size;
+ else if (!IOMMU_DOMAIN_IS_STRICT(cookie->domain) && iovad->flush_cb)
+ queue_iova(iovad, iova_pfn(iovad, iova),
+ size >> iova_shift(iovad), 0);
else
free_iova_fast(iovad, iova_pfn(iovad, iova),
size >> iova_shift(iovad));
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 39b3150..01ff569 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -87,6 +87,8 @@ struct iommu_domain_geometry {
__IOMMU_DOMAIN_DMA_API)

#define IOMMU_STRICT 1
+#define IOMMU_DOMAIN_IS_STRICT(domain) \
+ (domain->type == IOMMU_DOMAIN_UNMANAGED)

struct iommu_domain {
unsigned type;
@@ -103,6 +105,7 @@ enum iommu_cap {
transactions */
IOMMU_CAP_INTR_REMAP, /* IOMMU supports interrupt isolation */
IOMMU_CAP_NOEXEC, /* IOMMU_NOEXEC flag */
+ IOMMU_CAP_NON_STRICT, /* IOMMU supports non-strict mode */
};

/*
--
1.8.3



2018-05-31 07:47:14

by Zhen Lei

[permalink] [raw]
Subject: [PATCH 3/7] iommu: prepare for the non-strict mode support

In common, a IOMMU unmap operation follow the below steps:
1. remove the mapping in page table of the specified iova range
2. execute tlbi command to invalid the mapping which is cached in TLB
3. wait for the above tlbi operation to be finished
4. free the IOVA resource
5. free the physical memory resource

This maybe a problem when unmap is very frequently, the combination of tlbi
and wait operation will consume a lot of time. A feasible method is put off
tlbi and iova-free operation, when accumulating to a certain number or
reaching a specified time, execute only one tlbi_all command to clean up
TLB, then free the backup IOVAs. Mark as non-strict mode.

But it must be noted that, although the mapping has already been removed in
the page table, it maybe still exist in TLB. And the freed physical memory
may also be reused for others. So a attacker can persistent access to memory
based on the just freed IOVA, to obtain sensible data or corrupt memory. So
the VFIO should always choose the strict mode.

This patch just add a new parameter for the unmap operation, to help the
upper functions capable choose which mode to be applied.

No functional changes.

Signed-off-by: Zhen Lei <[email protected]>
---
drivers/iommu/arm-smmu-v3.c | 2 +-
drivers/iommu/arm-smmu.c | 2 +-
drivers/iommu/io-pgtable-arm-v7s.c | 6 +++---
drivers/iommu/io-pgtable-arm.c | 6 +++---
drivers/iommu/io-pgtable.h | 2 +-
drivers/iommu/ipmmu-vmsa.c | 2 +-
drivers/iommu/msm_iommu.c | 2 +-
drivers/iommu/mtk_iommu.c | 2 +-
drivers/iommu/qcom_iommu.c | 2 +-
include/linux/iommu.h | 2 ++
10 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4402187..59b3387 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1767,7 +1767,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
if (!ops)
return 0;

- return ops->unmap(ops, iova, size);
+ return ops->unmap(ops, iova, size, IOMMU_STRICT);
}

static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 69e7c60..253e807 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1249,7 +1249,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
if (!ops)
return 0;

- return ops->unmap(ops, iova, size);
+ return ops->unmap(ops, iova, size, IOMMU_STRICT);
}

static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 10e4a3d..799eced 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -658,7 +658,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
}

static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
- size_t size)
+ size_t size, int strict)
{
struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);

@@ -883,7 +883,7 @@ static int __init arm_v7s_do_selftests(void)
size = 1UL << __ffs(cfg.pgsize_bitmap);
while (i < loopnr) {
iova_start = i * SZ_16M;
- if (ops->unmap(ops, iova_start + size, size) != size)
+ if (ops->unmap(ops, iova_start + size, size, IOMMU_STRICT) != size)
return __FAIL(ops);

/* Remap of partial unmap */
@@ -902,7 +902,7 @@ static int __init arm_v7s_do_selftests(void)
while (i != BITS_PER_LONG) {
size = 1UL << i;

- if (ops->unmap(ops, iova, size) != size)
+ if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
return __FAIL(ops);

if (ops->iova_to_phys(ops, iova + 42))
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 39c2a05..e0f52db 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -624,7 +624,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
}

static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
- size_t size)
+ size_t size, int strict)
{
struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
arm_lpae_iopte *ptep = data->pgd;
@@ -1108,7 +1108,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)

/* Partial unmap */
size = 1UL << __ffs(cfg->pgsize_bitmap);
- if (ops->unmap(ops, SZ_1G + size, size) != size)
+ if (ops->unmap(ops, SZ_1G + size, size, IOMMU_STRICT) != size)
return __FAIL(ops, i);

/* Remap of partial unmap */
@@ -1124,7 +1124,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
while (j != BITS_PER_LONG) {
size = 1UL << j;

- if (ops->unmap(ops, iova, size) != size)
+ if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
return __FAIL(ops, i);

if (ops->iova_to_phys(ops, iova + 42))
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 2df7909..2908806 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -120,7 +120,7 @@ struct io_pgtable_ops {
int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
phys_addr_t paddr, size_t size, int prot);
size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
- size_t size);
+ size_t size, int strict);
phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
unsigned long iova);
};
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 40ae6e8..e6d9e11 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -716,7 +716,7 @@ static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
{
struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);

- return domain->iop->unmap(domain->iop, iova, size);
+ return domain->iop->unmap(domain->iop, iova, size, IOMMU_STRICT);
}

static void ipmmu_iotlb_sync(struct iommu_domain *io_domain)
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 0d33504..180fa3d 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -532,7 +532,7 @@ static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
unsigned long flags;

spin_lock_irqsave(&priv->pgtlock, flags);
- len = priv->iop->unmap(priv->iop, iova, len);
+ len = priv->iop->unmap(priv->iop, iova, len, IOMMU_STRICT);
spin_unlock_irqrestore(&priv->pgtlock, flags);

return len;
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index f2832a1..54661ed 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -386,7 +386,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
size_t unmapsz;

spin_lock_irqsave(&dom->pgtlock, flags);
- unmapsz = dom->iop->unmap(dom->iop, iova, size);
+ unmapsz = dom->iop->unmap(dom->iop, iova, size, IOMMU_STRICT);
spin_unlock_irqrestore(&dom->pgtlock, flags);

return unmapsz;
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 65b9c99..90abde1 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -444,7 +444,7 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
*/
pm_runtime_get_sync(qcom_domain->iommu->dev);
spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
- ret = ops->unmap(ops, iova, size);
+ ret = ops->unmap(ops, iova, size, IOMMU_STRICT);
spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
pm_runtime_put_sync(qcom_domain->iommu->dev);

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 19938ee..39b3150 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -86,6 +86,8 @@ struct iommu_domain_geometry {
#define IOMMU_DOMAIN_DMA (__IOMMU_DOMAIN_PAGING | \
__IOMMU_DOMAIN_DMA_API)

+#define IOMMU_STRICT 1
+
struct iommu_domain {
unsigned type;
const struct iommu_ops *ops;
--
1.8.3



2018-05-31 07:48:02

by Zhen Lei

[permalink] [raw]
Subject: [PATCH 2/7] iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook

.flush_iotlb_all can not just wait for previous tlbi operations to be
completed, but should also invalid all TLBs of the related domain.

Signed-off-by: Zhen Lei <[email protected]>
---
drivers/iommu/arm-smmu-v3.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 1d64710..4402187 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1770,6 +1770,14 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
return ops->unmap(ops, iova, size);
}

+static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
+{
+ struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+ if (smmu_domain->smmu)
+ arm_smmu_tlb_inv_context(smmu_domain);
+}
+
static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
{
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
@@ -1998,7 +2006,7 @@ static void arm_smmu_put_resv_regions(struct device *dev,
.map = arm_smmu_map,
.unmap = arm_smmu_unmap,
.map_sg = default_iommu_map_sg,
- .flush_iotlb_all = arm_smmu_iotlb_sync,
+ .flush_iotlb_all = arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys = arm_smmu_iova_to_phys,
.add_device = arm_smmu_add_device,
--
1.8.3



2018-05-31 11:26:41

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 0/7] add non-strict mode support for arm-smmu-v3

On 31/05/18 08:42, Zhen Lei wrote:
> In common, a IOMMU unmap operation follow the below steps:
> 1. remove the mapping in page table of the specified iova range
> 2. execute tlbi command to invalid the mapping which is cached in TLB
> 3. wait for the above tlbi operation to be finished
> 4. free the IOVA resource
> 5. free the physical memory resource
>
> This maybe a problem when unmap is very frequently, the combination of tlbi
> and wait operation will consume a lot of time. A feasible method is put off
> tlbi and iova-free operation, when accumulating to a certain number or
> reaching a specified time, execute only one tlbi_all command to clean up
> TLB, then free the backup IOVAs. Mark as non-strict mode.
>
> But it must be noted that, although the mapping has already been removed in
> the page table, it maybe still exist in TLB. And the freed physical memory
> may also be reused for others. So a attacker can persistent access to memory
> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
> the VFIO should always choose the strict mode.
>
> Some may consider put off physical memory free also, that will still follow
> strict mode. But for the map_sg cases, the memory allocation is not controlled
> by IOMMU APIs, so it is not enforceable.
>
> Fortunately, Intel and AMD have already applied the non-strict mode, and put
> queue_iova() operation into the common file dma-iommu.c., and my work is based
> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
> unmap, but Intel and AMD IOMMU drivers are not.
>
> Below is the performance data of strict vs non-strict for NVMe device:
> Randomly Read IOPS: 146K(strict) vs 573K(non-strict)
> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)

What hardware is this on? If it's SMMUv3 without MSIs (e.g. D05), then
you'll still be using the rubbish globally-blocking sync implementation.
If that is the case, I'd be very interested to see how much there is to
gain from just improving that - I've had a patch kicking around for a
while[1] (also on a rebased branch at [2]), but don't have the means for
serious performance testing.

Robin.

[1]
https://www.mail-archive.com/[email protected]/msg20576.html
[2] git://linux-arm.org/linux-rm iommu/smmu

>
>
> Zhen Lei (7):
> iommu/dma: fix trival coding style mistake
> iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook
> iommu: prepare for the non-strict mode support
> iommu/amd: make sure TLB to be flushed before IOVA freed
> iommu/dma: add support for non-strict mode
> iommu/io-pgtable-arm: add support for non-strict mode
> iommu/arm-smmu-v3: add support for non-strict mode
>
> drivers/iommu/amd_iommu.c | 2 +-
> drivers/iommu/arm-smmu-v3.c | 16 ++++++++++++---
> drivers/iommu/arm-smmu.c | 2 +-
> drivers/iommu/dma-iommu.c | 41 ++++++++++++++++++++++++++++++--------
> drivers/iommu/io-pgtable-arm-v7s.c | 6 +++---
> drivers/iommu/io-pgtable-arm.c | 28 ++++++++++++++------------
> drivers/iommu/io-pgtable.h | 2 +-
> drivers/iommu/ipmmu-vmsa.c | 2 +-
> drivers/iommu/msm_iommu.c | 2 +-
> drivers/iommu/mtk_iommu.c | 2 +-
> drivers/iommu/qcom_iommu.c | 2 +-
> include/linux/iommu.h | 5 +++++
> 12 files changed, 76 insertions(+), 34 deletions(-)
>

2018-05-31 13:04:49

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 1/7] iommu/dma: fix trival coding style mistake

On 31/05/18 08:42, Zhen Lei wrote:
> The static function iova_reserve_iommu_regions is only called by function
> iommu_dma_init_domain, and the 'if (!dev)' check in iommu_dma_init_domain
> affect it only, so we can safely move the check into it. I think it looks
> more natural.

As before, I disagree - the logic of iommu_dma_init_domain() is "we
expect to have a valid device, but stop here if we don't", and moving
the check just needlessly obfuscates that. It is not a coincidence that
the arguments of both functions are in effective order of importance.

> In addition, the local variable 'ret' is only assigned in the branch of
> 'if (region->type == IOMMU_RESV_MSI)', so the 'if (ret)' should also only
> take effect in the branch, add a brace to enclose it.

'ret' is clearly also assigned at its declaration, to cover the (very
likely) case where we don't enter the loop at all. Thus testing it in
the loop is harmless, and cluttering that up with extra tabs and braces
is just noise.

Robin.

> No functional changes.
>
> Signed-off-by: Zhen Lei <[email protected]>
> ---
> drivers/iommu/dma-iommu.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index ddcbbdb..4e885f7 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -231,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
> LIST_HEAD(resv_regions);
> int ret = 0;
>
> + if (!dev)
> + return 0;
> +
> if (dev_is_pci(dev))
> iova_reserve_pci_windows(to_pci_dev(dev), iovad);
>
> @@ -246,11 +249,12 @@ static int iova_reserve_iommu_regions(struct device *dev,
> hi = iova_pfn(iovad, region->start + region->length - 1);
> reserve_iova(iovad, lo, hi);
>
> - if (region->type == IOMMU_RESV_MSI)
> + if (region->type == IOMMU_RESV_MSI) {
> ret = cookie_init_hw_msi_region(cookie, region->start,
> region->start + region->length);
> - if (ret)
> - break;
> + if (ret)
> + break;
> + }
> }
> iommu_put_resv_regions(dev, &resv_regions);
>
> @@ -308,8 +312,6 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
> }
>
> init_iova_domain(iovad, 1UL << order, base_pfn);
> - if (!dev)
> - return 0;
>
> return iova_reserve_iommu_regions(dev, domain);
> }
>

2018-05-31 13:05:53

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 3/7] iommu: prepare for the non-strict mode support

On 31/05/18 08:42, Zhen Lei wrote:
> In common, a IOMMU unmap operation follow the below steps:
> 1. remove the mapping in page table of the specified iova range
> 2. execute tlbi command to invalid the mapping which is cached in TLB
> 3. wait for the above tlbi operation to be finished
> 4. free the IOVA resource
> 5. free the physical memory resource
>
> This maybe a problem when unmap is very frequently, the combination of tlbi
> and wait operation will consume a lot of time. A feasible method is put off
> tlbi and iova-free operation, when accumulating to a certain number or
> reaching a specified time, execute only one tlbi_all command to clean up
> TLB, then free the backup IOVAs. Mark as non-strict mode.
>
> But it must be noted that, although the mapping has already been removed in
> the page table, it maybe still exist in TLB. And the freed physical memory
> may also be reused for others. So a attacker can persistent access to memory
> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
> the VFIO should always choose the strict mode.
>
> This patch just add a new parameter for the unmap operation, to help the
> upper functions capable choose which mode to be applied.

This seems like it might be better handled by a flag in
io_pgtable_cfg->quirks. This interface change on its own looks rather
invasive, and teh fact that it ends up only being used to pass through a
constant property of the domain (which is already known by the point
io_pgtable_alloc() is called) implies that it is indeed the wrong level
of abstraction.

> No functional changes.
>
> Signed-off-by: Zhen Lei <[email protected]>
> ---
> drivers/iommu/arm-smmu-v3.c | 2 +-
> drivers/iommu/arm-smmu.c | 2 +-
> drivers/iommu/io-pgtable-arm-v7s.c | 6 +++---
> drivers/iommu/io-pgtable-arm.c | 6 +++---
> drivers/iommu/io-pgtable.h | 2 +-
> drivers/iommu/ipmmu-vmsa.c | 2 +-
> drivers/iommu/msm_iommu.c | 2 +-
> drivers/iommu/mtk_iommu.c | 2 +-
> drivers/iommu/qcom_iommu.c | 2 +-
> include/linux/iommu.h | 2 ++

Plus things specific to io-pgtable shouldn't really be spilling into the
core API header either.

Robin.

> 10 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 4402187..59b3387 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1767,7 +1767,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> if (!ops)
> return 0;
>
> - return ops->unmap(ops, iova, size);
> + return ops->unmap(ops, iova, size, IOMMU_STRICT);
> }
>
> static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 69e7c60..253e807 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1249,7 +1249,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> if (!ops)
> return 0;
>
> - return ops->unmap(ops, iova, size);
> + return ops->unmap(ops, iova, size, IOMMU_STRICT);
> }
>
> static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 10e4a3d..799eced 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -658,7 +658,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
> }
>
> static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
> - size_t size)
> + size_t size, int strict)
> {
> struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
>
> @@ -883,7 +883,7 @@ static int __init arm_v7s_do_selftests(void)
> size = 1UL << __ffs(cfg.pgsize_bitmap);
> while (i < loopnr) {
> iova_start = i * SZ_16M;
> - if (ops->unmap(ops, iova_start + size, size) != size)
> + if (ops->unmap(ops, iova_start + size, size, IOMMU_STRICT) != size)
> return __FAIL(ops);
>
> /* Remap of partial unmap */
> @@ -902,7 +902,7 @@ static int __init arm_v7s_do_selftests(void)
> while (i != BITS_PER_LONG) {
> size = 1UL << i;
>
> - if (ops->unmap(ops, iova, size) != size)
> + if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
> return __FAIL(ops);
>
> if (ops->iova_to_phys(ops, iova + 42))
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 39c2a05..e0f52db 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -624,7 +624,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
> }
>
> static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
> - size_t size)
> + size_t size, int strict)
> {
> struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> arm_lpae_iopte *ptep = data->pgd;
> @@ -1108,7 +1108,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
>
> /* Partial unmap */
> size = 1UL << __ffs(cfg->pgsize_bitmap);
> - if (ops->unmap(ops, SZ_1G + size, size) != size)
> + if (ops->unmap(ops, SZ_1G + size, size, IOMMU_STRICT) != size)
> return __FAIL(ops, i);
>
> /* Remap of partial unmap */
> @@ -1124,7 +1124,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
> while (j != BITS_PER_LONG) {
> size = 1UL << j;
>
> - if (ops->unmap(ops, iova, size) != size)
> + if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
> return __FAIL(ops, i);
>
> if (ops->iova_to_phys(ops, iova + 42))
> diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> index 2df7909..2908806 100644
> --- a/drivers/iommu/io-pgtable.h
> +++ b/drivers/iommu/io-pgtable.h
> @@ -120,7 +120,7 @@ struct io_pgtable_ops {
> int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
> phys_addr_t paddr, size_t size, int prot);
> size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
> - size_t size);
> + size_t size, int strict);
> phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
> unsigned long iova);
> };
> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
> index 40ae6e8..e6d9e11 100644
> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c
> @@ -716,7 +716,7 @@ static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
> {
> struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
>
> - return domain->iop->unmap(domain->iop, iova, size);
> + return domain->iop->unmap(domain->iop, iova, size, IOMMU_STRICT);
> }
>
> static void ipmmu_iotlb_sync(struct iommu_domain *io_domain)
> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
> index 0d33504..180fa3d 100644
> --- a/drivers/iommu/msm_iommu.c
> +++ b/drivers/iommu/msm_iommu.c
> @@ -532,7 +532,7 @@ static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> unsigned long flags;
>
> spin_lock_irqsave(&priv->pgtlock, flags);
> - len = priv->iop->unmap(priv->iop, iova, len);
> + len = priv->iop->unmap(priv->iop, iova, len, IOMMU_STRICT);
> spin_unlock_irqrestore(&priv->pgtlock, flags);
>
> return len;
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index f2832a1..54661ed 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -386,7 +386,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
> size_t unmapsz;
>
> spin_lock_irqsave(&dom->pgtlock, flags);
> - unmapsz = dom->iop->unmap(dom->iop, iova, size);
> + unmapsz = dom->iop->unmap(dom->iop, iova, size, IOMMU_STRICT);
> spin_unlock_irqrestore(&dom->pgtlock, flags);
>
> return unmapsz;
> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> index 65b9c99..90abde1 100644
> --- a/drivers/iommu/qcom_iommu.c
> +++ b/drivers/iommu/qcom_iommu.c
> @@ -444,7 +444,7 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> */
> pm_runtime_get_sync(qcom_domain->iommu->dev);
> spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
> - ret = ops->unmap(ops, iova, size);
> + ret = ops->unmap(ops, iova, size, IOMMU_STRICT);
> spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
> pm_runtime_put_sync(qcom_domain->iommu->dev);
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 19938ee..39b3150 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -86,6 +86,8 @@ struct iommu_domain_geometry {
> #define IOMMU_DOMAIN_DMA (__IOMMU_DOMAIN_PAGING | \
> __IOMMU_DOMAIN_DMA_API)
>
> +#define IOMMU_STRICT 1
> +
> struct iommu_domain {
> unsigned type;
> const struct iommu_ops *ops;
>

2018-05-31 13:06:10

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 5/7] iommu/dma: add support for non-strict mode

On 31/05/18 08:42, Zhen Lei wrote:
> 1. Save the related domain pointer in struct iommu_dma_cookie, make iovad
> capable call domain->ops->flush_iotlb_all to flush TLB.
> 2. Define a new iommu capable: IOMMU_CAP_NON_STRICT, which used to indicate
> that the iommu domain support non-strict mode.
> 3. During the iommu domain initialization phase, call capable() to check
> whether it support non-strcit mode. If so, call init_iova_flush_queue
> to register iovad->flush_cb callback.
> 4. All unmap(contains iova-free) APIs will finally invoke __iommu_dma_unmap
> -->iommu_dma_free_iova. Use iovad->flush_cb to check whether its related
> iommu support non-strict mode or not, and call IOMMU_DOMAIN_IS_STRICT to
> make sure the IOMMU_DOMAIN_UNMANAGED domain always follow strict mode.

Once again, this is a whole load of complexity for a property which
could just be statically encoded at allocation, e.g. in the cookie type.

> Signed-off-by: Zhen Lei <[email protected]>
> ---
> drivers/iommu/dma-iommu.c | 29 ++++++++++++++++++++++++++---
> include/linux/iommu.h | 3 +++
> 2 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 4e885f7..2e116d9 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -55,6 +55,7 @@ struct iommu_dma_cookie {
> };
> struct list_head msi_page_list;
> spinlock_t msi_lock;
> + struct iommu_domain *domain;
> };
>
> static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
> @@ -64,7 +65,8 @@ static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
> return PAGE_SIZE;
> }
>
> -static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
> +static struct iommu_dma_cookie *cookie_alloc(struct iommu_domain *domain,
> + enum iommu_dma_cookie_type type)
> {
> struct iommu_dma_cookie *cookie;
>
> @@ -73,6 +75,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
> spin_lock_init(&cookie->msi_lock);
> INIT_LIST_HEAD(&cookie->msi_page_list);
> cookie->type = type;
> + cookie->domain = domain;
> }
> return cookie;
> }
> @@ -94,7 +97,7 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
> if (domain->iova_cookie)
> return -EEXIST;
>
> - domain->iova_cookie = cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
> + domain->iova_cookie = cookie_alloc(domain, IOMMU_DMA_IOVA_COOKIE);
> if (!domain->iova_cookie)
> return -ENOMEM;
>
> @@ -124,7 +127,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
> if (domain->iova_cookie)
> return -EEXIST;
>
> - cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
> + cookie = cookie_alloc(domain, IOMMU_DMA_MSI_COOKIE);
> if (!cookie)
> return -ENOMEM;
>
> @@ -261,6 +264,17 @@ static int iova_reserve_iommu_regions(struct device *dev,
> return ret;
> }
>
> +static void iova_flush_iotlb_all(struct iova_domain *iovad)

iommu_dma_flush...

> +{
> + struct iommu_dma_cookie *cookie;
> + struct iommu_domain *domain;
> +
> + cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
> + domain = cookie->domain;
> +
> + domain->ops->flush_iotlb_all(domain);
> +}
> +
> /**
> * iommu_dma_init_domain - Initialise a DMA mapping domain
> * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
> @@ -276,6 +290,7 @@ static int iova_reserve_iommu_regions(struct device *dev,
> int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
> u64 size, struct device *dev)
> {
> + const struct iommu_ops *ops = domain->ops;
> struct iommu_dma_cookie *cookie = domain->iova_cookie;
> struct iova_domain *iovad = &cookie->iovad;
> unsigned long order, base_pfn, end_pfn;
> @@ -313,6 +328,11 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>
> init_iova_domain(iovad, 1UL << order, base_pfn);
>
> + if (ops->capable && ops->capable(IOMMU_CAP_NON_STRICT)) {
> + BUG_ON(!ops->flush_iotlb_all);
> + init_iova_flush_queue(iovad, iova_flush_iotlb_all, NULL);
> + }
> +
> return iova_reserve_iommu_regions(dev, domain);
> }
> EXPORT_SYMBOL(iommu_dma_init_domain);
> @@ -392,6 +412,9 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
> /* The MSI case is only ever cleaning up its most recent allocation */
> if (cookie->type == IOMMU_DMA_MSI_COOKIE)
> cookie->msi_iova -= size;
> + else if (!IOMMU_DOMAIN_IS_STRICT(cookie->domain) && iovad->flush_cb)
> + queue_iova(iovad, iova_pfn(iovad, iova),
> + size >> iova_shift(iovad), 0);
> else
> free_iova_fast(iovad, iova_pfn(iovad, iova),
> size >> iova_shift(iovad));
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 39b3150..01ff569 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -87,6 +87,8 @@ struct iommu_domain_geometry {
> __IOMMU_DOMAIN_DMA_API)
>
> #define IOMMU_STRICT 1
> +#define IOMMU_DOMAIN_IS_STRICT(domain) \
> + (domain->type == IOMMU_DOMAIN_UNMANAGED)
>
> struct iommu_domain {
> unsigned type;
> @@ -103,6 +105,7 @@ enum iommu_cap {
> transactions */
> IOMMU_CAP_INTR_REMAP, /* IOMMU supports interrupt isolation */
> IOMMU_CAP_NOEXEC, /* IOMMU_NOEXEC flag */
> + IOMMU_CAP_NON_STRICT, /* IOMMU supports non-strict mode */

This isn't a property of the IOMMU, it depends purely on the driver
implementation. I think it also doesn't matter anyway - if a caller asks
for lazy unmapping on their domain but the IOMMU driver just does strict
unmaps anyway because that's all it supports, there's no actual harm done.

Robin.

> };
>
> /*
>

2018-05-31 13:06:38

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 4/7] iommu/amd: make sure TLB to be flushed before IOVA freed

On 31/05/18 08:42, Zhen Lei wrote:
> Although the mapping has already been removed in the page table, it maybe
> still exist in TLB. Suppose the freed IOVAs is reused by others before the
> flush operation completed, the new user can not correctly access to its
> meomory.

This change seems reasonable in isolation, but why is it right in the
middle of a series which has nothing to do with x86?

Robin.

> Signed-off-by: Zhen Lei <[email protected]>
> ---
> drivers/iommu/amd_iommu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 8fb8c73..93aa389 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2402,9 +2402,9 @@ static void __unmap_single(struct dma_ops_domain *dma_dom,
> }
>
> if (amd_iommu_unmap_flush) {
> - dma_ops_free_iova(dma_dom, dma_addr, pages);
> domain_flush_tlb(&dma_dom->domain);
> domain_flush_complete(&dma_dom->domain);
> + dma_ops_free_iova(dma_dom, dma_addr, pages);
> } else {
> pages = __roundup_pow_of_two(pages);
> queue_iova(&dma_dom->iovad, dma_addr >> PAGE_SHIFT, pages, 0);
>

2018-05-31 13:50:44

by Hanjun Guo

[permalink] [raw]
Subject: Re: [PATCH 0/7] add non-strict mode support for arm-smmu-v3

Hi Robin,

On 2018/5/31 19:24, Robin Murphy wrote:
> On 31/05/18 08:42, Zhen Lei wrote:
>> In common, a IOMMU unmap operation follow the below steps:
>> 1. remove the mapping in page table of the specified iova range
>> 2. execute tlbi command to invalid the mapping which is cached in TLB
>> 3. wait for the above tlbi operation to be finished
>> 4. free the IOVA resource
>> 5. free the physical memory resource
>>
>> This maybe a problem when unmap is very frequently, the combination of tlbi
>> and wait operation will consume a lot of time. A feasible method is put off
>> tlbi and iova-free operation, when accumulating to a certain number or
>> reaching a specified time, execute only one tlbi_all command to clean up
>> TLB, then free the backup IOVAs. Mark as non-strict mode.
>>
>> But it must be noted that, although the mapping has already been removed in
>> the page table, it maybe still exist in TLB. And the freed physical memory
>> may also be reused for others. So a attacker can persistent access to memory
>> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
>> the VFIO should always choose the strict mode.
>>
>> Some may consider put off physical memory free also, that will still follow
>> strict mode. But for the map_sg cases, the memory allocation is not controlled
>> by IOMMU APIs, so it is not enforceable.
>>
>> Fortunately, Intel and AMD have already applied the non-strict mode, and put
>> queue_iova() operation into the common file dma-iommu.c., and my work is based
>> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
>> unmap, but Intel and AMD IOMMU drivers are not.
>>
>> Below is the performance data of strict vs non-strict for NVMe device:
>> Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)
>> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)
>
> What hardware is this on? If it's SMMUv3 without MSIs (e.g. D05), then you'll still be using the rubbish globally-blocking sync implementation. If that is the case, I'd be very interested to see how much there is to gain from just improving that - I've had a patch kicking around for a while[1] (also on a rebased branch at [2]), but don't have the means for serious performance testing.

The hardware is the new D06 which the SMMU with MSIs,
it's not D05 :)

Thanks
Hanjun


2018-05-31 14:26:12

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 0/7] add non-strict mode support for arm-smmu-v3

On 31/05/18 14:49, Hanjun Guo wrote:
> Hi Robin,
>
> On 2018/5/31 19:24, Robin Murphy wrote:
>> On 31/05/18 08:42, Zhen Lei wrote:
>>> In common, a IOMMU unmap operation follow the below steps:
>>> 1. remove the mapping in page table of the specified iova range
>>> 2. execute tlbi command to invalid the mapping which is cached in TLB
>>> 3. wait for the above tlbi operation to be finished
>>> 4. free the IOVA resource
>>> 5. free the physical memory resource
>>>
>>> This maybe a problem when unmap is very frequently, the combination of tlbi
>>> and wait operation will consume a lot of time. A feasible method is put off
>>> tlbi and iova-free operation, when accumulating to a certain number or
>>> reaching a specified time, execute only one tlbi_all command to clean up
>>> TLB, then free the backup IOVAs. Mark as non-strict mode.
>>>
>>> But it must be noted that, although the mapping has already been removed in
>>> the page table, it maybe still exist in TLB. And the freed physical memory
>>> may also be reused for others. So a attacker can persistent access to memory
>>> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
>>> the VFIO should always choose the strict mode.
>>>
>>> Some may consider put off physical memory free also, that will still follow
>>> strict mode. But for the map_sg cases, the memory allocation is not controlled
>>> by IOMMU APIs, so it is not enforceable.
>>>
>>> Fortunately, Intel and AMD have already applied the non-strict mode, and put
>>> queue_iova() operation into the common file dma-iommu.c., and my work is based
>>> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
>>> unmap, but Intel and AMD IOMMU drivers are not.
>>>
>>> Below is the performance data of strict vs non-strict for NVMe device:
>>> Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)
>>> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)
>>
>> What hardware is this on? If it's SMMUv3 without MSIs (e.g. D05), then you'll still be using the rubbish globally-blocking sync implementation. If that is the case, I'd be very interested to see how much there is to gain from just improving that - I've had a patch kicking around for a while[1] (also on a rebased branch at [2]), but don't have the means for serious performance testing.
>
> The hardware is the new D06 which the SMMU with MSIs,

Cool! Now that profiling is fairly useful since we got rid of most of
the locks, are you able to get an idea of how the overhead in the normal
case is distributed between arm_smmu_cmdq_insert_cmd() and
__arm_smmu_sync_poll_msi()? We're always trying to improve our
understanding of where command-queue-related overheads turn out to be in
practice, and there's still potentially room to do nicer things than
TLBI_NH_ALL ;)

Robin.

> it's not D05 :)
>
> Thanks
> Hanjun
>

2018-06-01 06:52:10

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH 0/7] add non-strict mode support for arm-smmu-v3



On 2018/5/31 22:25, Robin Murphy wrote:
> On 31/05/18 14:49, Hanjun Guo wrote:
>> Hi Robin,
>>
>> On 2018/5/31 19:24, Robin Murphy wrote:
>>> On 31/05/18 08:42, Zhen Lei wrote:
>>>> In common, a IOMMU unmap operation follow the below steps:
>>>> 1. remove the mapping in page table of the specified iova range
>>>> 2. execute tlbi command to invalid the mapping which is cached in TLB
>>>> 3. wait for the above tlbi operation to be finished
>>>> 4. free the IOVA resource
>>>> 5. free the physical memory resource
>>>>
>>>> This maybe a problem when unmap is very frequently, the combination of tlbi
>>>> and wait operation will consume a lot of time. A feasible method is put off
>>>> tlbi and iova-free operation, when accumulating to a certain number or
>>>> reaching a specified time, execute only one tlbi_all command to clean up
>>>> TLB, then free the backup IOVAs. Mark as non-strict mode.
>>>>
>>>> But it must be noted that, although the mapping has already been removed in
>>>> the page table, it maybe still exist in TLB. And the freed physical memory
>>>> may also be reused for others. So a attacker can persistent access to memory
>>>> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
>>>> the VFIO should always choose the strict mode.
>>>>
>>>> Some may consider put off physical memory free also, that will still follow
>>>> strict mode. But for the map_sg cases, the memory allocation is not controlled
>>>> by IOMMU APIs, so it is not enforceable.
>>>>
>>>> Fortunately, Intel and AMD have already applied the non-strict mode, and put
>>>> queue_iova() operation into the common file dma-iommu.c., and my work is based
>>>> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
>>>> unmap, but Intel and AMD IOMMU drivers are not.
>>>>
>>>> Below is the performance data of strict vs non-strict for NVMe device:
>>>> Randomly Read IOPS: 146K(strict) vs 573K(non-strict)
>>>> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)
>>>
>>> What hardware is this on? If it's SMMUv3 without MSIs (e.g. D05), then you'll still be using the rubbish globally-blocking sync implementation. If that is the case, I'd be very interested to see how much there is to gain from just improving that - I've had a patch kicking around for a while[1] (also on a rebased branch at [2]), but don't have the means for serious performance testing.
I will try your patch to see how much it can improve. I think the best way
to resovle the globally-blocking sync is that the hardware provide 64bits
CONS regitster, so that it can never be wrapped, and the spinlock can also
be removed.

>>
>> The hardware is the new D06 which the SMMU with MSIs,
>
> Cool! Now that profiling is fairly useful since we got rid of most of the locks, are you able to get an idea of how the overhead in the normal case is distributed between arm_smmu_cmdq_insert_cmd() and __arm_smmu_sync_poll_msi()? We're always trying to improve our understanding of where command-queue-related overheads turn out to be in practice, and there's still potentially room to do nicer things than TLBI_NH_ALL ;)
Even if the software has no overhead, there may still be a problem, because
the smmu need to execute the commands in sequence, especially before
globally-blocking sync has been removed. Base on the actually execute time
of single tlbi and sync, we can get the upper limit in theory.

BTW, I will reply the reset of mail next week. I'm busy with other things now.

>
> Robin.
>
>> it's not D05 :)
>>
>> Thanks
>> Hanjun
>>
>
> .
>

--
Thanks!
BestRegards


2018-06-01 17:53:22

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 5/7] iommu/dma: add support for non-strict mode

Hi Zhen,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.17-rc7 next-20180601]
[cannot apply to iommu/next]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Zhen-Lei/add-non-strict-mode-support-for-arm-smmu-v3/20180602-000418
config: x86_64-randconfig-x008-201821 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All warnings (new ones prefixed by >>):

drivers//iommu/amd_iommu.c: In function 'amd_iommu_capable':
>> drivers//iommu/amd_iommu.c:3053:2: warning: enumeration value 'IOMMU_CAP_NON_STRICT' not handled in switch [-Wswitch]
switch (cap) {
^~~~~~

vim +/IOMMU_CAP_NON_STRICT +3053 drivers//iommu/amd_iommu.c

645c4c8d arch/x86/kernel/amd_iommu.c Joerg Roedel 2008-12-02 3050
ab636481 drivers/iommu/amd_iommu.c Joerg Roedel 2014-09-05 3051 static bool amd_iommu_capable(enum iommu_cap cap)
dbb9fd86 arch/x86/kernel/amd_iommu.c Sheng Yang 2009-03-18 3052 {
80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27 @3053 switch (cap) {
80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27 3054 case IOMMU_CAP_CACHE_COHERENCY:
ab636481 drivers/iommu/amd_iommu.c Joerg Roedel 2014-09-05 3055 return true;
bdddadcb drivers/iommu/amd_iommu.c Joerg Roedel 2012-07-02 3056 case IOMMU_CAP_INTR_REMAP:
ab636481 drivers/iommu/amd_iommu.c Joerg Roedel 2014-09-05 3057 return (irq_remapping_enabled == 1);
cfdeec22 drivers/iommu/amd_iommu.c Will Deacon 2014-10-27 3058 case IOMMU_CAP_NOEXEC:
cfdeec22 drivers/iommu/amd_iommu.c Will Deacon 2014-10-27 3059 return false;
80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27 3060 }
80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27 3061
ab636481 drivers/iommu/amd_iommu.c Joerg Roedel 2014-09-05 3062 return false;
dbb9fd86 arch/x86/kernel/amd_iommu.c Sheng Yang 2009-03-18 3063 }
dbb9fd86 arch/x86/kernel/amd_iommu.c Sheng Yang 2009-03-18 3064

:::::: The code at line 3053 was first introduced by commit
:::::: 80a506b8fdcfa868bb53eb740f928217d0966fc1 x86/amd-iommu: Export cache-coherency capability

:::::: TO: Joerg Roedel <[email protected]>
:::::: CC: Joerg Roedel <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (2.59 kB)
.config.gz (27.93 kB)
Download all attachments

2018-06-04 11:06:55

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH 1/7] iommu/dma: fix trival coding style mistake



On 2018/5/31 21:03, Robin Murphy wrote:
> On 31/05/18 08:42, Zhen Lei wrote:
>> The static function iova_reserve_iommu_regions is only called by function
>> iommu_dma_init_domain, and the 'if (!dev)' check in iommu_dma_init_domain
>> affect it only, so we can safely move the check into it. I think it looks
>> more natural.
>
> As before, I disagree - the logic of iommu_dma_init_domain() is "we expect to have a valid device, but stop here if we don't", and moving the check just needlessly obfuscates that. It is not a coincidence that the arguments of both functions are in effective order of importance.
OK

>
>> In addition, the local variable 'ret' is only assigned in the branch of
>> 'if (region->type == IOMMU_RESV_MSI)', so the 'if (ret)' should also only
>> take effect in the branch, add a brace to enclose it.
>
> 'ret' is clearly also assigned at its declaration, to cover the (very likely) case where we don't enter the loop at all. Thus testing it in the loop is harmless, and cluttering that up with extra tabs and braces is just noise.
OK, I will drop this patch in v2

>
> Robin.
>
>> No functional changes.
>>
>> Signed-off-by: Zhen Lei <[email protected]>
>> ---
>> drivers/iommu/dma-iommu.c | 12 +++++++-----
>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index ddcbbdb..4e885f7 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -231,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
>> LIST_HEAD(resv_regions);
>> int ret = 0;
>> + if (!dev)
>> + return 0;
>> +
>> if (dev_is_pci(dev))
>> iova_reserve_pci_windows(to_pci_dev(dev), iovad);
>> @@ -246,11 +249,12 @@ static int iova_reserve_iommu_regions(struct device *dev,
>> hi = iova_pfn(iovad, region->start + region->length - 1);
>> reserve_iova(iovad, lo, hi);
>> - if (region->type == IOMMU_RESV_MSI)
>> + if (region->type == IOMMU_RESV_MSI) {
>> ret = cookie_init_hw_msi_region(cookie, region->start,
>> region->start + region->length);
>> - if (ret)
>> - break;
>> + if (ret)
>> + break;
>> + }
>> }
>> iommu_put_resv_regions(dev, &resv_regions);
>> @@ -308,8 +312,6 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>> }
>> init_iova_domain(iovad, 1UL << order, base_pfn);
>> - if (!dev)
>> - return 0;
>> return iova_reserve_iommu_regions(dev, domain);
>> }
>>
>
> .
>

--
Thanks!
BestRegards


2018-06-04 11:30:10

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH 3/7] iommu: prepare for the non-strict mode support



On 2018/5/31 21:04, Robin Murphy wrote:
> On 31/05/18 08:42, Zhen Lei wrote:
>> In common, a IOMMU unmap operation follow the below steps:
>> 1. remove the mapping in page table of the specified iova range
>> 2. execute tlbi command to invalid the mapping which is cached in TLB
>> 3. wait for the above tlbi operation to be finished
>> 4. free the IOVA resource
>> 5. free the physical memory resource
>>
>> This maybe a problem when unmap is very frequently, the combination of tlbi
>> and wait operation will consume a lot of time. A feasible method is put off
>> tlbi and iova-free operation, when accumulating to a certain number or
>> reaching a specified time, execute only one tlbi_all command to clean up
>> TLB, then free the backup IOVAs. Mark as non-strict mode.
>>
>> But it must be noted that, although the mapping has already been removed in
>> the page table, it maybe still exist in TLB. And the freed physical memory
>> may also be reused for others. So a attacker can persistent access to memory
>> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
>> the VFIO should always choose the strict mode.
>>
>> This patch just add a new parameter for the unmap operation, to help the
>> upper functions capable choose which mode to be applied.
>
> This seems like it might be better handled by a flag in io_pgtable_cfg->quirks. This interface change on its own looks rather invasive, and teh fact that it ends up only being used to pass through a constant property of the domain (which is already known by the point io_pgtable_alloc() is called) implies that it is indeed the wrong level of abstraction.
>
Sound good. Thanks for your suggestion, I will try it in v2.

>> No functional changes.
>>
>> Signed-off-by: Zhen Lei <[email protected]>
>> ---
>> drivers/iommu/arm-smmu-v3.c | 2 +-
>> drivers/iommu/arm-smmu.c | 2 +-
>> drivers/iommu/io-pgtable-arm-v7s.c | 6 +++---
>> drivers/iommu/io-pgtable-arm.c | 6 +++---
>> drivers/iommu/io-pgtable.h | 2 +-
>> drivers/iommu/ipmmu-vmsa.c | 2 +-
>> drivers/iommu/msm_iommu.c | 2 +-
>> drivers/iommu/mtk_iommu.c | 2 +-
>> drivers/iommu/qcom_iommu.c | 2 +-
>> include/linux/iommu.h | 2 ++
>
> Plus things specific to io-pgtable shouldn't really be spilling into the core API header either.
>
> Robin.
>
>> 10 files changed, 15 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 4402187..59b3387 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -1767,7 +1767,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> if (!ops)
>> return 0;
>> - return ops->unmap(ops, iova, size);
>> + return ops->unmap(ops, iova, size, IOMMU_STRICT);
>> }
>> static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 69e7c60..253e807 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -1249,7 +1249,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> if (!ops)
>> return 0;
>> - return ops->unmap(ops, iova, size);
>> + return ops->unmap(ops, iova, size, IOMMU_STRICT);
>> }
>> static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
>> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
>> index 10e4a3d..799eced 100644
>> --- a/drivers/iommu/io-pgtable-arm-v7s.c
>> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
>> @@ -658,7 +658,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
>> }
>> static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>> - size_t size)
>> + size_t size, int strict)
>> {
>> struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
>> @@ -883,7 +883,7 @@ static int __init arm_v7s_do_selftests(void)
>> size = 1UL << __ffs(cfg.pgsize_bitmap);
>> while (i < loopnr) {
>> iova_start = i * SZ_16M;
>> - if (ops->unmap(ops, iova_start + size, size) != size)
>> + if (ops->unmap(ops, iova_start + size, size, IOMMU_STRICT) != size)
>> return __FAIL(ops);
>> /* Remap of partial unmap */
>> @@ -902,7 +902,7 @@ static int __init arm_v7s_do_selftests(void)
>> while (i != BITS_PER_LONG) {
>> size = 1UL << i;
>> - if (ops->unmap(ops, iova, size) != size)
>> + if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
>> return __FAIL(ops);
>> if (ops->iova_to_phys(ops, iova + 42))
>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>> index 39c2a05..e0f52db 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -624,7 +624,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>> }
>> static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>> - size_t size)
>> + size_t size, int strict)
>> {
>> struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>> arm_lpae_iopte *ptep = data->pgd;
>> @@ -1108,7 +1108,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
>> /* Partial unmap */
>> size = 1UL << __ffs(cfg->pgsize_bitmap);
>> - if (ops->unmap(ops, SZ_1G + size, size) != size)
>> + if (ops->unmap(ops, SZ_1G + size, size, IOMMU_STRICT) != size)
>> return __FAIL(ops, i);
>> /* Remap of partial unmap */
>> @@ -1124,7 +1124,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
>> while (j != BITS_PER_LONG) {
>> size = 1UL << j;
>> - if (ops->unmap(ops, iova, size) != size)
>> + if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
>> return __FAIL(ops, i);
>> if (ops->iova_to_phys(ops, iova + 42))
>> diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
>> index 2df7909..2908806 100644
>> --- a/drivers/iommu/io-pgtable.h
>> +++ b/drivers/iommu/io-pgtable.h
>> @@ -120,7 +120,7 @@ struct io_pgtable_ops {
>> int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
>> phys_addr_t paddr, size_t size, int prot);
>> size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
>> - size_t size);
>> + size_t size, int strict);
>> phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
>> unsigned long iova);
>> };
>> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
>> index 40ae6e8..e6d9e11 100644
>> --- a/drivers/iommu/ipmmu-vmsa.c
>> +++ b/drivers/iommu/ipmmu-vmsa.c
>> @@ -716,7 +716,7 @@ static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
>> {
>> struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
>> - return domain->iop->unmap(domain->iop, iova, size);
>> + return domain->iop->unmap(domain->iop, iova, size, IOMMU_STRICT);
>> }
>> static void ipmmu_iotlb_sync(struct iommu_domain *io_domain)
>> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
>> index 0d33504..180fa3d 100644
>> --- a/drivers/iommu/msm_iommu.c
>> +++ b/drivers/iommu/msm_iommu.c
>> @@ -532,7 +532,7 @@ static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>> unsigned long flags;
>> spin_lock_irqsave(&priv->pgtlock, flags);
>> - len = priv->iop->unmap(priv->iop, iova, len);
>> + len = priv->iop->unmap(priv->iop, iova, len, IOMMU_STRICT);
>> spin_unlock_irqrestore(&priv->pgtlock, flags);
>> return len;
>> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
>> index f2832a1..54661ed 100644
>> --- a/drivers/iommu/mtk_iommu.c
>> +++ b/drivers/iommu/mtk_iommu.c
>> @@ -386,7 +386,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
>> size_t unmapsz;
>> spin_lock_irqsave(&dom->pgtlock, flags);
>> - unmapsz = dom->iop->unmap(dom->iop, iova, size);
>> + unmapsz = dom->iop->unmap(dom->iop, iova, size, IOMMU_STRICT);
>> spin_unlock_irqrestore(&dom->pgtlock, flags);
>> return unmapsz;
>> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
>> index 65b9c99..90abde1 100644
>> --- a/drivers/iommu/qcom_iommu.c
>> +++ b/drivers/iommu/qcom_iommu.c
>> @@ -444,7 +444,7 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>> */
>> pm_runtime_get_sync(qcom_domain->iommu->dev);
>> spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
>> - ret = ops->unmap(ops, iova, size);
>> + ret = ops->unmap(ops, iova, size, IOMMU_STRICT);
>> spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
>> pm_runtime_put_sync(qcom_domain->iommu->dev);
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 19938ee..39b3150 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -86,6 +86,8 @@ struct iommu_domain_geometry {
>> #define IOMMU_DOMAIN_DMA (__IOMMU_DOMAIN_PAGING | \
>> __IOMMU_DOMAIN_DMA_API)
>> +#define IOMMU_STRICT 1
>> +
>> struct iommu_domain {
>> unsigned type;
>> const struct iommu_ops *ops;
>>
>
> .
>

--
Thanks!
BestRegards


2018-06-04 11:42:52

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH 4/7] iommu/amd: make sure TLB to be flushed before IOVA freed



On 2018/5/31 21:04, Robin Murphy wrote:
> On 31/05/18 08:42, Zhen Lei wrote:
>> Although the mapping has already been removed in the page table, it maybe
>> still exist in TLB. Suppose the freed IOVAs is reused by others before the
>> flush operation completed, the new user can not correctly access to its
>> meomory.
>
> This change seems reasonable in isolation, but why is it right in the middle of a series which has nothing to do with x86?
Because I described more in the previous patch, which may help this patch to be understood well.

You're right, I will repost this patch separately.

>
> Robin.
>
>> Signed-off-by: Zhen Lei <[email protected]>
>> ---
>> drivers/iommu/amd_iommu.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index 8fb8c73..93aa389 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -2402,9 +2402,9 @@ static void __unmap_single(struct dma_ops_domain *dma_dom,
>> }
>> if (amd_iommu_unmap_flush) {
>> - dma_ops_free_iova(dma_dom, dma_addr, pages);
>> domain_flush_tlb(&dma_dom->domain);
>> domain_flush_complete(&dma_dom->domain);
>> + dma_ops_free_iova(dma_dom, dma_addr, pages);
>> } else {
>> pages = __roundup_pow_of_two(pages);
>> queue_iova(&dma_dom->iovad, dma_addr >> PAGE_SHIFT, pages, 0);
>>
>
> .
>

--
Thanks!
BestRegards


2018-06-04 12:06:18

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH 5/7] iommu/dma: add support for non-strict mode



On 2018/5/31 21:04, Robin Murphy wrote:
> On 31/05/18 08:42, Zhen Lei wrote:
>> 1. Save the related domain pointer in struct iommu_dma_cookie, make iovad
>> capable call domain->ops->flush_iotlb_all to flush TLB.
>> 2. Define a new iommu capable: IOMMU_CAP_NON_STRICT, which used to indicate
>> that the iommu domain support non-strict mode.
>> 3. During the iommu domain initialization phase, call capable() to check
>> whether it support non-strcit mode. If so, call init_iova_flush_queue
>> to register iovad->flush_cb callback.
>> 4. All unmap(contains iova-free) APIs will finally invoke __iommu_dma_unmap
>> -->iommu_dma_free_iova. Use iovad->flush_cb to check whether its related
>> iommu support non-strict mode or not, and call IOMMU_DOMAIN_IS_STRICT to
>> make sure the IOMMU_DOMAIN_UNMANAGED domain always follow strict mode.
>
> Once again, this is a whole load of complexity for a property which could just be statically encoded at allocation, e.g. in the cookie type.
That's right. Pass domain to the static function iommu_dma_free_iova will be better.

>
>> Signed-off-by: Zhen Lei <[email protected]>
>> ---
>> drivers/iommu/dma-iommu.c | 29 ++++++++++++++++++++++++++---
>> include/linux/iommu.h | 3 +++
>> 2 files changed, 29 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 4e885f7..2e116d9 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -55,6 +55,7 @@ struct iommu_dma_cookie {
>> };
>> struct list_head msi_page_list;
>> spinlock_t msi_lock;
>> + struct iommu_domain *domain;
>> };
>> static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
>> @@ -64,7 +65,8 @@ static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
>> return PAGE_SIZE;
>> }
>> -static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
>> +static struct iommu_dma_cookie *cookie_alloc(struct iommu_domain *domain,
>> + enum iommu_dma_cookie_type type)
>> {
>> struct iommu_dma_cookie *cookie;
>> @@ -73,6 +75,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
>> spin_lock_init(&cookie->msi_lock);
>> INIT_LIST_HEAD(&cookie->msi_page_list);
>> cookie->type = type;
>> + cookie->domain = domain;
>> }
>> return cookie;
>> }
>> @@ -94,7 +97,7 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
>> if (domain->iova_cookie)
>> return -EEXIST;
>> - domain->iova_cookie = cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
>> + domain->iova_cookie = cookie_alloc(domain, IOMMU_DMA_IOVA_COOKIE);
>> if (!domain->iova_cookie)
>> return -ENOMEM;
>> @@ -124,7 +127,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
>> if (domain->iova_cookie)
>> return -EEXIST;
>> - cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
>> + cookie = cookie_alloc(domain, IOMMU_DMA_MSI_COOKIE);
>> if (!cookie)
>> return -ENOMEM;
>> @@ -261,6 +264,17 @@ static int iova_reserve_iommu_regions(struct device *dev,
>> return ret;
>> }
>> +static void iova_flush_iotlb_all(struct iova_domain *iovad)
>
> iommu_dma_flush...
OK

>
>> +{
>> + struct iommu_dma_cookie *cookie;
>> + struct iommu_domain *domain;
>> +
>> + cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
>> + domain = cookie->domain;
>> +
>> + domain->ops->flush_iotlb_all(domain);
>> +}
>> +
>> /**
>> * iommu_dma_init_domain - Initialise a DMA mapping domain
>> * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
>> @@ -276,6 +290,7 @@ static int iova_reserve_iommu_regions(struct device *dev,
>> int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>> u64 size, struct device *dev)
>> {
>> + const struct iommu_ops *ops = domain->ops;
>> struct iommu_dma_cookie *cookie = domain->iova_cookie;
>> struct iova_domain *iovad = &cookie->iovad;
>> unsigned long order, base_pfn, end_pfn;
>> @@ -313,6 +328,11 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>> init_iova_domain(iovad, 1UL << order, base_pfn);
>> + if (ops->capable && ops->capable(IOMMU_CAP_NON_STRICT)) {
>> + BUG_ON(!ops->flush_iotlb_all);
>> + init_iova_flush_queue(iovad, iova_flush_iotlb_all, NULL);
>> + }
>> +
>> return iova_reserve_iommu_regions(dev, domain);
>> }
>> EXPORT_SYMBOL(iommu_dma_init_domain);
>> @@ -392,6 +412,9 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
>> /* The MSI case is only ever cleaning up its most recent allocation */
>> if (cookie->type == IOMMU_DMA_MSI_COOKIE)
>> cookie->msi_iova -= size;
>> + else if (!IOMMU_DOMAIN_IS_STRICT(cookie->domain) && iovad->flush_cb)
>> + queue_iova(iovad, iova_pfn(iovad, iova),
>> + size >> iova_shift(iovad), 0);
>> else
>> free_iova_fast(iovad, iova_pfn(iovad, iova),
>> size >> iova_shift(iovad));
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 39b3150..01ff569 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -87,6 +87,8 @@ struct iommu_domain_geometry {
>> __IOMMU_DOMAIN_DMA_API)
>> #define IOMMU_STRICT 1
>> +#define IOMMU_DOMAIN_IS_STRICT(domain) \
>> + (domain->type == IOMMU_DOMAIN_UNMANAGED)
>> struct iommu_domain {
>> unsigned type;
>> @@ -103,6 +105,7 @@ enum iommu_cap {
>> transactions */
>> IOMMU_CAP_INTR_REMAP, /* IOMMU supports interrupt isolation */
>> IOMMU_CAP_NOEXEC, /* IOMMU_NOEXEC flag */
>> + IOMMU_CAP_NON_STRICT, /* IOMMU supports non-strict mode */
>
> This isn't a property of the IOMMU, it depends purely on the driver implementation. I think it also doesn't matter anyway - if a caller asks for lazy unmapping on their domain but the IOMMU driver just does strict unmaps anyway because that's all it supports, there's no actual harm done.
>
> Robin.
>
>> };
>> /*
>>
>
> .
>

--
Thanks!
BestRegards


2018-06-04 12:20:56

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH 5/7] iommu/dma: add support for non-strict mode



On 2018/6/2 1:51, kbuild test robot wrote:
> Hi Zhen,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on linus/master]
> [also build test WARNING on v4.17-rc7 next-20180601]
> [cannot apply to iommu/next]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Zhen-Lei/add-non-strict-mode-support-for-arm-smmu-v3/20180602-000418
> config: x86_64-randconfig-x008-201821 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> All warnings (new ones prefixed by >>):
>
> drivers//iommu/amd_iommu.c: In function 'amd_iommu_capable':
>>> drivers//iommu/amd_iommu.c:3053:2: warning: enumeration value 'IOMMU_CAP_NON_STRICT' not handled in switch [-Wswitch]
> switch (cap) {
> ^~~~~~
>
> vim +/IOMMU_CAP_NON_STRICT +3053 drivers//iommu/amd_iommu.c
>
> 645c4c8d arch/x86/kernel/amd_iommu.c Joerg Roedel 2008-12-02 3050
> ab636481 drivers/iommu/amd_iommu.c Joerg Roedel 2014-09-05 3051 static bool amd_iommu_capable(enum iommu_cap cap)
> dbb9fd86 arch/x86/kernel/amd_iommu.c Sheng Yang 2009-03-18 3052 {
> 80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27 @3053 switch (cap) {
> 80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27 3054 case IOMMU_CAP_CACHE_COHERENCY:
> ab636481 drivers/iommu/amd_iommu.c Joerg Roedel 2014-09-05 3055 return true;
> bdddadcb drivers/iommu/amd_iommu.c Joerg Roedel 2012-07-02 3056 case IOMMU_CAP_INTR_REMAP:
> ab636481 drivers/iommu/amd_iommu.c Joerg Roedel 2014-09-05 3057 return (irq_remapping_enabled == 1);
> cfdeec22 drivers/iommu/amd_iommu.c Will Deacon 2014-10-27 3058 case IOMMU_CAP_NOEXEC:
It seems that it's better to change this to 'default'.

> cfdeec22 drivers/iommu/amd_iommu.c Will Deacon 2014-10-27 3059 return false;
> 80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27 3060 }
> 80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27 3061
> ab636481 drivers/iommu/amd_iommu.c Joerg Roedel 2014-09-05 3062 return false;
> dbb9fd86 arch/x86/kernel/amd_iommu.c Sheng Yang 2009-03-18 3063 }
> dbb9fd86 arch/x86/kernel/amd_iommu.c Sheng Yang 2009-03-18 3064
>
> :::::: The code at line 3053 was first introduced by commit
> :::::: 80a506b8fdcfa868bb53eb740f928217d0966fc1 x86/amd-iommu: Export cache-coherency capability
>
> :::::: TO: Joerg Roedel <[email protected]>
> :::::: CC: Joerg Roedel <[email protected]>
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
>

--
Thanks!
BestRegards


2018-06-10 11:15:29

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH 0/7] add non-strict mode support for arm-smmu-v3



On 2018/6/1 14:50, Leizhen (ThunderTown) wrote:
>
>
> On 2018/5/31 22:25, Robin Murphy wrote:
>> On 31/05/18 14:49, Hanjun Guo wrote:
>>> Hi Robin,
>>>
>>> On 2018/5/31 19:24, Robin Murphy wrote:
>>>> On 31/05/18 08:42, Zhen Lei wrote:
>>>>> In common, a IOMMU unmap operation follow the below steps:
>>>>> 1. remove the mapping in page table of the specified iova range
>>>>> 2. execute tlbi command to invalid the mapping which is cached in TLB
>>>>> 3. wait for the above tlbi operation to be finished
>>>>> 4. free the IOVA resource
>>>>> 5. free the physical memory resource
>>>>>
>>>>> This maybe a problem when unmap is very frequently, the combination of tlbi
>>>>> and wait operation will consume a lot of time. A feasible method is put off
>>>>> tlbi and iova-free operation, when accumulating to a certain number or
>>>>> reaching a specified time, execute only one tlbi_all command to clean up
>>>>> TLB, then free the backup IOVAs. Mark as non-strict mode.
>>>>>
>>>>> But it must be noted that, although the mapping has already been removed in
>>>>> the page table, it maybe still exist in TLB. And the freed physical memory
>>>>> may also be reused for others. So a attacker can persistent access to memory
>>>>> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
>>>>> the VFIO should always choose the strict mode.
>>>>>
>>>>> Some may consider put off physical memory free also, that will still follow
>>>>> strict mode. But for the map_sg cases, the memory allocation is not controlled
>>>>> by IOMMU APIs, so it is not enforceable.
>>>>>
>>>>> Fortunately, Intel and AMD have already applied the non-strict mode, and put
>>>>> queue_iova() operation into the common file dma-iommu.c., and my work is based
>>>>> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
>>>>> unmap, but Intel and AMD IOMMU drivers are not.
>>>>>
>>>>> Below is the performance data of strict vs non-strict for NVMe device:
>>>>> Randomly Read IOPS: 146K(strict) vs 573K(non-strict)
>>>>> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)
>>>>
>>>> What hardware is this on? If it's SMMUv3 without MSIs (e.g. D05), then you'll still be using the rubbish globally-blocking sync implementation. If that is the case, I'd be very interested to see how much there is to gain from just improving that - I've had a patch kicking around for a while[1] (also on a rebased branch at [2]), but don't have the means for serious performance testing.
> I will try your patch to see how much it can improve. I think the best way
Hi Robin,

I applied your patch and got below improvemnet.

Randomly Read IOPS: 146K --> 214K
Randomly Write IOPS: 143K --> 212K

> to resovle the globally-blocking sync is that the hardware provide 64bits
> CONS regitster, so that it can never be wrapped, and the spinlock can also
> be removed.
>
>>>
>>> The hardware is the new D06 which the SMMU with MSIs,
>>
>> Cool! Now that profiling is fairly useful since we got rid of most of the locks, are you able to get an idea of how the overhead in the normal case is distributed between arm_smmu_cmdq_insert_cmd() and __arm_smmu_sync_poll_msi()? We're always trying to improve our understanding of where command-queue-related overheads turn out to be in practice, and there's still potentially room to do nicer things than TLBI_NH_ALL ;)
> Even if the software has no overhead, there may still be a problem, because
> the smmu need to execute the commands in sequence, especially before
> globally-blocking sync has been removed. Base on the actually execute time
> of single tlbi and sync, we can get the upper limit in theory.
>
> BTW, I will reply the reset of mail next week. I'm busy with other things now.
>
>>
>> Robin.
>>
>>> it's not D05 :)
>>>
>>> Thanks
>>> Hanjun
>>>
>>
>> .
>>
>

--
Thanks!
BestRegards