2021-01-07 12:31:34

by Yong Wu (吴勇)

[permalink] [raw]
Subject: [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap

This patchset is to improve tlb flushing performance in iommu_map/unmap
for MediaTek IOMMU.

For iommu_map, currently MediaTek IOMMU use IO_PGTABLE_QUIRK_TLBI_ON_MAP
to do tlb_flush for each a memory chunk. this is so unnecessary. we could
improve it by tlb flushing one time at the end of iommu_map.

For iommu_unmap, currently we have already improve this performance by
gather. But the current gather should take care its granule size. if the
granule size is different, it will do tlb flush and gather again. Our HW
don't care about granule size. thus I gather the range in our file.

After this patchset, we could achieve only tlb flushing once in iommu_map
and iommu_unmap.

Regardless of sg, for each a segment, I did a simple test:

size = 20 * SZ_1M;
/* the worst case, all are 4k mapping. */
ret = iommu_map(domain, 0x5bb02000, 0x123f1000, size, IOMMU_READ);
iommu_unmap(domain, 0x5bb02000, size);

This is the comparing time(unit is us):
original-time after-improve
map-20M 59943 2347
unmap-20M 264 36

This patchset also flush tlb once in the iommu_map_sg case.

patch [1/7][2/7][3/7] are for map while the others are for unmap.

change note:
v4: a. base on v5.11-rc1.
b. Add a little helper _iommu_map.
c. Fix a build fail for tegra-gart.c. I didn't notice there is another place
call gart_iommu_sync_map.
d. Switch gather->end to the read end address("start + end - 1").

v3: https://lore.kernel.org/linux-iommu/[email protected]/#r
Refactor the unmap flow suggested by Robin.

v2: https://lore.kernel.org/linux-iommu/[email protected]/
Refactor all the code.
base on v5.10-rc1.

Yong Wu (7):
iommu: Move iotlb_sync_map out from __iommu_map
iommu: Add iova and size as parameters in iotlb_sync_map
iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
iommu: Switch gather->end to the inclusive end
iommu/io-pgtable: Allow io_pgtable_tlb ops optional
iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once
iommu/mediatek: Remove the tlb-ops for v7s

drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
drivers/iommu/iommu.c | 23 +++++++---
drivers/iommu/mtk_iommu.c | 47 +++++++++------------
drivers/iommu/tegra-gart.c | 7 ++-
include/linux/io-pgtable.h | 8 ++--
include/linux/iommu.h | 7 +--
6 files changed, 52 insertions(+), 42 deletions(-)

--
2.18.0



2021-01-07 12:32:20

by Yong Wu (吴勇)

[permalink] [raw]
Subject: [PATCH v4 1/7] iommu: Move iotlb_sync_map out from __iommu_map

In the end of __iommu_map, It alway call iotlb_sync_map.

This patch moves iotlb_sync_map out from __iommu_map since it is
unnecessary to call this for each sg segment especially iotlb_sync_map
is flush tlb all currently. Add a little helper _iommu_map for this.

Signed-off-by: Yong Wu <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
---
drivers/iommu/iommu.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ffeebda8d6de..c304a6a30d42 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2426,9 +2426,6 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
size -= pgsize;
}

- if (ops->iotlb_sync_map)
- ops->iotlb_sync_map(domain);
-
/* unroll mapping in case something went wrong */
if (ret)
iommu_unmap(domain, orig_iova, orig_size - size);
@@ -2438,18 +2435,31 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
return ret;
}

+static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+{
+ const struct iommu_ops *ops = domain->ops;
+ int ret;
+
+ ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+ if (ret == 0 && ops->iotlb_sync_map)
+ ops->iotlb_sync_map(domain);
+
+ return ret;
+}
+
int iommu_map(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot)
{
might_sleep();
- return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+ return _iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
}
EXPORT_SYMBOL_GPL(iommu_map);

int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot)
{
- return __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
+ return _iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
}
EXPORT_SYMBOL_GPL(iommu_map_atomic);

@@ -2533,6 +2543,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
struct scatterlist *sg, unsigned int nents, int prot,
gfp_t gfp)
{
+ const struct iommu_ops *ops = domain->ops;
size_t len = 0, mapped = 0;
phys_addr_t start;
unsigned int i = 0;
@@ -2563,6 +2574,8 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
sg = sg_next(sg);
}

+ if (ops->iotlb_sync_map)
+ ops->iotlb_sync_map(domain);
return mapped;

out_err:
--
2.18.0

2021-01-07 12:32:21

by Yong Wu (吴勇)

[permalink] [raw]
Subject: [PATCH v4 5/7] iommu/io-pgtable: Allow io_pgtable_tlb ops optional

This patch allows io_pgtable_tlb ops could be null since the IOMMU drivers
may use the tlb ops from iommu framework.

Signed-off-by: Yong Wu <[email protected]>
---
include/linux/io-pgtable.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ea727eb1a1a9..2a5686ca2ba3 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -214,14 +214,16 @@ struct io_pgtable_domain_attr {

static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
{
- iop->cfg.tlb->tlb_flush_all(iop->cookie);
+ if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_all)
+ iop->cfg.tlb->tlb_flush_all(iop->cookie);
}

static inline void
io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
size_t size, size_t granule)
{
- iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
+ if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_walk)
+ iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
}

static inline void
@@ -229,7 +231,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
struct iommu_iotlb_gather * gather, unsigned long iova,
size_t granule)
{
- if (iop->cfg.tlb->tlb_add_page)
+ if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
}

--
2.18.0

2021-01-07 12:32:43

by Yong Wu (吴勇)

[permalink] [raw]
Subject: [PATCH v4 2/7] iommu: Add iova and size as parameters in iotlb_sync_map

iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
mapping. This patch adds iova and size as the parameters in it. then the
IOMMU driver could flush tlb with the whole range once after iova mapping
to improve performance.

Signed-off-by: Yong Wu <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
---
drivers/iommu/iommu.c | 4 ++--
drivers/iommu/tegra-gart.c | 7 +++++--
include/linux/iommu.h | 3 ++-
3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c304a6a30d42..3d099a31ddca 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2443,7 +2443,7 @@ static int _iommu_map(struct iommu_domain *domain, unsigned long iova,

ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
if (ret == 0 && ops->iotlb_sync_map)
- ops->iotlb_sync_map(domain);
+ ops->iotlb_sync_map(domain, iova, size);

return ret;
}
@@ -2575,7 +2575,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
}

if (ops->iotlb_sync_map)
- ops->iotlb_sync_map(domain);
+ ops->iotlb_sync_map(domain, iova, mapped);
return mapped;

out_err:
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index fac720273889..05e8e19b8269 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -261,7 +261,8 @@ static int gart_iommu_of_xlate(struct device *dev,
return 0;
}

-static void gart_iommu_sync_map(struct iommu_domain *domain)
+static void gart_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
+ size_t size)
{
FLUSH_GART_REGS(gart_handle);
}
@@ -269,7 +270,9 @@ static void gart_iommu_sync_map(struct iommu_domain *domain)
static void gart_iommu_sync(struct iommu_domain *domain,
struct iommu_iotlb_gather *gather)
{
- gart_iommu_sync_map(domain);
+ size_t length = gather->end - gather->start;
+
+ gart_iommu_sync_map(domain, gather->start, length);
}

static const struct iommu_ops gart_iommu_ops = {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b3f0e2018c62..9ce0aa9e236b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -246,7 +246,8 @@ struct iommu_ops {
size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
size_t size, struct iommu_iotlb_gather *iotlb_gather);
void (*flush_iotlb_all)(struct iommu_domain *domain);
- void (*iotlb_sync_map)(struct iommu_domain *domain);
+ void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
+ size_t size);
void (*iotlb_sync)(struct iommu_domain *domain,
struct iommu_iotlb_gather *iotlb_gather);
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
--
2.18.0

2021-01-07 12:32:47

by Yong Wu (吴勇)

[permalink] [raw]
Subject: [PATCH v4 3/7] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range

Remove IO_PGTABLE_QUIRK_TLBI_ON_MAP to avoid tlb sync for each a small
chunk memory, Use the new iotlb_sync_map to tlb_sync once for whole the
iova range of iommu_map.

Signed-off-by: Yong Wu <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
---
drivers/iommu/mtk_iommu.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 8e56cec532e7..f579fc21971e 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -322,7 +322,6 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom)
dom->cfg = (struct io_pgtable_cfg) {
.quirks = IO_PGTABLE_QUIRK_ARM_NS |
IO_PGTABLE_QUIRK_NO_PERMS |
- IO_PGTABLE_QUIRK_TLBI_ON_MAP |
IO_PGTABLE_QUIRK_ARM_MTK_EXT,
.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
.ias = 32,
@@ -453,6 +452,14 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
data);
}

+static void mtk_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
+ size_t size)
+{
+ struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
+
+ mtk_iommu_tlb_flush_range_sync(iova, size, size, data);
+}
+
static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
dma_addr_t iova)
{
@@ -539,6 +546,7 @@ static const struct iommu_ops mtk_iommu_ops = {
.unmap = mtk_iommu_unmap,
.flush_iotlb_all = mtk_iommu_flush_iotlb_all,
.iotlb_sync = mtk_iommu_iotlb_sync,
+ .iotlb_sync_map = mtk_iommu_sync_map,
.iova_to_phys = mtk_iommu_iova_to_phys,
.probe_device = mtk_iommu_probe_device,
.release_device = mtk_iommu_release_device,
--
2.18.0

2021-01-07 12:33:38

by Yong Wu (吴勇)

[permalink] [raw]
Subject: [PATCH v4 7/7] iommu/mediatek: Remove the tlb-ops for v7s

Until now, we have already used the tlb operations from iommu framework,
then the tlb operations for v7s can be removed.

Correspondingly, Switch the paramenter "cookie" to the internal structure.

Signed-off-by: Yong Wu <[email protected]>
---
drivers/iommu/mtk_iommu.c | 27 ++++-----------------------
1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d3b8a1649093..86ab577c9520 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -182,10 +182,8 @@ static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom)
return container_of(dom, struct mtk_iommu_domain, domain);
}

-static void mtk_iommu_tlb_flush_all(void *cookie)
+static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data)
{
- struct mtk_iommu_data *data = cookie;
-
for_each_m4u(data) {
writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
data->base + data->plat_data->inv_sel_reg);
@@ -195,9 +193,9 @@ static void mtk_iommu_tlb_flush_all(void *cookie)
}

static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
- size_t granule, void *cookie)
+ size_t granule,
+ struct mtk_iommu_data *data)
{
- struct mtk_iommu_data *data = cookie;
unsigned long flags;
int ret;
u32 tmp;
@@ -219,7 +217,7 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
if (ret) {
dev_warn(data->dev,
"Partial TLB flush timed out, falling back to full flush\n");
- mtk_iommu_tlb_flush_all(cookie);
+ mtk_iommu_tlb_flush_all(data);
}
/* Clear the CPE status */
writel_relaxed(0, data->base + REG_MMU_CPE_DONE);
@@ -227,22 +225,6 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
}
}

-static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather,
- unsigned long iova, size_t granule,
- void *cookie)
-{
- struct mtk_iommu_data *data = cookie;
- struct iommu_domain *domain = &data->m4u_dom->domain;
-
- iommu_iotlb_gather_add_page(domain, gather, iova, granule);
-}
-
-static const struct iommu_flush_ops mtk_iommu_flush_ops = {
- .tlb_flush_all = mtk_iommu_tlb_flush_all,
- .tlb_flush_walk = mtk_iommu_tlb_flush_range_sync,
- .tlb_add_page = mtk_iommu_tlb_flush_page_nosync,
-};
-
static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
{
struct mtk_iommu_data *data = dev_id;
@@ -326,7 +308,6 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom)
.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
.ias = 32,
.oas = 34,
- .tlb = &mtk_iommu_flush_ops,
.iommu_dev = data->dev,
};

--
2.18.0

2021-01-07 12:33:52

by Yong Wu (吴勇)

[permalink] [raw]
Subject: [PATCH v4 4/7] iommu: Switch gather->end to the inclusive end

Currently gather->end is "unsigned long" which may be overflow in
arch32 in the corner case: 0xfff00000 + 0x100000(iova + size).
Although it doesn't affect the size(end - start), it affects the checking
"gather->end < end"

This patch changes this "end" to the real end address
(end = start + size - 1). Correspondingly, update the length to
"end - start + 1".

Fixes: a7d20dc19d9e ("iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes")
Signed-off-by: Yong Wu <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
drivers/iommu/mtk_iommu.c | 2 +-
drivers/iommu/tegra-gart.c | 2 +-
include/linux/iommu.h | 4 ++--
4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8ca7415d785d..c70d6e79f534 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2280,7 +2280,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain,
{
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);

- arm_smmu_tlb_inv_range(gather->start, gather->end - gather->start,
+ arm_smmu_tlb_inv_range(gather->start, gather->end - gather->start + 1,
gather->pgsize, true, smmu_domain);
}

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index f579fc21971e..66a00a2cb445 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -443,7 +443,7 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
struct iommu_iotlb_gather *gather)
{
struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
- size_t length = gather->end - gather->start;
+ size_t length = gather->end - gather->start + 1;

if (gather->start == ULONG_MAX)
return;
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 05e8e19b8269..6f130e51f072 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -270,7 +270,7 @@ static void gart_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
static void gart_iommu_sync(struct iommu_domain *domain,
struct iommu_iotlb_gather *gather)
{
- size_t length = gather->end - gather->start;
+ size_t length = gather->end - gather->start + 1;

gart_iommu_sync_map(domain, gather->start, length);
}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9ce0aa9e236b..ae8eddd4621b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -170,7 +170,7 @@ enum iommu_dev_features {
* struct iommu_iotlb_gather - Range information for a pending IOTLB flush
*
* @start: IOVA representing the start of the range to be flushed
- * @end: IOVA representing the end of the range to be flushed (exclusive)
+ * @end: IOVA representing the end of the range to be flushed (inclusive)
* @pgsize: The interval at which to perform the flush
*
* This structure is intended to be updated by multiple calls to the
@@ -539,7 +539,7 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
struct iommu_iotlb_gather *gather,
unsigned long iova, size_t size)
{
- unsigned long start = iova, end = start + size;
+ unsigned long start = iova, end = start + size - 1;

/*
* If the new page is disjoint from the current range or is mapped at
--
2.18.0

2021-01-07 12:34:31

by Yong Wu (吴勇)

[permalink] [raw]
Subject: [PATCH v4 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

In current iommu_unmap, this code is:

iommu_iotlb_gather_init(&iotlb_gather);
ret = __iommu_unmap(domain, iova, size, &iotlb_gather);
iommu_iotlb_sync(domain, &iotlb_gather);

We could gather the whole iova range in __iommu_unmap, and then do tlb
synchronization in the iommu_iotlb_sync.

This patch implement this, Gather the range in mtk_iommu_unmap.
then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
we don't call iommu_iotlb_gather_add_page since our tlb synchronization
could be regardless of granule size.

In this way, gather->start is impossible ULONG_MAX, remove the checking.

This patch aims to do tlb synchronization *once* in the iommu_unmap.

Signed-off-by: Yong Wu <[email protected]>
---
drivers/iommu/mtk_iommu.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 66a00a2cb445..d3b8a1649093 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -430,7 +430,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
struct iommu_iotlb_gather *gather)
{
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+ unsigned long end = iova + size - 1;

+ if (gather->start > iova)
+ gather->start = iova;
+ if (gather->end < end)
+ gather->end = end;
return dom->iop->unmap(dom->iop, iova, size, gather);
}

@@ -445,9 +450,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
size_t length = gather->end - gather->start + 1;

- if (gather->start == ULONG_MAX)
- return;
-
mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
data);
}
--
2.18.0

2021-01-18 18:54:56

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] iommu: Switch gather->end to the inclusive end

On 2021-01-07 12:29, Yong Wu wrote:
> Currently gather->end is "unsigned long" which may be overflow in
> arch32 in the corner case: 0xfff00000 + 0x100000(iova + size).
> Although it doesn't affect the size(end - start), it affects the checking
> "gather->end < end"
>
> This patch changes this "end" to the real end address
> (end = start + size - 1). Correspondingly, update the length to
> "end - start + 1".
>
> Fixes: a7d20dc19d9e ("iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes")
> Signed-off-by: Yong Wu <[email protected]>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
> drivers/iommu/mtk_iommu.c | 2 +-
> drivers/iommu/tegra-gart.c | 2 +-
> include/linux/iommu.h | 4 ++--
> 4 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 8ca7415d785d..c70d6e79f534 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2280,7 +2280,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain,
> {
> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>
> - arm_smmu_tlb_inv_range(gather->start, gather->end - gather->start,
> + arm_smmu_tlb_inv_range(gather->start, gather->end - gather->start + 1,
> gather->pgsize, true, smmu_domain);
> }
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index f579fc21971e..66a00a2cb445 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -443,7 +443,7 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
> struct iommu_iotlb_gather *gather)
> {
> struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
> - size_t length = gather->end - gather->start;
> + size_t length = gather->end - gather->start + 1;
>
> if (gather->start == ULONG_MAX)
> return;
> diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
> index 05e8e19b8269..6f130e51f072 100644
> --- a/drivers/iommu/tegra-gart.c
> +++ b/drivers/iommu/tegra-gart.c
> @@ -270,7 +270,7 @@ static void gart_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
> static void gart_iommu_sync(struct iommu_domain *domain,
> struct iommu_iotlb_gather *gather)
> {
> - size_t length = gather->end - gather->start;
> + size_t length = gather->end - gather->start + 1;

TBH I don't think there's any need to bother doing precise calculations
on effectively-uninitialised data (this driver doesn't even do
address-based invalidation, let alone use the gather mechanism). In fact
it might make sense to flip things around and define gart_iommu_sync_map
in terms of gart_iommu_sync now just so there's one less unused argument
to make up. However we can always do cleanup on top, and right now I'm
more interested in getting these changes landed, so either way,

Reviewed-by: Robin Murphy <[email protected]>

> gart_iommu_sync_map(domain, gather->start, length);
> }
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 9ce0aa9e236b..ae8eddd4621b 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -170,7 +170,7 @@ enum iommu_dev_features {
> * struct iommu_iotlb_gather - Range information for a pending IOTLB flush
> *
> * @start: IOVA representing the start of the range to be flushed
> - * @end: IOVA representing the end of the range to be flushed (exclusive)
> + * @end: IOVA representing the end of the range to be flushed (inclusive)
> * @pgsize: The interval at which to perform the flush
> *
> * This structure is intended to be updated by multiple calls to the
> @@ -539,7 +539,7 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
> struct iommu_iotlb_gather *gather,
> unsigned long iova, size_t size)
> {
> - unsigned long start = iova, end = start + size;
> + unsigned long start = iova, end = start + size - 1;
>
> /*
> * If the new page is disjoint from the current range or is mapped at
>

2021-01-19 05:01:03

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] iommu/io-pgtable: Allow io_pgtable_tlb ops optional

On 2021-01-07 12:29, Yong Wu wrote:
> This patch allows io_pgtable_tlb ops could be null since the IOMMU drivers
> may use the tlb ops from iommu framework.

For the reasons I gave on v3,

Reviewed-by: Robin Murphy <[email protected]>

> Signed-off-by: Yong Wu <[email protected]>
> ---
> include/linux/io-pgtable.h | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index ea727eb1a1a9..2a5686ca2ba3 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -214,14 +214,16 @@ struct io_pgtable_domain_attr {
>
> static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
> {
> - iop->cfg.tlb->tlb_flush_all(iop->cookie);
> + if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_all)
> + iop->cfg.tlb->tlb_flush_all(iop->cookie);
> }
>
> static inline void
> io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
> size_t size, size_t granule)
> {
> - iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
> + if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_walk)
> + iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
> }
>
> static inline void
> @@ -229,7 +231,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
> struct iommu_iotlb_gather * gather, unsigned long iova,
> size_t granule)
> {
> - if (iop->cfg.tlb->tlb_add_page)
> + if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
> iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
> }
>
>

2021-01-19 05:01:14

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] iommu/mediatek: Remove the tlb-ops for v7s

On 2021-01-07 12:29, Yong Wu wrote:
> Until now, we have already used the tlb operations from iommu framework,
> then the tlb operations for v7s can be removed.
>
> Correspondingly, Switch the paramenter "cookie" to the internal structure.

Reviewed-by: Robin Murphy <[email protected]>

> Signed-off-by: Yong Wu <[email protected]>
> ---
> drivers/iommu/mtk_iommu.c | 27 ++++-----------------------
> 1 file changed, 4 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index d3b8a1649093..86ab577c9520 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -182,10 +182,8 @@ static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom)
> return container_of(dom, struct mtk_iommu_domain, domain);
> }
>
> -static void mtk_iommu_tlb_flush_all(void *cookie)
> +static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data)
> {
> - struct mtk_iommu_data *data = cookie;
> -
> for_each_m4u(data) {
> writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
> data->base + data->plat_data->inv_sel_reg);
> @@ -195,9 +193,9 @@ static void mtk_iommu_tlb_flush_all(void *cookie)
> }
>
> static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
> - size_t granule, void *cookie)
> + size_t granule,
> + struct mtk_iommu_data *data)
> {
> - struct mtk_iommu_data *data = cookie;
> unsigned long flags;
> int ret;
> u32 tmp;
> @@ -219,7 +217,7 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
> if (ret) {
> dev_warn(data->dev,
> "Partial TLB flush timed out, falling back to full flush\n");
> - mtk_iommu_tlb_flush_all(cookie);
> + mtk_iommu_tlb_flush_all(data);
> }
> /* Clear the CPE status */
> writel_relaxed(0, data->base + REG_MMU_CPE_DONE);
> @@ -227,22 +225,6 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
> }
> }
>
> -static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather,
> - unsigned long iova, size_t granule,
> - void *cookie)
> -{
> - struct mtk_iommu_data *data = cookie;
> - struct iommu_domain *domain = &data->m4u_dom->domain;
> -
> - iommu_iotlb_gather_add_page(domain, gather, iova, granule);
> -}
> -
> -static const struct iommu_flush_ops mtk_iommu_flush_ops = {
> - .tlb_flush_all = mtk_iommu_tlb_flush_all,
> - .tlb_flush_walk = mtk_iommu_tlb_flush_range_sync,
> - .tlb_add_page = mtk_iommu_tlb_flush_page_nosync,
> -};
> -
> static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
> {
> struct mtk_iommu_data *data = dev_id;
> @@ -326,7 +308,6 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom)
> .pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
> .ias = 32,
> .oas = 34,
> - .tlb = &mtk_iommu_flush_ops,
> .iommu_dev = data->dev,
> };
>
>

2021-01-19 05:04:10

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

On 2021-01-07 12:29, Yong Wu wrote:
> In current iommu_unmap, this code is:
>
> iommu_iotlb_gather_init(&iotlb_gather);
> ret = __iommu_unmap(domain, iova, size, &iotlb_gather);
> iommu_iotlb_sync(domain, &iotlb_gather);
>
> We could gather the whole iova range in __iommu_unmap, and then do tlb
> synchronization in the iommu_iotlb_sync.
>
> This patch implement this, Gather the range in mtk_iommu_unmap.
> then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
> we don't call iommu_iotlb_gather_add_page since our tlb synchronization
> could be regardless of granule size.
>
> In this way, gather->start is impossible ULONG_MAX, remove the checking.
>
> This patch aims to do tlb synchronization *once* in the iommu_unmap.

Reviewed-by: Robin Murphy <[email protected]>

> Signed-off-by: Yong Wu <[email protected]>
> ---
> drivers/iommu/mtk_iommu.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 66a00a2cb445..d3b8a1649093 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -430,7 +430,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
> struct iommu_iotlb_gather *gather)
> {
> struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> + unsigned long end = iova + size - 1;
>
> + if (gather->start > iova)
> + gather->start = iova;
> + if (gather->end < end)
> + gather->end = end;
> return dom->iop->unmap(dom->iop, iova, size, gather);
> }
>
> @@ -445,9 +450,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
> struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
> size_t length = gather->end - gather->start + 1;
>
> - if (gather->start == ULONG_MAX)
> - return;
> -
> mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
> data);
> }
>

2021-01-22 20:22:45

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap

On Thu, Jan 07, 2021 at 08:29:02PM +0800, Yong Wu wrote:
> This patchset is to improve tlb flushing performance in iommu_map/unmap
> for MediaTek IOMMU.
>
> For iommu_map, currently MediaTek IOMMU use IO_PGTABLE_QUIRK_TLBI_ON_MAP
> to do tlb_flush for each a memory chunk. this is so unnecessary. we could
> improve it by tlb flushing one time at the end of iommu_map.
>
> For iommu_unmap, currently we have already improve this performance by
> gather. But the current gather should take care its granule size. if the
> granule size is different, it will do tlb flush and gather again. Our HW
> don't care about granule size. thus I gather the range in our file.
>
> After this patchset, we could achieve only tlb flushing once in iommu_map
> and iommu_unmap.
>
> Regardless of sg, for each a segment, I did a simple test:
>
> size = 20 * SZ_1M;
> /* the worst case, all are 4k mapping. */
> ret = iommu_map(domain, 0x5bb02000, 0x123f1000, size, IOMMU_READ);
> iommu_unmap(domain, 0x5bb02000, size);
>
> This is the comparing time(unit is us):
> original-time after-improve
> map-20M 59943 2347
> unmap-20M 264 36
>
> This patchset also flush tlb once in the iommu_map_sg case.
>
> patch [1/7][2/7][3/7] are for map while the others are for unmap.
>
> change note:
> v4: a. base on v5.11-rc1.
> b. Add a little helper _iommu_map.
> c. Fix a build fail for tegra-gart.c. I didn't notice there is another place
> call gart_iommu_sync_map.
> d. Switch gather->end to the read end address("start + end - 1").
>
> v3: https://lore.kernel.org/linux-iommu/[email protected]/#r
> Refactor the unmap flow suggested by Robin.
>
> v2: https://lore.kernel.org/linux-iommu/[email protected]/
> Refactor all the code.
> base on v5.10-rc1.
>
> Yong Wu (7):
> iommu: Move iotlb_sync_map out from __iommu_map
> iommu: Add iova and size as parameters in iotlb_sync_map
> iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
> iommu: Switch gather->end to the inclusive end
> iommu/io-pgtable: Allow io_pgtable_tlb ops optional
> iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once
> iommu/mediatek: Remove the tlb-ops for v7s

For the series:

Acked-by: Will Deacon <[email protected]>

Joerg -- how would you like to handle merging this? I suppose either you
could host a separate branch that I could merge if needed, or I could
include this in my pull to you, or something else.

Please let me know what you prefer,

Cheers,

Will

2021-01-27 23:59:04

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap

On Thu, 7 Jan 2021 20:29:02 +0800, Yong Wu wrote:
> This patchset is to improve tlb flushing performance in iommu_map/unmap
> for MediaTek IOMMU.
>
> For iommu_map, currently MediaTek IOMMU use IO_PGTABLE_QUIRK_TLBI_ON_MAP
> to do tlb_flush for each a memory chunk. this is so unnecessary. we could
> improve it by tlb flushing one time at the end of iommu_map.
>
> [...]

After discussion with Joerg, I'll queue this (and hopefully the next posting
of your IOMMU driver) along with the Arm SMMU patches, and then send that
all together.

Applied to will (for-joerg/mtk), thanks!

[1/7] iommu: Move iotlb_sync_map out from __iommu_map
https://git.kernel.org/arm64/c/d8c1df02ac7f
[2/7] iommu: Add iova and size as parameters in iotlb_sync_map
https://git.kernel.org/arm64/c/2ebbd25873ce
[3/7] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
https://git.kernel.org/arm64/c/20143451eff0
[4/7] iommu: Switch gather->end to the inclusive end
https://git.kernel.org/arm64/c/862c3715de8f
[5/7] iommu/io-pgtable: Allow io_pgtable_tlb ops optional
https://git.kernel.org/arm64/c/77e0992aee4e
[6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once
https://git.kernel.org/arm64/c/f21ae3b10084
[7/7] iommu/mediatek: Remove the tlb-ops for v7s
https://git.kernel.org/arm64/c/0954d61a59e3

Cheers,
--
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

2021-02-02 01:12:38

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH v4 1/7] iommu: Move iotlb_sync_map out from __iommu_map

Hi,

On Thu, Jan 7, 2021 at 4:31 AM Yong Wu <[email protected]> wrote:
>
> @@ -2438,18 +2435,31 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
> return ret;
> }
>
> +static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
> + phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
> +{
> + const struct iommu_ops *ops = domain->ops;
> + int ret;
> +
> + ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);

The above is broken. Instead of GFP_KERNEL it should be passing "gfp".


> + if (ret == 0 && ops->iotlb_sync_map)
> + ops->iotlb_sync_map(domain);
> +
> + return ret;
> +}
> +
> int iommu_map(struct iommu_domain *domain, unsigned long iova,
> phys_addr_t paddr, size_t size, int prot)
> {
> might_sleep();
> - return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
> + return _iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
> }
> EXPORT_SYMBOL_GPL(iommu_map);
>
> int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
> phys_addr_t paddr, size_t size, int prot)
> {
> - return __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
> + return _iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);

Specifically the above bug means we drop the "GFP_ATOMIC" here.

It means we trigger a warning, like this (on a downstream kernel with
the patch backported):

BUG: sleeping function called from invalid context at mm/page_alloc.c:4726
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9, name: ksoftirqd/0
CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.4.93-12508-gc10c93e28e39 #1
Call trace:
dump_backtrace+0x0/0x154
show_stack+0x20/0x2c
dump_stack+0xa0/0xfc
___might_sleep+0x11c/0x12c
__might_sleep+0x50/0x84
__alloc_pages_nodemask+0xf8/0x2bc
__arm_lpae_alloc_pages+0x48/0x1b4
__arm_lpae_map+0x124/0x274
__arm_lpae_map+0x1cc/0x274
arm_lpae_map+0x140/0x170
arm_smmu_map+0x78/0xbc
__iommu_map+0xd4/0x210
_iommu_map+0x4c/0x84
iommu_map_atomic+0x44/0x58
__iommu_dma_map+0x8c/0xc4
iommu_dma_map_page+0xac/0xf0

---

A quick (but not very tested) fix at:

https://lore.kernel.org/r/20210201170611.1.I64a7b62579287d668d7c89e105dcedf45d641063@changeid/


-Doug