2019-04-21 01:32:40

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v3 00/10] iommu: Bounce page for untrusted devices

The Thunderbolt vulnerabilities are public and have a nice
name as Thunderclap [1] [3] nowadays. This patch series aims
to mitigate those concerns.

An external PCI device is a PCI peripheral device connected
to the system through an external bus, such as Thunderbolt.
What makes it different is that it can't be trusted to the
same degree as the devices build into the system. Generally,
a trusted PCIe device will DMA into the designated buffers
and not overrun or otherwise write outside the specified
bounds. But it's different for an external device.

The minimum IOMMU mapping granularity is one page (4k), so
for DMA transfers smaller than that a malicious PCIe device
can access the whole page of memory even if it does not
belong to the driver in question. This opens a possibility
for DMA attack. For more information about DMA attacks
imposed by an untrusted PCI/PCIe device, please refer to [2].

This implements bounce buffer for the untrusted external
devices. The transfers should be limited in isolated pages
so the IOMMU window does not cover memory outside of what
the driver expects. Full pages within a buffer could be
directly mapped in IOMMU page table, but for partial pages
we use bounce page instead.

The implementation of bounce buffers for untrusted devices
will cause a little performance overhead, but we didn't see
any user experience problems. The users could use the kernel
parameter defined in the IOMMU driver to remove the performance
overhead if they trust their devices enough.

The first part of this patch series (PATCH1/10 ~ 4/10) extends
the swiotlb APIs to support bounce buffer in page manner. The
second part (PATCH 5/10) introduce the APIs for bounce page:

* iommu_bounce_map(dev, addr, paddr, size, dir, attrs)
- Map a buffer start at DMA address @addr in bounce page
manner. For buffer parts that doesn't cross a whole
minimal IOMMU page, the bounce page policy is applied.
A bounce page mapped by swiotlb will be used as the DMA
target in the IOMMU page table. Otherwise, the physical
address @paddr is mapped instead.

* iommu_bounce_unmap(dev, addr, size, dir, attrs)
- Unmap the buffer mapped with iommu_bounce_map(). The bounce
page will be torn down after the bounced data get synced.

* iommu_bounce_sync_single(dev, addr, size, dir, target)
- Synce the bounced data in case the bounce mapped buffer is
reused.

The third part of this patch series (PATCH 6/10 ~ 10/10) uses
the bounce page APIs to map/unmap/sync DMA buffers for those
untrusted devices in Intel IOMMU driver.

The third part of this patch series depends on a patch set posted
here [4] for discussion. It delegates Intel IOMMU DMA domains to
the iommu generic layer.

The bounce page idea:
Based-on-idea-by: Mika Westerberg <[email protected]>
Based-on-idea-by: Ashok Raj <[email protected]>
Based-on-idea-by: Alan Cox <[email protected]>
Based-on-idea-by: Kevin Tian <[email protected]>

The patch series has been tested by:
Tested-by: Xu Pengfei <[email protected]>
Tested-by: Mika Westerberg <[email protected]>

Reference:
[1] https://thunderclap.io/
[2] https://thunderclap.io/thunderclap-paper-ndss2019.pdf
[3] https://christian.kellner.me/2019/02/27/thunderclap-and-linux/
[4] https://lkml.org/lkml/2019/3/4/644

Best regards,
Lu Baolu

Change log:
v2->v3:
- The previous v2 was posed here:
https://lkml.org/lkml/2019/3/27/157
- Reuse the existing swiotlb APIs for bounce buffer by
extending it to support bounce page.
- Move the bouce page APIs into iommu generic layer.
- This patch series is based on 5.1-rc1.

v1->v2:
- The previous v1 was posted here:
https://lkml.org/lkml/2019/3/12/66
- Refactor the code to remove struct bounce_param;
- During the v1 review cycle, we discussed the possibility
of reusing swiotlb code to avoid code dumplication, but
we found the swiotlb implementations are not ready for the
use of bounce page pool.
https://lkml.org/lkml/2019/3/19/259
- This patch series has been rebased to v5.1-rc2.

Lu Baolu (10):
iommu: Add helper to get minimal page size of domain
swiotlb: Factor out slot allocation and free
swiotlb: Limit tlb address range inside slot pool
swiotlb: Extend swiotlb to support page bounce
iommu: Add bounce page APIs
iommu/vt-d: Add trace events for domain map/unmap
iommu/vt-d: Keep swiotlb on if bounce page is necessary
iommu/vt-d: Check whether device requires bounce buffer
iommu/vt-d: Add dma sync ops for untrusted devices
iommu/vt-d: Use bounce buffer for untrusted devices

.../admin-guide/kernel-parameters.txt | 5 +
drivers/iommu/Kconfig | 15 +
drivers/iommu/Makefile | 1 +
drivers/iommu/intel-iommu.c | 276 ++++++++++++++----
drivers/iommu/intel-trace.c | 14 +
drivers/iommu/iommu.c | 275 +++++++++++++++++
include/linux/dma-mapping.h | 6 +
include/linux/iommu.h | 50 ++++
include/trace/events/intel_iommu.h | 132 +++++++++
kernel/dma/swiotlb.c | 117 ++++++--
10 files changed, 808 insertions(+), 83 deletions(-)
create mode 100644 drivers/iommu/intel-trace.c
create mode 100644 include/trace/events/intel_iommu.h

--
2.17.1


2019-04-21 01:32:40

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v3 08/10] iommu/vt-d: Check whether device requires bounce buffer

This adds a helper to check whether a device needs to
use bounce buffer. It also provides a boot time option
to disable the bounce buffer. Users can use this to
prevent the iommu driver from using the bounce buffer
for performance gain.

Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Tested-by: Xu Pengfei <[email protected]>
Tested-by: Mika Westerberg <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 5 +++++
drivers/iommu/intel-iommu.c | 17 +++++++++++++++++
2 files changed, 22 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2b8ee90bb644..86880eb3fc73 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1726,6 +1726,11 @@
Note that using this option lowers the security
provided by tboot because it makes the system
vulnerable to DMA attacks.
+ nobounce [Default off]
+ Do not use the bounce buffer for untrusted devices like
+ the Thunderbolt devices. This will treat the untrusted
+ devices as the trusted ones, hence might expose security
+ risks of DMA attacks.

intel_idle.max_cstate= [KNL,HW,ACPI,X86]
0 disables intel_idle and fall back on acpi_idle.
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index e4a164324bdd..0d80f26b8a72 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -380,6 +380,7 @@ static int intel_iommu_strict;
static int intel_iommu_superpage = 1;
static int intel_iommu_sm;
static int iommu_identity_mapping;
+static int intel_no_bounce;

#define IDENTMAP_ALL 1
#define IDENTMAP_GFX 2
@@ -396,6 +397,19 @@ EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
static DEFINE_SPINLOCK(device_domain_lock);
static LIST_HEAD(device_domain_list);

+static inline bool device_needs_bounce(struct device *dev)
+{
+ struct pci_dev *pdev = NULL;
+
+ if (intel_no_bounce)
+ return false;
+
+ if (dev_is_pci(dev))
+ pdev = to_pci_dev(dev);
+
+ return pdev ? pdev->untrusted : false;
+}
+
/*
* Iterate over elements in device_domain_list and call the specified
* callback @fn against each element.
@@ -478,6 +492,9 @@ static int __init intel_iommu_setup(char *str)
printk(KERN_INFO
"Intel-IOMMU: not forcing on after tboot. This could expose security risk for tboot\n");
intel_iommu_tboot_noforce = 1;
+ } else if (!strncmp(str, "nobounce", 8)) {
+ pr_info("Intel-IOMMU: No bounce buffer. This could expose security risks of DMA attacks\n");
+ intel_no_bounce = 1;
}

str += strcspn(str, ",");
--
2.17.1

2019-04-21 01:32:40

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v3 04/10] swiotlb: Extend swiotlb to support page bounce

This extends below swiotlb APIs to support page bounce.

- swiotlb_tbl_map_single()
- swiotlb_tbl_unmap_single()

In page bounce manner, swiotlb allocates a whole page from the
slot pool, syncs the data, and returns the start place of the
bounced buffer in the page. The caller is responsible to sync
the data after DMA transfer with swiotlb_tbl_sync_single().

In order to distinguish page bounce from other type of bounces,
this introduces a new DMA attribution bit (DMA_ATTR_BOUNCE_PAGE)
which will be set in the @attrs passed to these APIs.

Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Robin Murphy <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/dma-mapping.h | 6 +++++
kernel/dma/swiotlb.c | 53 ++++++++++++++++++++++++++++++++-----
2 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 75e60be91e5f..26e506e5b04c 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -70,6 +70,12 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)

+/*
+ * DMA_ATTR_BOUNCE_PAGE: used by the IOMMU sub-system to indicate that
+ * the buffer is used as a bounce page in the DMA remapping page table.
+ */
+#define DMA_ATTR_BOUNCE_PAGE (1UL << 10)
+
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform.
* It can be given to a device to use as a DMA source or target. A CPU cannot
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index dbb937ce79c8..96b87a11dee1 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -34,6 +34,7 @@
#include <linux/scatterlist.h>
#include <linux/mem_encrypt.h>
#include <linux/set_memory.h>
+#include <linux/iommu.h>
#ifdef CONFIG_DEBUG_FS
#include <linux/debugfs.h>
#endif
@@ -596,6 +597,14 @@ swiotlb_tbl_free_tlb(struct device *hwdev, phys_addr_t tlb_addr, size_t size)
spin_unlock_irqrestore(&io_tlb_lock, flags);
}

+static unsigned long
+get_iommu_pgsize(struct device *dev, phys_addr_t phys, size_t size)
+{
+ struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+ return domain_minimal_pgsize(domain);
+}
+
phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
dma_addr_t tbl_dma_addr,
phys_addr_t orig_addr, size_t size,
@@ -603,17 +612,37 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
unsigned long attrs)
{
phys_addr_t tlb_addr;
+ unsigned long offset = 0;
+
+ if (attrs & DMA_ATTR_BOUNCE_PAGE) {
+ unsigned long pgsize = get_iommu_pgsize(hwdev, orig_addr, size);
+
+ offset = orig_addr & (pgsize - 1);
+
+ /* Don't allow the buffer to cross page boundary. */
+ if (offset + size > pgsize)
+ return DMA_MAPPING_ERROR;
+
+ tlb_addr = swiotlb_tbl_alloc_tlb(hwdev,
+ __phys_to_dma(hwdev, io_tlb_start),
+ ALIGN_DOWN(orig_addr, pgsize), pgsize);
+ } else {
+ tlb_addr = swiotlb_tbl_alloc_tlb(hwdev,
+ tbl_dma_addr, orig_addr, size);
+ }

- tlb_addr = swiotlb_tbl_alloc_tlb(hwdev, tbl_dma_addr, orig_addr, size);
if (tlb_addr == DMA_MAPPING_ERROR) {
if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n",
size);
- } else if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) {
- swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);
+ return DMA_MAPPING_ERROR;
}

+ tlb_addr += offset;
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+ (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
+ swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);
+
return tlb_addr;
}

@@ -626,6 +655,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
{
int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = io_tlb_orig_addr[index];
+ unsigned long offset = 0;
+
+ if (attrs & DMA_ATTR_BOUNCE_PAGE)
+ offset = tlb_addr & ((1 << IO_TLB_SHIFT) - 1);

/*
* First, sync the memory before unmapping the entry
@@ -633,9 +666,17 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
if (orig_addr != INVALID_PHYS_ADDR &&
!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
- swiotlb_bounce(orig_addr, tlb_addr, size, DMA_FROM_DEVICE);
+ swiotlb_bounce(orig_addr + offset,
+ tlb_addr, size, DMA_FROM_DEVICE);
+
+ if (attrs & DMA_ATTR_BOUNCE_PAGE) {
+ unsigned long pgsize = get_iommu_pgsize(hwdev, tlb_addr, size);

- swiotlb_tbl_free_tlb(hwdev, tlb_addr, size);
+ swiotlb_tbl_free_tlb(hwdev,
+ ALIGN_DOWN(tlb_addr, pgsize), pgsize);
+ } else {
+ swiotlb_tbl_free_tlb(hwdev, tlb_addr, size);
+ }
}

void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
--
2.17.1

2019-04-21 01:32:40

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

This moves slot allocation and free code into two common
functions in order to avoid code duplication. There's no
functional change.

Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Robin Murphy <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
kernel/dma/swiotlb.c | 72 +++++++++++++++++++++++++++++---------------
1 file changed, 47 insertions(+), 25 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 53012db1e53c..173122d16b7f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -439,11 +439,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
}
}

-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
- dma_addr_t tbl_dma_addr,
- phys_addr_t orig_addr, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
+static phys_addr_t
+swiotlb_tbl_alloc_tlb(struct device *hwdev, dma_addr_t tbl_dma_addr,
+ phys_addr_t orig_addr, size_t size)
{
unsigned long flags;
phys_addr_t tlb_addr;
@@ -539,8 +537,6 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,

not_found:
spin_unlock_irqrestore(&io_tlb_lock, flags);
- if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
- dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", size);
return DMA_MAPPING_ERROR;
found:
io_tlb_used += nslots;
@@ -553,32 +549,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
*/
for (i = 0; i < nslots; i++)
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);

return tlb_addr;
}

-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t size, enum dma_data_direction dir,
- unsigned long attrs)
+static void
+swiotlb_tbl_free_tlb(struct device *hwdev, phys_addr_t tlb_addr, size_t size)
{
unsigned long flags;
int i, count, nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
- phys_addr_t orig_addr = io_tlb_orig_addr[index];
-
- /*
- * First, sync the memory before unmapping the entry
- */
- if (orig_addr != INVALID_PHYS_ADDR &&
- !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
- swiotlb_bounce(orig_addr, tlb_addr, size, DMA_FROM_DEVICE);

/*
* Return the buffer to the free list by setting the corresponding
@@ -610,6 +590,48 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
spin_unlock_irqrestore(&io_tlb_lock, flags);
}

+phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
+ dma_addr_t tbl_dma_addr,
+ phys_addr_t orig_addr, size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ phys_addr_t tlb_addr;
+
+ tlb_addr = swiotlb_tbl_alloc_tlb(hwdev, tbl_dma_addr, orig_addr, size);
+ if (tlb_addr == DMA_MAPPING_ERROR) {
+ if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
+ dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n",
+ size);
+ } else if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+ (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) {
+ swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);
+ }
+
+ return tlb_addr;
+}
+
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
+ phys_addr_t orig_addr = io_tlb_orig_addr[index];
+
+ /*
+ * First, sync the memory before unmapping the entry
+ */
+ if (orig_addr != INVALID_PHYS_ADDR &&
+ !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+ ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
+ swiotlb_bounce(orig_addr, tlb_addr, size, DMA_FROM_DEVICE);
+
+ swiotlb_tbl_free_tlb(hwdev, tlb_addr, size);
+}
+
void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir,
enum dma_sync_target target)
--
2.17.1

2019-04-21 01:32:40

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v3 06/10] iommu/vt-d: Add trace events for domain map/unmap

This adds trace support for the Intel IOMMU driver. It
also declares some events which could be used to trace
the events when an IOVA is being mapped or unmapped in
a domain.

Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Mika Westerberg <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/Makefile | 1 +
drivers/iommu/intel-trace.c | 14 +++
include/trace/events/intel_iommu.h | 132 +++++++++++++++++++++++++++++
3 files changed, 147 insertions(+)
create mode 100644 drivers/iommu/intel-trace.c
create mode 100644 include/trace/events/intel_iommu.h

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 8c71a15e986b..8b5fb8051281 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o
obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
obj-$(CONFIG_DMAR_TABLE) += dmar.o
obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
+obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o
obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += intel-iommu-debugfs.o
obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o
obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
diff --git a/drivers/iommu/intel-trace.c b/drivers/iommu/intel-trace.c
new file mode 100644
index 000000000000..bfb6a6e37a88
--- /dev/null
+++ b/drivers/iommu/intel-trace.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Intel IOMMU trace support
+ *
+ * Copyright (C) 2019 Intel Corporation
+ *
+ * Author: Lu Baolu <[email protected]>
+ */
+
+#include <linux/string.h>
+#include <linux/types.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/intel_iommu.h>
diff --git a/include/trace/events/intel_iommu.h b/include/trace/events/intel_iommu.h
new file mode 100644
index 000000000000..49ca57a90079
--- /dev/null
+++ b/include/trace/events/intel_iommu.h
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Intel IOMMU trace support
+ *
+ * Copyright (C) 2019 Intel Corporation
+ *
+ * Author: Lu Baolu <[email protected]>
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM intel_iommu
+
+#if !defined(_TRACE_INTEL_IOMMU_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_INTEL_IOMMU_H
+
+#include <linux/tracepoint.h>
+#include <linux/intel-iommu.h>
+
+TRACE_EVENT(bounce_map_single,
+ TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
+ size_t size),
+
+ TP_ARGS(dev, dev_addr, phys_addr, size),
+
+ TP_STRUCT__entry(
+ __string(dev_name, dev_name(dev))
+ __field(dma_addr_t, dev_addr)
+ __field(phys_addr_t, phys_addr)
+ __field(size_t, size)
+ ),
+
+ TP_fast_assign(
+ __assign_str(dev_name, dev_name(dev));
+ __entry->dev_addr = dev_addr;
+ __entry->phys_addr = phys_addr;
+ __entry->size = size;
+ ),
+
+ TP_printk("dev=%s dev_addr=0x%llx phys_addr=0x%llx size=%zu",
+ __get_str(dev_name),
+ (unsigned long long)__entry->dev_addr,
+ (unsigned long long)__entry->phys_addr,
+ __entry->size)
+);
+
+TRACE_EVENT(bounce_unmap_single,
+ TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
+
+ TP_ARGS(dev, dev_addr, size),
+
+ TP_STRUCT__entry(
+ __string(dev_name, dev_name(dev))
+ __field(dma_addr_t, dev_addr)
+ __field(size_t, size)
+ ),
+
+ TP_fast_assign(
+ __assign_str(dev_name, dev_name(dev));
+ __entry->dev_addr = dev_addr;
+ __entry->size = size;
+ ),
+
+ TP_printk("dev=%s dev_addr=0x%llx size=%zu",
+ __get_str(dev_name),
+ (unsigned long long)__entry->dev_addr,
+ __entry->size)
+);
+
+TRACE_EVENT(bounce_map_sg,
+ TP_PROTO(struct device *dev, unsigned int i, unsigned int nelems,
+ dma_addr_t dev_addr, phys_addr_t phys_addr, size_t size),
+
+ TP_ARGS(dev, i, nelems, dev_addr, phys_addr, size),
+
+ TP_STRUCT__entry(
+ __string(dev_name, dev_name(dev))
+ __field(unsigned int, i)
+ __field(unsigned int, last)
+ __field(dma_addr_t, dev_addr)
+ __field(phys_addr_t, phys_addr)
+ __field(size_t, size)
+ ),
+
+ TP_fast_assign(
+ __assign_str(dev_name, dev_name(dev));
+ __entry->i = i;
+ __entry->last = nelems - 1;
+ __entry->dev_addr = dev_addr;
+ __entry->phys_addr = phys_addr;
+ __entry->size = size;
+ ),
+
+ TP_printk("dev=%s elem=%u/%u dev_addr=0x%llx phys_addr=0x%llx size=%zu",
+ __get_str(dev_name), __entry->i, __entry->last,
+ (unsigned long long)__entry->dev_addr,
+ (unsigned long long)__entry->phys_addr,
+ __entry->size)
+);
+
+TRACE_EVENT(bounce_unmap_sg,
+ TP_PROTO(struct device *dev, unsigned int i, unsigned int nelems,
+ dma_addr_t dev_addr, phys_addr_t phys_addr, size_t size),
+
+ TP_ARGS(dev, i, nelems, dev_addr, phys_addr, size),
+
+ TP_STRUCT__entry(
+ __string(dev_name, dev_name(dev))
+ __field(unsigned int, i)
+ __field(unsigned int, last)
+ __field(dma_addr_t, dev_addr)
+ __field(phys_addr_t, phys_addr)
+ __field(size_t, size)
+ ),
+
+ TP_fast_assign(
+ __assign_str(dev_name, dev_name(dev));
+ __entry->i = i;
+ __entry->last = nelems - 1;
+ __entry->dev_addr = dev_addr;
+ __entry->phys_addr = phys_addr;
+ __entry->size = size;
+ ),
+
+ TP_printk("dev=%s elem=%u/%u dev_addr=0x%llx phys_addr=0x%llx size=%zu",
+ __get_str(dev_name), __entry->i, __entry->last,
+ (unsigned long long)__entry->dev_addr,
+ (unsigned long long)__entry->phys_addr,
+ __entry->size)
+);
+#endif /* _TRACE_INTEL_IOMMU_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
2.17.1

2019-04-21 01:33:37

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v3 01/10] iommu: Add helper to get minimal page size of domain

This makes it possible for other modules to know the minimal
page size supported by a domain without the knowledge of the
structure details.

Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/iommu.h | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a5007d035218..46679ef19b7e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -377,6 +377,14 @@ static inline void iommu_tlb_sync(struct iommu_domain *domain)
domain->ops->iotlb_sync(domain);
}

+static inline unsigned long domain_minimal_pgsize(struct iommu_domain *domain)
+{
+ if (domain && domain->pgsize_bitmap)
+ return 1 << __ffs(domain->pgsize_bitmap);
+
+ return 0;
+}
+
/* PCI device grouping function */
extern struct iommu_group *pci_device_group(struct device *dev);
/* Generic device grouping function */
@@ -704,6 +712,11 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
return NULL;
}

+static inline unsigned long domain_minimal_pgsize(struct iommu_domain *domain)
+{
+ return 0;
+}
+
#endif /* CONFIG_IOMMU_API */

#ifdef CONFIG_IOMMU_DEBUGFS
--
2.17.1

2019-04-21 01:33:38

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v3 07/10] iommu/vt-d: Keep swiotlb on if bounce page is necessary

The bouce page is necessary whenever there are untrusted
devices in the system. The bounce page policy depends on
swiotlb, hence don't turn off swiotlb if untrusted devices
exist.

Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Mika Westerberg <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel-iommu.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index af040602bd44..e4a164324bdd 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4528,22 +4528,27 @@ const struct attribute_group *intel_iommu_groups[] = {
NULL,
};

-static int __init platform_optin_force_iommu(void)
+static inline bool platform_has_untrusted_device(void)
{
+ bool has_untrusted_device = false;
struct pci_dev *pdev = NULL;
- bool has_untrusted_dev = false;
-
- if (!dmar_platform_optin() || no_platform_optin)
- return 0;

for_each_pci_dev(pdev) {
if (pdev->untrusted) {
- has_untrusted_dev = true;
+ has_untrusted_device = true;
break;
}
}

- if (!has_untrusted_dev)
+ return has_untrusted_device;
+}
+
+static int __init platform_optin_force_iommu(void)
+{
+ if (!dmar_platform_optin() || no_platform_optin)
+ return 0;
+
+ if (!platform_has_untrusted_device())
return 0;

if (no_iommu || dmar_disabled)
@@ -4557,9 +4562,6 @@ static int __init platform_optin_force_iommu(void)
iommu_identity_mapping |= IDENTMAP_ALL;

dmar_disabled = 0;
-#if defined(CONFIG_X86) && defined(CONFIG_SWIOTLB)
- swiotlb = 0;
-#endif
no_iommu = 0;

return 1;
@@ -4653,7 +4655,8 @@ int __init intel_iommu_init(void)
up_write(&dmar_global_lock);

#if defined(CONFIG_X86) && defined(CONFIG_SWIOTLB)
- swiotlb = 0;
+ if (!platform_has_untrusted_device())
+ swiotlb = 0;
#endif
dma_ops = &intel_dma_ops;

--
2.17.1

2019-04-21 01:34:23

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v3 09/10] iommu/vt-d: Add dma sync ops for untrusted devices

This adds the dma sync ops for dma buffers used by any
untrusted device. We need to sync such buffers because
they might have been mapped with bounce pages.

Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Tested-by: Xu Pengfei <[email protected]>
Tested-by: Mika Westerberg <[email protected]>
---
drivers/iommu/Kconfig | 1 +
drivers/iommu/intel-iommu.c | 96 +++++++++++++++++++++++++++++++++----
2 files changed, 88 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index b918c22ca25b..f3191ec29e45 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -194,6 +194,7 @@ config INTEL_IOMMU
select IOMMU_IOVA
select NEED_DMA_MAP_STATE
select DMAR_TABLE
+ select IOMMU_BOUNCE_PAGE
help
DMA remapping (DMAR) devices support enables independent address
translations for Direct Memory Access (DMA) from devices.
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0d80f26b8a72..ed941ec9b9d5 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3683,16 +3683,94 @@ static int intel_map_sg(struct device *dev, struct scatterlist *sglist, int nele
return nelems;
}

+static void
+intel_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir)
+{
+ if (WARN_ON(dir == DMA_NONE))
+ return;
+
+ if (!device_needs_bounce(dev))
+ return;
+
+ if (iommu_no_mapping(dev))
+ return;
+
+ iommu_bounce_sync_single(dev, addr, size, dir, SYNC_FOR_CPU);
+}
+
+static void
+intel_sync_single_for_device(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir)
+{
+ if (WARN_ON(dir == DMA_NONE))
+ return;
+
+ if (!device_needs_bounce(dev))
+ return;
+
+ if (iommu_no_mapping(dev))
+ return;
+
+ iommu_bounce_sync_single(dev, addr, size, dir, SYNC_FOR_DEVICE);
+}
+
+static void
+intel_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist,
+ int nelems, enum dma_data_direction dir)
+{
+ struct scatterlist *sg;
+ int i;
+
+ if (WARN_ON(dir == DMA_NONE))
+ return;
+
+ if (!device_needs_bounce(dev))
+ return;
+
+ if (iommu_no_mapping(dev))
+ return;
+
+ for_each_sg(sglist, sg, nelems, i)
+ iommu_bounce_sync_single(dev, sg_dma_address(sg),
+ sg_dma_len(sg), dir, SYNC_FOR_CPU);
+}
+
+static void
+intel_sync_sg_for_device(struct device *dev, struct scatterlist *sglist,
+ int nelems, enum dma_data_direction dir)
+{
+ struct scatterlist *sg;
+ int i;
+
+ if (WARN_ON(dir == DMA_NONE))
+ return;
+
+ if (!device_needs_bounce(dev))
+ return;
+
+ if (iommu_no_mapping(dev))
+ return;
+
+ for_each_sg(sglist, sg, nelems, i)
+ iommu_bounce_sync_single(dev, sg_dma_address(sg),
+ sg_dma_len(sg), dir, SYNC_FOR_DEVICE);
+}
+
static const struct dma_map_ops intel_dma_ops = {
- .alloc = intel_alloc_coherent,
- .free = intel_free_coherent,
- .map_sg = intel_map_sg,
- .unmap_sg = intel_unmap_sg,
- .map_page = intel_map_page,
- .unmap_page = intel_unmap_page,
- .map_resource = intel_map_resource,
- .unmap_resource = intel_unmap_page,
- .dma_supported = dma_direct_supported,
+ .alloc = intel_alloc_coherent,
+ .free = intel_free_coherent,
+ .map_sg = intel_map_sg,
+ .unmap_sg = intel_unmap_sg,
+ .map_page = intel_map_page,
+ .unmap_page = intel_unmap_page,
+ .sync_single_for_cpu = intel_sync_single_for_cpu,
+ .sync_single_for_device = intel_sync_single_for_device,
+ .sync_sg_for_cpu = intel_sync_sg_for_cpu,
+ .sync_sg_for_device = intel_sync_sg_for_device,
+ .map_resource = intel_map_resource,
+ .unmap_resource = intel_unmap_page,
+ .dma_supported = dma_direct_supported,
};

static inline int iommu_domain_cache_init(void)
--
2.17.1

2019-04-21 01:34:30

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v3 03/10] swiotlb: Limit tlb address range inside slot pool

In swiotlb_tbl_free_tlb(), when the tlb range is out of
the scope of the tlb slot pool, return directly with a
warning message. Otherwise, kernel data might be blindly
changed.

Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Robin Murphy <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
kernel/dma/swiotlb.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 173122d16b7f..dbb937ce79c8 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -560,6 +560,12 @@ swiotlb_tbl_free_tlb(struct device *hwdev, phys_addr_t tlb_addr, size_t size)
int i, count, nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;

+ /* Return directly if the tlb address is out of slot pool. */
+ if (tlb_addr < io_tlb_start || tlb_addr + size > io_tlb_end) {
+ dev_warn(hwdev, "invalid tlb address\n");
+ return;
+ }
+
/*
* Return the buffer to the free list by setting the corresponding
* entries to indicate the number of contiguous entries available.
--
2.17.1

2019-04-21 01:34:31

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v3 05/10] iommu: Add bounce page APIs

IOMMU hardware always use paging for DMA remapping. The
minimum mapped window is a page size. The device drivers
may map buffers not filling whole IOMMU window. It allows
device to access to possibly unrelated memory and various
malicious devices can exploit this to perform DMA attack.

This introduces the bouce buffer mechanism for DMA buffers
which doesn't fill a minimal IOMMU page. It could be used
by various vendor specific IOMMU drivers as long as the
DMA domain is managed by the generic IOMMU layer. Below
APIs are added:

* iommu_bounce_map(dev, addr, paddr, size, dir, attrs)
- Map a buffer start at DMA address @addr in bounce page
manner. For buffer parts that doesn't cross a whole
minimal IOMMU page, the bounce page policy is applied.
A bounce page mapped by swiotlb will be used as the DMA
target in the IOMMU page table. Otherwise, the physical
address @paddr is mapped instead.

* iommu_bounce_unmap(dev, addr, size, dir, attrs)
- Unmap the buffer mapped with iommu_bounce_map(). The bounce
page will be torn down after the bounced data get synced.

* iommu_bounce_sync_single(dev, addr, size, dir, target)
- Synce the bounced data in case the bounce mapped buffer is
reused.

The whole APIs are included within a kernel option IOMMU_BOUNCE_PAGE.
It's useful for cases where bounce page doesn't needed, for example,
embedded cases.

Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: Mika Westerberg <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/Kconfig | 14 +++
drivers/iommu/iommu.c | 275 ++++++++++++++++++++++++++++++++++++++++++
include/linux/iommu.h | 37 ++++++
3 files changed, 326 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6f07f3b21816..b918c22ca25b 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -85,6 +85,20 @@ config IOMMU_DEFAULT_PASSTHROUGH

If unsure, say N here.

+config IOMMU_BOUNCE_PAGE
+ bool "Use bounce page for untrusted devices"
+ depends on IOMMU_API
+ select SWIOTLB
+ help
+ IOMMU hardware always use paging for DMA remapping. The minimum
+ mapped window is a page size. The device drivers may map buffers
+ not filling whole IOMMU window. This allows device to access to
+ possibly unrelated memory and malicious device can exploit this
+ to perform a DMA attack. Select this to use a bounce page for the
+ buffer which doesn't fill a whole IOMU page.
+
+ If unsure, say N here.
+
config OF_IOMMU
def_bool y
depends on OF && IOMMU_API
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 516e4d8995c2..f199d0addbf1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -34,6 +34,7 @@
#include <linux/bitops.h>
#include <linux/property.h>
#include <linux/fsl/mc.h>
+#include <linux/dma-direct.h>
#include <trace/events/iommu.h>

static struct kset *iommu_group_kset;
@@ -2043,3 +2044,277 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
return 0;
}
EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
+
+#ifdef CONFIG_IOMMU_BOUNCE_PAGE
+
+/*
+ * Bounce buffer support for external devices:
+ *
+ * IOMMU hardware always use paging for DMA remapping. The minimum mapped
+ * window is a page size. The device drivers may map buffers not filling
+ * whole IOMMU window. This allows device to access to possibly unrelated
+ * memory and malicious device can exploit this to perform a DMA attack.
+ * Use a bounce page for the buffer which doesn't fill a whole IOMU page.
+ */
+
+struct addr_walk {
+ int (*low)(struct device *dev, struct iommu_domain *domain,
+ dma_addr_t addr, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs, void *data);
+ int (*middle)(struct device *dev, struct iommu_domain *domain,
+ dma_addr_t addr, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs, void *data);
+ int (*high)(struct device *dev, struct iommu_domain *domain,
+ dma_addr_t addr, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs, void *data);
+};
+
+/* Calculate how many pages does a range of [addr, addr + size) cross. */
+static inline unsigned long
+range_nrpages(dma_addr_t addr, size_t size, unsigned long page_size)
+{
+ unsigned long offset = page_size - 1;
+
+ return ALIGN((addr & offset) + size, page_size) >> __ffs(page_size);
+}
+
+static int nobounce_map(struct device *dev, struct iommu_domain *domain,
+ dma_addr_t addr, phys_addr_t paddr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs,
+ void *data)
+{
+ const struct iommu_ops *ops = domain->ops;
+ int prot = 0;
+
+ if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
+ prot |= IOMMU_READ;
+
+ if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
+ prot |= IOMMU_WRITE;
+
+ if (unlikely(!ops->map || domain->pgsize_bitmap == 0UL))
+ return -ENODEV;
+
+ return ops->map(domain, addr, paddr, size, prot);
+}
+
+static int nobounce_unmap(struct device *dev, struct iommu_domain *domain,
+ dma_addr_t addr, phys_addr_t paddr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs,
+ void *data)
+{
+ const struct iommu_ops *ops = domain->ops;
+
+ if (unlikely(!ops->unmap))
+ return -ENODEV;
+ ops->unmap(domain, addr, size);
+
+ return 0;
+}
+
+static phys_addr_t
+iova_to_tlb_addr(struct iommu_domain *domain, dma_addr_t addr)
+{
+ unsigned long page_size = domain_minimal_pgsize(domain);
+ const struct iommu_ops *ops = domain->ops;
+ phys_addr_t tlb_addr;
+
+ if (unlikely(!ops->iova_to_phys))
+ return 0;
+
+ tlb_addr = ops->iova_to_phys(domain, addr);
+ if (!tlb_addr)
+ return 0;
+
+ return tlb_addr + (addr & (page_size - 1));
+}
+
+static int
+bounce_sync_single(struct device *dev, struct iommu_domain *domain,
+ dma_addr_t addr, phys_addr_t paddr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs,
+ void *data)
+{
+ enum dma_sync_target *target = data;
+ phys_addr_t tlb_addr;
+
+ tlb_addr = iova_to_tlb_addr(domain, addr);
+ if (tlb_addr)
+ swiotlb_tbl_sync_single(dev, tlb_addr,
+ size, dir, *target);
+
+ return 0;
+}
+
+static int bounce_map(struct device *dev, struct iommu_domain *domain,
+ dma_addr_t addr, phys_addr_t paddr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs,
+ void *data)
+{
+ const struct iommu_ops *ops = domain->ops;
+ phys_addr_t tlb_addr;
+ int prot = 0;
+ int ret;
+
+ if (unlikely(!ops->map || domain->pgsize_bitmap == 0UL))
+ return -ENODEV;
+
+ if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
+ prot |= IOMMU_READ;
+
+ if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
+ prot |= IOMMU_WRITE;
+
+ tlb_addr = phys_to_dma(dev, paddr);
+ if (!swiotlb_map(dev, &paddr, &tlb_addr, size,
+ dir, attrs | DMA_ATTR_BOUNCE_PAGE))
+ return -ENOMEM;
+
+ ret = ops->map(domain, addr, tlb_addr, size, prot);
+ if (ret)
+ swiotlb_tbl_unmap_single(dev, tlb_addr, size,
+ dir, attrs | DMA_ATTR_BOUNCE_PAGE);
+
+ return ret;
+}
+
+static const struct addr_walk walk_bounce_map = {
+ .low = bounce_map,
+ .middle = nobounce_map,
+ .high = bounce_map,
+};
+
+static int bounce_unmap(struct device *dev, struct iommu_domain *domain,
+ dma_addr_t addr, phys_addr_t paddr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs,
+ void *data)
+{
+ unsigned long page_size = domain_minimal_pgsize(domain);
+ phys_addr_t tlb_addr = iova_to_tlb_addr(domain, addr);
+ const struct iommu_ops *ops = domain->ops;
+
+ if (unlikely(!ops->unmap))
+ return -ENODEV;
+ ops->unmap(domain, ALIGN_DOWN(addr, page_size), page_size);
+
+ if (tlb_addr)
+ swiotlb_tbl_unmap_single(dev, tlb_addr, size,
+ dir, attrs | DMA_ATTR_BOUNCE_PAGE);
+
+ return 0;
+}
+
+static const struct addr_walk walk_bounce_unmap = {
+ .low = bounce_unmap,
+ .middle = nobounce_unmap,
+ .high = bounce_unmap,
+};
+
+static const struct addr_walk walk_bounce_sync_single = {
+ .low = bounce_sync_single,
+ .high = bounce_sync_single,
+};
+
+static int
+domain_walk_addr_range(const struct addr_walk *walk, struct device *dev,
+ struct iommu_domain *domain, dma_addr_t addr,
+ phys_addr_t paddr, size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs,
+ void *data)
+{
+ u64 page_size = domain_minimal_pgsize(domain);
+ u64 page_offset = page_size - 1;
+ u64 page_mask = ~page_offset;
+ u64 length = 0;
+ int ret;
+
+ /*
+ * The first vt-d page is partial. Use bounce buffer for
+ * security concern if necessary.
+ */
+ if (addr & page_offset) {
+ length = ALIGN(addr, page_size) - addr;
+ if (length > size)
+ length = size;
+ if (walk->low) {
+ ret = walk->low(dev, domain, addr, paddr,
+ length, dir, attrs, data);
+ if (ret)
+ return ret;
+ }
+
+ /* The buffer only covers on page. */
+ if (range_nrpages(addr, size, page_size) <= 1)
+ return 0;
+
+ size -= length;
+ addr = ALIGN(addr, page_size);
+ paddr = ALIGN(paddr, page_size);
+ }
+
+ /*
+ * There might be several pages which could totally accessed
+ * by a device in the middle. It's unnecessary to use bounce
+ * buffer against these pages.
+ */
+ if (size & page_mask) {
+ length = size & page_mask;
+ if (walk->middle) {
+ ret = walk->middle(dev, domain, addr, paddr,
+ length, dir, attrs, data);
+ if (ret)
+ return ret;
+ }
+
+ addr += size & page_mask;
+ paddr += size & page_mask;
+ size &= page_offset;
+ }
+
+ /*
+ * Okay, last page might be partial. Use bounce buffer if necessary.
+ */
+ if (size && walk->high)
+ return walk->high(dev, domain, addr, paddr,
+ size, dir, attrs, data);
+
+ return 0;
+}
+
+int iommu_bounce_map(struct device *dev, dma_addr_t addr, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+ return domain_walk_addr_range(&walk_bounce_map, dev, domain,
+ addr, paddr, size, dir, attrs, NULL);
+}
+EXPORT_SYMBOL_GPL(iommu_bounce_map);
+
+int iommu_bounce_unmap(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+ return domain_walk_addr_range(&walk_bounce_unmap, dev, domain,
+ addr, 0, size, dir, attrs, NULL);
+}
+EXPORT_SYMBOL_GPL(iommu_bounce_unmap);
+
+int iommu_bounce_sync_single(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir,
+ enum dma_sync_target target)
+{
+ struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+ return domain_walk_addr_range(&walk_bounce_sync_single, dev,
+ domain, addr, 0, size, dir, 0, &target);
+}
+EXPORT_SYMBOL_GPL(iommu_bounce_sync_single);
+#endif /* CONFIG_IOMMU_BOUNCE_PAGE */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 46679ef19b7e..93a4837dfe43 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
#include <linux/errno.h>
#include <linux/err.h>
#include <linux/of.h>
+#include <linux/swiotlb.h>

#define IOMMU_READ (1 << 0)
#define IOMMU_WRITE (1 << 1)
@@ -428,6 +429,42 @@ static inline void dev_iommu_fwspec_set(struct device *dev,
int iommu_probe_device(struct device *dev);
void iommu_release_device(struct device *dev);

+#ifdef CONFIG_IOMMU_BOUNCE_PAGE
+int iommu_bounce_map(struct device *dev, dma_addr_t addr, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs);
+int iommu_bounce_unmap(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs);
+int iommu_bounce_sync_single(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir,
+ enum dma_sync_target target);
+#else
+static inline int
+iommu_bounce_map(struct device *dev, dma_addr_t addr, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ return -ENODEV;
+}
+
+static inline int
+iommu_bounce_unmap(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ return -ENODEV;
+}
+
+static inline int
+iommu_bounce_sync_single(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir,
+ enum dma_sync_target target)
+{
+ return -ENODEV;
+}
+#endif /* CONFIG_IOMMU_BOUNCE_PAGE */
+
#else /* CONFIG_IOMMU_API */

struct iommu_ops {};
--
2.17.1

2019-04-21 01:34:45

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v3 10/10] iommu/vt-d: Use bounce buffer for untrusted devices

The Intel VT-d hardware uses paging for DMA remapping.
The minimum mapped window is a page size. The device
drivers may map buffers not filling the whole IOMMU
window. This allows the device to access to possibly
unrelated memory and a malicious device could exploit
this to perform DMA attacks. To address this, the
Intel IOMMU driver will use bounce pages for those
buffers which don't fill a whole IOMMU page.

Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Tested-by: Xu Pengfei <[email protected]>
Tested-by: Mika Westerberg <[email protected]>
---
drivers/iommu/intel-iommu.c | 138 ++++++++++++++++++++++++++----------
1 file changed, 99 insertions(+), 39 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ed941ec9b9d5..52ccbd3f1425 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -52,6 +52,7 @@
#include <asm/irq_remapping.h>
#include <asm/cacheflush.h>
#include <asm/iommu.h>
+#include <trace/events/intel_iommu.h>

#include "irq_remapping.h"
#include "intel-pasid.h"
@@ -3410,15 +3411,17 @@ static int iommu_no_mapping(struct device *dev)
}

static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr,
- size_t size, int dir, u64 dma_mask)
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs, u64 dma_mask)
{
struct dmar_domain *domain;
- phys_addr_t start_paddr;
+ dma_addr_t start_dma;
unsigned long iova_pfn;
int prot = 0;
int ret;
struct intel_iommu *iommu;
unsigned long paddr_pfn = paddr >> PAGE_SHIFT;
+ unsigned long nrpages;

BUG_ON(dir == DMA_NONE);

@@ -3430,9 +3433,10 @@ static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr,
return DMA_MAPPING_ERROR;

iommu = domain_get_iommu(domain);
- size = aligned_nrpages(paddr, size);
+ nrpages = aligned_nrpages(paddr, size);

- iova_pfn = intel_alloc_iova(dev, domain, dma_to_mm_pfn(size), dma_mask);
+ iova_pfn = intel_alloc_iova(dev, domain,
+ dma_to_mm_pfn(nrpages), dma_mask);
if (!iova_pfn)
goto error;

@@ -3445,24 +3449,33 @@ static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr,
prot |= DMA_PTE_READ;
if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
prot |= DMA_PTE_WRITE;
+
+ start_dma = (dma_addr_t)iova_pfn << PAGE_SHIFT;
+ start_dma += offset_in_page(paddr);
+
/*
* paddr - (paddr + size) might be partial page, we should map the whole
* page. Note: if two part of one page are separately mapped, we
* might have two guest_addr mapping to the same host paddr, but this
* is not a big problem
*/
- ret = domain_pfn_mapping(domain, mm_to_dma_pfn(iova_pfn),
- mm_to_dma_pfn(paddr_pfn), size, prot);
+ if (device_needs_bounce(dev)) {
+ ret = iommu_bounce_map(dev, start_dma, paddr, size, dir, attrs);
+ if (!ret)
+ trace_bounce_map_single(dev, start_dma, paddr, size);
+ } else {
+ ret = domain_pfn_mapping(domain, mm_to_dma_pfn(iova_pfn),
+ mm_to_dma_pfn(paddr_pfn),
+ nrpages, prot);
+ }
if (ret)
goto error;

- start_paddr = (phys_addr_t)iova_pfn << PAGE_SHIFT;
- start_paddr += paddr & ~PAGE_MASK;
- return start_paddr;
-
+ return start_dma;
error:
if (iova_pfn)
- free_iova_fast(&domain->iovad, iova_pfn, dma_to_mm_pfn(size));
+ free_iova_fast(&domain->iovad, iova_pfn,
+ dma_to_mm_pfn(nrpages));
dev_err(dev, "Device request: %zx@%llx dir %d --- failed\n",
size, (unsigned long long)paddr, dir);
return DMA_MAPPING_ERROR;
@@ -3474,44 +3487,79 @@ static dma_addr_t intel_map_page(struct device *dev, struct page *page,
unsigned long attrs)
{
return __intel_map_single(dev, page_to_phys(page) + offset, size,
- dir, *dev->dma_mask);
+ dir, attrs, *dev->dma_mask);
}

static dma_addr_t intel_map_resource(struct device *dev, phys_addr_t phys_addr,
size_t size, enum dma_data_direction dir,
unsigned long attrs)
{
- return __intel_map_single(dev, phys_addr, size, dir, *dev->dma_mask);
+ return __intel_map_single(dev, phys_addr, size,
+ dir, attrs, *dev->dma_mask);
}

-static void intel_unmap(struct device *dev, dma_addr_t dev_addr, size_t size)
+static void
+intel_unmap(struct device *dev, dma_addr_t dev_addr, size_t size,
+ struct scatterlist *sglist, int nelems,
+ enum dma_data_direction dir, unsigned long attrs)
{
struct dmar_domain *domain;
unsigned long start_pfn, last_pfn;
- unsigned long nrpages;
+ unsigned long nrpages = 0;
unsigned long iova_pfn;
struct intel_iommu *iommu;
- struct page *freelist;
+ struct page *freelist = NULL;
+ struct pci_dev *pdev = NULL;

if (iommu_no_mapping(dev))
return;

+ if (dev_is_pci(dev))
+ pdev = to_pci_dev(dev);
+
domain = find_domain(dev);
BUG_ON(!domain);

iommu = domain_get_iommu(domain);

- iova_pfn = IOVA_PFN(dev_addr);
-
- nrpages = aligned_nrpages(dev_addr, size);
- start_pfn = mm_to_dma_pfn(iova_pfn);
- last_pfn = start_pfn + nrpages - 1;
-
- dev_dbg(dev, "Device unmapping: pfn %lx-%lx\n", start_pfn, last_pfn);
+ if (sglist) {
+ struct scatterlist *sg;
+ int i;

- freelist = domain_unmap(domain, start_pfn, last_pfn);
+ dev_addr = sg_dma_address(sglist) & PAGE_MASK;
+ iova_pfn = IOVA_PFN(dev_addr);
+ for_each_sg(sglist, sg, nelems, i) {
+ nrpages += aligned_nrpages(sg_dma_address(sg),
+ sg_dma_len(sg));
+ }
+ start_pfn = mm_to_dma_pfn(iova_pfn);
+ last_pfn = start_pfn + nrpages - 1;
+
+ if (device_needs_bounce(dev))
+ for_each_sg(sglist, sg, nelems, i) {
+ iommu_bounce_unmap(dev, sg_dma_address(sg),
+ sg->length, dir, attrs);
+ trace_bounce_unmap_sg(dev, i, nelems,
+ sg_dma_address(sg),
+ sg_phys(sg), sg->length);
+ }
+ else
+ freelist = domain_unmap(domain, start_pfn, last_pfn);
+ } else {
+ iova_pfn = IOVA_PFN(dev_addr);
+ nrpages = aligned_nrpages(dev_addr, size);
+ start_pfn = mm_to_dma_pfn(iova_pfn);
+ last_pfn = start_pfn + nrpages - 1;
+
+ if (device_needs_bounce(dev)) {
+ iommu_bounce_unmap(dev, dev_addr, size, dir, attrs);
+ trace_bounce_unmap_single(dev, dev_addr, size);
+ } else {
+ freelist = domain_unmap(domain, start_pfn, last_pfn);
+ }
+ }

- if (intel_iommu_strict) {
+ if (intel_iommu_strict || (pdev && pdev->untrusted)) {
iommu_flush_iotlb_psi(iommu, domain, start_pfn,
nrpages, !freelist, 0);
/* free iova */
@@ -3531,7 +3579,7 @@ static void intel_unmap_page(struct device *dev, dma_addr_t dev_addr,
size_t size, enum dma_data_direction dir,
unsigned long attrs)
{
- intel_unmap(dev, dev_addr, size);
+ intel_unmap(dev, dev_addr, size, NULL, 0, dir, attrs);
}

static void *intel_alloc_coherent(struct device *dev, size_t size,
@@ -3572,7 +3620,7 @@ static void *intel_alloc_coherent(struct device *dev, size_t size,
memset(page_address(page), 0, size);

*dma_handle = __intel_map_single(dev, page_to_phys(page), size,
- DMA_BIDIRECTIONAL,
+ DMA_BIDIRECTIONAL, attrs,
dev->coherent_dma_mask);
if (*dma_handle != DMA_MAPPING_ERROR)
return page_address(page);
@@ -3591,7 +3639,7 @@ static void intel_free_coherent(struct device *dev, size_t size, void *vaddr,
size = PAGE_ALIGN(size);
order = get_order(size);

- intel_unmap(dev, dma_handle, size);
+ intel_unmap(dev, dma_handle, size, NULL, 0, 0, attrs);
if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT))
__free_pages(page, order);
}
@@ -3600,16 +3648,7 @@ static void intel_unmap_sg(struct device *dev, struct scatterlist *sglist,
int nelems, enum dma_data_direction dir,
unsigned long attrs)
{
- dma_addr_t startaddr = sg_dma_address(sglist) & PAGE_MASK;
- unsigned long nrpages = 0;
- struct scatterlist *sg;
- int i;
-
- for_each_sg(sglist, sg, nelems, i) {
- nrpages += aligned_nrpages(sg_dma_address(sg), sg_dma_len(sg));
- }
-
- intel_unmap(dev, startaddr, nrpages << VTD_PAGE_SHIFT);
+ intel_unmap(dev, 0, 0, sglist, nelems, dir, attrs);
}

static int intel_nontranslate_map_sg(struct device *hddev,
@@ -3671,7 +3710,28 @@ static int intel_map_sg(struct device *dev, struct scatterlist *sglist, int nele

start_vpfn = mm_to_dma_pfn(iova_pfn);

- ret = domain_sg_mapping(domain, start_vpfn, sglist, size, prot);
+ if (device_needs_bounce(dev)) {
+ for_each_sg(sglist, sg, nelems, i) {
+ unsigned int pgoff = offset_in_page(sg->offset);
+ dma_addr_t addr;
+
+ addr = ((dma_addr_t)iova_pfn << PAGE_SHIFT) + pgoff;
+ ret = iommu_bounce_map(dev, addr, sg_phys(sg),
+ sg->length, dir, attrs);
+ if (ret)
+ break;
+
+ trace_bounce_map_sg(dev, i, nelems, addr,
+ sg_phys(sg), sg->length);
+
+ sg->dma_address = addr;
+ sg->dma_length = sg->length;
+ iova_pfn += aligned_nrpages(sg->offset, sg->length);
+ }
+ } else {
+ ret = domain_sg_mapping(domain, start_vpfn, sglist, size, prot);
+ }
+
if (unlikely(ret)) {
dma_pte_free_pagetable(domain, start_vpfn,
start_vpfn + size - 1,
--
2.17.1

2019-04-22 16:50:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

I looked over your swiotlb modification and I don't think we really need
them. The only thing we really need is to split the size parameter to
swiotlb_tbl_map_single and swiotlb_tbl_unmap_single into an alloc_size
and a mapping_size paramter, where the latter one is rounded up to the
iommu page size. Below is an untested patch on top of your series to
show what I mean. That being said - both the current series and the one
with my patch will still leak the content of the swiotlb buffer
allocated but not used to the untrusted external device. Is that
acceptable? If not we need to clear that part, at which point you don't
need swiotlb changes. Another implication is that for untrusted devices
the size of the dma coherent allocations needs to be rounded up to the
iommu page size (if that can ever be larger than the host page size).

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8c4a078fb041..eb5c32ad4443 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2151,10 +2151,13 @@ static int bounce_map(struct device *dev, struct iommu_domain *domain,
void *data)
{
const struct iommu_ops *ops = domain->ops;
+ unsigned long page_size = domain_minimal_pgsize(domain);
phys_addr_t tlb_addr;
int prot = 0;
int ret;

+ if (WARN_ON_ONCE(size > page_size))
+ return -EINVAL;
if (unlikely(!ops->map || domain->pgsize_bitmap == 0UL))
return -ENODEV;

@@ -2164,16 +2167,16 @@ static int bounce_map(struct device *dev, struct iommu_domain *domain,
if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
prot |= IOMMU_WRITE;

- tlb_addr = phys_to_dma(dev, paddr);
- if (!swiotlb_map(dev, &paddr, &tlb_addr, size,
- dir, attrs | DMA_ATTR_BOUNCE_PAGE))
+ tlb_addr = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
+ paddr, size, page_size, dir, attrs);
+ if (tlb_addr == DMA_MAPPING_ERROR)
return -ENOMEM;

ret = ops->map(domain, addr, tlb_addr, size, prot);
- if (ret)
- swiotlb_tbl_unmap_single(dev, tlb_addr, size,
- dir, attrs | DMA_ATTR_BOUNCE_PAGE);
-
+ if (ret) {
+ swiotlb_tbl_unmap_single(dev, tlb_addr, size, page_size,
+ dir, attrs);
+ }
return ret;
}

@@ -2194,11 +2197,12 @@ static int bounce_unmap(struct device *dev, struct iommu_domain *domain,

if (unlikely(!ops->unmap))
return -ENODEV;
- ops->unmap(domain, ALIGN_DOWN(addr, page_size), page_size);
+ ops->unmap(domain, addr, page_size);

- if (tlb_addr)
- swiotlb_tbl_unmap_single(dev, tlb_addr, size,
- dir, attrs | DMA_ATTR_BOUNCE_PAGE);
+ if (tlb_addr) {
+ swiotlb_tbl_unmap_single(dev, tlb_addr, size, page_size,
+ dir, attrs);
+ }

return 0;
}
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 877baf2a94f4..3b6ce643bffa 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -404,7 +404,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
*/
trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force);

- map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir,
+ map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, size, dir,
attrs);
if (map == DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
@@ -420,7 +420,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
return dev_addr;

attrs |= DMA_ATTR_SKIP_CPU_SYNC;
- swiotlb_tbl_unmap_single(dev, map, size, dir, attrs);
+ swiotlb_tbl_unmap_single(dev, map, size, size, dir, attrs);

return DMA_MAPPING_ERROR;
}
@@ -445,7 +445,7 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr,

/* NOTE: We use dev_addr here, not paddr! */
if (is_xen_swiotlb_buffer(dev_addr))
- swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
+ swiotlb_tbl_unmap_single(hwdev, paddr, size, size, dir, attrs);
}

static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
@@ -556,6 +556,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
start_dma_addr,
sg_phys(sg),
sg->length,
+ sg->length,
dir, attrs);
if (map == DMA_MAPPING_ERROR) {
dev_warn(hwdev, "swiotlb buffer is full\n");
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 26e506e5b04c..75e60be91e5f 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -70,12 +70,6 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)

-/*
- * DMA_ATTR_BOUNCE_PAGE: used by the IOMMU sub-system to indicate that
- * the buffer is used as a bounce page in the DMA remapping page table.
- */
-#define DMA_ATTR_BOUNCE_PAGE (1UL << 10)
-
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform.
* It can be given to a device to use as a DMA source or target. A CPU cannot
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 361f62bb4a8e..e1c738eb5baf 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -44,16 +44,13 @@ enum dma_sync_target {
SYNC_FOR_DEVICE = 1,
};

-extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
- dma_addr_t tbl_dma_addr,
- phys_addr_t phys, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs);
-
-extern void swiotlb_tbl_unmap_single(struct device *hwdev,
- phys_addr_t tlb_addr,
- size_t size, enum dma_data_direction dir,
- unsigned long attrs);
+phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
+ dma_addr_t tbl_dma_addr, phys_addr_t phys, size_t mapping_size,
+ size_t alloc_size, enum dma_data_direction dir,
+ unsigned long attrs);
+void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
+ size_t mapping_size, size_t alloc_size,
+ enum dma_data_direction dir, unsigned long attrs);

extern void swiotlb_tbl_sync_single(struct device *hwdev,
phys_addr_t tlb_addr,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index fcdb23e8d2fc..0edc0bf80207 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -290,7 +290,7 @@ void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
dma_direct_sync_single_for_cpu(dev, addr, size, dir);

if (unlikely(is_swiotlb_buffer(phys)))
- swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+ swiotlb_tbl_unmap_single(dev, phys, size, size, dir, attrs);
}
EXPORT_SYMBOL(dma_direct_unmap_page);

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 96b87a11dee1..feec87196cc8 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -34,7 +34,6 @@
#include <linux/scatterlist.h>
#include <linux/mem_encrypt.h>
#include <linux/set_memory.h>
-#include <linux/iommu.h>
#ifdef CONFIG_DEBUG_FS
#include <linux/debugfs.h>
#endif
@@ -440,9 +439,10 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
}
}

-static phys_addr_t
-swiotlb_tbl_alloc_tlb(struct device *hwdev, dma_addr_t tbl_dma_addr,
- phys_addr_t orig_addr, size_t size)
+phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
+ dma_addr_t tbl_dma_addr, phys_addr_t orig_addr,
+ size_t mapping_size, size_t alloc_size,
+ enum dma_data_direction dir, unsigned long attrs)
{
unsigned long flags;
phys_addr_t tlb_addr;
@@ -476,8 +476,8 @@ swiotlb_tbl_alloc_tlb(struct device *hwdev, dma_addr_t tbl_dma_addr,
* For mappings greater than or equal to a page, we limit the stride
* (and hence alignment) to a page size.
*/
- nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
- if (size >= PAGE_SIZE)
+ nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+ if (alloc_size >= PAGE_SIZE)
stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
else
stride = 1;
@@ -538,6 +538,9 @@ swiotlb_tbl_alloc_tlb(struct device *hwdev, dma_addr_t tbl_dma_addr,

not_found:
spin_unlock_irqrestore(&io_tlb_lock, flags);
+ if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
+ dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n",
+ alloc_size);
return DMA_MAPPING_ERROR;
found:
io_tlb_used += nslots;
@@ -550,21 +553,34 @@ swiotlb_tbl_alloc_tlb(struct device *hwdev, dma_addr_t tbl_dma_addr,
*/
for (i = 0; i < nslots; i++)
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+ (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
+ swiotlb_bounce(orig_addr, tlb_addr, mapping_size,
+ DMA_TO_DEVICE);

return tlb_addr;
}

-static void
-swiotlb_tbl_free_tlb(struct device *hwdev, phys_addr_t tlb_addr, size_t size)
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
+ size_t mapping_size, size_t alloc_size,
+ enum dma_data_direction dir, unsigned long attrs)
{
unsigned long flags;
- int i, count, nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+ int i, count, nslots;
int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
+ phys_addr_t orig_addr = io_tlb_orig_addr[index];

- /* Return directly if the tlb address is out of slot pool. */
- if (tlb_addr < io_tlb_start || tlb_addr + size > io_tlb_end) {
- dev_warn(hwdev, "invalid tlb address\n");
- return;
+ /*
+ * First, sync the memory before unmapping the entry
+ */
+ if (orig_addr != INVALID_PHYS_ADDR &&
+ !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+ (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)) {
+ swiotlb_bounce(orig_addr, tlb_addr, mapping_size,
+ DMA_FROM_DEVICE);
}

/*
@@ -573,6 +589,7 @@ swiotlb_tbl_free_tlb(struct device *hwdev, phys_addr_t tlb_addr, size_t size)
* While returning the entries to the free list, we merge the entries
* with slots below and above the pool being returned.
*/
+ nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
spin_lock_irqsave(&io_tlb_lock, flags);
{
count = ((index + nslots) < ALIGN(index + 1, IO_TLB_SEGSIZE) ?
@@ -597,88 +614,6 @@ swiotlb_tbl_free_tlb(struct device *hwdev, phys_addr_t tlb_addr, size_t size)
spin_unlock_irqrestore(&io_tlb_lock, flags);
}

-static unsigned long
-get_iommu_pgsize(struct device *dev, phys_addr_t phys, size_t size)
-{
- struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-
- return domain_minimal_pgsize(domain);
-}
-
-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
- dma_addr_t tbl_dma_addr,
- phys_addr_t orig_addr, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
-{
- phys_addr_t tlb_addr;
- unsigned long offset = 0;
-
- if (attrs & DMA_ATTR_BOUNCE_PAGE) {
- unsigned long pgsize = get_iommu_pgsize(hwdev, orig_addr, size);
-
- offset = orig_addr & (pgsize - 1);
-
- /* Don't allow the buffer to cross page boundary. */
- if (offset + size > pgsize)
- return DMA_MAPPING_ERROR;
-
- tlb_addr = swiotlb_tbl_alloc_tlb(hwdev,
- __phys_to_dma(hwdev, io_tlb_start),
- ALIGN_DOWN(orig_addr, pgsize), pgsize);
- } else {
- tlb_addr = swiotlb_tbl_alloc_tlb(hwdev,
- tbl_dma_addr, orig_addr, size);
- }
-
- if (tlb_addr == DMA_MAPPING_ERROR) {
- if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
- dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n",
- size);
- return DMA_MAPPING_ERROR;
- }
-
- tlb_addr += offset;
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);
-
- return tlb_addr;
-}
-
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t size, enum dma_data_direction dir,
- unsigned long attrs)
-{
- int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
- phys_addr_t orig_addr = io_tlb_orig_addr[index];
- unsigned long offset = 0;
-
- if (attrs & DMA_ATTR_BOUNCE_PAGE)
- offset = tlb_addr & ((1 << IO_TLB_SHIFT) - 1);
-
- /*
- * First, sync the memory before unmapping the entry
- */
- if (orig_addr != INVALID_PHYS_ADDR &&
- !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
- swiotlb_bounce(orig_addr + offset,
- tlb_addr, size, DMA_FROM_DEVICE);
-
- if (attrs & DMA_ATTR_BOUNCE_PAGE) {
- unsigned long pgsize = get_iommu_pgsize(hwdev, tlb_addr, size);
-
- swiotlb_tbl_free_tlb(hwdev,
- ALIGN_DOWN(tlb_addr, pgsize), pgsize);
- } else {
- swiotlb_tbl_free_tlb(hwdev, tlb_addr, size);
- }
-}
-
void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir,
enum dma_sync_target target)
@@ -727,14 +662,14 @@ bool swiotlb_map(struct device *dev, phys_addr_t *phys, dma_addr_t *dma_addr,

/* Oh well, have to allocate and map a bounce buffer. */
*phys = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
- *phys, size, dir, attrs);
+ *phys, size, size, dir, attrs);
if (*phys == DMA_MAPPING_ERROR)
return false;

/* Ensure that the address returned is DMA'ble */
*dma_addr = __phys_to_dma(dev, *phys);
if (unlikely(!dma_capable(dev, *dma_addr, size))) {
- swiotlb_tbl_unmap_single(dev, *phys, size, dir,
+ swiotlb_tbl_unmap_single(dev, *phys, size, size, dir,
attrs | DMA_ATTR_SKIP_CPU_SYNC);
return false;
}

2019-04-22 16:52:08

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 07/10] iommu/vt-d: Keep swiotlb on if bounce page is necessary

On Sun, Apr 21, 2019 at 09:17:16AM +0800, Lu Baolu wrote:
> +static inline bool platform_has_untrusted_device(void)
> {
> + bool has_untrusted_device = false;
> struct pci_dev *pdev = NULL;
>
> for_each_pci_dev(pdev) {
> if (pdev->untrusted) {
> + has_untrusted_device = true;
> break;
> }
> }
>
> + return has_untrusted_device;

This shouldn't really be in the intel-iommu driver, should it?
This probably should be something like pci_has_untrusted_devices
and be moved to the PCI code.

2019-04-22 16:52:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 08/10] iommu/vt-d: Check whether device requires bounce buffer

On Sun, Apr 21, 2019 at 09:17:17AM +0800, Lu Baolu wrote:
> +static inline bool device_needs_bounce(struct device *dev)
> +{
> + struct pci_dev *pdev = NULL;
> +
> + if (intel_no_bounce)
> + return false;
> +
> + if (dev_is_pci(dev))
> + pdev = to_pci_dev(dev);
> +
> + return pdev ? pdev->untrusted : false;
> +}

Again, this and the option should not be in a specific iommu driver.

2019-04-23 04:12:27

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v3 07/10] iommu/vt-d: Keep swiotlb on if bounce page is necessary

Hi,

On 4/23/19 12:47 AM, Christoph Hellwig wrote:
> On Sun, Apr 21, 2019 at 09:17:16AM +0800, Lu Baolu wrote:
>> +static inline bool platform_has_untrusted_device(void)
>> {
>> + bool has_untrusted_device = false;
>> struct pci_dev *pdev = NULL;
>>
>> for_each_pci_dev(pdev) {
>> if (pdev->untrusted) {
>> + has_untrusted_device = true;
>> break;
>> }
>> }
>>
>> + return has_untrusted_device;
>
> This shouldn't really be in the intel-iommu driver, should it?
> This probably should be something like pci_has_untrusted_devices
> and be moved to the PCI code.
>

Fair enough.

Best regards,
Lu Baolu

2019-04-23 04:13:02

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

Hi Christoph,

Thanks for reviewing my patches.

On 4/23/19 12:45 AM, Christoph Hellwig wrote:
> I looked over your swiotlb modification and I don't think we really need
> them. The only thing we really need is to split the size parameter to
> swiotlb_tbl_map_single and swiotlb_tbl_unmap_single into an alloc_size
> and a mapping_size paramter, where the latter one is rounded up to the
> iommu page size. Below is an untested patch on top of your series to
> show what I mean.

Good suggestion. The only problem as far as I can see is:

442 phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
443 dma_addr_t tbl_dma_addr, phys_addr_t orig_addr,
444 size_t mapping_size, size_t alloc_size,
445 enum dma_data_direction dir, unsigned long attrs)
446 {
447 unsigned long flags;
448 phys_addr_t tlb_addr;
449 unsigned int nslots, stride, index, wrap;
450 int i;
451 unsigned long mask;
452 unsigned long offset_slots;
453 unsigned long max_slots;

[--->o<----]

545 found:
546 io_tlb_used += nslots;
547 spin_unlock_irqrestore(&io_tlb_lock, flags);
548
549 /*
550 * Save away the mapping from the original address to the
DMA address.
551 * This is needed when we sync the memory. Then we sync the
buffer if
552 * needed.
553 */
554 for (i = 0; i < nslots; i++)
555 io_tlb_orig_addr[index+i] = orig_addr + (i <<
IO_TLB_SHIFT);

Could the tlb orig address set to PAGE_ALIGN_DOWN(orig_addr)? We
couldn't assume the bounce buffer just starts from the beginning of the
slot. Or anything I missed?


556 if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
557 (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
558 swiotlb_bounce(orig_addr, tlb_addr, mapping_size,
559 DMA_TO_DEVICE);

Same here. We should sync from the place where the bounce buffer starts
from.


560
561 return tlb_addr;
562 }


> That being said - both the current series and the one
> with my patch will still leak the content of the swiotlb buffer
> allocated but not used to the untrusted external device. Is that
> acceptable? If not we need to clear that part, at which point you don't
> need swiotlb changes.

Good catch. I think the allocated buffer should be cleared, otherwise,
the untrusted device still has a chance to access the data belonging to
other swiotlb consumers.

> Another implication is that for untrusted devices
> the size of the dma coherent allocations needs to be rounded up to the
> iommu page size (if that can ever be larger than the host page size).

Agreed.

Intel iommu driver has already aligned the dma coherent allocation size
to PAGE_SIZE. And alloc_coherent is equal to alloc plus map. Hence, it
eventually goes into bounce_map().

Best regards,
Lu Baolu

>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 8c4a078fb041..eb5c32ad4443 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2151,10 +2151,13 @@ static int bounce_map(struct device *dev, struct iommu_domain *domain,
> void *data)
> {
> const struct iommu_ops *ops = domain->ops;
> + unsigned long page_size = domain_minimal_pgsize(domain);
> phys_addr_t tlb_addr;
> int prot = 0;
> int ret;
>
> + if (WARN_ON_ONCE(size > page_size))
> + return -EINVAL;
> if (unlikely(!ops->map || domain->pgsize_bitmap == 0UL))
> return -ENODEV;
>
> @@ -2164,16 +2167,16 @@ static int bounce_map(struct device *dev, struct iommu_domain *domain,
> if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
> prot |= IOMMU_WRITE;
>
> - tlb_addr = phys_to_dma(dev, paddr);
> - if (!swiotlb_map(dev, &paddr, &tlb_addr, size,
> - dir, attrs | DMA_ATTR_BOUNCE_PAGE))
> + tlb_addr = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
> + paddr, size, page_size, dir, attrs);
> + if (tlb_addr == DMA_MAPPING_ERROR)
> return -ENOMEM;
>
> ret = ops->map(domain, addr, tlb_addr, size, prot);
> - if (ret)
> - swiotlb_tbl_unmap_single(dev, tlb_addr, size,
> - dir, attrs | DMA_ATTR_BOUNCE_PAGE);
> -
> + if (ret) {
> + swiotlb_tbl_unmap_single(dev, tlb_addr, size, page_size,
> + dir, attrs);
> + }
> return ret;
> }
>
> @@ -2194,11 +2197,12 @@ static int bounce_unmap(struct device *dev, struct iommu_domain *domain,
>
> if (unlikely(!ops->unmap))
> return -ENODEV;
> - ops->unmap(domain, ALIGN_DOWN(addr, page_size), page_size);
> + ops->unmap(domain, addr, page_size);
>
> - if (tlb_addr)
> - swiotlb_tbl_unmap_single(dev, tlb_addr, size,
> - dir, attrs | DMA_ATTR_BOUNCE_PAGE);
> + if (tlb_addr) {
> + swiotlb_tbl_unmap_single(dev, tlb_addr, size, page_size,
> + dir, attrs);
> + }
>
> return 0;
> }
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 877baf2a94f4..3b6ce643bffa 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -404,7 +404,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> */
> trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force);
>
> - map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir,
> + map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, size, dir,
> attrs);
> if (map == DMA_MAPPING_ERROR)
> return DMA_MAPPING_ERROR;
> @@ -420,7 +420,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> return dev_addr;
>
> attrs |= DMA_ATTR_SKIP_CPU_SYNC;
> - swiotlb_tbl_unmap_single(dev, map, size, dir, attrs);
> + swiotlb_tbl_unmap_single(dev, map, size, size, dir, attrs);
>
> return DMA_MAPPING_ERROR;
> }
> @@ -445,7 +445,7 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr,
>
> /* NOTE: We use dev_addr here, not paddr! */
> if (is_xen_swiotlb_buffer(dev_addr))
> - swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
> + swiotlb_tbl_unmap_single(hwdev, paddr, size, size, dir, attrs);
> }
>
> static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
> @@ -556,6 +556,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> start_dma_addr,
> sg_phys(sg),
> sg->length,
> + sg->length,
> dir, attrs);
> if (map == DMA_MAPPING_ERROR) {
> dev_warn(hwdev, "swiotlb buffer is full\n");
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 26e506e5b04c..75e60be91e5f 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -70,12 +70,6 @@
> */
> #define DMA_ATTR_PRIVILEGED (1UL << 9)
>
> -/*
> - * DMA_ATTR_BOUNCE_PAGE: used by the IOMMU sub-system to indicate that
> - * the buffer is used as a bounce page in the DMA remapping page table.
> - */
> -#define DMA_ATTR_BOUNCE_PAGE (1UL << 10)
> -
> /*
> * A dma_addr_t can hold any valid DMA or bus address for the platform.
> * It can be given to a device to use as a DMA source or target. A CPU cannot
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 361f62bb4a8e..e1c738eb5baf 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -44,16 +44,13 @@ enum dma_sync_target {
> SYNC_FOR_DEVICE = 1,
> };
>
> -extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
> - dma_addr_t tbl_dma_addr,
> - phys_addr_t phys, size_t size,
> - enum dma_data_direction dir,
> - unsigned long attrs);
> -
> -extern void swiotlb_tbl_unmap_single(struct device *hwdev,
> - phys_addr_t tlb_addr,
> - size_t size, enum dma_data_direction dir,
> - unsigned long attrs);
> +phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
> + dma_addr_t tbl_dma_addr, phys_addr_t phys, size_t mapping_size,
> + size_t alloc_size, enum dma_data_direction dir,
> + unsigned long attrs);
> +void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
> + size_t mapping_size, size_t alloc_size,
> + enum dma_data_direction dir, unsigned long attrs);
>
> extern void swiotlb_tbl_sync_single(struct device *hwdev,
> phys_addr_t tlb_addr,
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index fcdb23e8d2fc..0edc0bf80207 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -290,7 +290,7 @@ void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
> dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>
> if (unlikely(is_swiotlb_buffer(phys)))
> - swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> + swiotlb_tbl_unmap_single(dev, phys, size, size, dir, attrs);
> }
> EXPORT_SYMBOL(dma_direct_unmap_page);
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 96b87a11dee1..feec87196cc8 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -34,7 +34,6 @@
> #include <linux/scatterlist.h>
> #include <linux/mem_encrypt.h>
> #include <linux/set_memory.h>
> -#include <linux/iommu.h>
> #ifdef CONFIG_DEBUG_FS
> #include <linux/debugfs.h>
> #endif
> @@ -440,9 +439,10 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
> }
> }
>
> -static phys_addr_t
> -swiotlb_tbl_alloc_tlb(struct device *hwdev, dma_addr_t tbl_dma_addr,
> - phys_addr_t orig_addr, size_t size)
> +phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
> + dma_addr_t tbl_dma_addr, phys_addr_t orig_addr,
> + size_t mapping_size, size_t alloc_size,
> + enum dma_data_direction dir, unsigned long attrs)
> {
> unsigned long flags;
> phys_addr_t tlb_addr;
> @@ -476,8 +476,8 @@ swiotlb_tbl_alloc_tlb(struct device *hwdev, dma_addr_t tbl_dma_addr,
> * For mappings greater than or equal to a page, we limit the stride
> * (and hence alignment) to a page size.
> */
> - nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> - if (size >= PAGE_SIZE)
> + nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> + if (alloc_size >= PAGE_SIZE)
> stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
> else
> stride = 1;
> @@ -538,6 +538,9 @@ swiotlb_tbl_alloc_tlb(struct device *hwdev, dma_addr_t tbl_dma_addr,
>
> not_found:
> spin_unlock_irqrestore(&io_tlb_lock, flags);
> + if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
> + dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n",
> + alloc_size);
> return DMA_MAPPING_ERROR;
> found:
> io_tlb_used += nslots;
> @@ -550,21 +553,34 @@ swiotlb_tbl_alloc_tlb(struct device *hwdev, dma_addr_t tbl_dma_addr,
> */
> for (i = 0; i < nslots; i++)
> io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> + (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
> + swiotlb_bounce(orig_addr, tlb_addr, mapping_size,
> + DMA_TO_DEVICE);
>
> return tlb_addr;
> }
>
> -static void
> -swiotlb_tbl_free_tlb(struct device *hwdev, phys_addr_t tlb_addr, size_t size)
> +/*
> + * tlb_addr is the physical address of the bounce buffer to unmap.
> + */
> +void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
> + size_t mapping_size, size_t alloc_size,
> + enum dma_data_direction dir, unsigned long attrs)
> {
> unsigned long flags;
> - int i, count, nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> + int i, count, nslots;
> int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
> + phys_addr_t orig_addr = io_tlb_orig_addr[index];
>
> - /* Return directly if the tlb address is out of slot pool. */
> - if (tlb_addr < io_tlb_start || tlb_addr + size > io_tlb_end) {
> - dev_warn(hwdev, "invalid tlb address\n");
> - return;
> + /*
> + * First, sync the memory before unmapping the entry
> + */
> + if (orig_addr != INVALID_PHYS_ADDR &&
> + !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> + (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)) {
> + swiotlb_bounce(orig_addr, tlb_addr, mapping_size,
> + DMA_FROM_DEVICE);
> }
>
> /*
> @@ -573,6 +589,7 @@ swiotlb_tbl_free_tlb(struct device *hwdev, phys_addr_t tlb_addr, size_t size)
> * While returning the entries to the free list, we merge the entries
> * with slots below and above the pool being returned.
> */
> + nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> spin_lock_irqsave(&io_tlb_lock, flags);
> {
> count = ((index + nslots) < ALIGN(index + 1, IO_TLB_SEGSIZE) ?
> @@ -597,88 +614,6 @@ swiotlb_tbl_free_tlb(struct device *hwdev, phys_addr_t tlb_addr, size_t size)
> spin_unlock_irqrestore(&io_tlb_lock, flags);
> }
>
> -static unsigned long
> -get_iommu_pgsize(struct device *dev, phys_addr_t phys, size_t size)
> -{
> - struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> -
> - return domain_minimal_pgsize(domain);
> -}
> -
> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
> - dma_addr_t tbl_dma_addr,
> - phys_addr_t orig_addr, size_t size,
> - enum dma_data_direction dir,
> - unsigned long attrs)
> -{
> - phys_addr_t tlb_addr;
> - unsigned long offset = 0;
> -
> - if (attrs & DMA_ATTR_BOUNCE_PAGE) {
> - unsigned long pgsize = get_iommu_pgsize(hwdev, orig_addr, size);
> -
> - offset = orig_addr & (pgsize - 1);
> -
> - /* Don't allow the buffer to cross page boundary. */
> - if (offset + size > pgsize)
> - return DMA_MAPPING_ERROR;
> -
> - tlb_addr = swiotlb_tbl_alloc_tlb(hwdev,
> - __phys_to_dma(hwdev, io_tlb_start),
> - ALIGN_DOWN(orig_addr, pgsize), pgsize);
> - } else {
> - tlb_addr = swiotlb_tbl_alloc_tlb(hwdev,
> - tbl_dma_addr, orig_addr, size);
> - }
> -
> - if (tlb_addr == DMA_MAPPING_ERROR) {
> - if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
> - dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n",
> - size);
> - return DMA_MAPPING_ERROR;
> - }
> -
> - tlb_addr += offset;
> - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> - (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
> - swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);
> -
> - return tlb_addr;
> -}
> -
> -/*
> - * tlb_addr is the physical address of the bounce buffer to unmap.
> - */
> -void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
> - size_t size, enum dma_data_direction dir,
> - unsigned long attrs)
> -{
> - int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
> - phys_addr_t orig_addr = io_tlb_orig_addr[index];
> - unsigned long offset = 0;
> -
> - if (attrs & DMA_ATTR_BOUNCE_PAGE)
> - offset = tlb_addr & ((1 << IO_TLB_SHIFT) - 1);
> -
> - /*
> - * First, sync the memory before unmapping the entry
> - */
> - if (orig_addr != INVALID_PHYS_ADDR &&
> - !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> - ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
> - swiotlb_bounce(orig_addr + offset,
> - tlb_addr, size, DMA_FROM_DEVICE);
> -
> - if (attrs & DMA_ATTR_BOUNCE_PAGE) {
> - unsigned long pgsize = get_iommu_pgsize(hwdev, tlb_addr, size);
> -
> - swiotlb_tbl_free_tlb(hwdev,
> - ALIGN_DOWN(tlb_addr, pgsize), pgsize);
> - } else {
> - swiotlb_tbl_free_tlb(hwdev, tlb_addr, size);
> - }
> -}
> -
> void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
> size_t size, enum dma_data_direction dir,
> enum dma_sync_target target)
> @@ -727,14 +662,14 @@ bool swiotlb_map(struct device *dev, phys_addr_t *phys, dma_addr_t *dma_addr,
>
> /* Oh well, have to allocate and map a bounce buffer. */
> *phys = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
> - *phys, size, dir, attrs);
> + *phys, size, size, dir, attrs);
> if (*phys == DMA_MAPPING_ERROR)
> return false;
>
> /* Ensure that the address returned is DMA'ble */
> *dma_addr = __phys_to_dma(dev, *phys);
> if (unlikely(!dma_capable(dev, *dma_addr, size))) {
> - swiotlb_tbl_unmap_single(dev, *phys, size, dir,
> + swiotlb_tbl_unmap_single(dev, *phys, size, size, dir,
> attrs | DMA_ATTR_SKIP_CPU_SYNC);
> return false;
> }
>

2019-04-23 04:14:01

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v3 08/10] iommu/vt-d: Check whether device requires bounce buffer

Hi,

On 4/23/19 12:47 AM, Christoph Hellwig wrote:
> On Sun, Apr 21, 2019 at 09:17:17AM +0800, Lu Baolu wrote:
>> +static inline bool device_needs_bounce(struct device *dev)
>> +{
>> + struct pci_dev *pdev = NULL;
>> +
>> + if (intel_no_bounce)
>> + return false;
>> +
>> + if (dev_is_pci(dev))
>> + pdev = to_pci_dev(dev);
>> +
>> + return pdev ? pdev->untrusted : false;
>> +}
>
> Again, this and the option should not be in a specific iommu driver.
>

The option of whether bounce is ignored should be in the specific iommu
driver. Or any other thought?

Best regards,
Lu Baolu

2019-04-23 06:10:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 08/10] iommu/vt-d: Check whether device requires bounce buffer

>> Again, this and the option should not be in a specific iommu driver.
>>
>
> The option of whether bounce is ignored should be in the specific iommu
> driver.

Why? As a user I could not care less which IOMMU driver my particular
system uses.

2019-04-23 06:13:59

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

On Tue, Apr 23, 2019 at 09:58:19AM +0800, Lu Baolu wrote:
> 554 for (i = 0; i < nslots; i++)
> 555 io_tlb_orig_addr[index+i] = orig_addr + (i <<
> IO_TLB_SHIFT);
>
> Could the tlb orig address set to PAGE_ALIGN_DOWN(orig_addr)? We
> couldn't assume the bounce buffer just starts from the beginning of the
> slot. Or anything I missed?

I don't see why we need to align the orig_addr. We only use
io_tlb_orig_addr to find the address(es) for the swiotlb_bounce calls,
and I don't see a good reason why we'd need to align those.

2019-04-23 07:40:13

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

Hi Christoph,

On 4/23/19 2:12 PM, Christoph Hellwig wrote:
> On Tue, Apr 23, 2019 at 09:58:19AM +0800, Lu Baolu wrote:
>> 554 for (i = 0; i < nslots; i++)
>> 555 io_tlb_orig_addr[index+i] = orig_addr + (i <<
>> IO_TLB_SHIFT);
>>
>> Could the tlb orig address set to PAGE_ALIGN_DOWN(orig_addr)? We
>> couldn't assume the bounce buffer just starts from the beginning of the
>> slot. Or anything I missed?
>
> I don't see why we need to align the orig_addr. We only use
> io_tlb_orig_addr to find the address(es) for the swiotlb_bounce calls,
> and I don't see a good reason why we'd need to align those.
>

Let me show you an example. Normally, if IOMMU is on, the device DMAs
with an iova. IOMMU takes the responsibility to translate the iova to
the physical address in paging mode.

Physical
IOVA Buffer
.---------. .---------.
| IOMMU | | IOMMU |
| PAGE | | PAGE |
.---------. -------> .---------.
| Buffer | | Buffer |
'---------' '---------'
| | | |
| | | |
'---------' '---------'
.-------.
| IOMMU |
'-------'

When we add the bounce buffer between IOVA and physical buffer, the
bounced buffer must starts from the same offset in a page, otherwise,
IOMMU can't work here.


Bouce Physical
IOVA Buffer Buffer
.---------. .---------. .---------.
| | .-----> | Buffer | <---. | |
| | | '---------' | | |
.---------. | | | | .---------.
| Buffer | NO | | YES | Buffer |
'---------' | | '---------'
| | | | | |
| | | | | |
'---------' '---------' '---------'
.-------. .---------.
| IOMMU | | swiotlb |
'-------' '---------'

A workable buffer location looks like below.


Bouce Physical
IOVA Buffer Buffer
.---------. .---------. .---------.
| | | | | |
| | | | | |
.---------. ------->.---------.<----- .---------.
| Buffer | YES | Buffer | YES | Buffer |
'---------' '---------' '---------'
| | | | | |
| | | | | |
'---------' '---------' '---------'
.-------. .---------.
| IOMMU | | swiotlb |
'-------' '---------'

Best regards,
Lu Baolu

2019-04-23 07:43:52

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v3 08/10] iommu/vt-d: Check whether device requires bounce buffer

Hi,

On 4/23/19 2:08 PM, Christoph Hellwig wrote:
>>> Again, this and the option should not be in a specific iommu driver.
>>>
>>
>> The option of whether bounce is ignored should be in the specific iommu
>> driver.
>
> Why? As a user I could not care less which IOMMU driver my particular
> system uses.
>

Looks reasonable to me. Let's listen to more opinions.

Best regards,
Lu Baolu

2019-04-24 20:35:16

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

On Tue, Apr 23, 2019 at 03:32:16PM +0800, Lu Baolu wrote:
> When we add the bounce buffer between IOVA and physical buffer, the
> bounced buffer must starts from the same offset in a page, otherwise,
> IOMMU can't work here.

Why? Even with the odd hardware descriptors typical in Intel land that
only allow offsets in the first page, not the following ones, having
a zero offset where we previously had one should be fine.

2019-04-25 05:46:44

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v3 08/10] iommu/vt-d: Check whether device requires bounce buffer

On Tue, Apr 23, 2019 at 03:35:51PM +0800, Lu Baolu wrote:
> Hi,
>
> On 4/23/19 2:08 PM, Christoph Hellwig wrote:
> > > > Again, this and the option should not be in a specific iommu driver.
> > > >
> > >
> > > The option of whether bounce is ignored should be in the specific iommu
> > > driver.
> >
> > Why? As a user I could not care less which IOMMU driver my particular
> > system uses.
> >
>
> Looks reasonable to me. Let's listen to more opinions.

I agree with Christoph here.
>
> Best regards,
> Lu Baolu

2019-04-25 08:01:32

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

Hi,

On 4/24/19 10:45 PM, Christoph Hellwig wrote:
> On Tue, Apr 23, 2019 at 03:32:16PM +0800, Lu Baolu wrote:
>> When we add the bounce buffer between IOVA and physical buffer, the
>> bounced buffer must starts from the same offset in a page, otherwise,
>> IOMMU can't work here.
>
> Why? Even with the odd hardware descriptors typical in Intel land that
> only allow offsets in the first page, not the following ones, having
> a zero offset where we previously had one should be fine.
>

This is not VT-d specific. It's just how generic IOMMU works.

Normally, IOMMU works in paging mode. So if a driver issues DMA with
IOVA 0xAAAA0123, IOMMU can remap it with a physical address 0xBBBB0123.
But we should never expect IOMMU to remap 0xAAAA0123 with physical
address of 0xBBBB0000. That's the reason why I said that IOMMU will not
work there.

swiotlb System
IOVA bounce page Memory
.---------. .---------. .---------.
| | | | | |
| | | | | |
buffer_start .---------. .---------.buffer_start .---------.
| |----->| | | |
| | | |************>| |
| | | | swiotlb | |
IOMMU Page '---------' '---------' mapping '---------'
Boundary | | | |
| | | |
| | | |
| |----------------------------->| |
| | IOMMU mapping | |
| | | |
IOMMU Page .---------. .---------.
Boundary | | | |
| | | |
| |----------------------------->| |
| | IOMMU mapping | |
| | | |
| | | |
IOMMU Page .---------. .---------. .---------.
Boundary | | | | | |
| | | |************>| |
| |----->| | swiotlb | |
buffer_end '---------' '---------' mapping '---------'
| | | | | |
| | | | | |
'---------' '---------' '---------'


This is the whole view of iommu bounce page. I expect the buffer_start
returned by swiotlb_tbl_map_single() starts from the same page_offset as
the buffer_start in IOVA.

Best regards,
Lu Baolu

2019-04-26 15:07:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

On Thu, Apr 25, 2019 at 10:07:19AM +0800, Lu Baolu wrote:
> This is not VT-d specific. It's just how generic IOMMU works.
>
> Normally, IOMMU works in paging mode. So if a driver issues DMA with
> IOVA 0xAAAA0123, IOMMU can remap it with a physical address 0xBBBB0123.
> But we should never expect IOMMU to remap 0xAAAA0123 with physical
> address of 0xBBBB0000. That's the reason why I said that IOMMU will not
> work there.

Well, with the iommu it doesn't happen. With swiotlb it obviosuly
can happen, so drivers are fine with it. Why would that suddenly
become an issue when swiotlb is called from the iommu code?

2019-04-29 05:17:43

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

Hi Christoph,

On 4/26/19 11:04 PM, Christoph Hellwig wrote:
> On Thu, Apr 25, 2019 at 10:07:19AM +0800, Lu Baolu wrote:
>> This is not VT-d specific. It's just how generic IOMMU works.
>>
>> Normally, IOMMU works in paging mode. So if a driver issues DMA with
>> IOVA 0xAAAA0123, IOMMU can remap it with a physical address 0xBBBB0123.
>> But we should never expect IOMMU to remap 0xAAAA0123 with physical
>> address of 0xBBBB0000. That's the reason why I said that IOMMU will not
>> work there.
>
> Well, with the iommu it doesn't happen. With swiotlb it obviosuly
> can happen, so drivers are fine with it. Why would that suddenly
> become an issue when swiotlb is called from the iommu code?
>

I would say IOMMU is DMA remapping, not DMA engine. :-)

Best regards,
Lu Baolu

2019-04-29 10:58:40

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] iommu: Add helper to get minimal page size of domain

On 21/04/2019 02:17, Lu Baolu wrote:
> This makes it possible for other modules to know the minimal
> page size supported by a domain without the knowledge of the
> structure details.
>
> Signed-off-by: Lu Baolu <[email protected]>
> ---
> include/linux/iommu.h | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index a5007d035218..46679ef19b7e 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -377,6 +377,14 @@ static inline void iommu_tlb_sync(struct iommu_domain *domain)
> domain->ops->iotlb_sync(domain);
> }
>
> +static inline unsigned long domain_minimal_pgsize(struct iommu_domain *domain)
> +{
> + if (domain && domain->pgsize_bitmap)
> + return 1 << __ffs(domain->pgsize_bitmap);

Nit: this would probably be more efficient on most architectures as:

if (domain)
return domain->pgsize_bitmap & -domain->pgsize_bitmap;


I'd also suggest s/minimal/min/ in the name, just to stop it getting too
long. Otherwise, though, I like the idea, and there's at least one other
place (in iommu-dma) that can make use of it straight away.

Robin.

> +
> + return 0;
> +}
> +
> /* PCI device grouping function */
> extern struct iommu_group *pci_device_group(struct device *dev);
> /* Generic device grouping function */
> @@ -704,6 +712,11 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
> return NULL;
> }
>
> +static inline unsigned long domain_minimal_pgsize(struct iommu_domain *domain)
> +{
> + return 0;
> +}
> +
> #endif /* CONFIG_IOMMU_API */
>
> #ifdef CONFIG_IOMMU_DEBUGFS
>

2019-04-29 11:09:30

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

On 29/04/2019 06:10, Lu Baolu wrote:
> Hi Christoph,
>
> On 4/26/19 11:04 PM, Christoph Hellwig wrote:
>> On Thu, Apr 25, 2019 at 10:07:19AM +0800, Lu Baolu wrote:
>>> This is not VT-d specific. It's just how generic IOMMU works.
>>>
>>> Normally, IOMMU works in paging mode. So if a driver issues DMA with
>>> IOVA  0xAAAA0123, IOMMU can remap it with a physical address 0xBBBB0123.
>>> But we should never expect IOMMU to remap 0xAAAA0123 with physical
>>> address of 0xBBBB0000. That's the reason why I said that IOMMU will not
>>> work there.
>>
>> Well, with the iommu it doesn't happen.  With swiotlb it obviosuly
>> can happen, so drivers are fine with it.  Why would that suddenly
>> become an issue when swiotlb is called from the iommu code?
>>
>
> I would say IOMMU is DMA remapping, not DMA engine. :-)

I'm not sure I really follow the issue here - if we're copying the
buffer to the bounce page(s) there's no conceptual difference from
copying it to SWIOTLB slot(s), so there should be no need to worry about
the original in-page offset.

From the reply up-thread I guess you're trying to include an
optimisation to only copy the head and tail of the buffer if it spans
multiple pages, and directly map the ones in the middle, but AFAICS
that's going to tie you to also using strict mode for TLB maintenance,
which may not be a win overall depending on the balance between
invalidation bandwidth vs. memcpy bandwidth. At least if we use standard
SWIOTLB logic to always copy the whole thing, we should be able to
release the bounce pages via the flush queue to allow 'safe' lazy unmaps.

Either way I think it would be worth just implementing the
straightforward version first, then coming back to consider
optimisations later.

Robin.

2019-04-29 11:45:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

On Mon, Apr 29, 2019 at 12:06:52PM +0100, Robin Murphy wrote:
>
> From the reply up-thread I guess you're trying to include an optimisation
> to only copy the head and tail of the buffer if it spans multiple pages,
> and directly map the ones in the middle, but AFAICS that's going to tie you
> to also using strict mode for TLB maintenance, which may not be a win
> overall depending on the balance between invalidation bandwidth vs. memcpy
> bandwidth. At least if we use standard SWIOTLB logic to always copy the
> whole thing, we should be able to release the bounce pages via the flush
> queue to allow 'safe' lazy unmaps.

Oh. The head and tail optimization is what I missed. Yes, for that
we'd need the offset.

> Either way I think it would be worth just implementing the straightforward
> version first, then coming back to consider optimisations later.

Agreed, let's start simple. Especially as large DMA mappings or
allocations should usually be properly aligned anyway, and if not we
should fix that for multiple reasons.

2019-04-30 00:48:08

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] iommu: Add helper to get minimal page size of domain

Hi Robin,

On 4/29/19 6:55 PM, Robin Murphy wrote:
> On 21/04/2019 02:17, Lu Baolu wrote:
>> This makes it possible for other modules to know the minimal
>> page size supported by a domain without the knowledge of the
>> structure details.
>>
>> Signed-off-by: Lu Baolu <[email protected]>
>> ---
>>   include/linux/iommu.h | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index a5007d035218..46679ef19b7e 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -377,6 +377,14 @@ static inline void iommu_tlb_sync(struct
>> iommu_domain *domain)
>>           domain->ops->iotlb_sync(domain);
>>   }
>> +static inline unsigned long domain_minimal_pgsize(struct iommu_domain
>> *domain)
>> +{
>> +    if (domain && domain->pgsize_bitmap)
>> +        return 1 << __ffs(domain->pgsize_bitmap);
>
> Nit: this would probably be more efficient on most architectures as:
>
>     if (domain)
>         return domain->pgsize_bitmap & -domain->pgsize_bitmap;
>

It looks reasonable to me.

>
> I'd also suggest s/minimal/min/ in the name, just to stop it getting too
> long. Otherwise, though, I like the idea, and there's at least one other
> place (in iommu-dma) that can make use of it straight away.

Okay, I will change the name to domain_min_pgsize().

>
> Robin.
>

Best regards,
Lu Baolu

>> +
>> +    return 0;
>> +}
>> +
>>   /* PCI device grouping function */
>>   extern struct iommu_group *pci_device_group(struct device *dev);
>>   /* Generic device grouping function */
>> @@ -704,6 +712,11 @@ const struct iommu_ops
>> *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
>>       return NULL;
>>   }
>> +static inline unsigned long domain_minimal_pgsize(struct iommu_domain
>> *domain)
>> +{
>> +    return 0;
>> +}
>> +
>>   #endif /* CONFIG_IOMMU_API */
>>   #ifdef CONFIG_IOMMU_DEBUGFS
>>
>

2019-04-30 02:09:38

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

Hi Robin,

On 4/29/19 7:06 PM, Robin Murphy wrote:
> On 29/04/2019 06:10, Lu Baolu wrote:
>> Hi Christoph,
>>
>> On 4/26/19 11:04 PM, Christoph Hellwig wrote:
>>> On Thu, Apr 25, 2019 at 10:07:19AM +0800, Lu Baolu wrote:
>>>> This is not VT-d specific. It's just how generic IOMMU works.
>>>>
>>>> Normally, IOMMU works in paging mode. So if a driver issues DMA with
>>>> IOVA  0xAAAA0123, IOMMU can remap it with a physical address
>>>> 0xBBBB0123.
>>>> But we should never expect IOMMU to remap 0xAAAA0123 with physical
>>>> address of 0xBBBB0000. That's the reason why I said that IOMMU will not
>>>> work there.
>>>
>>> Well, with the iommu it doesn't happen.  With swiotlb it obviosuly
>>> can happen, so drivers are fine with it.  Why would that suddenly
>>> become an issue when swiotlb is called from the iommu code?
>>>
>>
>> I would say IOMMU is DMA remapping, not DMA engine. :-)
>
> I'm not sure I really follow the issue here - if we're copying the
> buffer to the bounce page(s) there's no conceptual difference from
> copying it to SWIOTLB slot(s), so there should be no need to worry about
> the original in-page offset.
>
> From the reply up-thread I guess you're trying to include an
> optimisation to only copy the head and tail of the buffer if it spans
> multiple pages, and directly map the ones in the middle, but AFAICS
> that's going to tie you to also using strict mode for TLB maintenance,
> which may not be a win overall depending on the balance between
> invalidation bandwidth vs. memcpy bandwidth. At least if we use standard
> SWIOTLB logic to always copy the whole thing, we should be able to
> release the bounce pages via the flush queue to allow 'safe' lazy unmaps.
>

With respect, even we use the standard SWIOTLB logic, we need to use
the strict mode for TLB maintenance.

Say, some swiotbl slots are used by untrusted device for bounce page
purpose. When the device driver unmaps the IOVA, the slots are freed but
the mapping is still cached in IOTLB, hence the untrusted device is
still able to access the slots. Then the slots are allocated to other
devices. This makes it possible for the untrusted device to access
the data buffer of other devices.

Best regards,
Lu Baolu

> Either way I think it would be worth just implementing the
> straightforward version first, then coming back to consider
> optimisations later.



2019-04-30 09:55:07

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

On 30/04/2019 03:02, Lu Baolu wrote:
> Hi Robin,
>
> On 4/29/19 7:06 PM, Robin Murphy wrote:
>> On 29/04/2019 06:10, Lu Baolu wrote:
>>> Hi Christoph,
>>>
>>> On 4/26/19 11:04 PM, Christoph Hellwig wrote:
>>>> On Thu, Apr 25, 2019 at 10:07:19AM +0800, Lu Baolu wrote:
>>>>> This is not VT-d specific. It's just how generic IOMMU works.
>>>>>
>>>>> Normally, IOMMU works in paging mode. So if a driver issues DMA with
>>>>> IOVA  0xAAAA0123, IOMMU can remap it with a physical address
>>>>> 0xBBBB0123.
>>>>> But we should never expect IOMMU to remap 0xAAAA0123 with physical
>>>>> address of 0xBBBB0000. That's the reason why I said that IOMMU will
>>>>> not
>>>>> work there.
>>>>
>>>> Well, with the iommu it doesn't happen.  With swiotlb it obviosuly
>>>> can happen, so drivers are fine with it.  Why would that suddenly
>>>> become an issue when swiotlb is called from the iommu code?
>>>>
>>>
>>> I would say IOMMU is DMA remapping, not DMA engine. :-)
>>
>> I'm not sure I really follow the issue here - if we're copying the
>> buffer to the bounce page(s) there's no conceptual difference from
>> copying it to SWIOTLB slot(s), so there should be no need to worry
>> about the original in-page offset.
>>
>>  From the reply up-thread I guess you're trying to include an
>> optimisation to only copy the head and tail of the buffer if it spans
>> multiple pages, and directly map the ones in the middle, but AFAICS
>> that's going to tie you to also using strict mode for TLB maintenance,
>> which may not be a win overall depending on the balance between
>> invalidation bandwidth vs. memcpy bandwidth. At least if we use
>> standard SWIOTLB logic to always copy the whole thing, we should be
>> able to release the bounce pages via the flush queue to allow 'safe'
>> lazy unmaps.
>>
>
> With respect, even we use the standard SWIOTLB logic, we need to use
> the strict mode for TLB maintenance.
>
> Say, some swiotbl slots are used by untrusted device for bounce page
> purpose. When the device driver unmaps the IOVA, the slots are freed but
> the mapping is still cached in IOTLB, hence the untrusted device is
> still able to access the slots. Then the slots are allocated to other
> devices. This makes it possible for the untrusted device to access
> the data buffer of other devices.

Sure, that's indeed how it would work right now - however since the
bounce pages will be freed and reused by the DMA API layer itself (at
the same level as the IOVAs) I see no technical reason why we couldn't
investigate deferred freeing as a future optimisation.

Robin.

2019-05-02 01:57:38

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

Hi Robin,

On 4/30/19 5:53 PM, Robin Murphy wrote:
> On 30/04/2019 03:02, Lu Baolu wrote:
>> Hi Robin,
>>
>> On 4/29/19 7:06 PM, Robin Murphy wrote:
>>> On 29/04/2019 06:10, Lu Baolu wrote:
>>>> Hi Christoph,
>>>>
>>>> On 4/26/19 11:04 PM, Christoph Hellwig wrote:
>>>>> On Thu, Apr 25, 2019 at 10:07:19AM +0800, Lu Baolu wrote:
>>>>>> This is not VT-d specific. It's just how generic IOMMU works.
>>>>>>
>>>>>> Normally, IOMMU works in paging mode. So if a driver issues DMA with
>>>>>> IOVA  0xAAAA0123, IOMMU can remap it with a physical address
>>>>>> 0xBBBB0123.
>>>>>> But we should never expect IOMMU to remap 0xAAAA0123 with physical
>>>>>> address of 0xBBBB0000. That's the reason why I said that IOMMU
>>>>>> will not
>>>>>> work there.
>>>>>
>>>>> Well, with the iommu it doesn't happen.  With swiotlb it obviosuly
>>>>> can happen, so drivers are fine with it.  Why would that suddenly
>>>>> become an issue when swiotlb is called from the iommu code?
>>>>>
>>>>
>>>> I would say IOMMU is DMA remapping, not DMA engine. :-)
>>>
>>> I'm not sure I really follow the issue here - if we're copying the
>>> buffer to the bounce page(s) there's no conceptual difference from
>>> copying it to SWIOTLB slot(s), so there should be no need to worry
>>> about the original in-page offset.
>>>
>>>  From the reply up-thread I guess you're trying to include an
>>> optimisation to only copy the head and tail of the buffer if it spans
>>> multiple pages, and directly map the ones in the middle, but AFAICS
>>> that's going to tie you to also using strict mode for TLB
>>> maintenance, which may not be a win overall depending on the balance
>>> between invalidation bandwidth vs. memcpy bandwidth. At least if we
>>> use standard SWIOTLB logic to always copy the whole thing, we should
>>> be able to release the bounce pages via the flush queue to allow
>>> 'safe' lazy unmaps.
>>>
>>
>> With respect, even we use the standard SWIOTLB logic, we need to use
>> the strict mode for TLB maintenance.
>>
>> Say, some swiotbl slots are used by untrusted device for bounce page
>> purpose. When the device driver unmaps the IOVA, the slots are freed but
>> the mapping is still cached in IOTLB, hence the untrusted device is
>> still able to access the slots. Then the slots are allocated to other
>> devices. This makes it possible for the untrusted device to access
>> the data buffer of other devices.
>
> Sure, that's indeed how it would work right now - however since the
> bounce pages will be freed and reused by the DMA API layer itself (at
> the same level as the IOVAs) I see no technical reason why we couldn't
> investigate deferred freeing as a future optimisation.

Yes, agreed.

Best regards,
Lu Baolu

2019-05-06 02:02:12

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

Hi Christoph,

On 4/29/19 7:44 PM, Christoph Hellwig wrote:
> On Mon, Apr 29, 2019 at 12:06:52PM +0100, Robin Murphy wrote:
>>
>> From the reply up-thread I guess you're trying to include an optimisation
>> to only copy the head and tail of the buffer if it spans multiple pages,
>> and directly map the ones in the middle, but AFAICS that's going to tie you
>> to also using strict mode for TLB maintenance, which may not be a win
>> overall depending on the balance between invalidation bandwidth vs. memcpy
>> bandwidth. At least if we use standard SWIOTLB logic to always copy the
>> whole thing, we should be able to release the bounce pages via the flush
>> queue to allow 'safe' lazy unmaps.
>
> Oh. The head and tail optimization is what I missed. Yes, for that
> we'd need the offset.

Yes.

>
>> Either way I think it would be worth just implementing the straightforward
>> version first, then coming back to consider optimisations later.
>
> Agreed, let's start simple. Especially as large DMA mappings or
> allocations should usually be properly aligned anyway, and if not we
> should fix that for multiple reasons.
>

Agreed. I will prepare the next version simply without the optimization,
so the offset is not required.

For your changes in swiotlb, will you formalize them in patches or want
me to do this?

Best regards,
Lu Baolu

2019-05-13 07:08:00

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

On Mon, May 06, 2019 at 09:54:30AM +0800, Lu Baolu wrote:
> Agreed. I will prepare the next version simply without the optimization, so
> the offset is not required.
>
> For your changes in swiotlb, will you formalize them in patches or want
> me to do this?

Please do it yourself given that you still need the offset and thus a
rework of the patches anyway.

2019-05-16 02:01:44

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

Hi,

On 5/13/19 3:05 PM, Christoph Hellwig wrote:
> On Mon, May 06, 2019 at 09:54:30AM +0800, Lu Baolu wrote:
>> Agreed. I will prepare the next version simply without the optimization, so
>> the offset is not required.
>>
>> For your changes in swiotlb, will you formalize them in patches or want
>> me to do this?
>
> Please do it yourself given that you still need the offset and thus a
> rework of the patches anyway.
>

Okay.

Best regards,
Lu Baolu