2021-01-06 03:43:31

by Claire Chang

[permalink] [raw]
Subject: [RFC PATCH v3 0/6] Restricted DMA

This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.

For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).

To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. in ATF on some ARM platforms).

[1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/

Claire Chang (6):
swiotlb: Add io_tlb_mem struct
swiotlb: Add restricted DMA pool
swiotlb: Use restricted DMA pool if available
swiotlb: Add restricted DMA alloc/free support.
dt-bindings: of: Add restricted DMA pool
of: Add plumbing for restricted DMA pool

.../reserved-memory/reserved-memory.txt | 24 +
arch/powerpc/platforms/pseries/svm.c | 4 +-
drivers/iommu/dma-iommu.c | 12 +-
drivers/of/address.c | 21 +
drivers/of/device.c | 4 +
drivers/of/of_private.h | 5 +
drivers/xen/swiotlb-xen.c | 4 +-
include/linux/device.h | 4 +
include/linux/swiotlb.h | 61 +-
kernel/dma/Kconfig | 1 +
kernel/dma/direct.c | 20 +-
kernel/dma/direct.h | 10 +-
kernel/dma/swiotlb.c | 576 +++++++++++-------
13 files changed, 514 insertions(+), 232 deletions(-)

--
2.29.2.729.g45daf8777d-goog

v3:
Using only one reserved memory region for both streaming DMA and memory
allocation.

v2:
Building on top of swiotlb.
https://lore.kernel.org/patchwork/cover/1280705/

v1:
Using dma_map_ops.
https://lore.kernel.org/patchwork/cover/1271660/


2021-01-06 03:44:38

by Claire Chang

[permalink] [raw]
Subject: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the device tree.

Signed-off-by: Claire Chang <[email protected]>
---
.../reserved-memory/reserved-memory.txt | 24 +++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..44975e2a1fd2 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,20 @@ compatible (optional) - standard definition
used as a shared pool of DMA buffers for a set of devices. It can
be used by an operating system to instantiate the necessary pool
management subsystem if necessary.
+ - restricted-dma-pool: This indicates a region of memory meant to be
+ used as a pool of restricted DMA buffers for a set of devices. The
+ memory region would be the only region accessible to those devices.
+ When using this, the no-map and reusable properties must not be set,
+ so the operating system can create a virtual mapping that will be used
+ for synchronization. The main purpose for restricted DMA is to
+ mitigate the lack of DMA access control on systems without an IOMMU,
+ which could result in the DMA accessing the system memory at
+ unexpected times and/or unexpected addresses, possibly leading to data
+ leakage or corruption. The feature on its own provides a basic level
+ of protection against the DMA overwriting buffer contents at
+ unexpected times. However, to protect against general data leakage and
+ system memory corruption, the system needs to provide way to restrict
+ the DMA to a predefined memory region.
- vendor specific string in the form <vendor>,[<device>-]<usage>
no-map (optional) - empty property
- Indicates the operating system must not create a virtual mapping
@@ -120,6 +134,11 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
compatible = "acme,multimedia-memory";
reg = <0x77000000 0x4000000>;
};
+
+ restricted_dma_mem_reserved: restricted_dma_mem_reserved {
+ compatible = "restricted-dma-pool";
+ reg = <0x50000000 0x400000>;
+ };
};

/* ... */
@@ -138,4 +157,9 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
memory-region = <&multimedia_reserved>;
/* ... */
};
+
+ pcie_device: pcie_device@0,0 {
+ memory-region = <&restricted_dma_mem_reserved>;
+ /* ... */
+ };
};
--
2.29.2.729.g45daf8777d-goog

2021-01-06 03:44:40

by Claire Chang

[permalink] [raw]
Subject: [RFC PATCH v3 1/6] swiotlb: Add io_tlb_mem struct

Added a new struct, io_tlb_mem, as the IO TLB memory pool descriptor and
moved relevant global variables into that struct.
This will be useful later to allow for restricted DMA pool.

Signed-off-by: Claire Chang <[email protected]>
---
arch/powerpc/platforms/pseries/svm.c | 4 +-
drivers/xen/swiotlb-xen.c | 4 +-
include/linux/swiotlb.h | 39 +++-
kernel/dma/swiotlb.c | 292 +++++++++++++--------------
4 files changed, 178 insertions(+), 161 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/svm.c b/arch/powerpc/platforms/pseries/svm.c
index 7b739cc7a8a9..2b767f1ca5fd 100644
--- a/arch/powerpc/platforms/pseries/svm.c
+++ b/arch/powerpc/platforms/pseries/svm.c
@@ -55,8 +55,8 @@ void __init svm_swiotlb_init(void)
if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, false))
return;

- if (io_tlb_start)
- memblock_free_early(io_tlb_start,
+ if (io_tlb_default_mem.start)
+ memblock_free_early(io_tlb_default_mem.start,
PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
panic("SVM: Cannot allocate SWIOTLB buffer");
}
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 2b385c1b4a99..4d17dff7ffd2 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -192,8 +192,8 @@ int __ref xen_swiotlb_init(int verbose, bool early)
/*
* IO TLB memory already allocated. Just use it.
*/
- if (io_tlb_start != 0) {
- xen_io_tlb_start = phys_to_virt(io_tlb_start);
+ if (io_tlb_default_mem.start != 0) {
+ xen_io_tlb_start = phys_to_virt(io_tlb_default_mem.start);
goto end;
}

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d9c9fc9ca5d2..dd8eb57cbb8f 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -70,11 +70,46 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,

#ifdef CONFIG_SWIOTLB
extern enum swiotlb_force swiotlb_force;
-extern phys_addr_t io_tlb_start, io_tlb_end;
+
+/**
+ * struct io_tlb_mem - IO TLB Memory Pool Descriptor
+ *
+ * @start: The start address of the swiotlb memory pool. Used to do a quick
+ * range check to see if the memory was in fact allocated by this
+ * API.
+ * @end: The end address of the swiotlb memory pool. Used to do a quick
+ * range check to see if the memory was in fact allocated by this
+ * API.
+ * @nslabs: The number of IO TLB blocks (in groups of 64) between @start and
+ * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @used: The number of used IO TLB block.
+ * @list: The free list describing the number of free entries available
+ * from each index.
+ * @index: The index to start searching in the next round.
+ * @orig_addr: The original address corresponding to a mapped entry for the
+ * sync operations.
+ * @lock: The lock to protect the above data structures in the map and
+ * unmap calls.
+ * @debugfs: The dentry to debugfs.
+ */
+struct io_tlb_mem {
+ phys_addr_t start;
+ phys_addr_t end;
+ unsigned long nslabs;
+ unsigned long used;
+ unsigned int *list;
+ unsigned int index;
+ phys_addr_t *orig_addr;
+ spinlock_t lock;
+ struct dentry *debugfs;
+};
+extern struct io_tlb_mem io_tlb_default_mem;

static inline bool is_swiotlb_buffer(phys_addr_t paddr)
{
- return paddr >= io_tlb_start && paddr < io_tlb_end;
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
+
+ return paddr >= mem->start && paddr < mem->end;
}

void __init swiotlb_exit(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e6100..e4368159f88a 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -61,33 +61,11 @@
* allocate a contiguous 1MB, we're probably in trouble anyway.
*/
#define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
+#define INVALID_PHYS_ADDR (~(phys_addr_t)0)

enum swiotlb_force swiotlb_force;

-/*
- * Used to do a quick range check in swiotlb_tbl_unmap_single and
- * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
- * API.
- */
-phys_addr_t io_tlb_start, io_tlb_end;
-
-/*
- * The number of IO TLB blocks (in groups of 64) between io_tlb_start and
- * io_tlb_end. This is command line adjustable via setup_io_tlb_npages.
- */
-static unsigned long io_tlb_nslabs;
-
-/*
- * The number of used IO TLB block
- */
-static unsigned long io_tlb_used;
-
-/*
- * This is a free list describing the number of free entries available from
- * each index
- */
-static unsigned int *io_tlb_list;
-static unsigned int io_tlb_index;
+struct io_tlb_mem io_tlb_default_mem;

/*
* Max segment that we can provide which (if pages are contingous) will
@@ -95,27 +73,17 @@ static unsigned int io_tlb_index;
*/
static unsigned int max_segment;

-/*
- * We need to save away the original address corresponding to a mapped entry
- * for the sync operations.
- */
-#define INVALID_PHYS_ADDR (~(phys_addr_t)0)
-static phys_addr_t *io_tlb_orig_addr;
-
-/*
- * Protect the above data structures in the map and unmap calls
- */
-static DEFINE_SPINLOCK(io_tlb_lock);
-
static int late_alloc;

static int __init
setup_io_tlb_npages(char *str)
{
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
+
if (isdigit(*str)) {
- io_tlb_nslabs = simple_strtoul(str, &str, 0);
+ mem->nslabs = simple_strtoul(str, &str, 0);
/* avoid tail segment of size < IO_TLB_SEGSIZE */
- io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+ mem->nslabs = ALIGN(mem->nslabs, IO_TLB_SEGSIZE);
}
if (*str == ',')
++str;
@@ -123,7 +91,7 @@ setup_io_tlb_npages(char *str)
swiotlb_force = SWIOTLB_FORCE;
} else if (!strcmp(str, "noforce")) {
swiotlb_force = SWIOTLB_NO_FORCE;
- io_tlb_nslabs = 1;
+ mem->nslabs = 1;
}

return 0;
@@ -134,7 +102,7 @@ static bool no_iotlb_memory;

unsigned long swiotlb_nr_tbl(void)
{
- return unlikely(no_iotlb_memory) ? 0 : io_tlb_nslabs;
+ return unlikely(no_iotlb_memory) ? 0 : io_tlb_default_mem.nslabs;
}
EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);

@@ -156,13 +124,14 @@ unsigned long swiotlb_size_or_default(void)
{
unsigned long size;

- size = io_tlb_nslabs << IO_TLB_SHIFT;
+ size = io_tlb_default_mem.nslabs << IO_TLB_SHIFT;

return size ? size : (IO_TLB_DEFAULT_SIZE);
}

void __init swiotlb_adjust_size(unsigned long new_size)
{
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
unsigned long size;

/*
@@ -170,10 +139,10 @@ void __init swiotlb_adjust_size(unsigned long new_size)
* architectures such as those supporting memory encryption to
* adjust/expand SWIOTLB size for their use.
*/
- if (!io_tlb_nslabs) {
+ if (!mem->nslabs) {
size = ALIGN(new_size, 1 << IO_TLB_SHIFT);
- io_tlb_nslabs = size >> IO_TLB_SHIFT;
- io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+ mem->nslabs = size >> IO_TLB_SHIFT;
+ mem->nslabs = ALIGN(mem->nslabs, IO_TLB_SEGSIZE);

pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20);
}
@@ -181,14 +150,15 @@ void __init swiotlb_adjust_size(unsigned long new_size)

void swiotlb_print_info(void)
{
- unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
+ unsigned long bytes = mem->nslabs << IO_TLB_SHIFT;

if (no_iotlb_memory) {
pr_warn("No low mem\n");
return;
}

- pr_info("mapped [mem %pa-%pa] (%luMB)\n", &io_tlb_start, &io_tlb_end,
+ pr_info("mapped [mem %pa-%pa] (%luMB)\n", &mem->start, &mem->end,
bytes >> 20);
}

@@ -200,57 +170,59 @@ void swiotlb_print_info(void)
*/
void __init swiotlb_update_mem_attributes(void)
{
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
void *vaddr;
unsigned long bytes;

if (no_iotlb_memory || late_alloc)
return;

- vaddr = phys_to_virt(io_tlb_start);
- bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
+ vaddr = phys_to_virt(mem->start);
+ bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
memset(vaddr, 0, bytes);
}

int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
{
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
unsigned long i, bytes;
size_t alloc_size;

bytes = nslabs << IO_TLB_SHIFT;

- io_tlb_nslabs = nslabs;
- io_tlb_start = __pa(tlb);
- io_tlb_end = io_tlb_start + bytes;
+ mem->nslabs = nslabs;
+ mem->start = __pa(tlb);
+ mem->end = mem->start + bytes;

/*
* Allocate and initialize the free list array. This array is used
* to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
- * between io_tlb_start and io_tlb_end.
+ * between mem->start and mem->end.
*/
- alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(int));
- io_tlb_list = memblock_alloc(alloc_size, PAGE_SIZE);
- if (!io_tlb_list)
+ alloc_size = PAGE_ALIGN(mem->nslabs * sizeof(int));
+ mem->list = memblock_alloc(alloc_size, PAGE_SIZE);
+ if (!mem->list)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
__func__, alloc_size, PAGE_SIZE);

- alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t));
- io_tlb_orig_addr = memblock_alloc(alloc_size, PAGE_SIZE);
- if (!io_tlb_orig_addr)
+ alloc_size = PAGE_ALIGN(mem->nslabs * sizeof(phys_addr_t));
+ mem->orig_addr = memblock_alloc(alloc_size, PAGE_SIZE);
+ if (!mem->orig_addr)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
__func__, alloc_size, PAGE_SIZE);

- for (i = 0; i < io_tlb_nslabs; i++) {
- io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
- io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+ for (i = 0; i < mem->nslabs; i++) {
+ mem->list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+ mem->orig_addr[i] = INVALID_PHYS_ADDR;
}
- io_tlb_index = 0;
+ mem->index = 0;
no_iotlb_memory = false;

if (verbose)
swiotlb_print_info();

- swiotlb_set_max_segment(io_tlb_nslabs << IO_TLB_SHIFT);
+ swiotlb_set_max_segment(mem->nslabs << IO_TLB_SHIFT);
return 0;
}

@@ -261,26 +233,27 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
void __init
swiotlb_init(int verbose)
{
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
size_t default_size = IO_TLB_DEFAULT_SIZE;
unsigned char *vstart;
unsigned long bytes;

- if (!io_tlb_nslabs) {
- io_tlb_nslabs = (default_size >> IO_TLB_SHIFT);
- io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+ if (!mem->nslabs) {
+ mem->nslabs = (default_size >> IO_TLB_SHIFT);
+ mem->nslabs = ALIGN(mem->nslabs, IO_TLB_SEGSIZE);
}

- bytes = io_tlb_nslabs << IO_TLB_SHIFT;
+ bytes = mem->nslabs << IO_TLB_SHIFT;

/* Get IO TLB memory from the low pages */
vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
- if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
+ if (vstart && !swiotlb_init_with_tbl(vstart, mem->nslabs, verbose))
return;

- if (io_tlb_start) {
- memblock_free_early(io_tlb_start,
- PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
- io_tlb_start = 0;
+ if (mem->start) {
+ memblock_free_early(mem->start,
+ PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT));
+ mem->start = 0;
}
pr_warn("Cannot allocate buffer");
no_iotlb_memory = true;
@@ -294,22 +267,23 @@ swiotlb_init(int verbose)
int
swiotlb_late_init_with_default_size(size_t default_size)
{
- unsigned long bytes, req_nslabs = io_tlb_nslabs;
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
+ unsigned long bytes, req_nslabs = mem->nslabs;
unsigned char *vstart = NULL;
unsigned int order;
int rc = 0;

- if (!io_tlb_nslabs) {
- io_tlb_nslabs = (default_size >> IO_TLB_SHIFT);
- io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+ if (!mem->nslabs) {
+ mem->nslabs = (default_size >> IO_TLB_SHIFT);
+ mem->nslabs = ALIGN(mem->nslabs, IO_TLB_SEGSIZE);
}

/*
* Get IO TLB memory from the low pages
*/
- order = get_order(io_tlb_nslabs << IO_TLB_SHIFT);
- io_tlb_nslabs = SLABS_PER_PAGE << order;
- bytes = io_tlb_nslabs << IO_TLB_SHIFT;
+ order = get_order(mem->nslabs << IO_TLB_SHIFT);
+ mem->nslabs = SLABS_PER_PAGE << order;
+ bytes = mem->nslabs << IO_TLB_SHIFT;

while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
vstart = (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN,
@@ -320,15 +294,15 @@ swiotlb_late_init_with_default_size(size_t default_size)
}

if (!vstart) {
- io_tlb_nslabs = req_nslabs;
+ mem->nslabs = req_nslabs;
return -ENOMEM;
}
if (order != get_order(bytes)) {
pr_warn("only able to allocate %ld MB\n",
(PAGE_SIZE << order) >> 20);
- io_tlb_nslabs = SLABS_PER_PAGE << order;
+ mem->nslabs = SLABS_PER_PAGE << order;
}
- rc = swiotlb_late_init_with_tbl(vstart, io_tlb_nslabs);
+ rc = swiotlb_late_init_with_tbl(vstart, mem->nslabs);
if (rc)
free_pages((unsigned long)vstart, order);

@@ -337,22 +311,25 @@ swiotlb_late_init_with_default_size(size_t default_size)

static void swiotlb_cleanup(void)
{
- io_tlb_end = 0;
- io_tlb_start = 0;
- io_tlb_nslabs = 0;
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
+
+ mem->end = 0;
+ mem->start = 0;
+ mem->nslabs = 0;
max_segment = 0;
}

int
swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
{
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
unsigned long i, bytes;

bytes = nslabs << IO_TLB_SHIFT;

- io_tlb_nslabs = nslabs;
- io_tlb_start = virt_to_phys(tlb);
- io_tlb_end = io_tlb_start + bytes;
+ mem->nslabs = nslabs;
+ mem->start = virt_to_phys(tlb);
+ mem->end = mem->start + bytes;

set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
memset(tlb, 0, bytes);
@@ -360,39 +337,39 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
/*
* Allocate and initialize the free list array. This array is used
* to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
- * between io_tlb_start and io_tlb_end.
+ * between mem->start and mem->end.
*/
- io_tlb_list = (unsigned int *)__get_free_pages(GFP_KERNEL,
- get_order(io_tlb_nslabs * sizeof(int)));
- if (!io_tlb_list)
+ mem->list = (unsigned int *)__get_free_pages(GFP_KERNEL,
+ get_order(mem->nslabs * sizeof(int)));
+ if (!mem->list)
goto cleanup3;

- io_tlb_orig_addr = (phys_addr_t *)
+ mem->orig_addr = (phys_addr_t *)
__get_free_pages(GFP_KERNEL,
- get_order(io_tlb_nslabs *
+ get_order(mem->nslabs *
sizeof(phys_addr_t)));
- if (!io_tlb_orig_addr)
+ if (!mem->orig_addr)
goto cleanup4;

- for (i = 0; i < io_tlb_nslabs; i++) {
- io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
- io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+ for (i = 0; i < mem->nslabs; i++) {
+ mem->list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+ mem->orig_addr[i] = INVALID_PHYS_ADDR;
}
- io_tlb_index = 0;
+ mem->index = 0;
no_iotlb_memory = false;

swiotlb_print_info();

late_alloc = 1;

- swiotlb_set_max_segment(io_tlb_nslabs << IO_TLB_SHIFT);
+ swiotlb_set_max_segment(mem->nslabs << IO_TLB_SHIFT);

return 0;

cleanup4:
- free_pages((unsigned long)io_tlb_list, get_order(io_tlb_nslabs *
- sizeof(int)));
- io_tlb_list = NULL;
+ free_pages((unsigned long)mem->list,
+ get_order(mem->nslabs * sizeof(int)));
+ mem->list = NULL;
cleanup3:
swiotlb_cleanup();
return -ENOMEM;
@@ -400,23 +377,25 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)

void __init swiotlb_exit(void)
{
- if (!io_tlb_orig_addr)
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
+
+ if (!mem->orig_addr)
return;

if (late_alloc) {
- free_pages((unsigned long)io_tlb_orig_addr,
- get_order(io_tlb_nslabs * sizeof(phys_addr_t)));
- free_pages((unsigned long)io_tlb_list, get_order(io_tlb_nslabs *
- sizeof(int)));
- free_pages((unsigned long)phys_to_virt(io_tlb_start),
- get_order(io_tlb_nslabs << IO_TLB_SHIFT));
+ free_pages((unsigned long)mem->orig_addr,
+ get_order(mem->nslabs * sizeof(phys_addr_t)));
+ free_pages((unsigned long)mem->list,
+ get_order(mem->nslabs * sizeof(int)));
+ free_pages((unsigned long)phys_to_virt(mem->start),
+ get_order(mem->nslabs << IO_TLB_SHIFT));
} else {
- memblock_free_late(__pa(io_tlb_orig_addr),
- PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));
- memblock_free_late(__pa(io_tlb_list),
- PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
- memblock_free_late(io_tlb_start,
- PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+ memblock_free_late(__pa(mem->orig_addr),
+ PAGE_ALIGN(mem->nslabs * sizeof(phys_addr_t)));
+ memblock_free_late(__pa(mem->list),
+ PAGE_ALIGN(mem->nslabs * sizeof(int)));
+ memblock_free_late(mem->start,
+ PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT));
}
swiotlb_cleanup();
}
@@ -465,7 +444,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
{
- dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
+ dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, mem->start);
unsigned long flags;
phys_addr_t tlb_addr;
unsigned int nslots, stride, index, wrap;
@@ -516,13 +496,13 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
* Find suitable number of IO TLB entries size that will fit this
* request and allocate a buffer from that IO TLB pool.
*/
- spin_lock_irqsave(&io_tlb_lock, flags);
+ spin_lock_irqsave(&mem->lock, flags);

- if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
+ if (unlikely(nslots > mem->nslabs - mem->used))
goto not_found;

- index = ALIGN(io_tlb_index, stride);
- if (index >= io_tlb_nslabs)
+ index = ALIGN(mem->index, stride);
+ if (index >= mem->nslabs)
index = 0;
wrap = index;

@@ -530,7 +510,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
while (iommu_is_span_boundary(index, nslots, offset_slots,
max_slots)) {
index += stride;
- if (index >= io_tlb_nslabs)
+ if (index >= mem->nslabs)
index = 0;
if (index == wrap)
goto not_found;
@@ -541,40 +521,40 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
* contiguous buffers, we allocate the buffers from that slot
* and mark the entries as '0' indicating unavailable.
*/
- if (io_tlb_list[index] >= nslots) {
+ if (mem->list[index] >= nslots) {
int count = 0;

for (i = index; i < (int) (index + nslots); i++)
- io_tlb_list[i] = 0;
- for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && io_tlb_list[i]; i--)
- io_tlb_list[i] = ++count;
- tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
+ mem->list[i] = 0;
+ for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && mem->list[i]; i--)
+ mem->list[i] = ++count;
+ tlb_addr = mem->start + (index << IO_TLB_SHIFT);

/*
* Update the indices to avoid searching in the next
* round.
*/
- io_tlb_index = ((index + nslots) < io_tlb_nslabs
- ? (index + nslots) : 0);
+ mem->index = ((index + nslots) < mem->nslabs
+ ? (index + nslots) : 0);

goto found;
}
index += stride;
- if (index >= io_tlb_nslabs)
+ if (index >= mem->nslabs)
index = 0;
} while (index != wrap);

not_found:
- tmp_io_tlb_used = io_tlb_used;
+ tmp_io_tlb_used = mem->used;

- spin_unlock_irqrestore(&io_tlb_lock, flags);
+ spin_unlock_irqrestore(&mem->lock, flags);
if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
- alloc_size, io_tlb_nslabs, tmp_io_tlb_used);
+ alloc_size, mem->nslabs, tmp_io_tlb_used);
return (phys_addr_t)DMA_MAPPING_ERROR;
found:
- io_tlb_used += nslots;
- spin_unlock_irqrestore(&io_tlb_lock, flags);
+ mem->used += nslots;
+ spin_unlock_irqrestore(&mem->lock, flags);

/*
* Save away the mapping from the original address to the DMA address.
@@ -582,7 +562,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
* needed.
*/
for (i = 0; i < nslots; i++)
- io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+ mem->orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
@@ -597,10 +577,11 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
{
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
unsigned long flags;
int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
- int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
- phys_addr_t orig_addr = io_tlb_orig_addr[index];
+ int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
+ phys_addr_t orig_addr = mem->orig_addr[index];

/*
* First, sync the memory before unmapping the entry
@@ -616,36 +597,37 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
* While returning the entries to the free list, we merge the entries
* with slots below and above the pool being returned.
*/
- spin_lock_irqsave(&io_tlb_lock, flags);
+ spin_lock_irqsave(&mem->lock, flags);
{
count = ((index + nslots) < ALIGN(index + 1, IO_TLB_SEGSIZE) ?
- io_tlb_list[index + nslots] : 0);
+ mem->list[index + nslots] : 0);
/*
* Step 1: return the slots to the free list, merging the
* slots with superceeding slots
*/
for (i = index + nslots - 1; i >= index; i--) {
- io_tlb_list[i] = ++count;
- io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+ mem->list[i] = ++count;
+ mem->orig_addr[i] = INVALID_PHYS_ADDR;
}
/*
* Step 2: merge the returned slots with the preceding slots,
* if available (non zero)
*/
- for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
- io_tlb_list[i] = ++count;
+ for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && mem->list[i]; i--)
+ mem->list[i] = ++count;

- io_tlb_used -= nslots;
+ mem->used -= nslots;
}
- spin_unlock_irqrestore(&io_tlb_lock, flags);
+ spin_unlock_irqrestore(&mem->lock, flags);
}

void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir,
enum dma_sync_target target)
{
- int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
- phys_addr_t orig_addr = io_tlb_orig_addr[index];
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
+ int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
+ phys_addr_t orig_addr = mem->orig_addr[index];

if (orig_addr == INVALID_PHYS_ADDR)
return;
@@ -713,21 +695,21 @@ size_t swiotlb_max_mapping_size(struct device *dev)
bool is_swiotlb_active(void)
{
/*
- * When SWIOTLB is initialized, even if io_tlb_start points to physical
- * address zero, io_tlb_end surely doesn't.
+ * When SWIOTLB is initialized, even if mem->start points to physical
+ * address zero, mem->end surely doesn't.
*/
- return io_tlb_end != 0;
+ return io_tlb_default_mem.end != 0;
}

#ifdef CONFIG_DEBUG_FS

static int __init swiotlb_create_debugfs(void)
{
- struct dentry *root;
+ struct io_tlb_mem *mem = &io_tlb_default_mem;

- root = debugfs_create_dir("swiotlb", NULL);
- debugfs_create_ulong("io_tlb_nslabs", 0400, root, &io_tlb_nslabs);
- debugfs_create_ulong("io_tlb_used", 0400, root, &io_tlb_used);
+ mem->debugfs = debugfs_create_dir("swiotlb", NULL);
+ debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, &mem->nslabs);
+ debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, &mem->used);
return 0;
}

--
2.29.2.729.g45daf8777d-goog

To make this change as mechanical as possible, I didn't fix any
checkpatch.pl ERROR/WARNING.

2021-01-06 03:44:49

by Claire Chang

[permalink] [raw]
Subject: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

Add the initialization function to create restricted DMA pools from
matching reserved-memory nodes in the device tree.

Signed-off-by: Claire Chang <[email protected]>
---
include/linux/device.h | 4 ++
include/linux/swiotlb.h | 7 +-
kernel/dma/Kconfig | 1 +
kernel/dma/swiotlb.c | 144 ++++++++++++++++++++++++++++++++++------
4 files changed, 131 insertions(+), 25 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 89bb8b84173e..ca6f71ec8871 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -413,6 +413,7 @@ struct dev_links_info {
* @dma_pools: Dma pools (if dma'ble device).
* @dma_mem: Internal for coherent mem override.
* @cma_area: Contiguous memory area for dma allocations
+ * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
* @archdata: For arch-specific additions.
* @of_node: Associated device tree node.
* @fwnode: Associated device node supplied by platform firmware.
@@ -515,6 +516,9 @@ struct device {
#ifdef CONFIG_DMA_CMA
struct cma *cma_area; /* contiguous memory area for dma
allocations */
+#endif
+#ifdef CONFIG_SWIOTLB
+ struct io_tlb_mem *dma_io_tlb_mem;
#endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index dd8eb57cbb8f..a1bbd7788885 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
*
* @start: The start address of the swiotlb memory pool. Used to do a quick
* range check to see if the memory was in fact allocated by this
- * API.
+ * API. For restricted DMA pool, this is device tree adjustable.
* @end: The end address of the swiotlb memory pool. Used to do a quick
* range check to see if the memory was in fact allocated by this
- * API.
+ * API. For restricted DMA pool, this is device tree adjustable.
* @nslabs: The number of IO TLB blocks (in groups of 64) between @start and
- * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @end. For default swiotlb, this is command line adjustable via
+ * setup_io_tlb_npages.
* @used: The number of used IO TLB block.
* @list: The free list describing the number of free entries available
* from each index.
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 479fc145acfc..131a0a66781b 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -82,6 +82,7 @@ config ARCH_HAS_FORCE_DMA_UNENCRYPTED
config SWIOTLB
bool
select NEED_DMA_MAP_STATE
+ select OF_EARLY_FLATTREE

#
# Should be selected if we can mmap non-coherent mappings to userspace.
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e4368159f88a..7fb2ac087d23 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -36,6 +36,11 @@
#include <linux/scatterlist.h>
#include <linux/mem_encrypt.h>
#include <linux/set_memory.h>
+#include <linux/io.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+#include <linux/of_reserved_mem.h>
+#include <linux/slab.h>
#ifdef CONFIG_DEBUG_FS
#include <linux/debugfs.h>
#endif
@@ -319,20 +324,21 @@ static void swiotlb_cleanup(void)
max_segment = 0;
}

-int
-swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
+static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+ size_t size)
{
- struct io_tlb_mem *mem = &io_tlb_default_mem;
- unsigned long i, bytes;
+ unsigned long i;
+ void *vaddr = phys_to_virt(start);

- bytes = nslabs << IO_TLB_SHIFT;
+ size = ALIGN(size, 1 << IO_TLB_SHIFT);
+ mem->nslabs = size >> IO_TLB_SHIFT;
+ mem->nslabs = ALIGN(mem->nslabs, IO_TLB_SEGSIZE);

- mem->nslabs = nslabs;
- mem->start = virt_to_phys(tlb);
- mem->end = mem->start + bytes;
+ mem->start = start;
+ mem->end = mem->start + size;

- set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
- memset(tlb, 0, bytes);
+ set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
+ memset(vaddr, 0, size);

/*
* Allocate and initialize the free list array. This array is used
@@ -356,13 +362,6 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
mem->orig_addr[i] = INVALID_PHYS_ADDR;
}
mem->index = 0;
- no_iotlb_memory = false;
-
- swiotlb_print_info();
-
- late_alloc = 1;
-
- swiotlb_set_max_segment(mem->nslabs << IO_TLB_SHIFT);

return 0;

@@ -375,6 +374,27 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
return -ENOMEM;
}

+int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
+{
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
+ unsigned long bytes = nslabs << IO_TLB_SHIFT;
+ int ret;
+
+ ret = swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), bytes);
+ if (ret)
+ return ret;
+
+ no_iotlb_memory = false;
+
+ swiotlb_print_info();
+
+ late_alloc = 1;
+
+ swiotlb_set_max_segment(bytes);
+
+ return 0;
+}
+
void __init swiotlb_exit(void)
{
struct io_tlb_mem *mem = &io_tlb_default_mem;
@@ -703,16 +723,96 @@ bool is_swiotlb_active(void)

#ifdef CONFIG_DEBUG_FS

-static int __init swiotlb_create_debugfs(void)
+static void swiotlb_create_debugfs(struct io_tlb_mem *mem, const char *name,
+ struct dentry *node)
{
- struct io_tlb_mem *mem = &io_tlb_default_mem;
-
- mem->debugfs = debugfs_create_dir("swiotlb", NULL);
+ mem->debugfs = debugfs_create_dir(name, node);
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, &mem->nslabs);
debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, &mem->used);
+}
+
+static int __init swiotlb_create_default_debugfs(void)
+{
+ swiotlb_create_debugfs(&io_tlb_default_mem, "swiotlb", NULL);
+
return 0;
}

-late_initcall(swiotlb_create_debugfs);
+late_initcall(swiotlb_create_default_debugfs);

#endif
+
+static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
+ struct device *dev)
+{
+ struct io_tlb_mem *mem = rmem->priv;
+ int ret;
+
+ if (dev->dma_io_tlb_mem)
+ return -EBUSY;
+
+ if (!mem) {
+ mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+ if (!mem)
+ return -ENOMEM;
+
+ if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {
+ ret = -EINVAL;
+ goto cleanup;
+ }
+
+ ret = swiotlb_init_io_tlb_mem(mem, rmem->base, rmem->size);
+ if (ret)
+ goto cleanup;
+
+ rmem->priv = mem;
+ }
+
+#ifdef CONFIG_DEBUG_FS
+ swiotlb_create_debugfs(mem, dev_name(dev), io_tlb_default_mem.debugfs);
+#endif
+
+ dev->dma_io_tlb_mem = mem;
+
+ return 0;
+
+cleanup:
+ kfree(mem);
+
+ return ret;
+}
+
+static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
+ struct device *dev)
+{
+ if (!dev)
+ return;
+
+#ifdef CONFIG_DEBUG_FS
+ debugfs_remove_recursive(dev->dma_io_tlb_mem->debugfs);
+#endif
+ dev->dma_io_tlb_mem = NULL;
+}
+
+static const struct reserved_mem_ops rmem_swiotlb_ops = {
+ .device_init = rmem_swiotlb_device_init,
+ .device_release = rmem_swiotlb_device_release,
+};
+
+static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
+{
+ unsigned long node = rmem->fdt_node;
+
+ if (of_get_flat_dt_prop(node, "reusable", NULL) ||
+ of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
+ of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
+ of_get_flat_dt_prop(node, "no-map", NULL))
+ return -EINVAL;
+
+ rmem->ops = &rmem_swiotlb_ops;
+ pr_info("Reserved memory: created device swiotlb memory pool at %pa, size %ld MiB\n",
+ &rmem->base, (unsigned long)rmem->size / SZ_1M);
+ return 0;
+}
+
+RESERVEDMEM_OF_DECLARE(dma, "restricted-dma-pool", rmem_swiotlb_setup);
--
2.29.2.729.g45daf8777d-goog

2021-01-06 03:44:57

by Claire Chang

[permalink] [raw]
Subject: [RFC PATCH v3 4/6] swiotlb: Add restricted DMA alloc/free support.

Add the functions, swiotlb_alloc and swiotlb_free to support the
memory allocation from restricted DMA pool.

Signed-off-by: Claire Chang <[email protected]>
---
include/linux/swiotlb.h | 6 ++
kernel/dma/direct.c | 12 +++
kernel/dma/swiotlb.c | 171 +++++++++++++++++++++++++++++-----------
3 files changed, 144 insertions(+), 45 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 5135e5636042..84fe96e40685 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -68,6 +68,12 @@ extern void swiotlb_tbl_sync_single(struct device *hwdev,
dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
size_t size, enum dma_data_direction dir, unsigned long attrs);

+void *swiotlb_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ unsigned long attrs);
+
+void swiotlb_free(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_addr, unsigned long attrs);
+
#ifdef CONFIG_SWIOTLB
extern enum swiotlb_force swiotlb_force;

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 30ccbc08e229..126e9b3354d6 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -137,6 +137,11 @@ void *dma_direct_alloc(struct device *dev, size_t size,
void *ret;
int err;

+#ifdef CONFIG_SWIOTLB
+ if (unlikely(dev->dma_io_tlb_mem))
+ return swiotlb_alloc(dev, size, dma_handle, attrs);
+#endif
+
size = PAGE_ALIGN(size);
if (attrs & DMA_ATTR_NO_WARN)
gfp |= __GFP_NOWARN;
@@ -246,6 +251,13 @@ void dma_direct_free(struct device *dev, size_t size,
{
unsigned int page_order = get_order(size);

+#ifdef CONFIG_SWIOTLB
+ if (unlikely(dev->dma_io_tlb_mem)) {
+ swiotlb_free(dev, size, cpu_addr, dma_addr, attrs);
+ return;
+ }
+#endif
+
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
!force_dma_unencrypted(dev)) {
/* cpu_addr is a struct page cookie, not a kernel address */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1f05af09e61a..ca88ef59435d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -459,14 +459,13 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
}
}

-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
- size_t mapping_size, size_t alloc_size,
- enum dma_data_direction dir, unsigned long attrs)
+static int swiotlb_tbl_find_free_region(struct device *hwdev,
+ dma_addr_t tbl_dma_addr,
+ size_t alloc_size,
+ unsigned long attrs)
{
struct io_tlb_mem *mem = get_io_tlb_mem(hwdev);
- dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, mem->start);
unsigned long flags;
- phys_addr_t tlb_addr;
unsigned int nslots, stride, index, wrap;
int i;
unsigned long mask;
@@ -477,15 +476,6 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
if (no_iotlb_memory && !hwdev->dma_io_tlb_mem)
panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");

- if (mem_encrypt_active())
- pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
-
- if (mapping_size > alloc_size) {
- dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
- mapping_size, alloc_size);
- return (phys_addr_t)DMA_MAPPING_ERROR;
- }
-
mask = dma_get_seg_boundary(hwdev);

tbl_dma_addr &= mask;
@@ -547,7 +537,6 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
mem->list[i] = 0;
for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && mem->list[i]; i--)
mem->list[i] = ++count;
- tlb_addr = mem->start + (index << IO_TLB_SHIFT);

/*
* Update the indices to avoid searching in the next
@@ -570,45 +559,21 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
alloc_size, mem->nslabs, tmp_io_tlb_used);
- return (phys_addr_t)DMA_MAPPING_ERROR;
+ return -ENOMEM;
+
found:
mem->used += nslots;
spin_unlock_irqrestore(&mem->lock, flags);

- /*
- * Save away the mapping from the original address to the DMA address.
- * This is needed when we sync the memory. Then we sync the buffer if
- * needed.
- */
- for (i = 0; i < nslots; i++)
- mem->orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
-
- return tlb_addr;
+ return index;
}

-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, size_t alloc_size,
- enum dma_data_direction dir, unsigned long attrs)
+static void swiotlb_tbl_release_region(struct device *hwdev, int index,
+ size_t size)
{
struct io_tlb_mem *mem = get_io_tlb_mem(hwdev);
unsigned long flags;
- int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
- int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
- phys_addr_t orig_addr = mem->orig_addr[index];
-
- /*
- * First, sync the memory before unmapping the entry
- */
- if (orig_addr != INVALID_PHYS_ADDR &&
- !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
- swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+ int i, count, nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;

/*
* Return the buffer to the free list by setting the corresponding
@@ -640,6 +605,69 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
spin_unlock_irqrestore(&mem->lock, flags);
}

+phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
+ size_t mapping_size, size_t alloc_size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+ struct io_tlb_mem *mem = get_io_tlb_mem(hwdev);
+ dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, mem->start);
+ phys_addr_t tlb_addr;
+ unsigned int nslots, index;
+ int i;
+
+ if (mem_encrypt_active())
+ pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
+
+ if (mapping_size > alloc_size) {
+ dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
+ mapping_size, alloc_size);
+ return (phys_addr_t)DMA_MAPPING_ERROR;
+ }
+
+ index = swiotlb_tbl_find_free_region(hwdev, tbl_dma_addr, alloc_size,
+ attrs);
+ if (index < 0)
+ return (phys_addr_t)DMA_MAPPING_ERROR;
+
+ tlb_addr = mem->start + (index << IO_TLB_SHIFT);
+
+ /*
+ * Save away the mapping from the original address to the DMA address.
+ * This is needed when we sync the memory. Then we sync the buffer if
+ * needed.
+ */
+ nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+ for (i = 0; i < nslots; i++)
+ mem->orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+ (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
+ swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
+
+ return tlb_addr;
+}
+
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
+ size_t mapping_size, size_t alloc_size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+ struct io_tlb_mem *mem = get_io_tlb_mem(hwdev);
+ int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
+ phys_addr_t orig_addr = mem->orig_addr[index];
+
+ /*
+ * First, sync the memory before unmapping the entry
+ */
+ if (orig_addr != INVALID_PHYS_ADDR &&
+ !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+ ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
+ swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+
+ swiotlb_tbl_release_region(hwdev, index, alloc_size);
+}
+
void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir,
enum dma_sync_target target)
@@ -706,6 +734,59 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
return dma_addr;
}

+void *swiotlb_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ unsigned long attrs)
+{
+ struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+ int index;
+ void *vaddr;
+ phys_addr_t tlb_addr;
+
+ size = PAGE_ALIGN(size);
+ index = swiotlb_tbl_find_free_region(dev, mem->start, size, attrs);
+ if (index < 0)
+ return NULL;
+
+ tlb_addr = mem->start + (index << IO_TLB_SHIFT);
+ *dma_handle = phys_to_dma_unencrypted(dev, tlb_addr);
+
+ if (!dev_is_dma_coherent(dev)) {
+ unsigned long pfn = PFN_DOWN(tlb_addr);
+
+ /* remove any dirty cache lines on the kernel alias */
+ arch_dma_prep_coherent(pfn_to_page(pfn), size);
+
+ /* create a coherent mapping */
+ vaddr = dma_common_contiguous_remap(
+ pfn_to_page(pfn), size,
+ dma_pgprot(dev, PAGE_KERNEL, attrs),
+ __builtin_return_address(0));
+ if (!vaddr) {
+ swiotlb_tbl_release_region(dev, index, size);
+ return NULL;
+ }
+ } else {
+ vaddr = phys_to_virt(tlb_addr);
+ }
+
+ memset(vaddr, 0, size);
+
+ return vaddr;
+}
+
+void swiotlb_free(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_addr, unsigned long attrs)
+{
+ struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+ unsigned int index;
+
+ if (!dev_is_dma_coherent(dev))
+ vunmap(vaddr);
+
+ index = (dma_addr - mem->start) >> IO_TLB_SHIFT;
+ swiotlb_tbl_release_region(dev, index, PAGE_ALIGN(size));
+}
+
size_t swiotlb_max_mapping_size(struct device *dev)
{
return ((size_t)1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE;
--
2.29.2.729.g45daf8777d-goog

2021-01-06 03:45:07

by Claire Chang

[permalink] [raw]
Subject: [RFC PATCH v3 3/6] swiotlb: Use restricted DMA pool if available

Regardless of swiotlb setting, the restricted DMA pool is preferred if
available.

The restricted DMA pools provide a basic level of protection against
the DMA overwriting buffer contents at unexpected times. However, to
protect against general data leakage and system memory corruption, the
system needs to provide a way to restrict the DMA to a predefined memory
region.

Signed-off-by: Claire Chang <[email protected]>
---
drivers/iommu/dma-iommu.c | 12 ++++++------
include/linux/swiotlb.h | 17 +++++++++++------
kernel/dma/direct.c | 8 ++++----
kernel/dma/direct.h | 10 ++++++----
kernel/dma/swiotlb.c | 13 ++++++-------
5 files changed, 33 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f0305e6aac1b..1343cc2ef27a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -516,7 +516,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,

__iommu_dma_unmap(dev, dma_addr, size);

- if (unlikely(is_swiotlb_buffer(phys)))
+ if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size,
iova_align(iovad, size), dir, attrs);
}
@@ -592,7 +592,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
}

iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
- if ((iova == DMA_MAPPING_ERROR) && is_swiotlb_buffer(phys))
+ if ((iova == DMA_MAPPING_ERROR) && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, org_size,
aligned_size, dir, attrs);

@@ -764,7 +764,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(phys, size, dir);

- if (is_swiotlb_buffer(phys))
+ if (is_swiotlb_buffer(dev, phys))
swiotlb_tbl_sync_single(dev, phys, size, dir, SYNC_FOR_CPU);
}

@@ -777,7 +777,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
return;

phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
- if (is_swiotlb_buffer(phys))
+ if (is_swiotlb_buffer(dev, phys))
swiotlb_tbl_sync_single(dev, phys, size, dir, SYNC_FOR_DEVICE);

if (!dev_is_dma_coherent(dev))
@@ -798,7 +798,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);

- if (is_swiotlb_buffer(sg_phys(sg)))
+ if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_tbl_sync_single(dev, sg_phys(sg), sg->length,
dir, SYNC_FOR_CPU);
}
@@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
return;

for_each_sg(sgl, sg, nelems, i) {
- if (is_swiotlb_buffer(sg_phys(sg)))
+ if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_tbl_sync_single(dev, sg_phys(sg), sg->length,
dir, SYNC_FOR_DEVICE);

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index a1bbd7788885..5135e5636042 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -2,12 +2,12 @@
#ifndef __LINUX_SWIOTLB_H
#define __LINUX_SWIOTLB_H

+#include <linux/device.h>
#include <linux/dma-direction.h>
#include <linux/init.h>
#include <linux/types.h>
#include <linux/limits.h>

-struct device;
struct page;
struct scatterlist;

@@ -106,9 +106,14 @@ struct io_tlb_mem {
};
extern struct io_tlb_mem io_tlb_default_mem;

-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline struct io_tlb_mem *get_io_tlb_mem(struct device *dev)
{
- struct io_tlb_mem *mem = &io_tlb_default_mem;
+ return dev->dma_io_tlb_mem ? dev->dma_io_tlb_mem : &io_tlb_default_mem;
+}
+
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
+{
+ struct io_tlb_mem *mem = get_io_tlb_mem(dev);

return paddr >= mem->start && paddr < mem->end;
}
@@ -116,11 +121,11 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr)
void __init swiotlb_exit(void);
unsigned int swiotlb_max_segment(void);
size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
+bool is_swiotlb_active(struct device *dev);
void __init swiotlb_adjust_size(unsigned long new_size);
#else
#define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
{
return false;
}
@@ -136,7 +141,7 @@ static inline size_t swiotlb_max_mapping_size(struct device *dev)
return SIZE_MAX;
}

-static inline bool is_swiotlb_active(void)
+static inline bool is_swiotlb_active(struct device *dev)
{
return false;
}
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 002268262c9a..30ccbc08e229 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
for_each_sg(sgl, sg, nents, i) {
phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));

- if (unlikely(is_swiotlb_buffer(paddr)))
+ if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_tbl_sync_single(dev, paddr, sg->length,
dir, SYNC_FOR_DEVICE);

@@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(paddr, sg->length, dir);

- if (unlikely(is_swiotlb_buffer(paddr)))
+ if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_tbl_sync_single(dev, paddr, sg->length, dir,
SYNC_FOR_CPU);

@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
size_t dma_direct_max_mapping_size(struct device *dev)
{
/* If SWIOTLB is active, use its maximum mapping size */
- if (is_swiotlb_active() &&
+ if (is_swiotlb_active(dev) &&
(dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
@@ -504,7 +504,7 @@ size_t dma_direct_max_mapping_size(struct device *dev)
bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
{
return !dev_is_dma_coherent(dev) ||
- is_swiotlb_buffer(dma_to_phys(dev, dma_addr));
+ is_swiotlb_buffer(dev, dma_to_phys(dev, dma_addr));
}

/**
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index b98615578737..7188834cc4c7 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -56,7 +56,7 @@ static inline void dma_direct_sync_single_for_device(struct device *dev,
{
phys_addr_t paddr = dma_to_phys(dev, addr);

- if (unlikely(is_swiotlb_buffer(paddr)))
+ if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_DEVICE);

if (!dev_is_dma_coherent(dev))
@@ -73,7 +73,7 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
arch_sync_dma_for_cpu_all();
}

- if (unlikely(is_swiotlb_buffer(paddr)))
+ if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);

if (dir == DMA_FROM_DEVICE)
@@ -87,8 +87,10 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);

- if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+#ifdef CONFIG_SWIOTLB
+ if (unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dma_io_tlb_mem)
return swiotlb_map(dev, phys, size, dir, attrs);
+#endif

if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
if (swiotlb_force != SWIOTLB_NO_FORCE)
@@ -113,7 +115,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
dma_direct_sync_single_for_cpu(dev, addr, size, dir);

- if (unlikely(is_swiotlb_buffer(phys)))
+ if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size, size, dir, attrs);
}
#endif /* _KERNEL_DMA_DIRECT_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7fb2ac087d23..1f05af09e61a 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -222,7 +222,6 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
mem->orig_addr[i] = INVALID_PHYS_ADDR;
}
mem->index = 0;
- no_iotlb_memory = false;

if (verbose)
swiotlb_print_info();
@@ -464,7 +463,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
{
- struct io_tlb_mem *mem = &io_tlb_default_mem;
+ struct io_tlb_mem *mem = get_io_tlb_mem(hwdev);
dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, mem->start);
unsigned long flags;
phys_addr_t tlb_addr;
@@ -475,7 +474,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
unsigned long max_slots;
unsigned long tmp_io_tlb_used;

- if (no_iotlb_memory)
+ if (no_iotlb_memory && !hwdev->dma_io_tlb_mem)
panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");

if (mem_encrypt_active())
@@ -597,7 +596,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
{
- struct io_tlb_mem *mem = &io_tlb_default_mem;
+ struct io_tlb_mem *mem = get_io_tlb_mem(hwdev);
unsigned long flags;
int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
@@ -645,7 +644,7 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir,
enum dma_sync_target target)
{
- struct io_tlb_mem *mem = &io_tlb_default_mem;
+ struct io_tlb_mem *mem = get_io_tlb_mem(hwdev);
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = mem->orig_addr[index];

@@ -712,13 +711,13 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE;
}

-bool is_swiotlb_active(void)
+bool is_swiotlb_active(struct device *dev)
{
/*
* When SWIOTLB is initialized, even if mem->start points to physical
* address zero, mem->end surely doesn't.
*/
- return io_tlb_default_mem.end != 0;
+ return io_tlb_default_mem.end != 0 || dev->dma_io_tlb_mem;
}

#ifdef CONFIG_DEBUG_FS
--
2.29.2.729.g45daf8777d-goog

2021-01-06 03:45:21

by Claire Chang

[permalink] [raw]
Subject: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool

If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang <[email protected]>
---
drivers/of/address.c | 21 +++++++++++++++++++++
drivers/of/device.c | 4 ++++
drivers/of/of_private.h | 5 +++++
3 files changed, 30 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 73ddf2540f3f..94eca8249854 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
#include <linux/logic_pio.h>
#include <linux/module.h>
#include <linux/of_address.h>
+#include <linux/of_reserved_mem.h>
#include <linux/pci.h>
#include <linux/pci_regs.h>
#include <linux/sizes.h>
@@ -1094,3 +1095,23 @@ bool of_dma_is_coherent(struct device_node *np)
return false;
}
EXPORT_SYMBOL_GPL(of_dma_is_coherent);
+
+int of_dma_set_restricted_buffer(struct device *dev)
+{
+ struct device_node *node;
+ int count, i;
+
+ if (!dev->of_node)
+ return 0;
+
+ count = of_property_count_elems_of_size(dev->of_node, "memory-region",
+ sizeof(phandle));
+ for (i = 0; i < count; i++) {
+ node = of_parse_phandle(dev->of_node, "memory-region", i);
+ if (of_device_is_compatible(node, "restricted-dma-pool"))
+ return of_reserved_mem_device_init_by_idx(
+ dev, dev->of_node, i);
+ }
+
+ return 0;
+}
diff --git a/drivers/of/device.c b/drivers/of/device.c
index aedfaaafd3e7..e2c7409956ab 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -182,6 +182,10 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);

dev->dma_range_map = map;
+
+ if (!iommu)
+ return of_dma_set_restricted_buffer(dev);
+
return 0;
}
EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index d9e6a324de0a..28a2dfa197ba 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -161,12 +161,17 @@ struct bus_dma_region;
#if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev);
#else
static inline int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map)
{
return -ENODEV;
}
+static inline int of_dma_get_restricted_buffer(struct device *dev)
+{
+ return -ENODEV;
+}
#endif

#endif /* _LINUX_OF_PRIVATE_H */
--
2.29.2.729.g45daf8777d-goog

2021-01-06 07:52:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On Wed, Jan 06, 2021 at 11:41:20AM +0800, Claire Chang wrote:
> Add the initialization function to create restricted DMA pools from
> matching reserved-memory nodes in the device tree.
>
> Signed-off-by: Claire Chang <[email protected]>
> ---
> include/linux/device.h | 4 ++
> include/linux/swiotlb.h | 7 +-
> kernel/dma/Kconfig | 1 +
> kernel/dma/swiotlb.c | 144 ++++++++++++++++++++++++++++++++++------
> 4 files changed, 131 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 89bb8b84173e..ca6f71ec8871 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -413,6 +413,7 @@ struct dev_links_info {
> * @dma_pools: Dma pools (if dma'ble device).
> * @dma_mem: Internal for coherent mem override.
> * @cma_area: Contiguous memory area for dma allocations
> + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.

Why does this have to be added here? Shouldn't the platform-specific
code handle it instead?

thanks,

greg k-h

2021-01-06 18:52:23

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/6] Restricted DMA

Hi,

First of all let me say that I am glad that someone is working on a
upstream solution for this issue, would appreciate if you could CC and
Jim Quinlan on subsequent submissions.

On 1/5/21 7:41 PM, Claire Chang wrote:
> This series implements mitigations for lack of DMA access control on
> systems without an IOMMU, which could result in the DMA accessing the
> system memory at unexpected times and/or unexpected addresses, possibly
> leading to data leakage or corruption.
>
> For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> not behind an IOMMU. As PCI-e, by design, gives the device full access to
> system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> full chain of exploits; [2], [3]).
>
> To mitigate the security concerns, we introduce restricted DMA. Restricted
> DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> specially allocated region and does memory allocation from the same region.
> The feature on its own provides a basic level of protection against the DMA
> overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system needs
> to provide a way to restrict the DMA to a predefined memory region (this is
> usually done at firmware level, e.g. in ATF on some ARM platforms).

Can you explain how ATF gets involved and to what extent it does help,
besides enforcing a secure region from the ARM CPU's perpsective? Does
the PCIe root complex not have an IOMMU but can somehow be denied access
to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
still some sort of basic protection that the HW enforces, right?

On Broadcom STB SoCs we have had something similar for a while however
and while we don't have an IOMMU for the PCIe bridge, we do have a a
basic protection mechanism whereby we can configure a region in DRAM to
be PCIe read/write and CPU read/write which then gets used as the PCIe
inbound region for the PCIe EP. By default the PCIe bridge is not
allowed access to DRAM so we must call into a security agent to allow
the PCIe bridge to access the designated DRAM region.

We have done this using a private CMA area region assigned via Device
Tree, assigned with a and requiring the PCIe EP driver to use
dma_alloc_from_contiguous() in order to allocate from this device
private CMA area. The only drawback with that approach is that it
requires knowing how much memory you need up front for buffers and DMA
descriptors that the PCIe EP will need to process. The problem is that
it requires driver modifications and that does not scale over the number
of PCIe EP drivers, some we absolutely do not control, but there is no
need to bounce buffer. Your approach scales better across PCIe EP
drivers however it does require bounce buffering which could be a
performance hit.

Thanks!
--
Florian

2021-01-06 18:58:37

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

Hello!

In this file:

> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index e4368159f88a..7fb2ac087d23 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
..

> +static const struct reserved_mem_ops rmem_swiotlb_ops = {
> + .device_init = rmem_swiotlb_device_init,
> + .device_release = rmem_swiotlb_device_release,
> +};
> +
> +static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
> +{
> + unsigned long node = rmem->fdt_node;
> +
> + if (of_get_flat_dt_prop(node, "reusable", NULL) ||
> + of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
> + of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
> + of_get_flat_dt_prop(node, "no-map", NULL))
> + return -EINVAL;
> +
> + rmem->ops = &rmem_swiotlb_ops;
> + pr_info("Reserved memory: created device swiotlb memory pool at %pa, size %ld MiB\n",
> + &rmem->base, (unsigned long)rmem->size / SZ_1M);
> + return 0;
> +}
> +
> +RESERVEDMEM_OF_DECLARE(dma, "restricted-dma-pool", rmem_swiotlb_setup);

The code should be as much as possible arch-agnostic. That is why there
are multiple -swiotlb files scattered in arch directories that own the
architecture specific code.

Would it be possible to move the code there and perhaps have a ARM
specific front-end for this DMA restricted pool there? See for example
the xen-swiotlb code.

Cheers!

Konrad

2021-01-06 19:01:30

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> Introduce the new compatible string, restricted-dma-pool, for restricted
> DMA. One can specify the address and length of the restricted DMA memory
> region by restricted-dma-pool in the device tree.
>
> Signed-off-by: Claire Chang <[email protected]>
> ---
> .../reserved-memory/reserved-memory.txt | 24 +++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> index e8d3096d922c..44975e2a1fd2 100644
> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> used as a shared pool of DMA buffers for a set of devices. It can
> be used by an operating system to instantiate the necessary pool
> management subsystem if necessary.
> + - restricted-dma-pool: This indicates a region of memory meant to be
> + used as a pool of restricted DMA buffers for a set of devices. The
> + memory region would be the only region accessible to those devices.
> + When using this, the no-map and reusable properties must not be set,
> + so the operating system can create a virtual mapping that will be used
> + for synchronization. The main purpose for restricted DMA is to
> + mitigate the lack of DMA access control on systems without an IOMMU,
> + which could result in the DMA accessing the system memory at
> + unexpected times and/or unexpected addresses, possibly leading to data
> + leakage or corruption. The feature on its own provides a basic level
> + of protection against the DMA overwriting buffer contents at
> + unexpected times. However, to protect against general data leakage and
> + system memory corruption, the system needs to provide way to restrict
> + the DMA to a predefined memory region.

Heya!

I think I am missing something obvious here so please bear with my
questions:

- This code adds the means of having the SWIOTLB pool tied to a specific
memory correct?

- Nothing stops the physical device from bypassing the SWIOTLB buffer.
That is if an errant device screwed up the length or DMA address, the
SWIOTLB would gladly do what the device told it do?

- This has to be combined with SWIOTLB-force-ish to always use the
bounce buffer, otherwise you could still do DMA without using
SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?

2021-01-07 17:48:36

by Claire Chang

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

Hi Greg and Konrad,

This change is intended to be non-arch specific. Any arch that lacks DMA access
control and has devices not behind an IOMMU can make use of it. Could you share
why you think this should be arch specific?

Thanks!

2021-01-07 17:50:07

by Claire Chang

[permalink] [raw]
Subject: Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

On Thu, Jan 7, 2021 at 2:58 AM Konrad Rzeszutek Wilk
<[email protected]> wrote:
>
> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> > Introduce the new compatible string, restricted-dma-pool, for restricted
> > DMA. One can specify the address and length of the restricted DMA memory
> > region by restricted-dma-pool in the device tree.
> >
> > Signed-off-by: Claire Chang <[email protected]>
> > ---
> > .../reserved-memory/reserved-memory.txt | 24 +++++++++++++++++++
> > 1 file changed, 24 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > index e8d3096d922c..44975e2a1fd2 100644
> > --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> > used as a shared pool of DMA buffers for a set of devices. It can
> > be used by an operating system to instantiate the necessary pool
> > management subsystem if necessary.
> > + - restricted-dma-pool: This indicates a region of memory meant to be
> > + used as a pool of restricted DMA buffers for a set of devices. The
> > + memory region would be the only region accessible to those devices.
> > + When using this, the no-map and reusable properties must not be set,
> > + so the operating system can create a virtual mapping that will be used
> > + for synchronization. The main purpose for restricted DMA is to
> > + mitigate the lack of DMA access control on systems without an IOMMU,
> > + which could result in the DMA accessing the system memory at
> > + unexpected times and/or unexpected addresses, possibly leading to data
> > + leakage or corruption. The feature on its own provides a basic level
> > + of protection against the DMA overwriting buffer contents at
> > + unexpected times. However, to protect against general data leakage and
> > + system memory corruption, the system needs to provide way to restrict
> > + the DMA to a predefined memory region.
>
> Heya!
>
> I think I am missing something obvious here so please bear with my
> questions:
>
> - This code adds the means of having the SWIOTLB pool tied to a specific
> memory correct?

It doesn't affect the existing SWIOTLB. It just utilizes the existing SWIOTLB
code to create another DMA pool tied to a specific memory region for a given set
of devices. It bounces the streaming DMA (map/unmap) in and out of that region
and does the memory allocation (dma_direct_alloc) from the same region.

>
>
> - Nothing stops the physical device from bypassing the SWIOTLB buffer.
> That is if an errant device screwed up the length or DMA address, the
> SWIOTLB would gladly do what the device told it do?

So the system needs to provide a way to lock down the memory access, e.g. MPU.

>
> - This has to be combined with SWIOTLB-force-ish to always use the
> bounce buffer, otherwise you could still do DMA without using
> SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?

Since restricted DMA is for the devices that are not behind an IOMMU, I change
the criteria
`if (unlikely(swiotlb_force == SWIOTLB_FORCE))`
to
`if (unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dma_io_tlb_mem)`
in dma_direct_map_page().

Also, even if SWIOTLB=force, the restricted DMA pool is preferred if available
(get_io_tlb_mem in https://lore.kernel.org/patchwork/patch/1360995/).

Thanks!

2021-01-07 17:53:09

by Claire Chang

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/6] Restricted DMA

On Thu, Jan 7, 2021 at 2:48 AM Florian Fainelli <[email protected]> wrote:
>
> Hi,
>
> First of all let me say that I am glad that someone is working on a
> upstream solution for this issue, would appreciate if you could CC and
> Jim Quinlan on subsequent submissions.

Sure!

>
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> >
> > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> > not behind an IOMMU. As PCI-e, by design, gives the device full access to
> > system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> > full chain of exploits; [2], [3]).
> >
> > To mitigate the security concerns, we introduce restricted DMA. Restricted
> > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> > specially allocated region and does memory allocation from the same region.
> > The feature on its own provides a basic level of protection against the DMA
> > overwriting buffer contents at unexpected times. However, to protect
> > against general data leakage and system memory corruption, the system needs
> > to provide a way to restrict the DMA to a predefined memory region (this is
> > usually done at firmware level, e.g. in ATF on some ARM platforms).
>
> Can you explain how ATF gets involved and to what extent it does help,
> besides enforcing a secure region from the ARM CPU's perpsective? Does
> the PCIe root complex not have an IOMMU but can somehow be denied access
> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> still some sort of basic protection that the HW enforces, right?

We need the ATF support for memory MPU (memory protection unit).
Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
that specific regions.

>
> On Broadcom STB SoCs we have had something similar for a while however
> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> basic protection mechanism whereby we can configure a region in DRAM to
> be PCIe read/write and CPU read/write which then gets used as the PCIe
> inbound region for the PCIe EP. By default the PCIe bridge is not
> allowed access to DRAM so we must call into a security agent to allow
> the PCIe bridge to access the designated DRAM region.
>
> We have done this using a private CMA area region assigned via Device
> Tree, assigned with a and requiring the PCIe EP driver to use
> dma_alloc_from_contiguous() in order to allocate from this device
> private CMA area. The only drawback with that approach is that it
> requires knowing how much memory you need up front for buffers and DMA
> descriptors that the PCIe EP will need to process. The problem is that
> it requires driver modifications and that does not scale over the number
> of PCIe EP drivers, some we absolutely do not control, but there is no
> need to bounce buffer. Your approach scales better across PCIe EP
> drivers however it does require bounce buffering which could be a
> performance hit.

Only the streaming DMA (map/unmap) needs bounce buffering.
I also added alloc/free support in this series
(https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() will
try to allocate memory from the predefined memory region.

As for the performance hit, it should be similar to the default swiotlb.
Here are my experiment results. Both SoCs lack IOMMU for PCIe.

PCIe wifi vht80 throughput -

MTK SoC tcp_tx tcp_rx udp_tx udp_rx
w/o Restricted DMA 244.1 134.66 312.56 350.79
w/ Restricted DMA 246.95 136.59 363.21 351.99

Rockchip SoC tcp_tx tcp_rx udp_tx udp_rx
w/o Restricted DMA 237.87 133.86 288.28 361.88
w/ Restricted DMA 256.01 130.95 292.28 353.19

The CPU usage doesn't increase too much either.
Although I didn't measure the CPU usage very precisely, it's ~3% with a single
big core (Cortex-A72) and ~5% with a single small core (Cortex-A53).

Thanks!

>
> Thanks!
> --
> Florian

2021-01-07 18:02:00

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
> Hi Greg and Konrad,
>
> This change is intended to be non-arch specific. Any arch that lacks DMA access
> control and has devices not behind an IOMMU can make use of it. Could you share
> why you think this should be arch specific?

The idea behind non-arch specific code is it to be generic. The devicetree
is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
be in arch specific code.

>
> Thanks!

2021-01-07 18:04:37

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/6] Restricted DMA

On 1/7/21 9:42 AM, Claire Chang wrote:

>> Can you explain how ATF gets involved and to what extent it does help,
>> besides enforcing a secure region from the ARM CPU's perpsective? Does
>> the PCIe root complex not have an IOMMU but can somehow be denied access
>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
>> still some sort of basic protection that the HW enforces, right?
>
> We need the ATF support for memory MPU (memory protection unit).
> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
> that specific regions.

OK so you do have a protection unit of some sort to enforce which region
in DRAM the PCIE bridge is allowed to access, that makes sense,
otherwise the restricted DMA region would only be a hint but nothing you
can really enforce. This is almost entirely analogous to our systems then.

There may be some value in standardizing on an ARM SMCCC call then since
you already support two different SoC vendors.

>
>>
>> On Broadcom STB SoCs we have had something similar for a while however
>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
>> basic protection mechanism whereby we can configure a region in DRAM to
>> be PCIe read/write and CPU read/write which then gets used as the PCIe
>> inbound region for the PCIe EP. By default the PCIe bridge is not
>> allowed access to DRAM so we must call into a security agent to allow
>> the PCIe bridge to access the designated DRAM region.
>>
>> We have done this using a private CMA area region assigned via Device
>> Tree, assigned with a and requiring the PCIe EP driver to use
>> dma_alloc_from_contiguous() in order to allocate from this device
>> private CMA area. The only drawback with that approach is that it
>> requires knowing how much memory you need up front for buffers and DMA
>> descriptors that the PCIe EP will need to process. The problem is that
>> it requires driver modifications and that does not scale over the number
>> of PCIe EP drivers, some we absolutely do not control, but there is no
>> need to bounce buffer. Your approach scales better across PCIe EP
>> drivers however it does require bounce buffering which could be a
>> performance hit.
>
> Only the streaming DMA (map/unmap) needs bounce buffering.

True, and typically only on transmit since you don't really control
where the sk_buff are allocated from, right? On RX since you need to
hand buffer addresses to the WLAN chip prior to DMA, you can allocate
them from a pool that already falls within the restricted DMA region, right?

> I also added alloc/free support in this series
> (https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() will
> try to allocate memory from the predefined memory region.
>
> As for the performance hit, it should be similar to the default swiotlb.
> Here are my experiment results. Both SoCs lack IOMMU for PCIe.
>
> PCIe wifi vht80 throughput -
>
> MTK SoC tcp_tx tcp_rx udp_tx udp_rx
> w/o Restricted DMA 244.1 134.66 312.56 350.79
> w/ Restricted DMA 246.95 136.59 363.21 351.99
>
> Rockchip SoC tcp_tx tcp_rx udp_tx udp_rx
> w/o Restricted DMA 237.87 133.86 288.28 361.88
> w/ Restricted DMA 256.01 130.95 292.28 353.19

How come you get better throughput with restricted DMA? Is it because
doing DMA to/from a contiguous region allows for better grouping of
transactions from the DRAM controller's perspective somehow?

>
> The CPU usage doesn't increase too much either.
> Although I didn't measure the CPU usage very precisely, it's ~3% with a single
> big core (Cortex-A72) and ~5% with a single small core (Cortex-A53).
>
> Thanks!
>
>>
>> Thanks!
>> --
>> Florian


--
Florian

2021-01-07 18:07:35

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

On Fri, Jan 08, 2021 at 01:39:43AM +0800, Claire Chang wrote:
> On Thu, Jan 7, 2021 at 2:58 AM Konrad Rzeszutek Wilk
> <[email protected]> wrote:
> >
> > On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> > > Introduce the new compatible string, restricted-dma-pool, for restricted
> > > DMA. One can specify the address and length of the restricted DMA memory
> > > region by restricted-dma-pool in the device tree.
> > >
> > > Signed-off-by: Claire Chang <[email protected]>
> > > ---
> > > .../reserved-memory/reserved-memory.txt | 24 +++++++++++++++++++
> > > 1 file changed, 24 insertions(+)
> > >
> > > diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > index e8d3096d922c..44975e2a1fd2 100644
> > > --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> > > used as a shared pool of DMA buffers for a set of devices. It can
> > > be used by an operating system to instantiate the necessary pool
> > > management subsystem if necessary.
> > > + - restricted-dma-pool: This indicates a region of memory meant to be
> > > + used as a pool of restricted DMA buffers for a set of devices. The
> > > + memory region would be the only region accessible to those devices.
> > > + When using this, the no-map and reusable properties must not be set,
> > > + so the operating system can create a virtual mapping that will be used
> > > + for synchronization. The main purpose for restricted DMA is to
> > > + mitigate the lack of DMA access control on systems without an IOMMU,
> > > + which could result in the DMA accessing the system memory at
> > > + unexpected times and/or unexpected addresses, possibly leading to data
> > > + leakage or corruption. The feature on its own provides a basic level
> > > + of protection against the DMA overwriting buffer contents at
> > > + unexpected times. However, to protect against general data leakage and
> > > + system memory corruption, the system needs to provide way to restrict
> > > + the DMA to a predefined memory region.
> >
> > Heya!
> >
> > I think I am missing something obvious here so please bear with my
> > questions:
> >
> > - This code adds the means of having the SWIOTLB pool tied to a specific
> > memory correct?
>
> It doesn't affect the existing SWIOTLB. It just utilizes the existing SWIOTLB
> code to create another DMA pool tied to a specific memory region for a given set
> of devices. It bounces the streaming DMA (map/unmap) in and out of that region
> and does the memory allocation (dma_direct_alloc) from the same region.

Right, so why can't it follow the same mechanism that Xen SWIOTLB does - which
had exactly the same problem (needed special handling on the pool) - and do
a similar code?

>
> >
> >
> > - Nothing stops the physical device from bypassing the SWIOTLB buffer.
> > That is if an errant device screwed up the length or DMA address, the
> > SWIOTLB would gladly do what the device told it do?
>
> So the system needs to provide a way to lock down the memory access, e.g. MPU.

OK! Would it be prudent to have this in the description above perhaps?
>
> >
> > - This has to be combined with SWIOTLB-force-ish to always use the
> > bounce buffer, otherwise you could still do DMA without using
> > SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?
>
> Since restricted DMA is for the devices that are not behind an IOMMU, I change
> the criteria
> `if (unlikely(swiotlb_force == SWIOTLB_FORCE))`
> to
> `if (unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dma_io_tlb_mem)`
> in dma_direct_map_page().
>
> Also, even if SWIOTLB=force, the restricted DMA pool is preferred if available
> (get_io_tlb_mem in https://lore.kernel.org/patchwork/patch/1360995/).
>
> Thanks!

2021-01-07 18:11:36

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
>> Hi Greg and Konrad,
>>
>> This change is intended to be non-arch specific. Any arch that lacks DMA access
>> control and has devices not behind an IOMMU can make use of it. Could you share
>> why you think this should be arch specific?
>
> The idea behind non-arch specific code is it to be generic. The devicetree
> is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
> be in arch specific code.

In premise the same code could be used with an ACPI enabled system with
an appropriate service to identify the restricted DMA regions and unlock
them.

More than 1 architecture requiring this function (ARM and ARM64 are the
two I can think of needing this immediately) sort of calls for making
the code architecture agnostic since past 2, you need something that scales.

There is already code today under kernel/dma/contiguous.c that is only
activated on a CONFIG_OF=y && CONFIG_OF_RESERVED_MEM=y system, this is
no different.
--
Florian

2021-01-07 18:18:19

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

On 1/7/21 10:00 AM, Konrad Rzeszutek Wilk wrote:
>>>
>>>
>>> - Nothing stops the physical device from bypassing the SWIOTLB buffer.
>>> That is if an errant device screwed up the length or DMA address, the
>>> SWIOTLB would gladly do what the device told it do?
>>
>> So the system needs to provide a way to lock down the memory access, e.g. MPU.
>
> OK! Would it be prudent to have this in the description above perhaps?

Yes this is something that must be documented as a requirement for the
restricted DMA pool users, otherwise attempting to do restricted DMA
pool is no different than say, using a device private CMA region.
Without the enforcement, this is just a best effort.
--
Florian

2021-01-07 21:27:53

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On Thu, Jan 07, 2021 at 10:09:14AM -0800, Florian Fainelli wrote:
> On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
> >> Hi Greg and Konrad,
> >>
> >> This change is intended to be non-arch specific. Any arch that lacks DMA access
> >> control and has devices not behind an IOMMU can make use of it. Could you share
> >> why you think this should be arch specific?
> >
> > The idea behind non-arch specific code is it to be generic. The devicetree
> > is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
> > be in arch specific code.
>
> In premise the same code could be used with an ACPI enabled system with
> an appropriate service to identify the restricted DMA regions and unlock
> them.

Which this patchset is not.

>
> More than 1 architecture requiring this function (ARM and ARM64 are the
> two I can think of needing this immediately) sort of calls for making
> the code architecture agnostic since past 2, you need something that scales.

I believe the use-case is for ARM64 at this moment.

>
> There is already code today under kernel/dma/contiguous.c that is only
> activated on a CONFIG_OF=y && CONFIG_OF_RESERVED_MEM=y system, this is
> no different.
> --
> Florian

2021-01-12 11:40:16

by Claire Chang

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/6] Restricted DMA

On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli <[email protected]> wrote:
>
> On 1/7/21 9:42 AM, Claire Chang wrote:
>
> >> Can you explain how ATF gets involved and to what extent it does help,
> >> besides enforcing a secure region from the ARM CPU's perpsective? Does
> >> the PCIe root complex not have an IOMMU but can somehow be denied access
> >> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> >> still some sort of basic protection that the HW enforces, right?
> >
> > We need the ATF support for memory MPU (memory protection unit).
> > Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
> > region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
> > that specific regions.
>
> OK so you do have a protection unit of some sort to enforce which region
> in DRAM the PCIE bridge is allowed to access, that makes sense,
> otherwise the restricted DMA region would only be a hint but nothing you
> can really enforce. This is almost entirely analogous to our systems then.

Here is the example of setting the MPU:
https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

>
> There may be some value in standardizing on an ARM SMCCC call then since
> you already support two different SoC vendors.
>
> >
> >>
> >> On Broadcom STB SoCs we have had something similar for a while however
> >> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> >> basic protection mechanism whereby we can configure a region in DRAM to
> >> be PCIe read/write and CPU read/write which then gets used as the PCIe
> >> inbound region for the PCIe EP. By default the PCIe bridge is not
> >> allowed access to DRAM so we must call into a security agent to allow
> >> the PCIe bridge to access the designated DRAM region.
> >>
> >> We have done this using a private CMA area region assigned via Device
> >> Tree, assigned with a and requiring the PCIe EP driver to use
> >> dma_alloc_from_contiguous() in order to allocate from this device
> >> private CMA area. The only drawback with that approach is that it
> >> requires knowing how much memory you need up front for buffers and DMA
> >> descriptors that the PCIe EP will need to process. The problem is that
> >> it requires driver modifications and that does not scale over the number
> >> of PCIe EP drivers, some we absolutely do not control, but there is no
> >> need to bounce buffer. Your approach scales better across PCIe EP
> >> drivers however it does require bounce buffering which could be a
> >> performance hit.
> >
> > Only the streaming DMA (map/unmap) needs bounce buffering.
>
> True, and typically only on transmit since you don't really control
> where the sk_buff are allocated from, right? On RX since you need to
> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
> them from a pool that already falls within the restricted DMA region, right?
>

Right, but applying bounce buffering to RX will make it more secure.
The device won't be able to modify the content after unmap. Just like what
iommu_unmap does.

> > I also added alloc/free support in this series
> > (https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() will
> > try to allocate memory from the predefined memory region.
> >
> > As for the performance hit, it should be similar to the default swiotlb.
> > Here are my experiment results. Both SoCs lack IOMMU for PCIe.
> >
> > PCIe wifi vht80 throughput -
> >
> > MTK SoC tcp_tx tcp_rx udp_tx udp_rx
> > w/o Restricted DMA 244.1 134.66 312.56 350.79
> > w/ Restricted DMA 246.95 136.59 363.21 351.99
> >
> > Rockchip SoC tcp_tx tcp_rx udp_tx udp_rx
> > w/o Restricted DMA 237.87 133.86 288.28 361.88
> > w/ Restricted DMA 256.01 130.95 292.28 353.19
>
> How come you get better throughput with restricted DMA? Is it because
> doing DMA to/from a contiguous region allows for better grouping of
> transactions from the DRAM controller's perspective somehow?

I'm not sure, but actually, enabling the default swiotlb for wifi also helps the
throughput a little bit for me.

>
> >
> > The CPU usage doesn't increase too much either.
> > Although I didn't measure the CPU usage very precisely, it's ~3% with a single
> > big core (Cortex-A72) and ~5% with a single small core (Cortex-A53).
> >
> > Thanks!
> >
> >>
> >> Thanks!
> >> --
> >> Florian
>
>
> --
> Florian

2021-01-12 11:41:20

by Claire Chang

[permalink] [raw]
Subject: Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

On Fri, Jan 8, 2021 at 2:15 AM Florian Fainelli <[email protected]> wrote:
>
> On 1/7/21 10:00 AM, Konrad Rzeszutek Wilk wrote:
> >>>
> >>>
> >>> - Nothing stops the physical device from bypassing the SWIOTLB buffer.
> >>> That is if an errant device screwed up the length or DMA address, the
> >>> SWIOTLB would gladly do what the device told it do?
> >>
> >> So the system needs to provide a way to lock down the memory access, e.g. MPU.
> >
> > OK! Would it be prudent to have this in the description above perhaps?
>
> Yes this is something that must be documented as a requirement for the
> restricted DMA pool users, otherwise attempting to do restricted DMA
> pool is no different than say, using a device private CMA region.
> Without the enforcement, this is just a best effort.

Will add in the next version.

> --
> Florian

2021-01-13 02:25:05

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/6] Restricted DMA

On 1/11/21 11:48 PM, Claire Chang wrote:
> On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli <[email protected]> wrote:
>>
>> On 1/7/21 9:42 AM, Claire Chang wrote:
>>
>>>> Can you explain how ATF gets involved and to what extent it does help,
>>>> besides enforcing a secure region from the ARM CPU's perpsective? Does
>>>> the PCIe root complex not have an IOMMU but can somehow be denied access
>>>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
>>>> still some sort of basic protection that the HW enforces, right?
>>>
>>> We need the ATF support for memory MPU (memory protection unit).
>>> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
>>> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
>>> that specific regions.
>>
>> OK so you do have a protection unit of some sort to enforce which region
>> in DRAM the PCIE bridge is allowed to access, that makes sense,
>> otherwise the restricted DMA region would only be a hint but nothing you
>> can really enforce. This is almost entirely analogous to our systems then.
>
> Here is the example of setting the MPU:
> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
>
>>
>> There may be some value in standardizing on an ARM SMCCC call then since
>> you already support two different SoC vendors.
>>
>>>
>>>>
>>>> On Broadcom STB SoCs we have had something similar for a while however
>>>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
>>>> basic protection mechanism whereby we can configure a region in DRAM to
>>>> be PCIe read/write and CPU read/write which then gets used as the PCIe
>>>> inbound region for the PCIe EP. By default the PCIe bridge is not
>>>> allowed access to DRAM so we must call into a security agent to allow
>>>> the PCIe bridge to access the designated DRAM region.
>>>>
>>>> We have done this using a private CMA area region assigned via Device
>>>> Tree, assigned with a and requiring the PCIe EP driver to use
>>>> dma_alloc_from_contiguous() in order to allocate from this device
>>>> private CMA area. The only drawback with that approach is that it
>>>> requires knowing how much memory you need up front for buffers and DMA
>>>> descriptors that the PCIe EP will need to process. The problem is that
>>>> it requires driver modifications and that does not scale over the number
>>>> of PCIe EP drivers, some we absolutely do not control, but there is no
>>>> need to bounce buffer. Your approach scales better across PCIe EP
>>>> drivers however it does require bounce buffering which could be a
>>>> performance hit.
>>>
>>> Only the streaming DMA (map/unmap) needs bounce buffering.
>>
>> True, and typically only on transmit since you don't really control
>> where the sk_buff are allocated from, right? On RX since you need to
>> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
>> them from a pool that already falls within the restricted DMA region, right?
>>
>
> Right, but applying bounce buffering to RX will make it more secure.
> The device won't be able to modify the content after unmap. Just like what
> iommu_unmap does.

Sure, however the goals of using bounce buffering equally applies to RX
and TX in that this is the only layer sitting between a stack (block,
networking, USB, etc.) and the underlying device driver that scales well
in order to massage a dma_addr_t to be within a particular physical range.

There is however room for improvement if the drivers are willing to
change their buffer allocation strategy. When you receive Wi-Fi frames
you need to allocate buffers for the Wi-Fi device to DMA into, and that
happens ahead of the DMA transfers by the Wi-Fi device. At buffer
allocation time you could very well allocate these frames from the
restricted DMA region without having to bounce buffer them since the
host CPU is in control over where and when to DMA into.

The issue is that each network driver may implement its own buffer
allocation strategy, some may simply call netdev_alloc_skb() which gives
zero control over where the buffer comes from unless you play tricks
with NUMA node allocations and somehow declare that your restricted DMA
region is a different NUMA node. If the driver allocates pages and then
attaches a SKB to that page using build_skb(), then you have much more
control over where that page comes from, and this is where using a
device private CMA are helps, because you can just do
dma_alloc_from_contiguous() and that will ensure that the pages are
coming from your specific CMA area.

Few questions on the implementation:

- is there any warning or error being printed if the restricted DMA
region is outside of a device's DMA addressable range?

- are there are any helpful statistics that could be shown to indicate
that the restricted DMA region was sized too small, e.g.: that
allocation of a DMA buffer failed because we ran out of space in the
swiotlb pool?

>
>>> I also added alloc/free support in this series
>>> (https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() will
>>> try to allocate memory from the predefined memory region.
>>>
>>> As for the performance hit, it should be similar to the default swiotlb.
>>> Here are my experiment results. Both SoCs lack IOMMU for PCIe.
>>>
>>> PCIe wifi vht80 throughput -
>>>
>>> MTK SoC tcp_tx tcp_rx udp_tx udp_rx
>>> w/o Restricted DMA 244.1 134.66 312.56 350.79
>>> w/ Restricted DMA 246.95 136.59 363.21 351.99
>>>
>>> Rockchip SoC tcp_tx tcp_rx udp_tx udp_rx
>>> w/o Restricted DMA 237.87 133.86 288.28 361.88
>>> w/ Restricted DMA 256.01 130.95 292.28 353.19
>>
>> How come you get better throughput with restricted DMA? Is it because
>> doing DMA to/from a contiguous region allows for better grouping of
>> transactions from the DRAM controller's perspective somehow?
>
> I'm not sure, but actually, enabling the default swiotlb for wifi also helps the
> throughput a little bit for me.

OK, it would be interesting if you could get to the bottom of why
performance does increase with swiotlb.
--
Florian

2021-01-13 03:29:46

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 4/6] swiotlb: Add restricted DMA alloc/free support.

On 1/5/21 7:41 PM, Claire Chang wrote:
> Add the functions, swiotlb_alloc and swiotlb_free to support the
> memory allocation from restricted DMA pool.
>
> Signed-off-by: Claire Chang <[email protected]>
> ---

[snip]

> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 30ccbc08e229..126e9b3354d6 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -137,6 +137,11 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> void *ret;
> int err;
>
> +#ifdef CONFIG_SWIOTLB
> + if (unlikely(dev->dma_io_tlb_mem))
> + return swiotlb_alloc(dev, size, dma_handle, attrs);
> +#endif

While this is potentially a hot path, I am not sure of the unkikely is
warranted, maybe best left as a plain conditional.
--
Florian

2021-01-13 03:29:51

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool

On 1/5/21 7:41 PM, Claire Chang wrote:
> If a device is not behind an IOMMU, we look up the device node and set
> up the restricted DMA when the restricted-dma-pool is presented.
>
> Signed-off-by: Claire Chang <[email protected]>
> ---

[snip]

> +int of_dma_set_restricted_buffer(struct device *dev)
> +{
> + struct device_node *node;
> + int count, i;
> +
> + if (!dev->of_node)
> + return 0;
> +
> + count = of_property_count_elems_of_size(dev->of_node, "memory-region",
> + sizeof(phandle));

You could have an early check for count < 0, along with an error
message, if that is deemed useful.

> + for (i = 0; i < count; i++) {
> + node = of_parse_phandle(dev->of_node, "memory-region", i);
> + if (of_device_is_compatible(node, "restricted-dma-pool"))

And you may want to add here an of_device_is_available(node). A platform
that provides the Device Tree firmware and try to support multiple
different SoCs may try to determine if an IOMMU is present, and if it
is, it could be marking the restriced-dma-pool region with a 'status =
"disabled"' property, or any variant of that scheme.

> + return of_reserved_mem_device_init_by_idx(
> + dev, dev->of_node, i);

This does not seem to be supporting more than one memory region, did not
you want something like instead:

ret = of_reserved_mem_device_init_by_idx(...);
if (ret)
return ret;

> + }
> +
> + return 0;
> +}
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index aedfaaafd3e7..e2c7409956ab 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -182,6 +182,10 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
> arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
>
> dev->dma_range_map = map;
> +
> + if (!iommu)
> + return of_dma_set_restricted_buffer(dev);
> +
> return 0;
> }
> EXPORT_SYMBOL_GPL(of_dma_configure_id);
> diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
> index d9e6a324de0a..28a2dfa197ba 100644
> --- a/drivers/of/of_private.h
> +++ b/drivers/of/of_private.h
> @@ -161,12 +161,17 @@ struct bus_dma_region;
> #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
> int of_dma_get_range(struct device_node *np,
> const struct bus_dma_region **map);
> +int of_dma_set_restricted_buffer(struct device *dev);
> #else
> static inline int of_dma_get_range(struct device_node *np,
> const struct bus_dma_region **map)
> {
> return -ENODEV;
> }
> +static inline int of_dma_get_restricted_buffer(struct device *dev)
> +{
> + return -ENODEV;
> +}
> #endif
>
> #endif /* _LINUX_OF_PRIVATE_H */
>


--
Florian

2021-01-13 03:30:41

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On 1/7/21 1:19 PM, Konrad Rzeszutek Wilk wrote:
> On Thu, Jan 07, 2021 at 10:09:14AM -0800, Florian Fainelli wrote:
>> On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
>>>> Hi Greg and Konrad,
>>>>
>>>> This change is intended to be non-arch specific. Any arch that lacks DMA access
>>>> control and has devices not behind an IOMMU can make use of it. Could you share
>>>> why you think this should be arch specific?
>>>
>>> The idea behind non-arch specific code is it to be generic. The devicetree
>>> is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
>>> be in arch specific code.
>>
>> In premise the same code could be used with an ACPI enabled system with
>> an appropriate service to identify the restricted DMA regions and unlock
>> them.
>
> Which this patchset is not.

ACPI is not included, but the comment about Device Tree being specific
to PowerPC, SPARC and ARM is x86 is not quite correct. There is an
architecture specific part to obtaining where the Device Tree lives in
memory, but the implementation itself is architecture agnostic (with
some early SPARC/OpenFirmware shenanigans), and x86 does, or rather did
support Device Tree to a very small extent with the CE4100 platform.

Would you prefer that an swiotlb_of.c file be created instead or
something along those lines to better encapsulate where the OF specific
code lives?

>
>>
>> More than 1 architecture requiring this function (ARM and ARM64 are the
>> two I can think of needing this immediately) sort of calls for making
>> the code architecture agnostic since past 2, you need something that scales.
>
> I believe the use-case is for ARM64 at this moment.

For the platforms that Claire uses, certainly for the ones we use, ARM
and ARM64 are in scope.
--
Florian

2021-01-13 03:31:14

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/6] swiotlb: Use restricted DMA pool if available

On 1/5/21 7:41 PM, Claire Chang wrote:
> Regardless of swiotlb setting, the restricted DMA pool is preferred if
> available.
>
> The restricted DMA pools provide a basic level of protection against
> the DMA overwriting buffer contents at unexpected times. However, to
> protect against general data leakage and system memory corruption, the
> system needs to provide a way to restrict the DMA to a predefined memory
> region.
>
> Signed-off-by: Claire Chang <[email protected]>

You could probably split this patch into two:

- one that introduces the get_io_tlb_mem() getter, updates all callers
of is_swiotlb_buffer() to gain a 'struct device' argument
- another one that does add support for a non-default swiotlb pool and
adds dev->dma_io_tlb_mem

Other than that, LGTM!
--
Florian

2021-01-13 03:32:12

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On 1/5/21 7:41 PM, Claire Chang wrote:
> Add the initialization function to create restricted DMA pools from
> matching reserved-memory nodes in the device tree.
>
> Signed-off-by: Claire Chang <[email protected]>
> ---
> include/linux/device.h | 4 ++
> include/linux/swiotlb.h | 7 +-
> kernel/dma/Kconfig | 1 +
> kernel/dma/swiotlb.c | 144 ++++++++++++++++++++++++++++++++++------
> 4 files changed, 131 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 89bb8b84173e..ca6f71ec8871 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -413,6 +413,7 @@ struct dev_links_info {
> * @dma_pools: Dma pools (if dma'ble device).
> * @dma_mem: Internal for coherent mem override.
> * @cma_area: Contiguous memory area for dma allocations
> + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
> * @archdata: For arch-specific additions.
> * @of_node: Associated device tree node.
> * @fwnode: Associated device node supplied by platform firmware.
> @@ -515,6 +516,9 @@ struct device {
> #ifdef CONFIG_DMA_CMA
> struct cma *cma_area; /* contiguous memory area for dma
> allocations */
> +#endif
> +#ifdef CONFIG_SWIOTLB
> + struct io_tlb_mem *dma_io_tlb_mem;
> #endif
> /* arch specific additions */
> struct dev_archdata archdata;
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index dd8eb57cbb8f..a1bbd7788885 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
> *
> * @start: The start address of the swiotlb memory pool. Used to do a quick
> * range check to see if the memory was in fact allocated by this
> - * API.
> + * API. For restricted DMA pool, this is device tree adjustable.

Maybe write it as this is "firmware adjustable" such that when/if ACPI
needs something like this, the description does not need updating.

[snip]

> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> + struct device *dev)
> +{
> + struct io_tlb_mem *mem = rmem->priv;
> + int ret;
> +
> + if (dev->dma_io_tlb_mem)
> + return -EBUSY;
> +
> + if (!mem) {
> + mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> + if (!mem)
> + return -ENOMEM;
> +
> + if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {

MEMREMAP_WB sounds appropriate as a default.
Documentation/devicetree/bindings/reserved-memory/ramoops.txt does
define an "unbuffered" property which in premise could be applied to the
generic reserved memory binding as well and that we may have to be
honoring here, if we were to make it more generic. Oh well, this does
not need to be addressed right now I guess.
--
Florian

2021-01-13 03:38:54

by Robin Murphy

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On 2021-01-07 17:57, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
>> Hi Greg and Konrad,
>>
>> This change is intended to be non-arch specific. Any arch that lacks DMA access
>> control and has devices not behind an IOMMU can make use of it. Could you share
>> why you think this should be arch specific?
>
> The idea behind non-arch specific code is it to be generic. The devicetree
> is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
> be in arch specific code.

Sorry, but that's an absurd argument. By the same token you'd equally
have to claim that bits of, say, the Broadcom WiFi driver (not to
mention dozens of others) should be split out into arch code, since not
all platforms use the devicetree parts, nor the ACPI parts, nor the PCI
parts...

There is nothing architecture-specific about using devicetree as a
system description - AFAIK there *are* a handful of x86 platforms that
use it, besides even more architectures than you've listed above. It has
long been the policy that devicetree-related code for a particular
subsystem should just live within that subsystem. Sometimes if there's
enough of it it gets collected together into its own file - e.g.
drivers/pci/of.c - otherwise it tends to just get #ifdef'ed - e.g.
of_spi_parse_dt(), or the other DMA reserved-memory consumers that
already exist as Florian points out.

Besides, there are far more platforms that enable CONFIG_OF than enable
CONFIG_SWIOTLB, so by that metric the whole of the SWIOTLB code itself
is even less "generic" than any DT parsing :P

Robin.

2021-01-13 03:43:00

by Tomasz Figa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/6] Restricted DMA

Hi Florian,

On Wed, Jan 13, 2021 at 3:01 AM Florian Fainelli <[email protected]> wrote:
>
> On 1/11/21 11:48 PM, Claire Chang wrote:
> > On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli <[email protected]> wrote:
> >>
> >> On 1/7/21 9:42 AM, Claire Chang wrote:
> >>
> >>>> Can you explain how ATF gets involved and to what extent it does help,
> >>>> besides enforcing a secure region from the ARM CPU's perpsective? Does
> >>>> the PCIe root complex not have an IOMMU but can somehow be denied access
> >>>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> >>>> still some sort of basic protection that the HW enforces, right?
> >>>
> >>> We need the ATF support for memory MPU (memory protection unit).
> >>> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
> >>> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
> >>> that specific regions.
> >>
> >> OK so you do have a protection unit of some sort to enforce which region
> >> in DRAM the PCIE bridge is allowed to access, that makes sense,
> >> otherwise the restricted DMA region would only be a hint but nothing you
> >> can really enforce. This is almost entirely analogous to our systems then.
> >
> > Here is the example of setting the MPU:
> > https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
> >
> >>
> >> There may be some value in standardizing on an ARM SMCCC call then since
> >> you already support two different SoC vendors.
> >>
> >>>
> >>>>
> >>>> On Broadcom STB SoCs we have had something similar for a while however
> >>>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> >>>> basic protection mechanism whereby we can configure a region in DRAM to
> >>>> be PCIe read/write and CPU read/write which then gets used as the PCIe
> >>>> inbound region for the PCIe EP. By default the PCIe bridge is not
> >>>> allowed access to DRAM so we must call into a security agent to allow
> >>>> the PCIe bridge to access the designated DRAM region.
> >>>>
> >>>> We have done this using a private CMA area region assigned via Device
> >>>> Tree, assigned with a and requiring the PCIe EP driver to use
> >>>> dma_alloc_from_contiguous() in order to allocate from this device
> >>>> private CMA area. The only drawback with that approach is that it
> >>>> requires knowing how much memory you need up front for buffers and DMA
> >>>> descriptors that the PCIe EP will need to process. The problem is that
> >>>> it requires driver modifications and that does not scale over the number
> >>>> of PCIe EP drivers, some we absolutely do not control, but there is no
> >>>> need to bounce buffer. Your approach scales better across PCIe EP
> >>>> drivers however it does require bounce buffering which could be a
> >>>> performance hit.
> >>>
> >>> Only the streaming DMA (map/unmap) needs bounce buffering.
> >>
> >> True, and typically only on transmit since you don't really control
> >> where the sk_buff are allocated from, right? On RX since you need to
> >> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
> >> them from a pool that already falls within the restricted DMA region, right?
> >>
> >
> > Right, but applying bounce buffering to RX will make it more secure.
> > The device won't be able to modify the content after unmap. Just like what
> > iommu_unmap does.
>
> Sure, however the goals of using bounce buffering equally applies to RX
> and TX in that this is the only layer sitting between a stack (block,
> networking, USB, etc.) and the underlying device driver that scales well
> in order to massage a dma_addr_t to be within a particular physical range.
>
> There is however room for improvement if the drivers are willing to
> change their buffer allocation strategy. When you receive Wi-Fi frames
> you need to allocate buffers for the Wi-Fi device to DMA into, and that
> happens ahead of the DMA transfers by the Wi-Fi device. At buffer
> allocation time you could very well allocate these frames from the
> restricted DMA region without having to bounce buffer them since the
> host CPU is in control over where and when to DMA into.
>

That is, however, still a trade-off between saving that one copy and
protection from the DMA tampering with the packet contents when the
kernel is reading them. Notice how the copy effectively makes a
snapshot of the contents, guaranteeing that the kernel has a
consistent view of the packet, which is not true if the DMA could
modify the buffer contents in the middle of CPU accesses.

Best regards,
Tomasz

> The issue is that each network driver may implement its own buffer
> allocation strategy, some may simply call netdev_alloc_skb() which gives
> zero control over where the buffer comes from unless you play tricks
> with NUMA node allocations and somehow declare that your restricted DMA
> region is a different NUMA node. If the driver allocates pages and then
> attaches a SKB to that page using build_skb(), then you have much more
> control over where that page comes from, and this is where using a
> device private CMA are helps, because you can just do
> dma_alloc_from_contiguous() and that will ensure that the pages are
> coming from your specific CMA area.
>
> Few questions on the implementation:
>
> - is there any warning or error being printed if the restricted DMA
> region is outside of a device's DMA addressable range?
>
> - are there are any helpful statistics that could be shown to indicate
> that the restricted DMA region was sized too small, e.g.: that
> allocation of a DMA buffer failed because we ran out of space in the
> swiotlb pool?
>
> >
> >>> I also added alloc/free support in this series
> >>> (https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() will
> >>> try to allocate memory from the predefined memory region.
> >>>
> >>> As for the performance hit, it should be similar to the default swiotlb.
> >>> Here are my experiment results. Both SoCs lack IOMMU for PCIe.
> >>>
> >>> PCIe wifi vht80 throughput -
> >>>
> >>> MTK SoC tcp_tx tcp_rx udp_tx udp_rx
> >>> w/o Restricted DMA 244.1 134.66 312.56 350.79
> >>> w/ Restricted DMA 246.95 136.59 363.21 351.99
> >>>
> >>> Rockchip SoC tcp_tx tcp_rx udp_tx udp_rx
> >>> w/o Restricted DMA 237.87 133.86 288.28 361.88
> >>> w/ Restricted DMA 256.01 130.95 292.28 353.19
> >>
> >> How come you get better throughput with restricted DMA? Is it because
> >> doing DMA to/from a contiguous region allows for better grouping of
> >> transactions from the DRAM controller's perspective somehow?
> >
> > I'm not sure, but actually, enabling the default swiotlb for wifi also helps the
> > throughput a little bit for me.
>
> OK, it would be interesting if you could get to the bottom of why
> performance does increase with swiotlb.
> --
> Florian

2021-01-13 04:10:15

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/6] Restricted DMA



On 1/12/2021 6:29 PM, Tomasz Figa wrote:
> Hi Florian,
>
> On Wed, Jan 13, 2021 at 3:01 AM Florian Fainelli <[email protected]> wrote:
>>
>> On 1/11/21 11:48 PM, Claire Chang wrote:
>>> On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli <[email protected]> wrote:
>>>>
>>>> On 1/7/21 9:42 AM, Claire Chang wrote:
>>>>
>>>>>> Can you explain how ATF gets involved and to what extent it does help,
>>>>>> besides enforcing a secure region from the ARM CPU's perpsective? Does
>>>>>> the PCIe root complex not have an IOMMU but can somehow be denied access
>>>>>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
>>>>>> still some sort of basic protection that the HW enforces, right?
>>>>>
>>>>> We need the ATF support for memory MPU (memory protection unit).
>>>>> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
>>>>> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
>>>>> that specific regions.
>>>>
>>>> OK so you do have a protection unit of some sort to enforce which region
>>>> in DRAM the PCIE bridge is allowed to access, that makes sense,
>>>> otherwise the restricted DMA region would only be a hint but nothing you
>>>> can really enforce. This is almost entirely analogous to our systems then.
>>>
>>> Here is the example of setting the MPU:
>>> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
>>>
>>>>
>>>> There may be some value in standardizing on an ARM SMCCC call then since
>>>> you already support two different SoC vendors.
>>>>
>>>>>
>>>>>>
>>>>>> On Broadcom STB SoCs we have had something similar for a while however
>>>>>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
>>>>>> basic protection mechanism whereby we can configure a region in DRAM to
>>>>>> be PCIe read/write and CPU read/write which then gets used as the PCIe
>>>>>> inbound region for the PCIe EP. By default the PCIe bridge is not
>>>>>> allowed access to DRAM so we must call into a security agent to allow
>>>>>> the PCIe bridge to access the designated DRAM region.
>>>>>>
>>>>>> We have done this using a private CMA area region assigned via Device
>>>>>> Tree, assigned with a and requiring the PCIe EP driver to use
>>>>>> dma_alloc_from_contiguous() in order to allocate from this device
>>>>>> private CMA area. The only drawback with that approach is that it
>>>>>> requires knowing how much memory you need up front for buffers and DMA
>>>>>> descriptors that the PCIe EP will need to process. The problem is that
>>>>>> it requires driver modifications and that does not scale over the number
>>>>>> of PCIe EP drivers, some we absolutely do not control, but there is no
>>>>>> need to bounce buffer. Your approach scales better across PCIe EP
>>>>>> drivers however it does require bounce buffering which could be a
>>>>>> performance hit.
>>>>>
>>>>> Only the streaming DMA (map/unmap) needs bounce buffering.
>>>>
>>>> True, and typically only on transmit since you don't really control
>>>> where the sk_buff are allocated from, right? On RX since you need to
>>>> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
>>>> them from a pool that already falls within the restricted DMA region, right?
>>>>
>>>
>>> Right, but applying bounce buffering to RX will make it more secure.
>>> The device won't be able to modify the content after unmap. Just like what
>>> iommu_unmap does.
>>
>> Sure, however the goals of using bounce buffering equally applies to RX
>> and TX in that this is the only layer sitting between a stack (block,
>> networking, USB, etc.) and the underlying device driver that scales well
>> in order to massage a dma_addr_t to be within a particular physical range.
>>
>> There is however room for improvement if the drivers are willing to
>> change their buffer allocation strategy. When you receive Wi-Fi frames
>> you need to allocate buffers for the Wi-Fi device to DMA into, and that
>> happens ahead of the DMA transfers by the Wi-Fi device. At buffer
>> allocation time you could very well allocate these frames from the
>> restricted DMA region without having to bounce buffer them since the
>> host CPU is in control over where and when to DMA into.
>>
>
> That is, however, still a trade-off between saving that one copy and
> protection from the DMA tampering with the packet contents when the
> kernel is reading them. Notice how the copy effectively makes a
> snapshot of the contents, guaranteeing that the kernel has a
> consistent view of the packet, which is not true if the DMA could
> modify the buffer contents in the middle of CPU accesses.

I would say that the window just became so much narrower for the PCIe
end-point to overwrite contents with the copy because it would have to
happen within the dma_unmap_{page,single} time and before the copy is
finished to the bounce buffer.
--
Florian

2021-01-13 04:27:47

by Tomasz Figa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/6] Restricted DMA

On Wed, Jan 13, 2021 at 12:56 PM Florian Fainelli <[email protected]> wrote:
>
>
>
> On 1/12/2021 6:29 PM, Tomasz Figa wrote:
> > Hi Florian,
> >
> > On Wed, Jan 13, 2021 at 3:01 AM Florian Fainelli <[email protected]> wrote:
> >>
> >> On 1/11/21 11:48 PM, Claire Chang wrote:
> >>> On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli <[email protected]> wrote:
> >>>>
> >>>> On 1/7/21 9:42 AM, Claire Chang wrote:
> >>>>
> >>>>>> Can you explain how ATF gets involved and to what extent it does help,
> >>>>>> besides enforcing a secure region from the ARM CPU's perpsective? Does
> >>>>>> the PCIe root complex not have an IOMMU but can somehow be denied access
> >>>>>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> >>>>>> still some sort of basic protection that the HW enforces, right?
> >>>>>
> >>>>> We need the ATF support for memory MPU (memory protection unit).
> >>>>> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
> >>>>> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
> >>>>> that specific regions.
> >>>>
> >>>> OK so you do have a protection unit of some sort to enforce which region
> >>>> in DRAM the PCIE bridge is allowed to access, that makes sense,
> >>>> otherwise the restricted DMA region would only be a hint but nothing you
> >>>> can really enforce. This is almost entirely analogous to our systems then.
> >>>
> >>> Here is the example of setting the MPU:
> >>> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
> >>>
> >>>>
> >>>> There may be some value in standardizing on an ARM SMCCC call then since
> >>>> you already support two different SoC vendors.
> >>>>
> >>>>>
> >>>>>>
> >>>>>> On Broadcom STB SoCs we have had something similar for a while however
> >>>>>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> >>>>>> basic protection mechanism whereby we can configure a region in DRAM to
> >>>>>> be PCIe read/write and CPU read/write which then gets used as the PCIe
> >>>>>> inbound region for the PCIe EP. By default the PCIe bridge is not
> >>>>>> allowed access to DRAM so we must call into a security agent to allow
> >>>>>> the PCIe bridge to access the designated DRAM region.
> >>>>>>
> >>>>>> We have done this using a private CMA area region assigned via Device
> >>>>>> Tree, assigned with a and requiring the PCIe EP driver to use
> >>>>>> dma_alloc_from_contiguous() in order to allocate from this device
> >>>>>> private CMA area. The only drawback with that approach is that it
> >>>>>> requires knowing how much memory you need up front for buffers and DMA
> >>>>>> descriptors that the PCIe EP will need to process. The problem is that
> >>>>>> it requires driver modifications and that does not scale over the number
> >>>>>> of PCIe EP drivers, some we absolutely do not control, but there is no
> >>>>>> need to bounce buffer. Your approach scales better across PCIe EP
> >>>>>> drivers however it does require bounce buffering which could be a
> >>>>>> performance hit.
> >>>>>
> >>>>> Only the streaming DMA (map/unmap) needs bounce buffering.
> >>>>
> >>>> True, and typically only on transmit since you don't really control
> >>>> where the sk_buff are allocated from, right? On RX since you need to
> >>>> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
> >>>> them from a pool that already falls within the restricted DMA region, right?
> >>>>
> >>>
> >>> Right, but applying bounce buffering to RX will make it more secure.
> >>> The device won't be able to modify the content after unmap. Just like what
> >>> iommu_unmap does.
> >>
> >> Sure, however the goals of using bounce buffering equally applies to RX
> >> and TX in that this is the only layer sitting between a stack (block,
> >> networking, USB, etc.) and the underlying device driver that scales well
> >> in order to massage a dma_addr_t to be within a particular physical range.
> >>
> >> There is however room for improvement if the drivers are willing to
> >> change their buffer allocation strategy. When you receive Wi-Fi frames
> >> you need to allocate buffers for the Wi-Fi device to DMA into, and that
> >> happens ahead of the DMA transfers by the Wi-Fi device. At buffer
> >> allocation time you could very well allocate these frames from the
> >> restricted DMA region without having to bounce buffer them since the
> >> host CPU is in control over where and when to DMA into.
> >>
> >
> > That is, however, still a trade-off between saving that one copy and
> > protection from the DMA tampering with the packet contents when the
> > kernel is reading them. Notice how the copy effectively makes a
> > snapshot of the contents, guaranteeing that the kernel has a
> > consistent view of the packet, which is not true if the DMA could
> > modify the buffer contents in the middle of CPU accesses.
>
> I would say that the window just became so much narrower for the PCIe
> end-point to overwrite contents with the copy because it would have to
> happen within the dma_unmap_{page,single} time and before the copy is
> finished to the bounce buffer.

Not only. Imagine this:

a) Without bouncing:

- RX interrupt
- Pass the packet to the network stack
- Network stack validates the packet
- DMA overwrites the packet
- Network stack goes boom, because the packet changed after validation

b) With bouncing:

- RX interrupt
- Copy the packet to a DMA-inaccessible buffer
- Network stack validates the packet
- Network stack is happy, because the packet is guaranteed to stay the
same after validation

Best regards,
Tomasz

2021-01-13 04:44:00

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/6] Restricted DMA



On 1/12/2021 8:25 PM, Tomasz Figa wrote:
> On Wed, Jan 13, 2021 at 12:56 PM Florian Fainelli <[email protected]> wrote:
>>
>>
>>
>> On 1/12/2021 6:29 PM, Tomasz Figa wrote:
>>> Hi Florian,
>>>
>>> On Wed, Jan 13, 2021 at 3:01 AM Florian Fainelli <[email protected]> wrote:
>>>>
>>>> On 1/11/21 11:48 PM, Claire Chang wrote:
>>>>> On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli <[email protected]> wrote:
>>>>>>
>>>>>> On 1/7/21 9:42 AM, Claire Chang wrote:
>>>>>>
>>>>>>>> Can you explain how ATF gets involved and to what extent it does help,
>>>>>>>> besides enforcing a secure region from the ARM CPU's perpsective? Does
>>>>>>>> the PCIe root complex not have an IOMMU but can somehow be denied access
>>>>>>>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
>>>>>>>> still some sort of basic protection that the HW enforces, right?
>>>>>>>
>>>>>>> We need the ATF support for memory MPU (memory protection unit).
>>>>>>> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
>>>>>>> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
>>>>>>> that specific regions.
>>>>>>
>>>>>> OK so you do have a protection unit of some sort to enforce which region
>>>>>> in DRAM the PCIE bridge is allowed to access, that makes sense,
>>>>>> otherwise the restricted DMA region would only be a hint but nothing you
>>>>>> can really enforce. This is almost entirely analogous to our systems then.
>>>>>
>>>>> Here is the example of setting the MPU:
>>>>> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
>>>>>
>>>>>>
>>>>>> There may be some value in standardizing on an ARM SMCCC call then since
>>>>>> you already support two different SoC vendors.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Broadcom STB SoCs we have had something similar for a while however
>>>>>>>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
>>>>>>>> basic protection mechanism whereby we can configure a region in DRAM to
>>>>>>>> be PCIe read/write and CPU read/write which then gets used as the PCIe
>>>>>>>> inbound region for the PCIe EP. By default the PCIe bridge is not
>>>>>>>> allowed access to DRAM so we must call into a security agent to allow
>>>>>>>> the PCIe bridge to access the designated DRAM region.
>>>>>>>>
>>>>>>>> We have done this using a private CMA area region assigned via Device
>>>>>>>> Tree, assigned with a and requiring the PCIe EP driver to use
>>>>>>>> dma_alloc_from_contiguous() in order to allocate from this device
>>>>>>>> private CMA area. The only drawback with that approach is that it
>>>>>>>> requires knowing how much memory you need up front for buffers and DMA
>>>>>>>> descriptors that the PCIe EP will need to process. The problem is that
>>>>>>>> it requires driver modifications and that does not scale over the number
>>>>>>>> of PCIe EP drivers, some we absolutely do not control, but there is no
>>>>>>>> need to bounce buffer. Your approach scales better across PCIe EP
>>>>>>>> drivers however it does require bounce buffering which could be a
>>>>>>>> performance hit.
>>>>>>>
>>>>>>> Only the streaming DMA (map/unmap) needs bounce buffering.
>>>>>>
>>>>>> True, and typically only on transmit since you don't really control
>>>>>> where the sk_buff are allocated from, right? On RX since you need to
>>>>>> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
>>>>>> them from a pool that already falls within the restricted DMA region, right?
>>>>>>
>>>>>
>>>>> Right, but applying bounce buffering to RX will make it more secure.
>>>>> The device won't be able to modify the content after unmap. Just like what
>>>>> iommu_unmap does.
>>>>
>>>> Sure, however the goals of using bounce buffering equally applies to RX
>>>> and TX in that this is the only layer sitting between a stack (block,
>>>> networking, USB, etc.) and the underlying device driver that scales well
>>>> in order to massage a dma_addr_t to be within a particular physical range.
>>>>
>>>> There is however room for improvement if the drivers are willing to
>>>> change their buffer allocation strategy. When you receive Wi-Fi frames
>>>> you need to allocate buffers for the Wi-Fi device to DMA into, and that
>>>> happens ahead of the DMA transfers by the Wi-Fi device. At buffer
>>>> allocation time you could very well allocate these frames from the
>>>> restricted DMA region without having to bounce buffer them since the
>>>> host CPU is in control over where and when to DMA into.
>>>>
>>>
>>> That is, however, still a trade-off between saving that one copy and
>>> protection from the DMA tampering with the packet contents when the
>>> kernel is reading them. Notice how the copy effectively makes a
>>> snapshot of the contents, guaranteeing that the kernel has a
>>> consistent view of the packet, which is not true if the DMA could
>>> modify the buffer contents in the middle of CPU accesses.
>>
>> I would say that the window just became so much narrower for the PCIe
>> end-point to overwrite contents with the copy because it would have to
>> happen within the dma_unmap_{page,single} time and before the copy is
>> finished to the bounce buffer.
>
> Not only. Imagine this:
>
> a) Without bouncing:
>
> - RX interrupt
> - Pass the packet to the network stack
> - Network stack validates the packet
> - DMA overwrites the packet
> - Network stack goes boom, because the packet changed after validation
>
> b) With bouncing:
>
> - RX interrupt
> - Copy the packet to a DMA-inaccessible buffer
> - Network stack validates the packet
> - Network stack is happy, because the packet is guaranteed to stay the
> same after validation

Yes that's a much safer set of operations, thanks for walking through a
practical example.
--
Florian

2021-01-13 11:53:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH v3 1/6] swiotlb: Add io_tlb_mem struct

On Wed, Jan 06, 2021 at 11:41:19AM +0800, Claire Chang wrote:
> Added a new struct, io_tlb_mem, as the IO TLB memory pool descriptor and
> moved relevant global variables into that struct.
> This will be useful later to allow for restricted DMA pool.

I like where this is going, but a few comments.

Mostly I'd love to be able to entirely hide io_tlb_default_mem
and struct io_tlb_mem inside of swiotlb.c.

> --- a/arch/powerpc/platforms/pseries/svm.c
> +++ b/arch/powerpc/platforms/pseries/svm.c
> @@ -55,8 +55,8 @@ void __init svm_swiotlb_init(void)
> if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, false))
> return;
>
> - if (io_tlb_start)
> - memblock_free_early(io_tlb_start,
> + if (io_tlb_default_mem.start)
> + memblock_free_early(io_tlb_default_mem.start,
> PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));

I think this should switch to use the local vstart variable in
prep patch.

> panic("SVM: Cannot allocate SWIOTLB buffer");
> }
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 2b385c1b4a99..4d17dff7ffd2 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -192,8 +192,8 @@ int __ref xen_swiotlb_init(int verbose, bool early)
> /*
> * IO TLB memory already allocated. Just use it.
> */
> - if (io_tlb_start != 0) {
> - xen_io_tlb_start = phys_to_virt(io_tlb_start);
> + if (io_tlb_default_mem.start != 0) {
> + xen_io_tlb_start = phys_to_virt(io_tlb_default_mem.start);
> goto end;

xen_io_tlb_start is interesting. It is used only in two functions:

1) is_xen_swiotlb_buffer, where I think we should be able to just use
is_swiotlb_buffer instead of open coding it with the extra
phys_to_virt/virt_to_phys cycle.
2) xen_swiotlb_init, where except for the assignment it only is used
locally for the case not touched above and could this be replaced
with a local variable.

Konrad, does this make sense to you?

> static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> {
> - return paddr >= io_tlb_start && paddr < io_tlb_end;
> + struct io_tlb_mem *mem = &io_tlb_default_mem;
> +
> + return paddr >= mem->start && paddr < mem->end;

We'd then have to move this out of line as well.

2021-01-13 11:53:45

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On Wed, Jan 06, 2021 at 08:50:03AM +0100, Greg KH wrote:
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -413,6 +413,7 @@ struct dev_links_info {
> > * @dma_pools: Dma pools (if dma'ble device).
> > * @dma_mem: Internal for coherent mem override.
> > * @cma_area: Contiguous memory area for dma allocations
> > + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
>
> Why does this have to be added here? Shouldn't the platform-specific
> code handle it instead?

The whole code added here is pretty generic. What we need to eventually
do, though is to add a separate dma_device instead of adding more and more
bloat to struct device.

2021-01-13 12:32:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On Wed, Jan 13, 2021 at 12:51:26PM +0100, Christoph Hellwig wrote:
> On Wed, Jan 06, 2021 at 08:50:03AM +0100, Greg KH wrote:
> > > --- a/include/linux/device.h
> > > +++ b/include/linux/device.h
> > > @@ -413,6 +413,7 @@ struct dev_links_info {
> > > * @dma_pools: Dma pools (if dma'ble device).
> > > * @dma_mem: Internal for coherent mem override.
> > > * @cma_area: Contiguous memory area for dma allocations
> > > + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
> >
> > Why does this have to be added here? Shouldn't the platform-specific
> > code handle it instead?
>
> The whole code added here is pretty generic. What we need to eventually
> do, though is to add a separate dma_device instead of adding more and more
> bloat to struct device.

I have no objections for that happening!

2021-01-13 12:39:58

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On Wed, Jan 13, 2021 at 01:29:05PM +0100, Greg KH wrote:
> > > Why does this have to be added here? Shouldn't the platform-specific
> > > code handle it instead?
> >
> > The whole code added here is pretty generic. What we need to eventually
> > do, though is to add a separate dma_device instead of adding more and more
> > bloat to struct device.
>
> I have no objections for that happening!

I'm pretty sure you agreed to it before in fact. Now someone just needs
to find the time to do this heavy lifting, where "someone" probably means
me.

2021-01-13 12:45:01

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

> +#ifdef CONFIG_SWIOTLB
> + struct io_tlb_mem *dma_io_tlb_mem;
> #endif

Please add a new config option for this code instead of always building
it when swiotlb is enabled.

> +static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
> + size_t size)

Can you split the refactoring in swiotlb.c into one or more prep
patches?

> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> + struct device *dev)
> +{
> + struct io_tlb_mem *mem = rmem->priv;
> + int ret;
> +
> + if (dev->dma_io_tlb_mem)
> + return -EBUSY;
> +
> + if (!mem) {
> + mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> + if (!mem)
> + return -ENOMEM;

What is the calling convention here that allows for a NULL and non-NULL
private data?

2021-01-13 12:46:25

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/6] swiotlb: Use restricted DMA pool if available

> +#ifdef CONFIG_SWIOTLB
> + if (unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dma_io_tlb_mem)
> return swiotlb_map(dev, phys, size, dir, attrs);
> +#endif

Please provide a wrapper for the dev->dma_io_tlb_mem check that
always returns false if the per-device swiotlb support is not enabled.

> index 7fb2ac087d23..1f05af09e61a 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -222,7 +222,6 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
> mem->orig_addr[i] = INVALID_PHYS_ADDR;
> }
> mem->index = 0;
> - no_iotlb_memory = false;

How does this fit in here?

2021-01-13 12:50:50

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH v3 4/6] swiotlb: Add restricted DMA alloc/free support.

> +#ifdef CONFIG_SWIOTLB
> + if (unlikely(dev->dma_io_tlb_mem))
> + return swiotlb_alloc(dev, size, dma_handle, attrs);
> +#endif

Another place where the dma_io_tlb_mem is useful to avoid the ifdef.

> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
> - size_t mapping_size, size_t alloc_size,
> - enum dma_data_direction dir, unsigned long attrs)
> +static int swiotlb_tbl_find_free_region(struct device *hwdev,
> + dma_addr_t tbl_dma_addr,
> + size_t alloc_size,
> + unsigned long attrs)

> +static void swiotlb_tbl_release_region(struct device *hwdev, int index,
> + size_t size)

This refactoring should be another prep patch.


> +void *swiotlb_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
> + unsigned long attrs)

I'd rather have the names convey there are for the per-device bounce
buffer in some form.

> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;

While we're at it I wonder if the io_tlb is something we could change
while we're at it. Maybe replace io_tlb_mem with struct swiotlb
and rename the field in struct device to dev_swiotlb?

> + int index;
> + void *vaddr;
> + phys_addr_t tlb_addr;
> +
> + size = PAGE_ALIGN(size);
> + index = swiotlb_tbl_find_free_region(dev, mem->start, size, attrs);
> + if (index < 0)
> + return NULL;
> +
> + tlb_addr = mem->start + (index << IO_TLB_SHIFT);
> + *dma_handle = phys_to_dma_unencrypted(dev, tlb_addr);
> +
> + if (!dev_is_dma_coherent(dev)) {
> + unsigned long pfn = PFN_DOWN(tlb_addr);
> +
> + /* remove any dirty cache lines on the kernel alias */
> + arch_dma_prep_coherent(pfn_to_page(pfn), size);

Can we hook in somewhat lower level in the dma-direct code so that all
the remapping in dma-direct can be reused instead of duplicated? That
also becomes important if we want to use non-remapping uncached support,
e.g. on mips or x86, or the direct changing of the attributes that Will
planned to look into for arm64.

2021-01-13 14:02:30

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

Hi All,

On Tue, 2021-01-12 at 16:03 -0800, Florian Fainelli wrote:
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > Add the initialization function to create restricted DMA pools from
> > matching reserved-memory nodes in the device tree.
> >
> > Signed-off-by: Claire Chang <[email protected]>
> > ---
> >  include/linux/device.h | 4 ++
> >  include/linux/swiotlb.h | 7 +-
> >  kernel/dma/Kconfig | 1 +
> >  kernel/dma/swiotlb.c | 144 ++++++++++++++++++++++++++++++++++------
> >  4 files changed, 131 insertions(+), 25 deletions(-)
> >
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index 89bb8b84173e..ca6f71ec8871 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -413,6 +413,7 @@ struct dev_links_info {
> >   * @dma_pools: Dma pools (if dma'ble device).
> >   * @dma_mem: Internal for coherent mem override.
> >   * @cma_area: Contiguous memory area for dma allocations
> > + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
> >   * @archdata: For arch-specific additions.
> >   * @of_node: Associated device tree node.
> >   * @fwnode: Associated device node supplied by platform firmware.
> > @@ -515,6 +516,9 @@ struct device {
> >  #ifdef CONFIG_DMA_CMA
> >   struct cma *cma_area; /* contiguous memory area for dma
> >   allocations */
> > +#endif
> > +#ifdef CONFIG_SWIOTLB
> > + struct io_tlb_mem *dma_io_tlb_mem;
> >  #endif
> >   /* arch specific additions */
> >   struct dev_archdata archdata;
> > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > index dd8eb57cbb8f..a1bbd7788885 100644
> > --- a/include/linux/swiotlb.h
> > +++ b/include/linux/swiotlb.h
> > @@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
> >   *
> >   * @start: The start address of the swiotlb memory pool. Used to do a quick
> >   * range check to see if the memory was in fact allocated by this
> > - * API.
> > + * API. For restricted DMA pool, this is device tree adjustable.
>
> Maybe write it as this is "firmware adjustable" such that when/if ACPI
> needs something like this, the description does not need updating.
>
> [snip]
>
> > +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> > + struct device *dev)
> > +{
> > + struct io_tlb_mem *mem = rmem->priv;
> > + int ret;
> > +
> > + if (dev->dma_io_tlb_mem)
> > + return -EBUSY;
> > +
> > + if (!mem) {
> > + mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> > + if (!mem)
> > + return -ENOMEM;
> > +
> > + if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {
>
> MEMREMAP_WB sounds appropriate as a default.

As per the binding 'no-map' has to be disabled here. So AFAIU, this memory will
be part of the linear mapping. Is this really needed then?

> Documentation/devicetree/bindings/reserved-memory/ramoops.txt does
> define an "unbuffered" property which in premise could be applied to the
> generic reserved memory binding as well and that we may have to be
> honoring here, if we were to make it more generic. Oh well, this does
> not need to be addressed right now I guess.




Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2021-01-13 15:31:38

by Robin Murphy

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On 2021-01-13 13:59, Nicolas Saenz Julienne wrote:
> Hi All,
>
> On Tue, 2021-01-12 at 16:03 -0800, Florian Fainelli wrote:
>> On 1/5/21 7:41 PM, Claire Chang wrote:
>>> Add the initialization function to create restricted DMA pools from
>>> matching reserved-memory nodes in the device tree.
>>>
>>> Signed-off-by: Claire Chang <[email protected]>
>>> ---
>>>  include/linux/device.h | 4 ++
>>>  include/linux/swiotlb.h | 7 +-
>>>  kernel/dma/Kconfig | 1 +
>>>  kernel/dma/swiotlb.c | 144 ++++++++++++++++++++++++++++++++++------
>>>  4 files changed, 131 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>> index 89bb8b84173e..ca6f71ec8871 100644
>>> --- a/include/linux/device.h
>>> +++ b/include/linux/device.h
>>> @@ -413,6 +413,7 @@ struct dev_links_info {
>>>   * @dma_pools: Dma pools (if dma'ble device).
>>>   * @dma_mem: Internal for coherent mem override.
>>>   * @cma_area: Contiguous memory area for dma allocations
>>> + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
>>>   * @archdata: For arch-specific additions.
>>>   * @of_node: Associated device tree node.
>>>   * @fwnode: Associated device node supplied by platform firmware.
>>> @@ -515,6 +516,9 @@ struct device {
>>>  #ifdef CONFIG_DMA_CMA
>>>   struct cma *cma_area; /* contiguous memory area for dma
>>>   allocations */
>>> +#endif
>>> +#ifdef CONFIG_SWIOTLB
>>> + struct io_tlb_mem *dma_io_tlb_mem;
>>>  #endif
>>>   /* arch specific additions */
>>>   struct dev_archdata archdata;
>>> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>>> index dd8eb57cbb8f..a1bbd7788885 100644
>>> --- a/include/linux/swiotlb.h
>>> +++ b/include/linux/swiotlb.h
>>> @@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
>>>   *
>>>   * @start: The start address of the swiotlb memory pool. Used to do a quick
>>>   * range check to see if the memory was in fact allocated by this
>>> - * API.
>>> + * API. For restricted DMA pool, this is device tree adjustable.
>>
>> Maybe write it as this is "firmware adjustable" such that when/if ACPI
>> needs something like this, the description does not need updating.

TBH I really don't think this needs calling out at all. Even in the
regular case, the details of exactly how and where the pool is allocated
are beyond the scope of this code - architectures already have several
ways to control that and make their own decisions.

>>
>> [snip]
>>
>>> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
>>> + struct device *dev)
>>> +{
>>> + struct io_tlb_mem *mem = rmem->priv;
>>> + int ret;
>>> +
>>> + if (dev->dma_io_tlb_mem)
>>> + return -EBUSY;
>>> +
>>> + if (!mem) {
>>> + mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>> + if (!mem)
>>> + return -ENOMEM;
>>> +
>>> + if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {
>>
>> MEMREMAP_WB sounds appropriate as a default.
>
> As per the binding 'no-map' has to be disabled here. So AFAIU, this memory will
> be part of the linear mapping. Is this really needed then?

More than that, I'd assume that we *have* to use the linear/direct map
address rather than anything that has any possibility of being a vmalloc
remap, otherwise we can no longer safely rely on
phys_to_dma/dma_to_phys, no?

That said, given that we're not actually using the returned address, I'm
not entirely sure what the point of this call is anyway. If we can
assume it's always going to return the linear map address via
try_ram_remap() then we can equally just go ahead and use the linear map
address straight away. I don't really see how we could ever hit the
"is_ram == REGION_MIXED" case in memremap() that would return NULL, if
we passed the memblock check earlier in __reserved_mem_alloc_size() such
that this rmem node ever got to be initialised at all.

Robin.

>> Documentation/devicetree/bindings/reserved-memory/ramoops.txt does
>> define an "unbuffered" property which in premise could be applied to the
>> generic reserved memory binding as well and that we may have to be
>> honoring here, if we were to make it more generic. Oh well, this does
>> not need to be addressed right now I guess.
>
>
>

2021-01-13 17:46:23

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On 1/13/21 7:27 AM, Robin Murphy wrote:
> On 2021-01-13 13:59, Nicolas Saenz Julienne wrote:
>> Hi All,
>>
>> On Tue, 2021-01-12 at 16:03 -0800, Florian Fainelli wrote:
>>> On 1/5/21 7:41 PM, Claire Chang wrote:
>>>> Add the initialization function to create restricted DMA pools from
>>>> matching reserved-memory nodes in the device tree.
>>>>
>>>> Signed-off-by: Claire Chang <[email protected]>
>>>> ---
>>>>   include/linux/device.h  |   4 ++
>>>>   include/linux/swiotlb.h |   7 +-
>>>>   kernel/dma/Kconfig      |   1 +
>>>>   kernel/dma/swiotlb.c    | 144
>>>> ++++++++++++++++++++++++++++++++++------
>>>>   4 files changed, 131 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>>> index 89bb8b84173e..ca6f71ec8871 100644
>>>> --- a/include/linux/device.h
>>>> +++ b/include/linux/device.h
>>>> @@ -413,6 +413,7 @@ struct dev_links_info {
>>>>    * @dma_pools:    Dma pools (if dma'ble device).
>>>>    * @dma_mem:    Internal for coherent mem override.
>>>>    * @cma_area:    Contiguous memory area for dma allocations
>>>> + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
>>>>    * @archdata:    For arch-specific additions.
>>>>    * @of_node:    Associated device tree node.
>>>>    * @fwnode:    Associated device node supplied by platform firmware.
>>>> @@ -515,6 +516,9 @@ struct device {
>>>>   #ifdef CONFIG_DMA_CMA
>>>>       struct cma *cma_area;        /* contiguous memory area for dma
>>>>                          allocations */
>>>> +#endif
>>>> +#ifdef CONFIG_SWIOTLB
>>>> +    struct io_tlb_mem    *dma_io_tlb_mem;
>>>>   #endif
>>>>       /* arch specific additions */
>>>>       struct dev_archdata    archdata;
>>>> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>>>> index dd8eb57cbb8f..a1bbd7788885 100644
>>>> --- a/include/linux/swiotlb.h
>>>> +++ b/include/linux/swiotlb.h
>>>> @@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
>>>>    *
>>>>    * @start:    The start address of the swiotlb memory pool. Used
>>>> to do a quick
>>>>    *        range check to see if the memory was in fact allocated
>>>> by this
>>>> - *        API.
>>>> + *        API. For restricted DMA pool, this is device tree
>>>> adjustable.
>>>
>>> Maybe write it as this is "firmware adjustable" such that when/if ACPI
>>> needs something like this, the description does not need updating.
>
> TBH I really don't think this needs calling out at all. Even in the
> regular case, the details of exactly how and where the pool is allocated
> are beyond the scope of this code - architectures already have several
> ways to control that and make their own decisions.
>
>>>
>>> [snip]
>>>
>>>> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
>>>> +                    struct device *dev)
>>>> +{
>>>> +    struct io_tlb_mem *mem = rmem->priv;
>>>> +    int ret;
>>>> +
>>>> +    if (dev->dma_io_tlb_mem)
>>>> +        return -EBUSY;
>>>> +
>>>> +    if (!mem) {
>>>> +        mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>>> +        if (!mem)
>>>> +            return -ENOMEM;
>>>> +
>>>> +        if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {
>>>
>>> MEMREMAP_WB sounds appropriate as a default.
>>
>> As per the binding 'no-map' has to be disabled here. So AFAIU, this
>> memory will
>> be part of the linear mapping. Is this really needed then?
>
> More than that, I'd assume that we *have* to use the linear/direct map
> address rather than anything that has any possibility of being a vmalloc
> remap, otherwise we can no longer safely rely on
> phys_to_dma/dma_to_phys, no?

I believe you are right, which means that if we want to make use of the
restricted DMA pool on a 32-bit architecture (and we do, at least, I do)
we should probably add some error checking/warning to ensure the
restricted DMA pool falls within the linear map.
--
Florian

2021-01-13 18:05:49

by Robin Murphy

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On 2021-01-13 17:43, Florian Fainelli wrote:
> On 1/13/21 7:27 AM, Robin Murphy wrote:
>> On 2021-01-13 13:59, Nicolas Saenz Julienne wrote:
>>> Hi All,
>>>
>>> On Tue, 2021-01-12 at 16:03 -0800, Florian Fainelli wrote:
>>>> On 1/5/21 7:41 PM, Claire Chang wrote:
>>>>> Add the initialization function to create restricted DMA pools from
>>>>> matching reserved-memory nodes in the device tree.
>>>>>
>>>>> Signed-off-by: Claire Chang <[email protected]>
>>>>> ---
>>>>>   include/linux/device.h  |   4 ++
>>>>>   include/linux/swiotlb.h |   7 +-
>>>>>   kernel/dma/Kconfig      |   1 +
>>>>>   kernel/dma/swiotlb.c    | 144
>>>>> ++++++++++++++++++++++++++++++++++------
>>>>>   4 files changed, 131 insertions(+), 25 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>>>> index 89bb8b84173e..ca6f71ec8871 100644
>>>>> --- a/include/linux/device.h
>>>>> +++ b/include/linux/device.h
>>>>> @@ -413,6 +413,7 @@ struct dev_links_info {
>>>>>    * @dma_pools:    Dma pools (if dma'ble device).
>>>>>    * @dma_mem:    Internal for coherent mem override.
>>>>>    * @cma_area:    Contiguous memory area for dma allocations
>>>>> + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
>>>>>    * @archdata:    For arch-specific additions.
>>>>>    * @of_node:    Associated device tree node.
>>>>>    * @fwnode:    Associated device node supplied by platform firmware.
>>>>> @@ -515,6 +516,9 @@ struct device {
>>>>>   #ifdef CONFIG_DMA_CMA
>>>>>       struct cma *cma_area;        /* contiguous memory area for dma
>>>>>                          allocations */
>>>>> +#endif
>>>>> +#ifdef CONFIG_SWIOTLB
>>>>> +    struct io_tlb_mem    *dma_io_tlb_mem;
>>>>>   #endif
>>>>>       /* arch specific additions */
>>>>>       struct dev_archdata    archdata;
>>>>> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>>>>> index dd8eb57cbb8f..a1bbd7788885 100644
>>>>> --- a/include/linux/swiotlb.h
>>>>> +++ b/include/linux/swiotlb.h
>>>>> @@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
>>>>>    *
>>>>>    * @start:    The start address of the swiotlb memory pool. Used
>>>>> to do a quick
>>>>>    *        range check to see if the memory was in fact allocated
>>>>> by this
>>>>> - *        API.
>>>>> + *        API. For restricted DMA pool, this is device tree
>>>>> adjustable.
>>>>
>>>> Maybe write it as this is "firmware adjustable" such that when/if ACPI
>>>> needs something like this, the description does not need updating.
>>
>> TBH I really don't think this needs calling out at all. Even in the
>> regular case, the details of exactly how and where the pool is allocated
>> are beyond the scope of this code - architectures already have several
>> ways to control that and make their own decisions.
>>
>>>>
>>>> [snip]
>>>>
>>>>> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
>>>>> +                    struct device *dev)
>>>>> +{
>>>>> +    struct io_tlb_mem *mem = rmem->priv;
>>>>> +    int ret;
>>>>> +
>>>>> +    if (dev->dma_io_tlb_mem)
>>>>> +        return -EBUSY;
>>>>> +
>>>>> +    if (!mem) {
>>>>> +        mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>>>> +        if (!mem)
>>>>> +            return -ENOMEM;
>>>>> +
>>>>> +        if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {
>>>>
>>>> MEMREMAP_WB sounds appropriate as a default.
>>>
>>> As per the binding 'no-map' has to be disabled here. So AFAIU, this
>>> memory will
>>> be part of the linear mapping. Is this really needed then?
>>
>> More than that, I'd assume that we *have* to use the linear/direct map
>> address rather than anything that has any possibility of being a vmalloc
>> remap, otherwise we can no longer safely rely on
>> phys_to_dma/dma_to_phys, no?
>
> I believe you are right, which means that if we want to make use of the
> restricted DMA pool on a 32-bit architecture (and we do, at least, I do)
> we should probably add some error checking/warning to ensure the
> restricted DMA pool falls within the linear map.

Oh, good point - I'm so used to 64-bit that I instinctively just blanked
out the !PageHighMem() condition in try_ram_remap(). So maybe the
original intent here *was* to effectively just implement that check, but
if so it could still do with being a lot more explicit.

Cheers,
Robin.

2021-01-13 18:30:41

by Robin Murphy

[permalink] [raw]
Subject: Re: [RFC PATCH v3 4/6] swiotlb: Add restricted DMA alloc/free support.

On 2021-01-13 12:48, Christoph Hellwig wrote:
>> +#ifdef CONFIG_SWIOTLB
>> + if (unlikely(dev->dma_io_tlb_mem))
>> + return swiotlb_alloc(dev, size, dma_handle, attrs);
>> +#endif
>
> Another place where the dma_io_tlb_mem is useful to avoid the ifdef.
>
>> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
>> - size_t mapping_size, size_t alloc_size,
>> - enum dma_data_direction dir, unsigned long attrs)
>> +static int swiotlb_tbl_find_free_region(struct device *hwdev,
>> + dma_addr_t tbl_dma_addr,
>> + size_t alloc_size,
>> + unsigned long attrs)
>
>> +static void swiotlb_tbl_release_region(struct device *hwdev, int index,
>> + size_t size)
>
> This refactoring should be another prep patch.
>
>
>> +void *swiotlb_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
>> + unsigned long attrs)
>
> I'd rather have the names convey there are for the per-device bounce
> buffer in some form.
>
>> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>
> While we're at it I wonder if the io_tlb is something we could change
> while we're at it. Maybe replace io_tlb_mem with struct swiotlb
> and rename the field in struct device to dev_swiotlb?
>
>> + int index;
>> + void *vaddr;
>> + phys_addr_t tlb_addr;
>> +
>> + size = PAGE_ALIGN(size);
>> + index = swiotlb_tbl_find_free_region(dev, mem->start, size, attrs);
>> + if (index < 0)
>> + return NULL;
>> +
>> + tlb_addr = mem->start + (index << IO_TLB_SHIFT);
>> + *dma_handle = phys_to_dma_unencrypted(dev, tlb_addr);
>> +
>> + if (!dev_is_dma_coherent(dev)) {
>> + unsigned long pfn = PFN_DOWN(tlb_addr);
>> +
>> + /* remove any dirty cache lines on the kernel alias */
>> + arch_dma_prep_coherent(pfn_to_page(pfn), size);
>
> Can we hook in somewhat lower level in the dma-direct code so that all
> the remapping in dma-direct can be reused instead of duplicated? That
> also becomes important if we want to use non-remapping uncached support,
> e.g. on mips or x86, or the direct changing of the attributes that Will
> planned to look into for arm64.

Indeed, AFAICS this ought to boil down to a direct equivalent of
__dma_direct_alloc_pages() - other than the address there should be no
conceptual difference between pages from the restricted pool and those
from the regular page allocator, so this probably deserves to be plumbed
in as an alternative to that.

Robin.

2021-01-13 18:34:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH v3 4/6] swiotlb: Add restricted DMA alloc/free support.

On Wed, Jan 13, 2021 at 06:27:08PM +0000, Robin Murphy wrote:
>> Can we hook in somewhat lower level in the dma-direct code so that all
>> the remapping in dma-direct can be reused instead of duplicated? That
>> also becomes important if we want to use non-remapping uncached support,
>> e.g. on mips or x86, or the direct changing of the attributes that Will
>> planned to look into for arm64.
>
> Indeed, AFAICS this ought to boil down to a direct equivalent of
> __dma_direct_alloc_pages() - other than the address there should be no
> conceptual difference between pages from the restricted pool and those from
> the regular page allocator, so this probably deserves to be plumbed in as
> an alternative to that.

Yes, that's what I mean. You managed to word it much better, though.

2021-01-14 09:10:41

by Claire Chang

[permalink] [raw]
Subject: Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool

On Wed, Jan 13, 2021 at 7:48 AM Florian Fainelli <[email protected]> wrote:
>
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > If a device is not behind an IOMMU, we look up the device node and set
> > up the restricted DMA when the restricted-dma-pool is presented.
> >
> > Signed-off-by: Claire Chang <[email protected]>
> > ---
>
> [snip]
>
> > +int of_dma_set_restricted_buffer(struct device *dev)
> > +{
> > + struct device_node *node;
> > + int count, i;
> > +
> > + if (!dev->of_node)
> > + return 0;
> > +
> > + count = of_property_count_elems_of_size(dev->of_node, "memory-region",
> > + sizeof(phandle));
>
> You could have an early check for count < 0, along with an error
> message, if that is deemed useful.
>
> > + for (i = 0; i < count; i++) {
> > + node = of_parse_phandle(dev->of_node, "memory-region", i);
> > + if (of_device_is_compatible(node, "restricted-dma-pool"))
>
> And you may want to add here an of_device_is_available(node). A platform
> that provides the Device Tree firmware and try to support multiple
> different SoCs may try to determine if an IOMMU is present, and if it
> is, it could be marking the restriced-dma-pool region with a 'status =
> "disabled"' property, or any variant of that scheme.

This function is called only when there is no IOMMU present (check in
drivers/of/device.c). I can still add of_device_is_available(node)
here if you think it's helpful.

>
> > + return of_reserved_mem_device_init_by_idx(
> > + dev, dev->of_node, i);
>
> This does not seem to be supporting more than one memory region, did not
> you want something like instead:
>
> ret = of_reserved_mem_device_init_by_idx(...);
> if (ret)
> return ret;
>

Yes. This implement only supports one restriced-dma-pool memory region
with the assumption that there is only one memory region with the
compatible string, restricted-dma-pool, in the dts. IIUC, it's similar
to shared-dma-pool.


> > + }
> > +
> > + return 0;
> > +}
> > diff --git a/drivers/of/device.c b/drivers/of/device.c
> > index aedfaaafd3e7..e2c7409956ab 100644
> > --- a/drivers/of/device.c
> > +++ b/drivers/of/device.c
> > @@ -182,6 +182,10 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
> > arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
> >
> > dev->dma_range_map = map;
> > +
> > + if (!iommu)
> > + return of_dma_set_restricted_buffer(dev);
> > +
> > return 0;
> > }
> > EXPORT_SYMBOL_GPL(of_dma_configure_id);
> > diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
> > index d9e6a324de0a..28a2dfa197ba 100644
> > --- a/drivers/of/of_private.h
> > +++ b/drivers/of/of_private.h
> > @@ -161,12 +161,17 @@ struct bus_dma_region;
> > #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
> > int of_dma_get_range(struct device_node *np,
> > const struct bus_dma_region **map);
> > +int of_dma_set_restricted_buffer(struct device *dev);
> > #else
> > static inline int of_dma_get_range(struct device_node *np,
> > const struct bus_dma_region **map)
> > {
> > return -ENODEV;
> > }
> > +static inline int of_dma_get_restricted_buffer(struct device *dev)
> > +{
> > + return -ENODEV;
> > +}
> > #endif
> >
> > #endif /* _LINUX_OF_PRIVATE_H */
> >
>
>
> --
> Florian

2021-01-14 09:14:54

by Claire Chang

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On Wed, Jan 13, 2021 at 8:42 PM Christoph Hellwig <[email protected]> wrote:
>
> > +#ifdef CONFIG_SWIOTLB
> > + struct io_tlb_mem *dma_io_tlb_mem;
> > #endif
>
> Please add a new config option for this code instead of always building
> it when swiotlb is enabled.
>
> > +static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
> > + size_t size)
>
> Can you split the refactoring in swiotlb.c into one or more prep
> patches?
>
> > +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> > + struct device *dev)
> > +{
> > + struct io_tlb_mem *mem = rmem->priv;
> > + int ret;
> > +
> > + if (dev->dma_io_tlb_mem)
> > + return -EBUSY;
> > +
> > + if (!mem) {
> > + mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> > + if (!mem)
> > + return -ENOMEM;
>
> What is the calling convention here that allows for a NULL and non-NULL
> private data?

Since multiple devices can share the same pool, the private data,
io_tlb_mem struct, will be initialized by the first device attached to
it.
This is similar to rmem_dma_device_init() in kernel/dma/coherent.c.
I'll add a comment for it in next version.

2021-01-14 18:55:31

by Florian Fainelli

[permalink] [raw]
Subject: Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool

On 1/14/21 1:08 AM, Claire Chang wrote:
> On Wed, Jan 13, 2021 at 7:48 AM Florian Fainelli <[email protected]> wrote:
>>
>> On 1/5/21 7:41 PM, Claire Chang wrote:
>>> If a device is not behind an IOMMU, we look up the device node and set
>>> up the restricted DMA when the restricted-dma-pool is presented.
>>>
>>> Signed-off-by: Claire Chang <[email protected]>
>>> ---
>>
>> [snip]
>>
>>> +int of_dma_set_restricted_buffer(struct device *dev)
>>> +{
>>> + struct device_node *node;
>>> + int count, i;
>>> +
>>> + if (!dev->of_node)
>>> + return 0;
>>> +
>>> + count = of_property_count_elems_of_size(dev->of_node, "memory-region",
>>> + sizeof(phandle));
>>
>> You could have an early check for count < 0, along with an error
>> message, if that is deemed useful.
>>
>>> + for (i = 0; i < count; i++) {
>>> + node = of_parse_phandle(dev->of_node, "memory-region", i);
>>> + if (of_device_is_compatible(node, "restricted-dma-pool"))
>>
>> And you may want to add here an of_device_is_available(node). A platform
>> that provides the Device Tree firmware and try to support multiple
>> different SoCs may try to determine if an IOMMU is present, and if it
>> is, it could be marking the restriced-dma-pool region with a 'status =
>> "disabled"' property, or any variant of that scheme.
>
> This function is called only when there is no IOMMU present (check in
> drivers/of/device.c). I can still add of_device_is_available(node)
> here if you think it's helpful.

I believe it is, since boot loader can have a shared Device Tree blob
skeleton and do various adaptations based on the chip (that's what we
do) and adding a status property is much simpler than insertion new
nodes are run time.

>
>>
>>> + return of_reserved_mem_device_init_by_idx(
>>> + dev, dev->of_node, i);
>>
>> This does not seem to be supporting more than one memory region, did not
>> you want something like instead:
>>
>> ret = of_reserved_mem_device_init_by_idx(...);
>> if (ret)
>> return ret;
>>
>
> Yes. This implement only supports one restriced-dma-pool memory region
> with the assumption that there is only one memory region with the
> compatible string, restricted-dma-pool, in the dts. IIUC, it's similar
> to shared-dma-pool.

Then if here is such a known limitation it should be both documented and
enforced here, you shouldn ot be iterating over all of the phandles that
you find, stop at the first one and issue a warning if count > 1?
--
Florian

2021-01-15 05:47:34

by Claire Chang

[permalink] [raw]
Subject: Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool

On Fri, Jan 15, 2021 at 2:52 AM Florian Fainelli <[email protected]> wrote:
>
> On 1/14/21 1:08 AM, Claire Chang wrote:
> > On Wed, Jan 13, 2021 at 7:48 AM Florian Fainelli <[email protected]> wrote:
> >>
> >> On 1/5/21 7:41 PM, Claire Chang wrote:
> >>> If a device is not behind an IOMMU, we look up the device node and set
> >>> up the restricted DMA when the restricted-dma-pool is presented.
> >>>
> >>> Signed-off-by: Claire Chang <[email protected]>
> >>> ---
> >>
> >> [snip]
> >>
> >>> +int of_dma_set_restricted_buffer(struct device *dev)
> >>> +{
> >>> + struct device_node *node;
> >>> + int count, i;
> >>> +
> >>> + if (!dev->of_node)
> >>> + return 0;
> >>> +
> >>> + count = of_property_count_elems_of_size(dev->of_node, "memory-region",
> >>> + sizeof(phandle));
> >>
> >> You could have an early check for count < 0, along with an error
> >> message, if that is deemed useful.
> >>
> >>> + for (i = 0; i < count; i++) {
> >>> + node = of_parse_phandle(dev->of_node, "memory-region", i);
> >>> + if (of_device_is_compatible(node, "restricted-dma-pool"))
> >>
> >> And you may want to add here an of_device_is_available(node). A platform
> >> that provides the Device Tree firmware and try to support multiple
> >> different SoCs may try to determine if an IOMMU is present, and if it
> >> is, it could be marking the restriced-dma-pool region with a 'status =
> >> "disabled"' property, or any variant of that scheme.
> >
> > This function is called only when there is no IOMMU present (check in
> > drivers/of/device.c). I can still add of_device_is_available(node)
> > here if you think it's helpful.
>
> I believe it is, since boot loader can have a shared Device Tree blob
> skeleton and do various adaptations based on the chip (that's what we
> do) and adding a status property is much simpler than insertion new
> nodes are run time.
>
> >
> >>
> >>> + return of_reserved_mem_device_init_by_idx(
> >>> + dev, dev->of_node, i);
> >>
> >> This does not seem to be supporting more than one memory region, did not
> >> you want something like instead:
> >>
> >> ret = of_reserved_mem_device_init_by_idx(...);
> >> if (ret)
> >> return ret;
> >>
> >
> > Yes. This implement only supports one restriced-dma-pool memory region
> > with the assumption that there is only one memory region with the
> > compatible string, restricted-dma-pool, in the dts. IIUC, it's similar
> > to shared-dma-pool.
>
> Then if here is such a known limitation it should be both documented and
> enforced here, you shouldn ot be iterating over all of the phandles that
> you find, stop at the first one and issue a warning if count > 1?

What I have in mind is there might be multiple memory regions, but
only one is for restriced-dma-pool.
Say, if you want a separated region for coherent DMA and only do
streaming DMA in this restriced-dma-pool region, you can add another
reserved-memory node with shared-dma-pool in dts and the current
implementation will try to allocate the memory via
dma_alloc_from_dev_coherent() first (see dma_alloc_attrs() in
/kernel/dma/mapping.c).
Or if you have vendor specific memory region, you can still set up
restriced-dma-pool by adding another reserved-memory node in dts.
Dose this make sense to you? I'll document this for sure.

> --
> Florian

2021-01-20 16:58:57

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> Introduce the new compatible string, restricted-dma-pool, for restricted
> DMA. One can specify the address and length of the restricted DMA memory
> region by restricted-dma-pool in the device tree.

If this goes into DT, I think we should be able to use dma-ranges for
this purpose instead. Normally, 'dma-ranges' is for physical bus
restrictions, but there's no reason it can't be used for policy or to
express restrictions the firmware has enabled.

> Signed-off-by: Claire Chang <[email protected]>
> ---
> .../reserved-memory/reserved-memory.txt | 24 +++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> index e8d3096d922c..44975e2a1fd2 100644
> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> used as a shared pool of DMA buffers for a set of devices. It can
> be used by an operating system to instantiate the necessary pool
> management subsystem if necessary.
> + - restricted-dma-pool: This indicates a region of memory meant to be
> + used as a pool of restricted DMA buffers for a set of devices. The
> + memory region would be the only region accessible to those devices.
> + When using this, the no-map and reusable properties must not be set,
> + so the operating system can create a virtual mapping that will be used
> + for synchronization. The main purpose for restricted DMA is to
> + mitigate the lack of DMA access control on systems without an IOMMU,
> + which could result in the DMA accessing the system memory at
> + unexpected times and/or unexpected addresses, possibly leading to data
> + leakage or corruption. The feature on its own provides a basic level
> + of protection against the DMA overwriting buffer contents at
> + unexpected times. However, to protect against general data leakage and
> + system memory corruption, the system needs to provide way to restrict
> + the DMA to a predefined memory region.
> - vendor specific string in the form <vendor>,[<device>-]<usage>
> no-map (optional) - empty property
> - Indicates the operating system must not create a virtual mapping
> @@ -120,6 +134,11 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
> compatible = "acme,multimedia-memory";
> reg = <0x77000000 0x4000000>;
> };
> +
> + restricted_dma_mem_reserved: restricted_dma_mem_reserved {
> + compatible = "restricted-dma-pool";
> + reg = <0x50000000 0x400000>;
> + };
> };
>
> /* ... */
> @@ -138,4 +157,9 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
> memory-region = <&multimedia_reserved>;
> /* ... */
> };
> +
> + pcie_device: pcie_device@0,0 {
> + memory-region = <&restricted_dma_mem_reserved>;

PCI hosts often have inbound window configurations that limit the
address range and translate PCI to bus addresses. Those windows happen
to be configured by dma-ranges. In any case, wouldn't you want to put
the configuration in the PCI host node? Is there a usecase of
restricting one PCIe device and not another?

Rob

2021-01-20 19:59:29

by Robin Murphy

[permalink] [raw]
Subject: Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

On 2021-01-20 16:53, Rob Herring wrote:
> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
>> Introduce the new compatible string, restricted-dma-pool, for restricted
>> DMA. One can specify the address and length of the restricted DMA memory
>> region by restricted-dma-pool in the device tree.
>
> If this goes into DT, I think we should be able to use dma-ranges for
> this purpose instead. Normally, 'dma-ranges' is for physical bus
> restrictions, but there's no reason it can't be used for policy or to
> express restrictions the firmware has enabled.

There would still need to be some way to tell SWIOTLB to pick up the
corresponding chunk of memory and to prevent the kernel from using it
for anything else, though.

>> Signed-off-by: Claire Chang <[email protected]>
>> ---
>> .../reserved-memory/reserved-memory.txt | 24 +++++++++++++++++++
>> 1 file changed, 24 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>> index e8d3096d922c..44975e2a1fd2 100644
>> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
>> used as a shared pool of DMA buffers for a set of devices. It can
>> be used by an operating system to instantiate the necessary pool
>> management subsystem if necessary.
>> + - restricted-dma-pool: This indicates a region of memory meant to be
>> + used as a pool of restricted DMA buffers for a set of devices. The
>> + memory region would be the only region accessible to those devices.
>> + When using this, the no-map and reusable properties must not be set,
>> + so the operating system can create a virtual mapping that will be used
>> + for synchronization. The main purpose for restricted DMA is to
>> + mitigate the lack of DMA access control on systems without an IOMMU,
>> + which could result in the DMA accessing the system memory at
>> + unexpected times and/or unexpected addresses, possibly leading to data
>> + leakage or corruption. The feature on its own provides a basic level
>> + of protection against the DMA overwriting buffer contents at
>> + unexpected times. However, to protect against general data leakage and
>> + system memory corruption, the system needs to provide way to restrict
>> + the DMA to a predefined memory region.
>> - vendor specific string in the form <vendor>,[<device>-]<usage>
>> no-map (optional) - empty property
>> - Indicates the operating system must not create a virtual mapping
>> @@ -120,6 +134,11 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
>> compatible = "acme,multimedia-memory";
>> reg = <0x77000000 0x4000000>;
>> };
>> +
>> + restricted_dma_mem_reserved: restricted_dma_mem_reserved {
>> + compatible = "restricted-dma-pool";
>> + reg = <0x50000000 0x400000>;
>> + };
>> };
>>
>> /* ... */
>> @@ -138,4 +157,9 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
>> memory-region = <&multimedia_reserved>;
>> /* ... */
>> };
>> +
>> + pcie_device: pcie_device@0,0 {
>> + memory-region = <&restricted_dma_mem_reserved>;
>
> PCI hosts often have inbound window configurations that limit the
> address range and translate PCI to bus addresses. Those windows happen
> to be configured by dma-ranges. In any case, wouldn't you want to put
> the configuration in the PCI host node? Is there a usecase of
> restricting one PCIe device and not another?

The general design seems to accommodate devices having their own pools
such that they can't even snoop on each others' transient DMA data. If
the interconnect had a way of wiring up, say, PCI RIDs to AMBA NSAIDs,
then in principle you could certainly apply that to PCI endpoints too
(presumably you'd also disallow them from peer-to-peer transactions at
the PCI level too).

Robin.

2021-01-21 03:21:13

by Robin Murphy

[permalink] [raw]
Subject: Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

On 2021-01-20 21:31, Rob Herring wrote:
> On Wed, Jan 20, 2021 at 11:30 AM Robin Murphy <[email protected]> wrote:
>>
>> On 2021-01-20 16:53, Rob Herring wrote:
>>> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
>>>> Introduce the new compatible string, restricted-dma-pool, for restricted
>>>> DMA. One can specify the address and length of the restricted DMA memory
>>>> region by restricted-dma-pool in the device tree.
>>>
>>> If this goes into DT, I think we should be able to use dma-ranges for
>>> this purpose instead. Normally, 'dma-ranges' is for physical bus
>>> restrictions, but there's no reason it can't be used for policy or to
>>> express restrictions the firmware has enabled.
>>
>> There would still need to be some way to tell SWIOTLB to pick up the
>> corresponding chunk of memory and to prevent the kernel from using it
>> for anything else, though.
>
> Don't we already have that problem if dma-ranges had a very small
> range? We just get lucky because the restriction is generally much
> more RAM than needed.

Not really - if a device has a naturally tiny addressing capability that
doesn't even cover ZONE_DMA32 where the regular SWIOTLB buffer will be
allocated then it's unlikely to work well, but that's just crap system
design. Yes, memory pressure in ZONE_DMA{32} is particularly problematic
for such limited devices, but it's irrelevant to the issue at hand here.

What we have here is a device that's not allowed to see *kernel* memory
at all. It's been artificially constrained to a particular region by a
TZASC or similar, and the only data which should ever be placed in that
region is data intended for that device to see. That way if it tries to
go rogue it physically can't start slurping data intended for other
devices or not mapped for DMA at all. The bouncing is an important part
of this - I forget the title off-hand but there was an interesting paper
a few years ago which demonstrated that even with an IOMMU, streaming
DMA of in-place buffers could reveal enough adjacent data from the same
page to mount an attack on the system. Memory pressure should be
immaterial since the size of each bounce pool carveout will presumably
be tuned for the needs of the given device.

> In any case, wouldn't finding all the dma-ranges do this? We're
> already walking the tree to find the max DMA address now.

If all you can see are two "dma-ranges" properties, how do you propose
to tell that one means "this is the extent of what I can address, please
set my masks and dma-range-map accordingly and try to allocate things
where I can reach them" while the other means "take this output range
away from the page allocator and hook it up as my dedicated bounce pool,
because it is Serious Security Time"? Especially since getting that
choice wrong either way would be a Bad Thing.

Robin.

>>>> Signed-off-by: Claire Chang <[email protected]>
>>>> ---
>>>> .../reserved-memory/reserved-memory.txt | 24 +++++++++++++++++++
>>>> 1 file changed, 24 insertions(+)
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>>>> index e8d3096d922c..44975e2a1fd2 100644
>>>> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>>>> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>>>> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
>>>> used as a shared pool of DMA buffers for a set of devices. It can
>>>> be used by an operating system to instantiate the necessary pool
>>>> management subsystem if necessary.
>>>> + - restricted-dma-pool: This indicates a region of memory meant to be
>>>> + used as a pool of restricted DMA buffers for a set of devices. The
>>>> + memory region would be the only region accessible to those devices.
>>>> + When using this, the no-map and reusable properties must not be set,
>>>> + so the operating system can create a virtual mapping that will be used
>>>> + for synchronization. The main purpose for restricted DMA is to
>>>> + mitigate the lack of DMA access control on systems without an IOMMU,
>>>> + which could result in the DMA accessing the system memory at
>>>> + unexpected times and/or unexpected addresses, possibly leading to data
>>>> + leakage or corruption. The feature on its own provides a basic level
>>>> + of protection against the DMA overwriting buffer contents at
>>>> + unexpected times. However, to protect against general data leakage and
>>>> + system memory corruption, the system needs to provide way to restrict
>>>> + the DMA to a predefined memory region.
>>>> - vendor specific string in the form <vendor>,[<device>-]<usage>
>>>> no-map (optional) - empty property
>>>> - Indicates the operating system must not create a virtual mapping
>>>> @@ -120,6 +134,11 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
>>>> compatible = "acme,multimedia-memory";
>>>> reg = <0x77000000 0x4000000>;
>>>> };
>>>> +
>>>> + restricted_dma_mem_reserved: restricted_dma_mem_reserved {
>>>> + compatible = "restricted-dma-pool";
>>>> + reg = <0x50000000 0x400000>;
>>>> + };
>>>> };
>>>>
>>>> /* ... */
>>>> @@ -138,4 +157,9 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
>>>> memory-region = <&multimedia_reserved>;
>>>> /* ... */
>>>> };
>>>> +
>>>> + pcie_device: pcie_device@0,0 {
>>>> + memory-region = <&restricted_dma_mem_reserved>;
>>>
>>> PCI hosts often have inbound window configurations that limit the
>>> address range and translate PCI to bus addresses. Those windows happen
>>> to be configured by dma-ranges. In any case, wouldn't you want to put
>>> the configuration in the PCI host node? Is there a usecase of
>>> restricting one PCIe device and not another?
>>
>> The general design seems to accommodate devices having their own pools
>> such that they can't even snoop on each others' transient DMA data. If
>> the interconnect had a way of wiring up, say, PCI RIDs to AMBA NSAIDs,
>> then in principle you could certainly apply that to PCI endpoints too
>> (presumably you'd also disallow them from peer-to-peer transactions at
>> the PCI level too).
>
> At least for PCI, I think we can handle this. We have the BDF in the
> 3rd address cell in dma-ranges. The Openfirmware spec says those are 0
> in the case of ranges. It doesn't talk about dma-ranges though. But I
> think we could extend it to allow for BDF. Though typically with PCIe
> every device is behind its own bridge and each bridge node can have a
> dma-ranges.
>
> Rob
>

2021-01-21 03:32:25

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

On Wed, Jan 20, 2021 at 11:30 AM Robin Murphy <[email protected]> wrote:
>
> On 2021-01-20 16:53, Rob Herring wrote:
> > On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> >> Introduce the new compatible string, restricted-dma-pool, for restricted
> >> DMA. One can specify the address and length of the restricted DMA memory
> >> region by restricted-dma-pool in the device tree.
> >
> > If this goes into DT, I think we should be able to use dma-ranges for
> > this purpose instead. Normally, 'dma-ranges' is for physical bus
> > restrictions, but there's no reason it can't be used for policy or to
> > express restrictions the firmware has enabled.
>
> There would still need to be some way to tell SWIOTLB to pick up the
> corresponding chunk of memory and to prevent the kernel from using it
> for anything else, though.

Don't we already have that problem if dma-ranges had a very small
range? We just get lucky because the restriction is generally much
more RAM than needed.

In any case, wouldn't finding all the dma-ranges do this? We're
already walking the tree to find the max DMA address now.

> >> Signed-off-by: Claire Chang <[email protected]>
> >> ---
> >> .../reserved-memory/reserved-memory.txt | 24 +++++++++++++++++++
> >> 1 file changed, 24 insertions(+)
> >>
> >> diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> >> index e8d3096d922c..44975e2a1fd2 100644
> >> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> >> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> >> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> >> used as a shared pool of DMA buffers for a set of devices. It can
> >> be used by an operating system to instantiate the necessary pool
> >> management subsystem if necessary.
> >> + - restricted-dma-pool: This indicates a region of memory meant to be
> >> + used as a pool of restricted DMA buffers for a set of devices. The
> >> + memory region would be the only region accessible to those devices.
> >> + When using this, the no-map and reusable properties must not be set,
> >> + so the operating system can create a virtual mapping that will be used
> >> + for synchronization. The main purpose for restricted DMA is to
> >> + mitigate the lack of DMA access control on systems without an IOMMU,
> >> + which could result in the DMA accessing the system memory at
> >> + unexpected times and/or unexpected addresses, possibly leading to data
> >> + leakage or corruption. The feature on its own provides a basic level
> >> + of protection against the DMA overwriting buffer contents at
> >> + unexpected times. However, to protect against general data leakage and
> >> + system memory corruption, the system needs to provide way to restrict
> >> + the DMA to a predefined memory region.
> >> - vendor specific string in the form <vendor>,[<device>-]<usage>
> >> no-map (optional) - empty property
> >> - Indicates the operating system must not create a virtual mapping
> >> @@ -120,6 +134,11 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
> >> compatible = "acme,multimedia-memory";
> >> reg = <0x77000000 0x4000000>;
> >> };
> >> +
> >> + restricted_dma_mem_reserved: restricted_dma_mem_reserved {
> >> + compatible = "restricted-dma-pool";
> >> + reg = <0x50000000 0x400000>;
> >> + };
> >> };
> >>
> >> /* ... */
> >> @@ -138,4 +157,9 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
> >> memory-region = <&multimedia_reserved>;
> >> /* ... */
> >> };
> >> +
> >> + pcie_device: pcie_device@0,0 {
> >> + memory-region = <&restricted_dma_mem_reserved>;
> >
> > PCI hosts often have inbound window configurations that limit the
> > address range and translate PCI to bus addresses. Those windows happen
> > to be configured by dma-ranges. In any case, wouldn't you want to put
> > the configuration in the PCI host node? Is there a usecase of
> > restricting one PCIe device and not another?
>
> The general design seems to accommodate devices having their own pools
> such that they can't even snoop on each others' transient DMA data. If
> the interconnect had a way of wiring up, say, PCI RIDs to AMBA NSAIDs,
> then in principle you could certainly apply that to PCI endpoints too
> (presumably you'd also disallow them from peer-to-peer transactions at
> the PCI level too).

At least for PCI, I think we can handle this. We have the BDF in the
3rd address cell in dma-ranges. The Openfirmware spec says those are 0
in the case of ranges. It doesn't talk about dma-ranges though. But I
think we could extend it to allow for BDF. Though typically with PCIe
every device is behind its own bridge and each bridge node can have a
dma-ranges.

Rob

2021-01-21 15:56:03

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

On Wed, Jan 20, 2021 at 7:10 PM Robin Murphy <[email protected]> wrote:
>
> On 2021-01-20 21:31, Rob Herring wrote:
> > On Wed, Jan 20, 2021 at 11:30 AM Robin Murphy <[email protected]> wrote:
> >>
> >> On 2021-01-20 16:53, Rob Herring wrote:
> >>> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> >>>> Introduce the new compatible string, restricted-dma-pool, for restricted
> >>>> DMA. One can specify the address and length of the restricted DMA memory
> >>>> region by restricted-dma-pool in the device tree.
> >>>
> >>> If this goes into DT, I think we should be able to use dma-ranges for
> >>> this purpose instead. Normally, 'dma-ranges' is for physical bus
> >>> restrictions, but there's no reason it can't be used for policy or to
> >>> express restrictions the firmware has enabled.
> >>
> >> There would still need to be some way to tell SWIOTLB to pick up the
> >> corresponding chunk of memory and to prevent the kernel from using it
> >> for anything else, though.
> >
> > Don't we already have that problem if dma-ranges had a very small
> > range? We just get lucky because the restriction is generally much
> > more RAM than needed.
>
> Not really - if a device has a naturally tiny addressing capability that
> doesn't even cover ZONE_DMA32 where the regular SWIOTLB buffer will be
> allocated then it's unlikely to work well, but that's just crap system
> design. Yes, memory pressure in ZONE_DMA{32} is particularly problematic
> for such limited devices, but it's irrelevant to the issue at hand here.

Yesterday's crap system design is today's security feature. Couldn't
this feature make crap system design work better?

> What we have here is a device that's not allowed to see *kernel* memory
> at all. It's been artificially constrained to a particular region by a
> TZASC or similar, and the only data which should ever be placed in that

May have been constrained, but that's entirely optional.

In the optional case where the setup is entirely up to the OS, I don't
think this belongs in the DT at all. Perhaps that should be solved
first.

> region is data intended for that device to see. That way if it tries to
> go rogue it physically can't start slurping data intended for other
> devices or not mapped for DMA at all. The bouncing is an important part
> of this - I forget the title off-hand but there was an interesting paper
> a few years ago which demonstrated that even with an IOMMU, streaming
> DMA of in-place buffers could reveal enough adjacent data from the same
> page to mount an attack on the system. Memory pressure should be
> immaterial since the size of each bounce pool carveout will presumably
> be tuned for the needs of the given device.
>
> > In any case, wouldn't finding all the dma-ranges do this? We're
> > already walking the tree to find the max DMA address now.
>
> If all you can see are two "dma-ranges" properties, how do you propose
> to tell that one means "this is the extent of what I can address, please
> set my masks and dma-range-map accordingly and try to allocate things
> where I can reach them" while the other means "take this output range
> away from the page allocator and hook it up as my dedicated bounce pool,
> because it is Serious Security Time"? Especially since getting that
> choice wrong either way would be a Bad Thing.

Either we have some heuristic based on the size or we add some hint.
The point is let's build on what we already have for defining DMA
accessible memory in DT rather than some parallel mechanism.

Rob

2021-01-21 17:35:46

by Robin Murphy

[permalink] [raw]
Subject: Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

On 2021-01-21 15:48, Rob Herring wrote:
> On Wed, Jan 20, 2021 at 7:10 PM Robin Murphy <[email protected]>
> wrote:
>>
>> On 2021-01-20 21:31, Rob Herring wrote:
>>> On Wed, Jan 20, 2021 at 11:30 AM Robin Murphy
>>> <[email protected]> wrote:
>>>>
>>>> On 2021-01-20 16:53, Rob Herring wrote:
>>>>> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang
>>>>> wrote:
>>>>>> Introduce the new compatible string, restricted-dma-pool,
>>>>>> for restricted DMA. One can specify the address and length
>>>>>> of the restricted DMA memory region by restricted-dma-pool
>>>>>> in the device tree.
>>>>>
>>>>> If this goes into DT, I think we should be able to use
>>>>> dma-ranges for this purpose instead. Normally, 'dma-ranges'
>>>>> is for physical bus restrictions, but there's no reason it
>>>>> can't be used for policy or to express restrictions the
>>>>> firmware has enabled.
>>>>
>>>> There would still need to be some way to tell SWIOTLB to pick
>>>> up the corresponding chunk of memory and to prevent the kernel
>>>> from using it for anything else, though.
>>>
>>> Don't we already have that problem if dma-ranges had a very
>>> small range? We just get lucky because the restriction is
>>> generally much more RAM than needed.
>>
>> Not really - if a device has a naturally tiny addressing capability
>> that doesn't even cover ZONE_DMA32 where the regular SWIOTLB buffer
>> will be allocated then it's unlikely to work well, but that's just
>> crap system design. Yes, memory pressure in ZONE_DMA{32} is
>> particularly problematic for such limited devices, but it's
>> irrelevant to the issue at hand here.
>
> Yesterday's crap system design is today's security feature. Couldn't
> this feature make crap system design work better?

Indeed! Say you bring out your shiny new "Strawberry Flan 4" machine
with all the latest connectivity, but tragically its PCIe can only
address 25% of the RAM. So you decide to support deploying it in two
configurations: one where it runs normally for best performance, and
another "secure" one where it dedicates that quarter of RAM as a
restricted DMA pool for any PCIe devices - that way, even if that hotel
projector you plug in turns out to be a rogue Thunderbolt endpoint, it
can never snarf your private keys off your eMMC out of the page cache.

(Yes, is is the thinnest of strawmen, but it sets the scene for the
point you raised...)

...which is that in both cases the dma-ranges will still be identical.
So how is the kernel going to know whether to steal that whole area from
memblock before anything else can allocate from it, or not?

I don't disagree that even in Claire's original intended case it would
be semantically correct to describe the hardware-firewalled region with
dma-ranges. It just turns out not to be necessary, and you're already
arguing for not adding anything in DT that doesn't need to be.

>> What we have here is a device that's not allowed to see *kernel*
>> memory at all. It's been artificially constrained to a particular
>> region by a TZASC or similar, and the only data which should ever
>> be placed in that
>
> May have been constrained, but that's entirely optional.
>
> In the optional case where the setup is entirely up to the OS, I
> don't think this belongs in the DT at all. Perhaps that should be
> solved first.

Yes! Let's definitely consider that case! Say you don't have any
security or physical limitations but want to use a bounce pool for some
device anyway because reasons (perhaps copying streaming DMA data to a
better guaranteed alignment gives an overall performance win). Now the
*only* relevant thing to communicate to the kernel is to, ahem, reserve
a large chunk of memory, and use it for this special purpose. Isn't that
literally what reserved-memory bindings are for?

>> region is data intended for that device to see. That way if it
>> tries to go rogue it physically can't start slurping data intended
>> for other devices or not mapped for DMA at all. The bouncing is an
>> important part of this - I forget the title off-hand but there was
>> an interesting paper a few years ago which demonstrated that even
>> with an IOMMU, streaming DMA of in-place buffers could reveal
>> enough adjacent data from the same page to mount an attack on the
>> system. Memory pressure should be immaterial since the size of each
>> bounce pool carveout will presumably be tuned for the needs of the
>> given device.
>>
>>> In any case, wouldn't finding all the dma-ranges do this? We're
>>> already walking the tree to find the max DMA address now.
>>
>> If all you can see are two "dma-ranges" properties, how do you
>> propose to tell that one means "this is the extent of what I can
>> address, please set my masks and dma-range-map accordingly and try
>> to allocate things where I can reach them" while the other means
>> "take this output range away from the page allocator and hook it up
>> as my dedicated bounce pool, because it is Serious Security Time"?
>> Especially since getting that choice wrong either way would be a
>> Bad Thing.
>
> Either we have some heuristic based on the size or we add some hint.
> The point is let's build on what we already have for defining DMA
> accessible memory in DT rather than some parallel mechanism.

The point I'm trying to bang home is that it's really not about the DMA
accessibility, it's about the purpose of the memory itself. Even when
DMA accessibility *is* relevant it's already implied by that purpose,
from the point of view of the implementation. The only difference it
might make is to the end user if they want to ascertain whether the
presence of such a pool represents protection against an untrusted
device or just some DMA optimisation tweak.

Robin.

2021-01-25 05:29:47

by Jon Masters

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

On 1/7/21 1:09 PM, Florian Fainelli wrote:
> On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:
>> On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
>>> Hi Greg and Konrad,
>>>
>>> This change is intended to be non-arch specific. Any arch that lacks DMA access
>>> control and has devices not behind an IOMMU can make use of it. Could you share
>>> why you think this should be arch specific?
>>
>> The idea behind non-arch specific code is it to be generic. The devicetree
>> is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
>> be in arch specific code.
>
> In premise the same code could be used with an ACPI enabled system with
> an appropriate service to identify the restricted DMA regions and unlock
> them.
>
> More than 1 architecture requiring this function (ARM and ARM64 are the
> two I can think of needing this immediately) sort of calls for making
> the code architecture agnostic since past 2, you need something that scales.
>
> There is already code today under kernel/dma/contiguous.c that is only
> activated on a CONFIG_OF=y && CONFIG_OF_RESERVED_MEM=y system, this is
> no different.

<unrelated to these patches, which are useful for the case cited>

Just a note for history/archives that this approach would not be
appropriate on general purpose Arm systems, such as SystemReady-ES
edge/non-server platforms seeking to run general purpose distros. I want
to have that in the record before someone at Arm (or NVidia, or a bunch
of others that come to mind who have memory firewalls) gets an idea.

If you're working at an Arm vendor and come looking at this later
thinking "wow, what a great idea!", please fix your hardware to have a
real IOMMU/SMMU and real PCIe. You'll be pointed at this reply.

Jon.

--
Computer Architect