This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.
For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).
To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).
[1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
[4] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
v15:
- Apply Will's diff (https://lore.kernel.org/patchwork/patch/1448957/#1647521)
to fix the crash reported by Qian.
- Add Stefano's Acked-by tag for patch 01/12 from v14
v14:
- Move set_memory_decrypted before swiotlb_init_io_tlb_mem (patch 01/12, 10,12)
- Add Stefano's Acked-by tag from v13
https://lore.kernel.org/patchwork/cover/1448954/
v13:
- Fix xen-swiotlb issues
- memset in patch 01/12
- is_swiotlb_force_bounce in patch 06/12
- Fix the dts example typo in reserved-memory.txt
- Add Stefano and Will's Tested-by tag from v12
https://lore.kernel.org/patchwork/cover/1448001/
v12:
Split is_dev_swiotlb_force into is_swiotlb_force_bounce (patch 06/12) and
is_swiotlb_for_alloc (patch 09/12)
https://lore.kernel.org/patchwork/cover/1447254/
v11:
- Rebase against swiotlb devel/for-linus-5.14
- s/mempry/memory/g
- exchange the order of patch 09/12 and 10/12
https://lore.kernel.org/patchwork/cover/1447216/
v10:
Address the comments in v9 to
- fix the dev->dma_io_tlb_mem assignment
- propagate swiotlb_force setting into io_tlb_default_mem->force
- move set_memory_decrypted out of swiotlb_init_io_tlb_mem
- move debugfs_dir declaration into the main CONFIG_DEBUG_FS block
- add swiotlb_ prefix to find_slots and release_slots
- merge the 3 alloc/free related patches
- move the CONFIG_DMA_RESTRICTED_POOL later
https://lore.kernel.org/patchwork/cover/1446882/
v9:
Address the comments in v7 to
- set swiotlb active pool to dev->dma_io_tlb_mem
- get rid of get_io_tlb_mem
- dig out the device struct for is_swiotlb_active
- move debugfs_create_dir out of swiotlb_create_debugfs
- do set_memory_decrypted conditionally in swiotlb_init_io_tlb_mem
- use IS_ENABLED in kernel/dma/direct.c
- fix redefinition of 'of_dma_set_restricted_buffer'
https://lore.kernel.org/patchwork/cover/1445081/
v8:
- Fix reserved-memory.txt and add the reg property in example.
- Fix sizeof for of_property_count_elems_of_size in
drivers/of/address.c#of_dma_set_restricted_buffer.
- Apply Will's suggestion to try the OF node having DMA configuration in
drivers/of/address.c#of_dma_set_restricted_buffer.
- Fix typo in the comment of drivers/of/address.c#of_dma_set_restricted_buffer.
- Add error message for PageHighMem in
kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to
rmem_swiotlb_setup.
- Fix the message string in rmem_swiotlb_setup.
https://lore.kernel.org/patchwork/cover/1437112/
v7:
Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init
https://lore.kernel.org/patchwork/cover/1431031/
v6:
Address the comments in v5
https://lore.kernel.org/patchwork/cover/1423201/
v5:
Rebase on latest linux-next
https://lore.kernel.org/patchwork/cover/1416899/
v4:
- Fix spinlock bad magic
- Use rmem->name for debugfs entry
- Address the comments in v3
https://lore.kernel.org/patchwork/cover/1378113/
v3:
Using only one reserved memory region for both streaming DMA and memory
allocation.
https://lore.kernel.org/patchwork/cover/1360992/
v2:
Building on top of swiotlb.
https://lore.kernel.org/patchwork/cover/1280705/
v1:
Using dma_map_ops.
https://lore.kernel.org/patchwork/cover/1271660/
Claire Chang (12):
swiotlb: Refactor swiotlb init functions
swiotlb: Refactor swiotlb_create_debugfs
swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
swiotlb: Update is_swiotlb_buffer to add a struct device argument
swiotlb: Update is_swiotlb_active to add a struct device argument
swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
swiotlb: Move alloc_size to swiotlb_find_slots
swiotlb: Refactor swiotlb_tbl_unmap_single
swiotlb: Add restricted DMA alloc/free support
swiotlb: Add restricted DMA pool initialization
dt-bindings: of: Add restricted DMA pool
of: Add plumbing for restricted DMA pool
.../reserved-memory/reserved-memory.txt | 36 ++-
drivers/base/core.c | 4 +
drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 +-
drivers/iommu/dma-iommu.c | 12 +-
drivers/of/address.c | 33 +++
drivers/of/device.c | 3 +
drivers/of/of_private.h | 6 +
drivers/pci/xen-pcifront.c | 2 +-
drivers/xen/swiotlb-xen.c | 4 +-
include/linux/device.h | 4 +
include/linux/swiotlb.h | 53 +++-
kernel/dma/Kconfig | 14 +
kernel/dma/direct.c | 59 ++--
kernel/dma/direct.h | 8 +-
kernel/dma/swiotlb.c | 251 +++++++++++++-----
16 files changed, 390 insertions(+), 103 deletions(-)
--
2.32.0.288.g62a8d224e6-goog
Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
initialization to make the code reusable.
Signed-off-by: Claire Chang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
---
kernel/dma/swiotlb.c | 50 ++++++++++++++++++++++----------------------
1 file changed, 25 insertions(+), 25 deletions(-)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 52e2ac526757..1f9b2b9e7490 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
}
-int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+ unsigned long nslabs, bool late_alloc)
{
+ void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+
+ mem->nslabs = nslabs;
+ mem->start = start;
+ mem->end = mem->start + bytes;
+ mem->index = 0;
+ mem->late_alloc = late_alloc;
+ spin_lock_init(&mem->lock);
+ for (i = 0; i < mem->nslabs; i++) {
+ mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
+ mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
+ mem->slots[i].alloc_size = 0;
+ }
+ memset(vaddr, 0, bytes);
+}
+
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+{
struct io_tlb_mem *mem;
size_t alloc_size;
@@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
if (!mem)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
__func__, alloc_size, PAGE_SIZE);
- mem->nslabs = nslabs;
- mem->start = __pa(tlb);
- mem->end = mem->start + bytes;
- mem->index = 0;
- spin_lock_init(&mem->lock);
- for (i = 0; i < mem->nslabs; i++) {
- mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
- mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
- mem->slots[i].alloc_size = 0;
- }
+
+ swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
io_tlb_default_mem = mem;
if (verbose)
@@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
int
swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
{
- unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
struct io_tlb_mem *mem;
+ unsigned long bytes = nslabs << IO_TLB_SHIFT;
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
@@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
if (!mem)
return -ENOMEM;
- mem->nslabs = nslabs;
- mem->start = virt_to_phys(tlb);
- mem->end = mem->start + bytes;
- mem->index = 0;
- mem->late_alloc = 1;
- spin_lock_init(&mem->lock);
- for (i = 0; i < mem->nslabs; i++) {
- mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
- mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
- mem->slots[i].alloc_size = 0;
- }
-
+ memset(mem, 0, sizeof(*mem));
set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
- memset(tlb, 0, bytes);
+ swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
io_tlb_default_mem = mem;
swiotlb_print_info();
--
2.32.0.288.g62a8d224e6-goog
Split the debugfs creation to make the code reusable for supporting
different bounce buffer pools.
Signed-off-by: Claire Chang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
---
kernel/dma/swiotlb.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1f9b2b9e7490..ede66df6835b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -671,19 +671,26 @@ bool is_swiotlb_active(void)
EXPORT_SYMBOL_GPL(is_swiotlb_active);
#ifdef CONFIG_DEBUG_FS
+static struct dentry *debugfs_dir;
-static int __init swiotlb_create_debugfs(void)
+static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem)
{
- struct io_tlb_mem *mem = io_tlb_default_mem;
-
- if (!mem)
- return 0;
- mem->debugfs = debugfs_create_dir("swiotlb", NULL);
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, &mem->nslabs);
debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, &mem->used);
+}
+
+static int __init swiotlb_create_default_debugfs(void)
+{
+ struct io_tlb_mem *mem = io_tlb_default_mem;
+
+ debugfs_dir = debugfs_create_dir("swiotlb", NULL);
+ if (mem) {
+ mem->debugfs = debugfs_dir;
+ swiotlb_create_debugfs_files(mem);
+ }
return 0;
}
-late_initcall(swiotlb_create_debugfs);
+late_initcall(swiotlb_create_default_debugfs);
#endif
--
2.32.0.288.g62a8d224e6-goog
Rename find_slots to swiotlb_find_slots and move the maintenance of
alloc_size to it for better code reusability later.
Signed-off-by: Claire Chang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
---
kernel/dma/swiotlb.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0d294bbf274c..b41d16e92cf6 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -432,8 +432,8 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index)
* Find a suitable number of IO TLB entries size that will fit this request and
* allocate a buffer from that IO TLB pool.
*/
-static int find_slots(struct device *dev, phys_addr_t orig_addr,
- size_t alloc_size)
+static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
+ size_t alloc_size)
{
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
@@ -444,6 +444,7 @@ static int find_slots(struct device *dev, phys_addr_t orig_addr,
dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
unsigned int nslots = nr_slots(alloc_size), stride;
unsigned int index, wrap, count = 0, i;
+ unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned long flags;
BUG_ON(!nslots);
@@ -488,8 +489,11 @@ static int find_slots(struct device *dev, phys_addr_t orig_addr,
return -1;
found:
- for (i = index; i < index + nslots; i++)
+ for (i = index; i < index + nslots; i++) {
mem->slots[i].list = 0;
+ mem->slots[i].alloc_size =
+ alloc_size - (offset + ((i - index) << IO_TLB_SHIFT));
+ }
for (i = index - 1;
io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
mem->slots[i].list; i--)
@@ -530,7 +534,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
return (phys_addr_t)DMA_MAPPING_ERROR;
}
- index = find_slots(dev, orig_addr, alloc_size + offset);
+ index = swiotlb_find_slots(dev, orig_addr, alloc_size + offset);
if (index == -1) {
if (!(attrs & DMA_ATTR_NO_WARN))
dev_warn_ratelimited(dev,
@@ -544,11 +548,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
* This is needed when we sync the memory. Then we sync the buffer if
* needed.
*/
- for (i = 0; i < nr_slots(alloc_size + offset); i++) {
+ for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
- mem->slots[index + i].alloc_size =
- alloc_size - (i << IO_TLB_SHIFT);
- }
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
--
2.32.0.288.g62a8d224e6-goog
Add a new function, swiotlb_release_slots, to make the code reusable for
supporting different bounce buffer pools.
Signed-off-by: Claire Chang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
---
kernel/dma/swiotlb.c | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b41d16e92cf6..93752e752e76 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -557,27 +557,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
return tlb_addr;
}
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, enum dma_data_direction dir,
- unsigned long attrs)
+static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
{
- struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
+ struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long flags;
- unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+ unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
int nslots = nr_slots(mem->slots[index].alloc_size + offset);
int count, i;
- /*
- * First, sync the memory before unmapping the entry
- */
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
-
/*
* Return the buffer to the free list by setting the corresponding
* entries to indicate the number of contiguous entries available.
@@ -612,6 +600,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
spin_unlock_irqrestore(&mem->lock, flags);
}
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
+ size_t mapping_size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ /*
+ * First, sync the memory before unmapping the entry
+ */
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+ (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
+ swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+
+ swiotlb_release_slots(dev, tlb_addr);
+}
+
void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
{
--
2.32.0.288.g62a8d224e6-goog
Update is_swiotlb_buffer to add a struct device argument. This will be
useful later to allow for different pools.
Signed-off-by: Claire Chang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
---
drivers/iommu/dma-iommu.c | 12 ++++++------
drivers/xen/swiotlb-xen.c | 2 +-
include/linux/swiotlb.h | 7 ++++---
kernel/dma/direct.c | 6 +++---
kernel/dma/direct.h | 6 +++---
5 files changed, 17 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 3087d9fa6065..10997ef541f8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -507,7 +507,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
__iommu_dma_unmap(dev, dma_addr, size);
- if (unlikely(is_swiotlb_buffer(phys)))
+ if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
}
@@ -578,7 +578,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
}
iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
- if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+ if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
return iova;
}
@@ -749,7 +749,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(phys, size, dir);
- if (is_swiotlb_buffer(phys))
+ if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_cpu(dev, phys, size, dir);
}
@@ -762,7 +762,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
return;
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
- if (is_swiotlb_buffer(phys))
+ if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_device(dev, phys, size, dir);
if (!dev_is_dma_coherent(dev))
@@ -783,7 +783,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
- if (is_swiotlb_buffer(sg_phys(sg)))
+ if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
sg->length, dir);
}
@@ -800,7 +800,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
return;
for_each_sg(sgl, sg, nelems, i) {
- if (is_swiotlb_buffer(sg_phys(sg)))
+ if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_device(dev, sg_phys(sg),
sg->length, dir);
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 4c89afc0df62..0c6ed09f8513 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
* in our domain. Therefore _only_ check address within our domain.
*/
if (pfn_valid(PFN_DOWN(paddr)))
- return is_swiotlb_buffer(paddr);
+ return is_swiotlb_buffer(dev, paddr);
return 0;
}
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..d1f3d95881cd 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -2,6 +2,7 @@
#ifndef __LINUX_SWIOTLB_H
#define __LINUX_SWIOTLB_H
+#include <linux/device.h>
#include <linux/dma-direction.h>
#include <linux/init.h>
#include <linux/types.h>
@@ -101,9 +102,9 @@ struct io_tlb_mem {
};
extern struct io_tlb_mem *io_tlb_default_mem;
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
{
- struct io_tlb_mem *mem = io_tlb_default_mem;
+ struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
return mem && paddr >= mem->start && paddr < mem->end;
}
@@ -115,7 +116,7 @@ bool is_swiotlb_active(void);
void __init swiotlb_adjust_size(unsigned long size);
#else
#define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
{
return false;
}
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..84c9feb5474a 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
for_each_sg(sgl, sg, nents, i) {
phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
- if (unlikely(is_swiotlb_buffer(paddr)))
+ if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_sync_single_for_device(dev, paddr, sg->length,
dir);
@@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(paddr, sg->length, dir);
- if (unlikely(is_swiotlb_buffer(paddr)))
+ if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_sync_single_for_cpu(dev, paddr, sg->length,
dir);
@@ -504,7 +504,7 @@ size_t dma_direct_max_mapping_size(struct device *dev)
bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
{
return !dev_is_dma_coherent(dev) ||
- is_swiotlb_buffer(dma_to_phys(dev, dma_addr));
+ is_swiotlb_buffer(dev, dma_to_phys(dev, dma_addr));
}
/**
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 50afc05b6f1d..13e9e7158d94 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -56,7 +56,7 @@ static inline void dma_direct_sync_single_for_device(struct device *dev,
{
phys_addr_t paddr = dma_to_phys(dev, addr);
- if (unlikely(is_swiotlb_buffer(paddr)))
+ if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_sync_single_for_device(dev, paddr, size, dir);
if (!dev_is_dma_coherent(dev))
@@ -73,7 +73,7 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
arch_sync_dma_for_cpu_all();
}
- if (unlikely(is_swiotlb_buffer(paddr)))
+ if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
if (dir == DMA_FROM_DEVICE)
@@ -113,7 +113,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
dma_direct_sync_single_for_cpu(dev, addr, size, dir);
- if (unlikely(is_swiotlb_buffer(phys)))
+ if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
}
#endif /* _KERNEL_DMA_DIRECT_H */
--
2.32.0.288.g62a8d224e6-goog
Add the initialization function to create restricted DMA pools from
matching reserved-memory nodes.
Regardless of swiotlb setting, the restricted DMA pool is preferred if
available.
The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.
Signed-off-by: Claire Chang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
---
include/linux/swiotlb.h | 3 +-
kernel/dma/Kconfig | 14 ++++++++
kernel/dma/swiotlb.c | 76 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 92 insertions(+), 1 deletion(-)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 3b9454d1e498..39284ff2a6cd 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,7 +73,8 @@ extern enum swiotlb_force swiotlb_force;
* range check to see if the memory was in fact allocated by this
* API.
* @nslabs: The number of IO TLB blocks (in groups of 64) between @start and
- * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @end. For default swiotlb, this is command line adjustable via
+ * setup_io_tlb_npages.
* @used: The number of used IO TLB block.
* @list: The free list describing the number of free entries available
* from each index.
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 77b405508743..3e961dc39634 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -80,6 +80,20 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
+config DMA_RESTRICTED_POOL
+ bool "DMA Restricted Pool"
+ depends on OF && OF_RESERVED_MEM
+ select SWIOTLB
+ help
+ This enables support for restricted DMA pools which provide a level of
+ DMA memory protection on systems with limited hardware protection
+ capabilities, such as those lacking an IOMMU.
+
+ For more information see
+ <Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt>
+ and <kernel/dma/swiotlb.c>.
+ If unsure, say "n".
+
#
# Should be selected if we can mmap non-coherent mappings to userspace.
# The only thing that is really required is a way to set an uncached bit
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6a7c6e30eb4b..3baf49c9b766 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -39,6 +39,13 @@
#ifdef CONFIG_DEBUG_FS
#include <linux/debugfs.h>
#endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+#include <linux/io.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+#include <linux/of_reserved_mem.h>
+#include <linux/slab.h>
+#endif
#include <asm/io.h>
#include <asm/dma.h>
@@ -737,4 +744,73 @@ bool swiotlb_free(struct device *dev, struct page *page, size_t size)
return true;
}
+static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
+ struct device *dev)
+{
+ struct io_tlb_mem *mem = rmem->priv;
+ unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
+
+ /*
+ * Since multiple devices can share the same pool, the private data,
+ * io_tlb_mem struct, will be initialized by the first device attached
+ * to it.
+ */
+ if (!mem) {
+ mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
+ if (!mem)
+ return -ENOMEM;
+
+ set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
+ rmem->size >> PAGE_SHIFT);
+ swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
+ mem->force_bounce = true;
+ mem->for_alloc = true;
+
+ rmem->priv = mem;
+
+ if (IS_ENABLED(CONFIG_DEBUG_FS)) {
+ mem->debugfs =
+ debugfs_create_dir(rmem->name, debugfs_dir);
+ swiotlb_create_debugfs_files(mem);
+ }
+ }
+
+ dev->dma_io_tlb_mem = mem;
+
+ return 0;
+}
+
+static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
+ struct device *dev)
+{
+ dev->dma_io_tlb_mem = io_tlb_default_mem;
+}
+
+static const struct reserved_mem_ops rmem_swiotlb_ops = {
+ .device_init = rmem_swiotlb_device_init,
+ .device_release = rmem_swiotlb_device_release,
+};
+
+static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
+{
+ unsigned long node = rmem->fdt_node;
+
+ if (of_get_flat_dt_prop(node, "reusable", NULL) ||
+ of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
+ of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
+ of_get_flat_dt_prop(node, "no-map", NULL))
+ return -EINVAL;
+
+ if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base)))) {
+ pr_err("Restricted DMA pool must be accessible within the linear mapping.");
+ return -EINVAL;
+ }
+
+ rmem->ops = &rmem_swiotlb_ops;
+ pr_info("Reserved memory: created restricted DMA pool at %pa, size %ld MiB\n",
+ &rmem->base, (unsigned long)rmem->size / SZ_1M);
+ return 0;
+}
+
+RESERVEDMEM_OF_DECLARE(dma, "restricted-dma-pool", rmem_swiotlb_setup);
#endif /* CONFIG_DMA_RESTRICTED_POOL */
--
2.32.0.288.g62a8d224e6-goog
Always have the pointer to the swiotlb pool used in struct device. This
could help simplify the code for other pools.
Signed-off-by: Claire Chang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
---
drivers/base/core.c | 4 ++++
include/linux/device.h | 4 ++++
kernel/dma/swiotlb.c | 8 ++++----
3 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index f29839382f81..cb3123e3954d 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,7 @@
#include <linux/netdevice.h>
#include <linux/sched/signal.h>
#include <linux/sched/mm.h>
+#include <linux/swiotlb.h>
#include <linux/sysfs.h>
#include <linux/dma-map-ops.h> /* for dma_default_coherent */
@@ -2736,6 +2737,9 @@ void device_initialize(struct device *dev)
defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
#endif
+#ifdef CONFIG_SWIOTLB
+ dev->dma_io_tlb_mem = io_tlb_default_mem;
+#endif
}
EXPORT_SYMBOL_GPL(device_initialize);
diff --git a/include/linux/device.h b/include/linux/device.h
index ba660731bd25..240d652a0696 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -416,6 +416,7 @@ struct dev_links_info {
* @dma_pools: Dma pools (if dma'ble device).
* @dma_mem: Internal for coherent mem override.
* @cma_area: Contiguous memory area for dma allocations
+ * @dma_io_tlb_mem: Pointer to the swiotlb pool used. Not for driver use.
* @archdata: For arch-specific additions.
* @of_node: Associated device tree node.
* @fwnode: Associated device node supplied by platform firmware.
@@ -518,6 +519,9 @@ struct device {
#ifdef CONFIG_DMA_CMA
struct cma *cma_area; /* contiguous memory area for dma
allocations */
+#endif
+#ifdef CONFIG_SWIOTLB
+ struct io_tlb_mem *dma_io_tlb_mem;
#endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ede66df6835b..72a4289faed1 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -340,7 +340,7 @@ void __init swiotlb_exit(void)
static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size,
enum dma_data_direction dir)
{
- struct io_tlb_mem *mem = io_tlb_default_mem;
+ struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
phys_addr_t orig_addr = mem->slots[index].orig_addr;
@@ -431,7 +431,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index)
static int find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size)
{
- struct io_tlb_mem *mem = io_tlb_default_mem;
+ struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -508,7 +508,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
{
- struct io_tlb_mem *mem = io_tlb_default_mem;
+ struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int i;
int index;
@@ -559,7 +559,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
size_t mapping_size, enum dma_data_direction dir,
unsigned long attrs)
{
- struct io_tlb_mem *mem = io_tlb_default_mem;
+ struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
unsigned long flags;
unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
--
2.32.0.288.g62a8d224e6-goog
Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the reserved-memory node.
Signed-off-by: Claire Chang <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
---
.../reserved-memory/reserved-memory.txt | 36 +++++++++++++++++--
1 file changed, 33 insertions(+), 3 deletions(-)
diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..39b5f4c5a511 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,23 @@ compatible (optional) - standard definition
used as a shared pool of DMA buffers for a set of devices. It can
be used by an operating system to instantiate the necessary pool
management subsystem if necessary.
+ - restricted-dma-pool: This indicates a region of memory meant to be
+ used as a pool of restricted DMA buffers for a set of devices. The
+ memory region would be the only region accessible to those devices.
+ When using this, the no-map and reusable properties must not be set,
+ so the operating system can create a virtual mapping that will be used
+ for synchronization. The main purpose for restricted DMA is to
+ mitigate the lack of DMA access control on systems without an IOMMU,
+ which could result in the DMA accessing the system memory at
+ unexpected times and/or unexpected addresses, possibly leading to data
+ leakage or corruption. The feature on its own provides a basic level
+ of protection against the DMA overwriting buffer contents at
+ unexpected times. However, to protect against general data leakage and
+ system memory corruption, the system needs to provide way to lock down
+ the memory access, e.g., MPU. Note that since coherent allocation
+ needs remapping, one must set up another device coherent pool by
+ shared-dma-pool and use dma_alloc_from_dev_coherent instead for atomic
+ coherent allocation.
- vendor specific string in the form <vendor>,[<device>-]<usage>
no-map (optional) - empty property
- Indicates the operating system must not create a virtual mapping
@@ -85,10 +102,11 @@ memory-region-names (optional) - a list of names, one for each corresponding
Example
-------
-This example defines 3 contiguous regions are defined for Linux kernel:
+This example defines 4 contiguous regions for Linux kernel:
one default of all device drivers (named linux,cma@72000000 and 64MiB in size),
-one dedicated to the framebuffer device (named framebuffer@78000000, 8MiB), and
-one for multimedia processing (named multimedia-memory@77000000, 64MiB).
+one dedicated to the framebuffer device (named framebuffer@78000000, 8MiB),
+one for multimedia processing (named multimedia-memory@77000000, 64MiB), and
+one for restricted dma pool (named restricted_dma_reserved@0x50000000, 64MiB).
/ {
#address-cells = <1>;
@@ -120,6 +138,11 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
compatible = "acme,multimedia-memory";
reg = <0x77000000 0x4000000>;
};
+
+ restricted_dma_reserved: restricted_dma_reserved {
+ compatible = "restricted-dma-pool";
+ reg = <0x50000000 0x4000000>;
+ };
};
/* ... */
@@ -138,4 +161,11 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
memory-region = <&multimedia_reserved>;
/* ... */
};
+
+ pcie_device: pcie_device@0,0 {
+ reg = <0x83010000 0x0 0x00000000 0x0 0x00100000
+ 0x83010000 0x0 0x00100000 0x0 0x00100000>;
+ memory-region = <&restricted_dma_reserved>;
+ /* ... */
+ };
};
--
2.32.0.288.g62a8d224e6-goog
Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
use it to determine whether to bounce the data or not. This will be
useful later to allow for different pools.
Signed-off-by: Claire Chang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
---
drivers/xen/swiotlb-xen.c | 2 +-
include/linux/swiotlb.h | 13 +++++++++++++
kernel/dma/direct.c | 2 +-
kernel/dma/direct.h | 2 +-
kernel/dma/swiotlb.c | 4 ++++
5 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 0c6ed09f8513..4730a146fa35 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -369,7 +369,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
if (dma_capable(dev, dev_addr, size, true) &&
!range_straddles_page_boundary(phys, size) &&
!xen_arch_need_swiotlb(dev, phys, dev_addr) &&
- swiotlb_force != SWIOTLB_FORCE)
+ !is_swiotlb_force_bounce(dev))
goto done;
/*
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index dd1c30a83058..da348671b0d5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force;
* unmap calls.
* @debugfs: The dentry to debugfs.
* @late_alloc: %true if allocated using the page allocator
+ * @force_bounce: %true if swiotlb bouncing is forced
*/
struct io_tlb_mem {
phys_addr_t start;
@@ -94,6 +95,7 @@ struct io_tlb_mem {
spinlock_t lock;
struct dentry *debugfs;
bool late_alloc;
+ bool force_bounce;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -109,6 +111,13 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
return mem && paddr >= mem->start && paddr < mem->end;
}
+static inline bool is_swiotlb_force_bounce(struct device *dev)
+{
+ struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+
+ return mem && mem->force_bounce;
+}
+
void __init swiotlb_exit(void);
unsigned int swiotlb_max_segment(void);
size_t swiotlb_max_mapping_size(struct device *dev);
@@ -120,6 +129,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
{
return false;
}
+static inline bool is_swiotlb_force_bounce(struct device *dev)
+{
+ return false;
+}
static inline void swiotlb_exit(void)
{
}
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 7a88c34d0867..a92465b4eb12 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev)
{
/* If SWIOTLB is active, use its maximum mapping size */
if (is_swiotlb_active(dev) &&
- (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
+ (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev)))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
}
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 13e9e7158d94..4632b0f4f72e 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
- if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+ if (is_swiotlb_force_bounce(dev))
return swiotlb_map(dev, phys, size, dir, attrs);
if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8a120f42340b..0d294bbf274c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -179,6 +179,10 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
mem->end = mem->start + bytes;
mem->index = 0;
mem->late_alloc = late_alloc;
+
+ if (swiotlb_force == SWIOTLB_FORCE)
+ mem->force_bounce = true;
+
spin_lock_init(&mem->lock);
for (i = 0; i < mem->nslabs; i++) {
mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
--
2.32.0.288.g62a8d224e6-goog
Update is_swiotlb_active to add a struct device argument. This will be
useful later to allow for different pools.
Signed-off-by: Claire Chang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
---
drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 +-
drivers/pci/xen-pcifront.c | 2 +-
include/linux/swiotlb.h | 4 ++--
kernel/dma/direct.c | 2 +-
kernel/dma/swiotlb.c | 4 ++--
6 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index a9d65fc8aa0e..4b7afa0fc85d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj)
max_order = MAX_ORDER;
#ifdef CONFIG_SWIOTLB
- if (is_swiotlb_active()) {
+ if (is_swiotlb_active(obj->base.dev->dev)) {
unsigned int max_segment;
max_segment = swiotlb_max_segment();
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 9662522aa066..be15bfd9e0ee 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
#if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
- need_swiotlb = is_swiotlb_active();
+ need_swiotlb = is_swiotlb_active(dev->dev);
#endif
ret = ttm_bo_device_init(&drm->ttm.bdev, &nouveau_bo_driver,
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index b7a8f3a1921f..0d56985bfe81 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct pcifront_device *pdev)
spin_unlock(&pcifront_dev_lock);
- if (!err && !is_swiotlb_active()) {
+ if (!err && !is_swiotlb_active(&pdev->xdev->dev)) {
err = pci_xen_swiotlb_init_late();
if (err)
dev_err(&pdev->xdev->dev, "Could not setup SWIOTLB!\n");
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d1f3d95881cd..dd1c30a83058 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -112,7 +112,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
void __init swiotlb_exit(void);
unsigned int swiotlb_max_segment(void);
size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
+bool is_swiotlb_active(struct device *dev);
void __init swiotlb_adjust_size(unsigned long size);
#else
#define swiotlb_force SWIOTLB_NO_FORCE
@@ -132,7 +132,7 @@ static inline size_t swiotlb_max_mapping_size(struct device *dev)
return SIZE_MAX;
}
-static inline bool is_swiotlb_active(void)
+static inline bool is_swiotlb_active(struct device *dev)
{
return false;
}
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 84c9feb5474a..7a88c34d0867 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
size_t dma_direct_max_mapping_size(struct device *dev)
{
/* If SWIOTLB is active, use its maximum mapping size */
- if (is_swiotlb_active() &&
+ if (is_swiotlb_active(dev) &&
(dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 72a4289faed1..8a120f42340b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -664,9 +664,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
}
-bool is_swiotlb_active(void)
+bool is_swiotlb_active(struct device *dev)
{
- return io_tlb_default_mem != NULL;
+ return dev->dma_io_tlb_mem != NULL;
}
EXPORT_SYMBOL_GPL(is_swiotlb_active);
--
2.32.0.288.g62a8d224e6-goog
If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.
Signed-off-by: Claire Chang <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
---
drivers/of/address.c | 33 +++++++++++++++++++++++++++++++++
drivers/of/device.c | 3 +++
drivers/of/of_private.h | 6 ++++++
3 files changed, 42 insertions(+)
diff --git a/drivers/of/address.c b/drivers/of/address.c
index 73ddf2540f3f..cdf700fba5c4 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
#include <linux/logic_pio.h>
#include <linux/module.h>
#include <linux/of_address.h>
+#include <linux/of_reserved_mem.h>
#include <linux/pci.h>
#include <linux/pci_regs.h>
#include <linux/sizes.h>
@@ -1022,6 +1023,38 @@ int of_dma_get_range(struct device_node *np, const struct bus_dma_region **map)
of_node_put(node);
return ret;
}
+
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np)
+{
+ struct device_node *node, *of_node = dev->of_node;
+ int count, i;
+
+ count = of_property_count_elems_of_size(of_node, "memory-region",
+ sizeof(u32));
+ /*
+ * If dev->of_node doesn't exist or doesn't contain memory-region, try
+ * the OF node having DMA configuration.
+ */
+ if (count <= 0) {
+ of_node = np;
+ count = of_property_count_elems_of_size(
+ of_node, "memory-region", sizeof(u32));
+ }
+
+ for (i = 0; i < count; i++) {
+ node = of_parse_phandle(of_node, "memory-region", i);
+ /*
+ * There might be multiple memory regions, but only one
+ * restricted-dma-pool region is allowed.
+ */
+ if (of_device_is_compatible(node, "restricted-dma-pool") &&
+ of_device_is_available(node))
+ return of_reserved_mem_device_init_by_idx(dev, of_node,
+ i);
+ }
+
+ return 0;
+}
#endif /* CONFIG_HAS_DMA */
/**
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 6cb86de404f1..e68316836a7a 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
+ if (!iommu)
+ return of_dma_set_restricted_buffer(dev, np);
+
return 0;
}
EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index d9e6a324de0a..25cebbed5f02 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -161,12 +161,18 @@ struct bus_dma_region;
#if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np);
#else
static inline int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map)
{
return -ENODEV;
}
+static inline int of_dma_set_restricted_buffer(struct device *dev,
+ struct device_node *np)
+{
+ return -ENODEV;
+}
#endif
#endif /* _LINUX_OF_PRIVATE_H */
--
2.32.0.288.g62a8d224e6-goog
Add the functions, swiotlb_{alloc,free} and is_swiotlb_for_alloc to
support the memory allocation from restricted DMA pool.
The restricted DMA pool is preferred if available.
Note that since coherent allocation needs remapping, one must set up
another device coherent pool by shared-dma-pool and use
dma_alloc_from_dev_coherent instead for atomic coherent allocation.
Signed-off-by: Claire Chang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Tested-by: Will Deacon <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
---
include/linux/swiotlb.h | 26 ++++++++++++++++++++++
kernel/dma/direct.c | 49 +++++++++++++++++++++++++++++++----------
kernel/dma/swiotlb.c | 38 ++++++++++++++++++++++++++++++--
3 files changed, 99 insertions(+), 14 deletions(-)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index da348671b0d5..3b9454d1e498 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -85,6 +85,7 @@ extern enum swiotlb_force swiotlb_force;
* @debugfs: The dentry to debugfs.
* @late_alloc: %true if allocated using the page allocator
* @force_bounce: %true if swiotlb bouncing is forced
+ * @for_alloc: %true if the pool is used for memory allocation
*/
struct io_tlb_mem {
phys_addr_t start;
@@ -96,6 +97,7 @@ struct io_tlb_mem {
struct dentry *debugfs;
bool late_alloc;
bool force_bounce;
+ bool for_alloc;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -158,4 +160,28 @@ static inline void swiotlb_adjust_size(unsigned long size)
extern void swiotlb_print_info(void);
extern void swiotlb_set_max_segment(unsigned int);
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size);
+bool swiotlb_free(struct device *dev, struct page *page, size_t size);
+
+static inline bool is_swiotlb_for_alloc(struct device *dev)
+{
+ return dev->dma_io_tlb_mem->for_alloc;
+}
+#else
+static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
+{
+ return NULL;
+}
+static inline bool swiotlb_free(struct device *dev, struct page *page,
+ size_t size)
+{
+ return false;
+}
+static inline bool is_swiotlb_for_alloc(struct device *dev)
+{
+ return false;
+}
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+
#endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index a92465b4eb12..2de33e5d302b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,15 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
}
+static void __dma_direct_free_pages(struct device *dev, struct page *page,
+ size_t size)
+{
+ if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
+ swiotlb_free(dev, page, size))
+ return;
+ dma_free_contiguous(dev, page, size);
+}
+
static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp)
{
@@ -86,6 +95,16 @@ static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
&phys_limit);
+ if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
+ is_swiotlb_for_alloc(dev)) {
+ page = swiotlb_alloc(dev, size);
+ if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
+ __dma_direct_free_pages(dev, page, size);
+ return NULL;
+ }
+ return page;
+ }
+
page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
@@ -142,7 +161,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
- !force_dma_unencrypted(dev)) {
+ !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) {
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
@@ -155,18 +174,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
}
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
- !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
- !dev_is_dma_coherent(dev))
+ !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+ !is_swiotlb_for_alloc(dev))
return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
/*
* Remapping or decrypting memory may block. If either is required and
* we can't block, allocate the memory from the atomic pools.
+ * If restricted DMA (i.e., is_swiotlb_for_alloc) is required, one must
+ * set up another device coherent pool by shared-dma-pool and use
+ * dma_alloc_from_dev_coherent instead.
*/
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
!gfpflags_allow_blocking(gfp) &&
(force_dma_unencrypted(dev) ||
- (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev))))
+ (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+ !dev_is_dma_coherent(dev))) &&
+ !is_swiotlb_for_alloc(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
/* we always manually zero the memory once we are done */
@@ -237,7 +261,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return NULL;
}
out_free_pages:
- dma_free_contiguous(dev, page, size);
+ __dma_direct_free_pages(dev, page, size);
return NULL;
}
@@ -247,15 +271,15 @@ void dma_direct_free(struct device *dev, size_t size,
unsigned int page_order = get_order(size);
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
- !force_dma_unencrypted(dev)) {
+ !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) {
/* cpu_addr is a struct page cookie, not a kernel address */
dma_free_contiguous(dev, cpu_addr, size);
return;
}
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
- !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
- !dev_is_dma_coherent(dev)) {
+ !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+ !is_swiotlb_for_alloc(dev)) {
arch_dma_free(dev, size, cpu_addr, dma_addr, attrs);
return;
}
@@ -273,7 +297,7 @@ void dma_direct_free(struct device *dev, size_t size,
else if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_CLEAR_UNCACHED))
arch_dma_clear_uncached(cpu_addr, size);
- dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
+ __dma_direct_free_pages(dev, dma_direct_to_page(dev, dma_addr), size);
}
struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
@@ -283,7 +307,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
void *ret;
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
- force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp))
+ force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
+ !is_swiotlb_for_alloc(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
page = __dma_direct_alloc_pages(dev, size, gfp);
@@ -310,7 +335,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
return page;
out_free_pages:
- dma_free_contiguous(dev, page, size);
+ __dma_direct_free_pages(dev, page, size);
return NULL;
}
@@ -329,7 +354,7 @@ void dma_direct_free_pages(struct device *dev, size_t size,
if (force_dma_unencrypted(dev))
set_memory_encrypted((unsigned long)vaddr, 1 << page_order);
- dma_free_contiguous(dev, page, size);
+ __dma_direct_free_pages(dev, page, size);
}
#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 93752e752e76..6a7c6e30eb4b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -464,8 +464,9 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
index = wrap = wrap_index(mem, ALIGN(mem->index, stride));
do {
- if ((slot_addr(tbl_dma_addr, index) & iotlb_align_mask) !=
- (orig_addr & iotlb_align_mask)) {
+ if (orig_addr &&
+ (slot_addr(tbl_dma_addr, index) & iotlb_align_mask) !=
+ (orig_addr & iotlb_align_mask)) {
index = wrap_index(mem, index + 1);
continue;
}
@@ -704,3 +705,36 @@ static int __init swiotlb_create_default_debugfs(void)
late_initcall(swiotlb_create_default_debugfs);
#endif
+
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size)
+{
+ struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+ phys_addr_t tlb_addr;
+ int index;
+
+ if (!mem)
+ return NULL;
+
+ index = swiotlb_find_slots(dev, 0, size);
+ if (index == -1)
+ return NULL;
+
+ tlb_addr = slot_addr(mem->start, index);
+
+ return pfn_to_page(PFN_DOWN(tlb_addr));
+}
+
+bool swiotlb_free(struct device *dev, struct page *page, size_t size)
+{
+ phys_addr_t tlb_addr = page_to_phys(page);
+
+ if (!is_swiotlb_buffer(dev, tlb_addr))
+ return false;
+
+ swiotlb_release_slots(dev, tlb_addr);
+
+ return true;
+}
+
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
--
2.32.0.288.g62a8d224e6-goog
On Thu, Jun 24, 2021 at 11:55:14PM +0800, Claire Chang wrote:
> This series implements mitigations for lack of DMA access control on
> systems without an IOMMU, which could result in the DMA accessing the
> system memory at unexpected times and/or unexpected addresses, possibly
> leading to data leakage or corruption.
>
> For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> not behind an IOMMU. As PCI-e, by design, gives the device full access to
> system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> full chain of exploits; [2], [3]).
>
> To mitigate the security concerns, we introduce restricted DMA. Restricted
> DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> specially allocated region and does memory allocation from the same region.
> The feature on its own provides a basic level of protection against the DMA
> overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system needs
> to provide a way to restrict the DMA to a predefined memory region (this is
> usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).
>
> [1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
> [1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
> [2] https://blade.tencent.com/en/advisories/qualpwn/
> [3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
> [4] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
>
> v15:
> - Apply Will's diff (https://lore.kernel.org/patchwork/patch/1448957/#1647521)
> to fix the crash reported by Qian.
> - Add Stefano's Acked-by tag for patch 01/12 from v14
That all should be now be on
https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/
devel/for-linus-5.14 (and linux-next)
On Fri, Jun 25, 2021 at 3:20 AM Konrad Rzeszutek Wilk
<[email protected]> wrote:
>
> On Thu, Jun 24, 2021 at 11:55:14PM +0800, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> >
> > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> > not behind an IOMMU. As PCI-e, by design, gives the device full access to
> > system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> > full chain of exploits; [2], [3]).
> >
> > To mitigate the security concerns, we introduce restricted DMA. Restricted
> > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> > specially allocated region and does memory allocation from the same region.
> > The feature on its own provides a basic level of protection against the DMA
> > overwriting buffer contents at unexpected times. However, to protect
> > against general data leakage and system memory corruption, the system needs
> > to provide a way to restrict the DMA to a predefined memory region (this is
> > usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).
> >
> > [1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
> > [1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
> > [2] https://blade.tencent.com/en/advisories/qualpwn/
> > [3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
> > [4] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
> >
> > v15:
> > - Apply Will's diff (https://lore.kernel.org/patchwork/patch/1448957/#1647521)
> > to fix the crash reported by Qian.
> > - Add Stefano's Acked-by tag for patch 01/12 from v14
>
> That all should be now be on
>
> https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/
> devel/for-linus-5.14 (and linux-next)
>
devel/for-linus-5.14 looks good. Thanks!
On Thu, Jun 24, 2021 at 03:19:48PM -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Jun 24, 2021 at 11:55:14PM +0800, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> >
> > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> > not behind an IOMMU. As PCI-e, by design, gives the device full access to
> > system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> > full chain of exploits; [2], [3]).
> >
> > To mitigate the security concerns, we introduce restricted DMA. Restricted
> > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> > specially allocated region and does memory allocation from the same region.
> > The feature on its own provides a basic level of protection against the DMA
> > overwriting buffer contents at unexpected times. However, to protect
> > against general data leakage and system memory corruption, the system needs
> > to provide a way to restrict the DMA to a predefined memory region (this is
> > usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).
> >
> > [1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
> > [1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
> > [2] https://blade.tencent.com/en/advisories/qualpwn/
> > [3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
> > [4] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
> >
> > v15:
> > - Apply Will's diff (https://lore.kernel.org/patchwork/patch/1448957/#1647521)
> > to fix the crash reported by Qian.
> > - Add Stefano's Acked-by tag for patch 01/12 from v14
>
> That all should be now be on
>
> https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/
> devel/for-linus-5.14 (and linux-next)
Thanks Konrad!
Will
On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote:
> Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
> use it to determine whether to bounce the data or not. This will be
> useful later to allow for different pools.
>
> Signed-off-by: Claire Chang <[email protected]>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Tested-by: Stefano Stabellini <[email protected]>
> Tested-by: Will Deacon <[email protected]>
> Acked-by: Stefano Stabellini <[email protected]>
This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce
for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to
get to an X session consistently (although not every single time),
presumably due to a crash in the AMDGPU driver that I see in dmesg.
I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy
to provide any further information, debug, or test patches as necessary.
Cheers,
Nathan
On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor <[email protected]> wrote:
>
> On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote:
> > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
> > use it to determine whether to bounce the data or not. This will be
> > useful later to allow for different pools.
> >
> > Signed-off-by: Claire Chang <[email protected]>
> > Reviewed-by: Christoph Hellwig <[email protected]>
> > Tested-by: Stefano Stabellini <[email protected]>
> > Tested-by: Will Deacon <[email protected]>
> > Acked-by: Stefano Stabellini <[email protected]>
>
> This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce
> for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to
> get to an X session consistently (although not every single time),
> presumably due to a crash in the AMDGPU driver that I see in dmesg.
>
> I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy
> to provide any further information, debug, or test patches as necessary.
Are you using swiotlb=force? or the swiotlb_map is called because of
!dma_capable? (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93)
`BUG: unable to handle page fault for address: 00000000003a8290` and
the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
(maybe dev->dma_io_tlb_mem) was corrupted?
The dev->dma_io_tlb_mem should be set here
(https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
through device_initialize.
I can't tell what happened from the logs, but maybe we could try KASAN
to see if it provides more clue.
Thanks,
Claire
>
> Cheers,
> Nathan
On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
> On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor <[email protected]> wrote:
> >
> > On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote:
> > > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
> > > use it to determine whether to bounce the data or not. This will be
> > > useful later to allow for different pools.
> > >
> > > Signed-off-by: Claire Chang <[email protected]>
> > > Reviewed-by: Christoph Hellwig <[email protected]>
> > > Tested-by: Stefano Stabellini <[email protected]>
> > > Tested-by: Will Deacon <[email protected]>
> > > Acked-by: Stefano Stabellini <[email protected]>
> >
> > This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce
> > for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to
> > get to an X session consistently (although not every single time),
> > presumably due to a crash in the AMDGPU driver that I see in dmesg.
> >
> > I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy
> > to provide any further information, debug, or test patches as necessary.
>
> Are you using swiotlb=force? or the swiotlb_map is called because of
> !dma_capable? (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93)
The command line is in the dmesg:
| Kernel command line: initrd=\amd-ucode.img initrd=\initramfs-linux-next-llvm.img root=PARTUUID=8680aa0c-cf09-4a69-8cf3-970478040ee7 rw intel_pstate=no_hwp irqpoll
but I worry that this looks _very_ similar to the issue reported by Qian
Cai which we thought we had fixed. Nathan -- is the failure deterministic?
> `BUG: unable to handle page fault for address: 00000000003a8290` and
> the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
> (maybe dev->dma_io_tlb_mem) was corrupted?
> The dev->dma_io_tlb_mem should be set here
> (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
> through device_initialize.
I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
'io_tlb_default_mem', which is a page-aligned allocation from memblock.
The spinlock is at offset 0x24 in that structure, and looking at the
register dump from the crash:
Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:ffffadb4013db9e8 EFLAGS: 00010006
Jun 29 18:28:42 hp-4300G kernel: RAX: 00000000003a8290 RBX: 0000000000000000 RCX: ffff8900572ad580
Jun 29 18:28:42 hp-4300G kernel: RDX: ffff89005653f024 RSI: 00000000000c0000 RDI: 0000000000001d17
Jun 29 18:28:42 hp-4300G kernel: RBP: 000000000a20d000 R08: 00000000000c0000 R09: 0000000000000000
Jun 29 18:28:42 hp-4300G kernel: R10: 000000000a20d000 R11: ffff89005653f000 R12: 0000000000000212
Jun 29 18:28:42 hp-4300G kernel: R13: 0000000000001000 R14: 0000000000000002 R15: 0000000000200000
Jun 29 18:28:42 hp-4300G kernel: FS: 00007f1f8898ea40(0000) GS:ffff890057280000(0000) knlGS:0000000000000000
Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 29 18:28:42 hp-4300G kernel: CR2: 00000000003a8290 CR3: 00000001020d0000 CR4: 0000000000350ee0
Jun 29 18:28:42 hp-4300G kernel: Call Trace:
Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50
Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0
Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
RDX pointing at the spinlock. Yet RAX is holding junk :/
I agree that enabling KASAN would be a good idea, but I also think we
probably need to get some more information out of swiotlb_tbl_map_single()
to see see what exactly is going wrong in there.
Will
Hi Will and Claire,
On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote:
> On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
> > On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor <[email protected]> wrote:
> > >
> > > On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote:
> > > > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
> > > > use it to determine whether to bounce the data or not. This will be
> > > > useful later to allow for different pools.
> > > >
> > > > Signed-off-by: Claire Chang <[email protected]>
> > > > Reviewed-by: Christoph Hellwig <[email protected]>
> > > > Tested-by: Stefano Stabellini <[email protected]>
> > > > Tested-by: Will Deacon <[email protected]>
> > > > Acked-by: Stefano Stabellini <[email protected]>
> > >
> > > This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce
> > > for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to
> > > get to an X session consistently (although not every single time),
> > > presumably due to a crash in the AMDGPU driver that I see in dmesg.
> > >
> > > I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy
> > > to provide any further information, debug, or test patches as necessary.
> >
> > Are you using swiotlb=force? or the swiotlb_map is called because of
> > !dma_capable? (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93)
>
> The command line is in the dmesg:
>
> | Kernel command line: initrd=\amd-ucode.img initrd=\initramfs-linux-next-llvm.img root=PARTUUID=8680aa0c-cf09-4a69-8cf3-970478040ee7 rw intel_pstate=no_hwp irqpoll
>
> but I worry that this looks _very_ similar to the issue reported by Qian
> Cai which we thought we had fixed. Nathan -- is the failure deterministic?
Yes, for the most part. It does not happen every single boot so when I
was bisecting, I did a series of seven boots and only considered the
revision good when all seven of them made it to LightDM's greeter. My
results that I notated show most bad revisions failed anywhere from four
to six times.
> > `BUG: unable to handle page fault for address: 00000000003a8290` and
> > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
> > (maybe dev->dma_io_tlb_mem) was corrupted?
> > The dev->dma_io_tlb_mem should be set here
> > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
> > through device_initialize.
>
> I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
> 'io_tlb_default_mem', which is a page-aligned allocation from memblock.
> The spinlock is at offset 0x24 in that structure, and looking at the
> register dump from the crash:
>
> Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:ffffadb4013db9e8 EFLAGS: 00010006
> Jun 29 18:28:42 hp-4300G kernel: RAX: 00000000003a8290 RBX: 0000000000000000 RCX: ffff8900572ad580
> Jun 29 18:28:42 hp-4300G kernel: RDX: ffff89005653f024 RSI: 00000000000c0000 RDI: 0000000000001d17
> Jun 29 18:28:42 hp-4300G kernel: RBP: 000000000a20d000 R08: 00000000000c0000 R09: 0000000000000000
> Jun 29 18:28:42 hp-4300G kernel: R10: 000000000a20d000 R11: ffff89005653f000 R12: 0000000000000212
> Jun 29 18:28:42 hp-4300G kernel: R13: 0000000000001000 R14: 0000000000000002 R15: 0000000000200000
> Jun 29 18:28:42 hp-4300G kernel: FS: 00007f1f8898ea40(0000) GS:ffff890057280000(0000) knlGS:0000000000000000
> Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jun 29 18:28:42 hp-4300G kernel: CR2: 00000000003a8290 CR3: 00000001020d0000 CR4: 0000000000350ee0
> Jun 29 18:28:42 hp-4300G kernel: Call Trace:
> Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50
> Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0
>
> Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
> RDX pointing at the spinlock. Yet RAX is holding junk :/
>
> I agree that enabling KASAN would be a good idea, but I also think we
> probably need to get some more information out of swiotlb_tbl_map_single()
> to see see what exactly is going wrong in there.
I can certainly enable KASAN and if there is any debug print I can add
or dump anything, let me know!
Cheers,
Nathan
On Wed, Jun 30, 2021 at 08:56:51AM -0700, Nathan Chancellor wrote:
> On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote:
> > On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
> > > `BUG: unable to handle page fault for address: 00000000003a8290` and
> > > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
> > > (maybe dev->dma_io_tlb_mem) was corrupted?
> > > The dev->dma_io_tlb_mem should be set here
> > > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
> > > through device_initialize.
> >
> > I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
> > 'io_tlb_default_mem', which is a page-aligned allocation from memblock.
> > The spinlock is at offset 0x24 in that structure, and looking at the
> > register dump from the crash:
> >
> > Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:ffffadb4013db9e8 EFLAGS: 00010006
> > Jun 29 18:28:42 hp-4300G kernel: RAX: 00000000003a8290 RBX: 0000000000000000 RCX: ffff8900572ad580
> > Jun 29 18:28:42 hp-4300G kernel: RDX: ffff89005653f024 RSI: 00000000000c0000 RDI: 0000000000001d17
> > Jun 29 18:28:42 hp-4300G kernel: RBP: 000000000a20d000 R08: 00000000000c0000 R09: 0000000000000000
> > Jun 29 18:28:42 hp-4300G kernel: R10: 000000000a20d000 R11: ffff89005653f000 R12: 0000000000000212
> > Jun 29 18:28:42 hp-4300G kernel: R13: 0000000000001000 R14: 0000000000000002 R15: 0000000000200000
> > Jun 29 18:28:42 hp-4300G kernel: FS: 00007f1f8898ea40(0000) GS:ffff890057280000(0000) knlGS:0000000000000000
> > Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Jun 29 18:28:42 hp-4300G kernel: CR2: 00000000003a8290 CR3: 00000001020d0000 CR4: 0000000000350ee0
> > Jun 29 18:28:42 hp-4300G kernel: Call Trace:
> > Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50
> > Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0
> >
> > Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
> > RDX pointing at the spinlock. Yet RAX is holding junk :/
> >
> > I agree that enabling KASAN would be a good idea, but I also think we
> > probably need to get some more information out of swiotlb_tbl_map_single()
> > to see see what exactly is going wrong in there.
>
> I can certainly enable KASAN and if there is any debug print I can add
> or dump anything, let me know!
I bit the bullet and took v5.13 with swiotlb/for-linus-5.14 merged in, built
x86 defconfig and ran it on my laptop. However, it seems to work fine!
Please can you share your .config?
Will
On 7/1/2021 12:40 AM, Will Deacon wrote:
> On Wed, Jun 30, 2021 at 08:56:51AM -0700, Nathan Chancellor wrote:
>> On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote:
>>> On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
>>>> `BUG: unable to handle page fault for address: 00000000003a8290` and
>>>> the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
>>>> (maybe dev->dma_io_tlb_mem) was corrupted?
>>>> The dev->dma_io_tlb_mem should be set here
>>>> (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
>>>> through device_initialize.
>>>
>>> I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
>>> 'io_tlb_default_mem', which is a page-aligned allocation from memblock.
>>> The spinlock is at offset 0x24 in that structure, and looking at the
>>> register dump from the crash:
>>>
>>> Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:ffffadb4013db9e8 EFLAGS: 00010006
>>> Jun 29 18:28:42 hp-4300G kernel: RAX: 00000000003a8290 RBX: 0000000000000000 RCX: ffff8900572ad580
>>> Jun 29 18:28:42 hp-4300G kernel: RDX: ffff89005653f024 RSI: 00000000000c0000 RDI: 0000000000001d17
>>> Jun 29 18:28:42 hp-4300G kernel: RBP: 000000000a20d000 R08: 00000000000c0000 R09: 0000000000000000
>>> Jun 29 18:28:42 hp-4300G kernel: R10: 000000000a20d000 R11: ffff89005653f000 R12: 0000000000000212
>>> Jun 29 18:28:42 hp-4300G kernel: R13: 0000000000001000 R14: 0000000000000002 R15: 0000000000200000
>>> Jun 29 18:28:42 hp-4300G kernel: FS: 00007f1f8898ea40(0000) GS:ffff890057280000(0000) knlGS:0000000000000000
>>> Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Jun 29 18:28:42 hp-4300G kernel: CR2: 00000000003a8290 CR3: 00000001020d0000 CR4: 0000000000350ee0
>>> Jun 29 18:28:42 hp-4300G kernel: Call Trace:
>>> Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50
>>> Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0
>>>
>>> Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
>>> RDX pointing at the spinlock. Yet RAX is holding junk :/
>>>
>>> I agree that enabling KASAN would be a good idea, but I also think we
>>> probably need to get some more information out of swiotlb_tbl_map_single()
>>> to see see what exactly is going wrong in there.
>>
>> I can certainly enable KASAN and if there is any debug print I can add
>> or dump anything, let me know!
>
> I bit the bullet and took v5.13 with swiotlb/for-linus-5.14 merged in, built
> x86 defconfig and ran it on my laptop. However, it seems to work fine!
>
> Please can you share your .config?
Sure thing, it is attached. It is just Arch Linux's config run through
olddefconfig. The original is below in case you need to diff it.
https://raw.githubusercontent.com/archlinux/svntogit-packages/9045405dc835527164f3034b3ceb9a67c7a53cd4/trunk/config
If there is anything more that I can provide, please let me know.
Cheers,
Nathan
Hi,
On Thu, Jun 24, 2021 at 11:55:26PM +0800, Claire Chang wrote:
> If a device is not behind an IOMMU, we look up the device node and set
> up the restricted DMA when the restricted-dma-pool is presented.
>
> Signed-off-by: Claire Chang <[email protected]>
> Tested-by: Stefano Stabellini <[email protected]>
> Tested-by: Will Deacon <[email protected]>
With this patch in place, all sparc and sparc64 qemu emulations
fail to boot. Symptom is that the root file system is not found.
Reverting this patch fixes the problem. Bisect log is attached.
Guenter
---
# bad: [fb0ca446157a86b75502c1636b0d81e642fe6bf1] Add linux-next specific files for 20210701
# good: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
git bisect start 'HEAD' 'v5.13'
# bad: [f63c4fda987a19b1194cc45cb72fd5bf968d9d90] Merge remote-tracking branch 'rdma/for-next'
git bisect bad f63c4fda987a19b1194cc45cb72fd5bf968d9d90
# good: [46bb5dd1d2a63e906e374e97dfd4a5e33934b1c4] Merge remote-tracking branch 'ipsec/master'
git bisect good 46bb5dd1d2a63e906e374e97dfd4a5e33934b1c4
# good: [43ba6969cfb8185353a7a6fc79070f13b9e3d6d3] Merge remote-tracking branch 'clk/clk-next'
git bisect good 43ba6969cfb8185353a7a6fc79070f13b9e3d6d3
# good: [1ca5eddcf8dca1d6345471c6404e7364af0d7019] Merge remote-tracking branch 'fuse/for-next'
git bisect good 1ca5eddcf8dca1d6345471c6404e7364af0d7019
# good: [8f6d7b3248705920187263a4e7147b0752ec7dcf] Merge remote-tracking branch 'pci/next'
git bisect good 8f6d7b3248705920187263a4e7147b0752ec7dcf
# good: [df1885a755784da3ef285f36d9230c1d090ef186] RDMA/rtrs_clt: Alloc less memory with write path fast memory registration
git bisect good df1885a755784da3ef285f36d9230c1d090ef186
# good: [93d31efb58c8ad4a66bbedbc2d082df458c04e45] Merge remote-tracking branch 'cpufreq-arm/cpufreq/arm/linux-next'
git bisect good 93d31efb58c8ad4a66bbedbc2d082df458c04e45
# good: [46308965ae6fdc7c25deb2e8c048510ae51bbe66] RDMA/irdma: Check contents of user-space irdma_mem_reg_req object
git bisect good 46308965ae6fdc7c25deb2e8c048510ae51bbe66
# good: [6de7a1d006ea9db235492b288312838d6878385f] thermal/drivers/int340x/processor_thermal: Split enumeration and processing part
git bisect good 6de7a1d006ea9db235492b288312838d6878385f
# good: [081bec2577cda3d04f6559c60b6f4e2242853520] dt-bindings: of: Add restricted DMA pool
git bisect good 081bec2577cda3d04f6559c60b6f4e2242853520
# good: [bf95ac0bcd69979af146852f6a617a60285ebbc1] Merge remote-tracking branch 'thermal/thermal/linux-next'
git bisect good bf95ac0bcd69979af146852f6a617a60285ebbc1
# good: [3d8287544223a3d2f37981c1f9ffd94d0b5e9ffc] RDMA/core: Always release restrack object
git bisect good 3d8287544223a3d2f37981c1f9ffd94d0b5e9ffc
# bad: [cff1f23fad6e0bd7d671acce0d15285c709f259c] Merge remote-tracking branch 'swiotlb/linux-next'
git bisect bad cff1f23fad6e0bd7d671acce0d15285c709f259c
# bad: [b655006619b7bccd0dc1e055bd72de5d613e7b5c] of: Add plumbing for restricted DMA pool
git bisect bad b655006619b7bccd0dc1e055bd72de5d613e7b5c
# first bad commit: [b655006619b7bccd0dc1e055bd72de5d613e7b5c] of: Add plumbing for restricted DMA pool
On 2021-07-02 04:08, Guenter Roeck wrote:
> Hi,
>
> On Thu, Jun 24, 2021 at 11:55:26PM +0800, Claire Chang wrote:
>> If a device is not behind an IOMMU, we look up the device node and set
>> up the restricted DMA when the restricted-dma-pool is presented.
>>
>> Signed-off-by: Claire Chang <[email protected]>
>> Tested-by: Stefano Stabellini <[email protected]>
>> Tested-by: Will Deacon <[email protected]>
>
> With this patch in place, all sparc and sparc64 qemu emulations
> fail to boot. Symptom is that the root file system is not found.
> Reverting this patch fixes the problem. Bisect log is attached.
Ah, OF_ADDRESS depends on !SPARC, so of_dma_configure_id() is presumably
returning an unexpected -ENODEV from the of_dma_set_restricted_buffer()
stub. That should probably be returning 0 instead, since either way it's
not an error condition for it to simply do nothing.
Robin.
>
> Guenter
>
> ---
> # bad: [fb0ca446157a86b75502c1636b0d81e642fe6bf1] Add linux-next specific files for 20210701
> # good: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
> git bisect start 'HEAD' 'v5.13'
> # bad: [f63c4fda987a19b1194cc45cb72fd5bf968d9d90] Merge remote-tracking branch 'rdma/for-next'
> git bisect bad f63c4fda987a19b1194cc45cb72fd5bf968d9d90
> # good: [46bb5dd1d2a63e906e374e97dfd4a5e33934b1c4] Merge remote-tracking branch 'ipsec/master'
> git bisect good 46bb5dd1d2a63e906e374e97dfd4a5e33934b1c4
> # good: [43ba6969cfb8185353a7a6fc79070f13b9e3d6d3] Merge remote-tracking branch 'clk/clk-next'
> git bisect good 43ba6969cfb8185353a7a6fc79070f13b9e3d6d3
> # good: [1ca5eddcf8dca1d6345471c6404e7364af0d7019] Merge remote-tracking branch 'fuse/for-next'
> git bisect good 1ca5eddcf8dca1d6345471c6404e7364af0d7019
> # good: [8f6d7b3248705920187263a4e7147b0752ec7dcf] Merge remote-tracking branch 'pci/next'
> git bisect good 8f6d7b3248705920187263a4e7147b0752ec7dcf
> # good: [df1885a755784da3ef285f36d9230c1d090ef186] RDMA/rtrs_clt: Alloc less memory with write path fast memory registration
> git bisect good df1885a755784da3ef285f36d9230c1d090ef186
> # good: [93d31efb58c8ad4a66bbedbc2d082df458c04e45] Merge remote-tracking branch 'cpufreq-arm/cpufreq/arm/linux-next'
> git bisect good 93d31efb58c8ad4a66bbedbc2d082df458c04e45
> # good: [46308965ae6fdc7c25deb2e8c048510ae51bbe66] RDMA/irdma: Check contents of user-space irdma_mem_reg_req object
> git bisect good 46308965ae6fdc7c25deb2e8c048510ae51bbe66
> # good: [6de7a1d006ea9db235492b288312838d6878385f] thermal/drivers/int340x/processor_thermal: Split enumeration and processing part
> git bisect good 6de7a1d006ea9db235492b288312838d6878385f
> # good: [081bec2577cda3d04f6559c60b6f4e2242853520] dt-bindings: of: Add restricted DMA pool
> git bisect good 081bec2577cda3d04f6559c60b6f4e2242853520
> # good: [bf95ac0bcd69979af146852f6a617a60285ebbc1] Merge remote-tracking branch 'thermal/thermal/linux-next'
> git bisect good bf95ac0bcd69979af146852f6a617a60285ebbc1
> # good: [3d8287544223a3d2f37981c1f9ffd94d0b5e9ffc] RDMA/core: Always release restrack object
> git bisect good 3d8287544223a3d2f37981c1f9ffd94d0b5e9ffc
> # bad: [cff1f23fad6e0bd7d671acce0d15285c709f259c] Merge remote-tracking branch 'swiotlb/linux-next'
> git bisect bad cff1f23fad6e0bd7d671acce0d15285c709f259c
> # bad: [b655006619b7bccd0dc1e055bd72de5d613e7b5c] of: Add plumbing for restricted DMA pool
> git bisect bad b655006619b7bccd0dc1e055bd72de5d613e7b5c
> # first bad commit: [b655006619b7bccd0dc1e055bd72de5d613e7b5c] of: Add plumbing for restricted DMA pool
>
On Fri, Jul 02, 2021 at 12:39:41PM +0100, Robin Murphy wrote:
> On 2021-07-02 04:08, Guenter Roeck wrote:
> > On Thu, Jun 24, 2021 at 11:55:26PM +0800, Claire Chang wrote:
> > > If a device is not behind an IOMMU, we look up the device node and set
> > > up the restricted DMA when the restricted-dma-pool is presented.
> > >
> > > Signed-off-by: Claire Chang <[email protected]>
> > > Tested-by: Stefano Stabellini <[email protected]>
> > > Tested-by: Will Deacon <[email protected]>
> >
> > With this patch in place, all sparc and sparc64 qemu emulations
> > fail to boot. Symptom is that the root file system is not found.
> > Reverting this patch fixes the problem. Bisect log is attached.
>
> Ah, OF_ADDRESS depends on !SPARC, so of_dma_configure_id() is presumably
> returning an unexpected -ENODEV from the of_dma_set_restricted_buffer()
> stub. That should probably be returning 0 instead, since either way it's not
> an error condition for it to simply do nothing.
Something like below?
Will
--->8
From 4d9dcb9210c1f37435b6088284e04b6b36ee8c4d Mon Sep 17 00:00:00 2001
From: Will Deacon <[email protected]>
Date: Fri, 2 Jul 2021 14:13:28 +0100
Subject: [PATCH] of: Return success from of_dma_set_restricted_buffer() when
!OF_ADDRESS
When CONFIG_OF_ADDRESS=n, of_dma_set_restricted_buffer() returns -ENODEV
and breaks the boot for sparc[64] machines. Return 0 instead, since the
function is essentially a glorified NOP in this configuration.
Cc: Claire Chang <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Reported-by: Guenter Roeck <[email protected]>
Suggested-by: Robin Murphy <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
---
drivers/of/of_private.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index 8fde97565d11..34dd548c5eac 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -173,7 +173,8 @@ static inline int of_dma_get_range(struct device_node *np,
static inline int of_dma_set_restricted_buffer(struct device *dev,
struct device_node *np)
{
- return -ENODEV;
+ /* Do nothing, successfully. */
+ return 0;
}
#endif
--
2.32.0.93.g670b81a890-goog
On 7/2/21 6:18 AM, Will Deacon wrote:
> On Fri, Jul 02, 2021 at 12:39:41PM +0100, Robin Murphy wrote:
>> On 2021-07-02 04:08, Guenter Roeck wrote:
>>> On Thu, Jun 24, 2021 at 11:55:26PM +0800, Claire Chang wrote:
>>>> If a device is not behind an IOMMU, we look up the device node and set
>>>> up the restricted DMA when the restricted-dma-pool is presented.
>>>>
>>>> Signed-off-by: Claire Chang <[email protected]>
>>>> Tested-by: Stefano Stabellini <[email protected]>
>>>> Tested-by: Will Deacon <[email protected]>
>>>
>>> With this patch in place, all sparc and sparc64 qemu emulations
>>> fail to boot. Symptom is that the root file system is not found.
>>> Reverting this patch fixes the problem. Bisect log is attached.
>>
>> Ah, OF_ADDRESS depends on !SPARC, so of_dma_configure_id() is presumably
>> returning an unexpected -ENODEV from the of_dma_set_restricted_buffer()
>> stub. That should probably be returning 0 instead, since either way it's not
>> an error condition for it to simply do nothing.
>
> Something like below?
>
Yes, that does the trick.
> Will
>
> --->8
>
>>From 4d9dcb9210c1f37435b6088284e04b6b36ee8c4d Mon Sep 17 00:00:00 2001
> From: Will Deacon <[email protected]>
> Date: Fri, 2 Jul 2021 14:13:28 +0100
> Subject: [PATCH] of: Return success from of_dma_set_restricted_buffer() when
> !OF_ADDRESS
>
> When CONFIG_OF_ADDRESS=n, of_dma_set_restricted_buffer() returns -ENODEV
> and breaks the boot for sparc[64] machines. Return 0 instead, since the
> function is essentially a glorified NOP in this configuration.
>
> Cc: Claire Chang <[email protected]>
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Reported-by: Guenter Roeck <[email protected]>
> Suggested-by: Robin Murphy <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Will Deacon <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
> ---
> drivers/of/of_private.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
> index 8fde97565d11..34dd548c5eac 100644
> --- a/drivers/of/of_private.h
> +++ b/drivers/of/of_private.h
> @@ -173,7 +173,8 @@ static inline int of_dma_get_range(struct device_node *np,
> static inline int of_dma_set_restricted_buffer(struct device *dev,
> struct device_node *np)
> {
> - return -ENODEV;
> + /* Do nothing, successfully. */
> + return 0;
> }
> #endif
>
>
Hi Nathan,
On Thu, Jul 01, 2021 at 12:52:20AM -0700, Nathan Chancellor wrote:
> On 7/1/2021 12:40 AM, Will Deacon wrote:
> > On Wed, Jun 30, 2021 at 08:56:51AM -0700, Nathan Chancellor wrote:
> > > On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote:
> > > > On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
> > > > > `BUG: unable to handle page fault for address: 00000000003a8290` and
> > > > > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
> > > > > (maybe dev->dma_io_tlb_mem) was corrupted?
> > > > > The dev->dma_io_tlb_mem should be set here
> > > > > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
> > > > > through device_initialize.
> > > >
> > > > I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
> > > > 'io_tlb_default_mem', which is a page-aligned allocation from memblock.
> > > > The spinlock is at offset 0x24 in that structure, and looking at the
> > > > register dump from the crash:
> > > >
> > > > Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:ffffadb4013db9e8 EFLAGS: 00010006
> > > > Jun 29 18:28:42 hp-4300G kernel: RAX: 00000000003a8290 RBX: 0000000000000000 RCX: ffff8900572ad580
> > > > Jun 29 18:28:42 hp-4300G kernel: RDX: ffff89005653f024 RSI: 00000000000c0000 RDI: 0000000000001d17
> > > > Jun 29 18:28:42 hp-4300G kernel: RBP: 000000000a20d000 R08: 00000000000c0000 R09: 0000000000000000
> > > > Jun 29 18:28:42 hp-4300G kernel: R10: 000000000a20d000 R11: ffff89005653f000 R12: 0000000000000212
> > > > Jun 29 18:28:42 hp-4300G kernel: R13: 0000000000001000 R14: 0000000000000002 R15: 0000000000200000
> > > > Jun 29 18:28:42 hp-4300G kernel: FS: 00007f1f8898ea40(0000) GS:ffff890057280000(0000) knlGS:0000000000000000
> > > > Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > Jun 29 18:28:42 hp-4300G kernel: CR2: 00000000003a8290 CR3: 00000001020d0000 CR4: 0000000000350ee0
> > > > Jun 29 18:28:42 hp-4300G kernel: Call Trace:
> > > > Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50
> > > > Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0
> > > >
> > > > Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
> > > > RDX pointing at the spinlock. Yet RAX is holding junk :/
> > > >
> > > > I agree that enabling KASAN would be a good idea, but I also think we
> > > > probably need to get some more information out of swiotlb_tbl_map_single()
> > > > to see see what exactly is going wrong in there.
> > >
> > > I can certainly enable KASAN and if there is any debug print I can add
> > > or dump anything, let me know!
> >
> > I bit the bullet and took v5.13 with swiotlb/for-linus-5.14 merged in, built
> > x86 defconfig and ran it on my laptop. However, it seems to work fine!
> >
> > Please can you share your .config?
>
> Sure thing, it is attached. It is just Arch Linux's config run through
> olddefconfig. The original is below in case you need to diff it.
>
> https://raw.githubusercontent.com/archlinux/svntogit-packages/9045405dc835527164f3034b3ceb9a67c7a53cd4/trunk/config
>
> If there is anything more that I can provide, please let me know.
I eventually got this booting (for some reason it was causing LD to SEGV
trying to link it for a while...) and sadly it works fine on my laptop. Hmm.
Did you manage to try again with KASAN?
It might also be worth taking the IOMMU out of the equation, since that
interfaces differently with SWIOTLB and I couldn't figure out the code path
from the log you provided. What happens if you boot with "amd_iommu=off
swiotlb=force"?
(although word of warning here: i915 dies horribly on my laptop if I pass
swiotlb=force, even with the distro 5.10 kernel)
Will
On 2021-07-02 14:58, Will Deacon wrote:
> Hi Nathan,
>
> On Thu, Jul 01, 2021 at 12:52:20AM -0700, Nathan Chancellor wrote:
>> On 7/1/2021 12:40 AM, Will Deacon wrote:
>>> On Wed, Jun 30, 2021 at 08:56:51AM -0700, Nathan Chancellor wrote:
>>>> On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote:
>>>>> On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
>>>>>> `BUG: unable to handle page fault for address: 00000000003a8290` and
>>>>>> the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
>>>>>> (maybe dev->dma_io_tlb_mem) was corrupted?
>>>>>> The dev->dma_io_tlb_mem should be set here
>>>>>> (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
>>>>>> through device_initialize.
>>>>>
>>>>> I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
>>>>> 'io_tlb_default_mem', which is a page-aligned allocation from memblock.
>>>>> The spinlock is at offset 0x24 in that structure, and looking at the
>>>>> register dump from the crash:
>>>>>
>>>>> Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:ffffadb4013db9e8 EFLAGS: 00010006
>>>>> Jun 29 18:28:42 hp-4300G kernel: RAX: 00000000003a8290 RBX: 0000000000000000 RCX: ffff8900572ad580
>>>>> Jun 29 18:28:42 hp-4300G kernel: RDX: ffff89005653f024 RSI: 00000000000c0000 RDI: 0000000000001d17
>>>>> Jun 29 18:28:42 hp-4300G kernel: RBP: 000000000a20d000 R08: 00000000000c0000 R09: 0000000000000000
>>>>> Jun 29 18:28:42 hp-4300G kernel: R10: 000000000a20d000 R11: ffff89005653f000 R12: 0000000000000212
>>>>> Jun 29 18:28:42 hp-4300G kernel: R13: 0000000000001000 R14: 0000000000000002 R15: 0000000000200000
>>>>> Jun 29 18:28:42 hp-4300G kernel: FS: 00007f1f8898ea40(0000) GS:ffff890057280000(0000) knlGS:0000000000000000
>>>>> Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> Jun 29 18:28:42 hp-4300G kernel: CR2: 00000000003a8290 CR3: 00000001020d0000 CR4: 0000000000350ee0
>>>>> Jun 29 18:28:42 hp-4300G kernel: Call Trace:
>>>>> Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50
>>>>> Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0
>>>>>
>>>>> Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
>>>>> RDX pointing at the spinlock. Yet RAX is holding junk :/
>>>>>
>>>>> I agree that enabling KASAN would be a good idea, but I also think we
>>>>> probably need to get some more information out of swiotlb_tbl_map_single()
>>>>> to see see what exactly is going wrong in there.
>>>>
>>>> I can certainly enable KASAN and if there is any debug print I can add
>>>> or dump anything, let me know!
>>>
>>> I bit the bullet and took v5.13 with swiotlb/for-linus-5.14 merged in, built
>>> x86 defconfig and ran it on my laptop. However, it seems to work fine!
>>>
>>> Please can you share your .config?
>>
>> Sure thing, it is attached. It is just Arch Linux's config run through
>> olddefconfig. The original is below in case you need to diff it.
>>
>> https://raw.githubusercontent.com/archlinux/svntogit-packages/9045405dc835527164f3034b3ceb9a67c7a53cd4/trunk/config
>>
>> If there is anything more that I can provide, please let me know.
>
> I eventually got this booting (for some reason it was causing LD to SEGV
> trying to link it for a while...) and sadly it works fine on my laptop. Hmm.
>
> Did you manage to try again with KASAN?
>
> It might also be worth taking the IOMMU out of the equation, since that
> interfaces differently with SWIOTLB and I couldn't figure out the code path
> from the log you provided. What happens if you boot with "amd_iommu=off
> swiotlb=force"?
Oh, now there's a thing... the chat from the IOMMU API in the boot log
implies that the IOMMU *should* be in the picture - we see that default
domains are IOMMU_DOMAIN_DMA default and the GPU 0000:0c:00.0 was added
to a group. That means dev->dma_ops should be set and DMA API calls
should be going through iommu-dma, yet the callstack in the crash says
we've gone straight from dma_map_page_attrs() to swiotlb_map(), implying
the inline dma_direct_map_page() path.
If dev->dma_ops didn't look right in the first place, it's perhaps less
surprising that dev->dma_io_tlb_mem might be wild as well. It doesn't
seem plausible that we should have a race between initialising the
device and probing its driver, so maybe the whole dev pointer is getting
trampled earlier in the callchain (or is fundamentally wrong to begin
with, but from a quick skim of the amdgpu code it did look like
adev->dev and adev->pdev are appropriately set early on by
amdgpu_pci_probe()).
> (although word of warning here: i915 dies horribly on my laptop if I pass
> swiotlb=force, even with the distro 5.10 kernel)
FWIW I'd imagine you probably need to massively increase the SWIOTLB
buffer size to have hope of that working.
Robin.
Hi Will and Robin,
On Fri, Jul 02, 2021 at 04:13:50PM +0100, Robin Murphy wrote:
> On 2021-07-02 14:58, Will Deacon wrote:
> > Hi Nathan,
> >
> > On Thu, Jul 01, 2021 at 12:52:20AM -0700, Nathan Chancellor wrote:
> > > On 7/1/2021 12:40 AM, Will Deacon wrote:
> > > > On Wed, Jun 30, 2021 at 08:56:51AM -0700, Nathan Chancellor wrote:
> > > > > On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote:
> > > > > > On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
> > > > > > > `BUG: unable to handle page fault for address: 00000000003a8290` and
> > > > > > > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
> > > > > > > (maybe dev->dma_io_tlb_mem) was corrupted?
> > > > > > > The dev->dma_io_tlb_mem should be set here
> > > > > > > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
> > > > > > > through device_initialize.
> > > > > >
> > > > > > I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
> > > > > > 'io_tlb_default_mem', which is a page-aligned allocation from memblock.
> > > > > > The spinlock is at offset 0x24 in that structure, and looking at the
> > > > > > register dump from the crash:
> > > > > >
> > > > > > Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:ffffadb4013db9e8 EFLAGS: 00010006
> > > > > > Jun 29 18:28:42 hp-4300G kernel: RAX: 00000000003a8290 RBX: 0000000000000000 RCX: ffff8900572ad580
> > > > > > Jun 29 18:28:42 hp-4300G kernel: RDX: ffff89005653f024 RSI: 00000000000c0000 RDI: 0000000000001d17
> > > > > > Jun 29 18:28:42 hp-4300G kernel: RBP: 000000000a20d000 R08: 00000000000c0000 R09: 0000000000000000
> > > > > > Jun 29 18:28:42 hp-4300G kernel: R10: 000000000a20d000 R11: ffff89005653f000 R12: 0000000000000212
> > > > > > Jun 29 18:28:42 hp-4300G kernel: R13: 0000000000001000 R14: 0000000000000002 R15: 0000000000200000
> > > > > > Jun 29 18:28:42 hp-4300G kernel: FS: 00007f1f8898ea40(0000) GS:ffff890057280000(0000) knlGS:0000000000000000
> > > > > > Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > Jun 29 18:28:42 hp-4300G kernel: CR2: 00000000003a8290 CR3: 00000001020d0000 CR4: 0000000000350ee0
> > > > > > Jun 29 18:28:42 hp-4300G kernel: Call Trace:
> > > > > > Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50
> > > > > > Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0
> > > > > >
> > > > > > Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
> > > > > > RDX pointing at the spinlock. Yet RAX is holding junk :/
> > > > > >
> > > > > > I agree that enabling KASAN would be a good idea, but I also think we
> > > > > > probably need to get some more information out of swiotlb_tbl_map_single()
> > > > > > to see see what exactly is going wrong in there.
> > > > >
> > > > > I can certainly enable KASAN and if there is any debug print I can add
> > > > > or dump anything, let me know!
> > > >
> > > > I bit the bullet and took v5.13 with swiotlb/for-linus-5.14 merged in, built
> > > > x86 defconfig and ran it on my laptop. However, it seems to work fine!
> > > >
> > > > Please can you share your .config?
> > >
> > > Sure thing, it is attached. It is just Arch Linux's config run through
> > > olddefconfig. The original is below in case you need to diff it.
> > >
> > > https://raw.githubusercontent.com/archlinux/svntogit-packages/9045405dc835527164f3034b3ceb9a67c7a53cd4/trunk/config
> > >
> > > If there is anything more that I can provide, please let me know.
> >
> > I eventually got this booting (for some reason it was causing LD to SEGV
> > trying to link it for a while...) and sadly it works fine on my laptop. Hmm.
Seems like it might be something specific to the amdgpu module?
> > Did you manage to try again with KASAN?
Yes, it took a few times to reproduce the issue but I did manage to get
a dmesg, please find it attached. I build from commit 7d31f1c65cc9 ("swiotlb:
fix implicit debugfs declarations") in Konrad's tree.
> > It might also be worth taking the IOMMU out of the equation, since that
> > interfaces differently with SWIOTLB and I couldn't figure out the code path
> > from the log you provided. What happens if you boot with "amd_iommu=off
> > swiotlb=force"?
>
> Oh, now there's a thing... the chat from the IOMMU API in the boot log
> implies that the IOMMU *should* be in the picture - we see that default
> domains are IOMMU_DOMAIN_DMA default and the GPU 0000:0c:00.0 was added to a
> group. That means dev->dma_ops should be set and DMA API calls should be
> going through iommu-dma, yet the callstack in the crash says we've gone
> straight from dma_map_page_attrs() to swiotlb_map(), implying the inline
> dma_direct_map_page() path.
>
> If dev->dma_ops didn't look right in the first place, it's perhaps less
> surprising that dev->dma_io_tlb_mem might be wild as well. It doesn't seem
> plausible that we should have a race between initialising the device and
> probing its driver, so maybe the whole dev pointer is getting trampled
> earlier in the callchain (or is fundamentally wrong to begin with, but from
> a quick skim of the amdgpu code it did look like adev->dev and adev->pdev
> are appropriately set early on by amdgpu_pci_probe()).
>
> > (although word of warning here: i915 dies horribly on my laptop if I pass
> > swiotlb=force, even with the distro 5.10 kernel)
>
> FWIW I'd imagine you probably need to massively increase the SWIOTLB buffer
> size to have hope of that working.
Is it worth trying this still then?
Cheers,
Nathan
On Sat, Jul 3, 2021 at 1:55 PM Nathan Chancellor <[email protected]> wrote:
>
> Hi Will and Robin,
>
> On Fri, Jul 02, 2021 at 04:13:50PM +0100, Robin Murphy wrote:
> > On 2021-07-02 14:58, Will Deacon wrote:
> > > Hi Nathan,
> > >
> > > On Thu, Jul 01, 2021 at 12:52:20AM -0700, Nathan Chancellor wrote:
> > > > On 7/1/2021 12:40 AM, Will Deacon wrote:
> > > > > On Wed, Jun 30, 2021 at 08:56:51AM -0700, Nathan Chancellor wrote:
> > > > > > On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote:
> > > > > > > On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
> > > > > > > > `BUG: unable to handle page fault for address: 00000000003a8290` and
> > > > > > > > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
> > > > > > > > (maybe dev->dma_io_tlb_mem) was corrupted?
> > > > > > > > The dev->dma_io_tlb_mem should be set here
> > > > > > > > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
> > > > > > > > through device_initialize.
> > > > > > >
> > > > > > > I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
> > > > > > > 'io_tlb_default_mem', which is a page-aligned allocation from memblock.
> > > > > > > The spinlock is at offset 0x24 in that structure, and looking at the
> > > > > > > register dump from the crash:
> > > > > > >
> > > > > > > Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:ffffadb4013db9e8 EFLAGS: 00010006
> > > > > > > Jun 29 18:28:42 hp-4300G kernel: RAX: 00000000003a8290 RBX: 0000000000000000 RCX: ffff8900572ad580
> > > > > > > Jun 29 18:28:42 hp-4300G kernel: RDX: ffff89005653f024 RSI: 00000000000c0000 RDI: 0000000000001d17
> > > > > > > Jun 29 18:28:42 hp-4300G kernel: RBP: 000000000a20d000 R08: 00000000000c0000 R09: 0000000000000000
> > > > > > > Jun 29 18:28:42 hp-4300G kernel: R10: 000000000a20d000 R11: ffff89005653f000 R12: 0000000000000212
> > > > > > > Jun 29 18:28:42 hp-4300G kernel: R13: 0000000000001000 R14: 0000000000000002 R15: 0000000000200000
> > > > > > > Jun 29 18:28:42 hp-4300G kernel: FS: 00007f1f8898ea40(0000) GS:ffff890057280000(0000) knlGS:0000000000000000
> > > > > > > Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > Jun 29 18:28:42 hp-4300G kernel: CR2: 00000000003a8290 CR3: 00000001020d0000 CR4: 0000000000350ee0
> > > > > > > Jun 29 18:28:42 hp-4300G kernel: Call Trace:
> > > > > > > Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50
> > > > > > > Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0
> > > > > > >
> > > > > > > Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
> > > > > > > RDX pointing at the spinlock. Yet RAX is holding junk :/
> > > > > > >
> > > > > > > I agree that enabling KASAN would be a good idea, but I also think we
> > > > > > > probably need to get some more information out of swiotlb_tbl_map_single()
> > > > > > > to see see what exactly is going wrong in there.
> > > > > >
> > > > > > I can certainly enable KASAN and if there is any debug print I can add
> > > > > > or dump anything, let me know!
> > > > >
> > > > > I bit the bullet and took v5.13 with swiotlb/for-linus-5.14 merged in, built
> > > > > x86 defconfig and ran it on my laptop. However, it seems to work fine!
> > > > >
> > > > > Please can you share your .config?
> > > >
> > > > Sure thing, it is attached. It is just Arch Linux's config run through
> > > > olddefconfig. The original is below in case you need to diff it.
> > > >
> > > > https://raw.githubusercontent.com/archlinux/svntogit-packages/9045405dc835527164f3034b3ceb9a67c7a53cd4/trunk/config
> > > >
> > > > If there is anything more that I can provide, please let me know.
> > >
> > > I eventually got this booting (for some reason it was causing LD to SEGV
> > > trying to link it for a while...) and sadly it works fine on my laptop. Hmm.
>
> Seems like it might be something specific to the amdgpu module?
>
> > > Did you manage to try again with KASAN?
>
> Yes, it took a few times to reproduce the issue but I did manage to get
> a dmesg, please find it attached. I build from commit 7d31f1c65cc9 ("swiotlb:
> fix implicit debugfs declarations") in Konrad's tree.
Looking at the logs, the use-after-free bug looked somehow relevant
(and it's nvme again. Qian's crash is about nvme too):
[ 2.468288] BUG: KASAN: use-after-free in __iommu_dma_unmap_swiotlb+0x64/0xb0
[ 2.468288] Read of size 8 at addr ffff8881d7830000 by task swapper/0/0
[ 2.468288] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc3-debug #1
[ 2.468288] Hardware name: HP HP Desktop M01-F1xxx/87D6, BIOS F.12 12/17/2020
[ 2.468288] Call Trace:
[ 2.468288] <IRQ>
[ 2.479433] dump_stack+0x9c/0xcf
[ 2.479433] print_address_description.constprop.0+0x18/0x130
[ 2.479433] ? __iommu_dma_unmap_swiotlb+0x64/0xb0
[ 2.479433] kasan_report.cold+0x7f/0x111
[ 2.479433] ? __iommu_dma_unmap_swiotlb+0x64/0xb0
[ 2.479433] __iommu_dma_unmap_swiotlb+0x64/0xb0
[ 2.479433] nvme_pci_complete_rq+0x73/0x130
[ 2.479433] blk_complete_reqs+0x6f/0x80
[ 2.479433] __do_softirq+0xfc/0x3be
[ 2.479433] irq_exit_rcu+0xce/0x120
[ 2.479433] common_interrupt+0x80/0xa0
[ 2.479433] </IRQ>
[ 2.479433] asm_common_interrupt+0x1e/0x40
[ 2.479433] RIP: 0010:cpuidle_enter_state+0xf9/0x590
I wonder if this ended up unmapping something wrong and messing up the
dev->dma_io_tlb_mem (i.e. io_tlb_default_mem)?
Could you try this patch on top of 7d31f1c65cc9? This patch helps
check if we try to unmap the wrong address.
```
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b7f76bca89bf..5ac08d50a394 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -613,6 +613,21 @@ void swiotlb_tbl_unmap_single(struct device *dev,
phys_addr_t tlb_addr,
size_t mapping_size, enum dma_data_direction dir,
unsigned long attrs)
{
+ struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+ unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
+ int index;
+
+ if (!is_swiotlb_buffer(dev, tlb_addr - offset)) {
+ dev_err(dev, "%s: attempt to unmap invalid address
(0x%llx, offset=%u)\n", __func__, tlb_addr, offset);
+ return;
+ }
+
+ index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
+ if (mem->slots[index].orig_addr == INVALID_PHYS_ADDR) {
+ dev_err(dev, "%s: memory is not mapped before (0x%llx,
offset=%u)\n", __func__, tlb_addr, offset);
+ return;
+ }
+
/*
* First, sync the memory before unmapping the entry
*/
```
It might be useful to have CONFIG_SLUB_DEBUG=y, CONFIG_SLUB_DEBUG_ON=y
and line numbers (scripts/decode_stacktrace.sh) too.
Thank you so much for helping!
>
> > > It might also be worth taking the IOMMU out of the equation, since that
> > > interfaces differently with SWIOTLB and I couldn't figure out the code path
> > > from the log you provided. What happens if you boot with "amd_iommu=off
> > > swiotlb=force"?
> >
> > Oh, now there's a thing... the chat from the IOMMU API in the boot log
> > implies that the IOMMU *should* be in the picture - we see that default
> > domains are IOMMU_DOMAIN_DMA default and the GPU 0000:0c:00.0 was added to a
> > group. That means dev->dma_ops should be set and DMA API calls should be
> > going through iommu-dma, yet the callstack in the crash says we've gone
> > straight from dma_map_page_attrs() to swiotlb_map(), implying the inline
> > dma_direct_map_page() path.
> >
> > If dev->dma_ops didn't look right in the first place, it's perhaps less
> > surprising that dev->dma_io_tlb_mem might be wild as well. It doesn't seem
> > plausible that we should have a race between initialising the device and
> > probing its driver, so maybe the whole dev pointer is getting trampled
> > earlier in the callchain (or is fundamentally wrong to begin with, but from
> > a quick skim of the amdgpu code it did look like adev->dev and adev->pdev
> > are appropriately set early on by amdgpu_pci_probe()).
> >
> > > (although word of warning here: i915 dies horribly on my laptop if I pass
> > > swiotlb=force, even with the distro 5.10 kernel)
> >
> > FWIW I'd imagine you probably need to massively increase the SWIOTLB buffer
> > size to have hope of that working.
>
> Is it worth trying this still then?
>
> Cheers,
> Nathan
Hi Claire,
On Mon, Jul 05, 2021 at 03:29:34PM +0800, Claire Chang wrote:
> Looking at the logs, the use-after-free bug looked somehow relevant
> (and it's nvme again. Qian's crash is about nvme too):
>
> [ 2.468288] BUG: KASAN: use-after-free in __iommu_dma_unmap_swiotlb+0x64/0xb0
> [ 2.468288] Read of size 8 at addr ffff8881d7830000 by task swapper/0/0
>
> [ 2.468288] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc3-debug #1
> [ 2.468288] Hardware name: HP HP Desktop M01-F1xxx/87D6, BIOS F.12 12/17/2020
> [ 2.468288] Call Trace:
> [ 2.468288] <IRQ>
> [ 2.479433] dump_stack+0x9c/0xcf
> [ 2.479433] print_address_description.constprop.0+0x18/0x130
> [ 2.479433] ? __iommu_dma_unmap_swiotlb+0x64/0xb0
> [ 2.479433] kasan_report.cold+0x7f/0x111
> [ 2.479433] ? __iommu_dma_unmap_swiotlb+0x64/0xb0
> [ 2.479433] __iommu_dma_unmap_swiotlb+0x64/0xb0
> [ 2.479433] nvme_pci_complete_rq+0x73/0x130
> [ 2.479433] blk_complete_reqs+0x6f/0x80
> [ 2.479433] __do_softirq+0xfc/0x3be
> [ 2.479433] irq_exit_rcu+0xce/0x120
> [ 2.479433] common_interrupt+0x80/0xa0
> [ 2.479433] </IRQ>
> [ 2.479433] asm_common_interrupt+0x1e/0x40
> [ 2.479433] RIP: 0010:cpuidle_enter_state+0xf9/0x590
>
> I wonder if this ended up unmapping something wrong and messing up the
> dev->dma_io_tlb_mem (i.e. io_tlb_default_mem)?
>
> Could you try this patch on top of 7d31f1c65cc9? This patch helps
> check if we try to unmap the wrong address.
>
> ```
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index b7f76bca89bf..5ac08d50a394 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -613,6 +613,21 @@ void swiotlb_tbl_unmap_single(struct device *dev,
> phys_addr_t tlb_addr,
> size_t mapping_size, enum dma_data_direction dir,
> unsigned long attrs)
> {
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> + unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
> + int index;
> +
> + if (!is_swiotlb_buffer(dev, tlb_addr - offset)) {
> + dev_err(dev, "%s: attempt to unmap invalid address
> (0x%llx, offset=%u)\n", __func__, tlb_addr, offset);
> + return;
> + }
> +
> + index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> + if (mem->slots[index].orig_addr == INVALID_PHYS_ADDR) {
> + dev_err(dev, "%s: memory is not mapped before (0x%llx,
> offset=%u)\n", __func__, tlb_addr, offset);
> + return;
> + }
> +
> /*
> * First, sync the memory before unmapping the entry
> */
> ```
> It might be useful to have CONFIG_SLUB_DEBUG=y, CONFIG_SLUB_DEBUG_ON=y
> and line numbers (scripts/decode_stacktrace.sh) too.
>
> Thank you so much for helping!
Please find attached logs both decoded and not decoded, with
CONFIG_KASAN=y + CONFIG_SLUB_DEBUG_ON=y with the requested patch applied
on top of 7d31f1c65cc9.
If there is any further information I can provide, please let me know!
Cheers,
Nathan
Hi Nathan,
I may have just spotted something in these logs...
On Fri, Jul 02, 2021 at 10:55:17PM -0700, Nathan Chancellor wrote:
> [ 2.340956] pci 0000:0c:00.1: Adding to iommu group 4
> [ 2.340996] pci 0000:0c:00.2: Adding to iommu group 4
> [ 2.341038] pci 0000:0c:00.3: Adding to iommu group 4
> [ 2.341078] pci 0000:0c:00.4: Adding to iommu group 4
> [ 2.341122] pci 0000:0c:00.6: Adding to iommu group 4
> [ 2.341163] pci 0000:0d:00.0: Adding to iommu group 4
> [ 2.341203] pci 0000:0d:00.1: Adding to iommu group 4
> [ 2.361821] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
> [ 2.361839] pci 0000:00:00.2: AMD-Vi: Extended features (0x206d73ef22254ade):
> [ 2.361846] PPR X2APIC NX GT IA GA PC GA_vAPIC
> [ 2.361861] AMD-Vi: Interrupt remapping enabled
> [ 2.361865] AMD-Vi: Virtual APIC enabled
> [ 2.361870] AMD-Vi: X2APIC enabled
> [ 2.362272] AMD-Vi: Lazy IO/TLB flushing enabled
So at this point, the AMD IOMMU driver does:
swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;
where 'swiotlb' is a global variable indicating whether or not swiotlb
is in use. It's picked up a bit later on by pci_swiotlb_late_init(), which
will call swiotlb_exit() if 'swiotlb' is false.
Now, that used to work fine, because swiotlb_exit() clears
'io_tlb_default_mem' to NULL, but now with the restricted DMA changes, I
think that all the devices which have successfully probed beforehand will
have stale pointers to the freed structure in their 'dev->dma_io_tlb_mem'
field.
Will
On Mon, Jul 05, 2021 at 08:03:52PM +0100, Will Deacon wrote:
> So at this point, the AMD IOMMU driver does:
>
> swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;
>
> where 'swiotlb' is a global variable indicating whether or not swiotlb
> is in use. It's picked up a bit later on by pci_swiotlb_late_init(), which
> will call swiotlb_exit() if 'swiotlb' is false.
>
> Now, that used to work fine, because swiotlb_exit() clears
> 'io_tlb_default_mem' to NULL, but now with the restricted DMA changes, I
> think that all the devices which have successfully probed beforehand will
> have stale pointers to the freed structure in their 'dev->dma_io_tlb_mem'
> field.
Yeah. I don't think we can do that anymore, and I also think it is
a bad idea to start with.
On Tue, Jul 06, 2021 at 06:48:48AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 05, 2021 at 08:03:52PM +0100, Will Deacon wrote:
> > So at this point, the AMD IOMMU driver does:
> >
> > swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;
> >
> > where 'swiotlb' is a global variable indicating whether or not swiotlb
> > is in use. It's picked up a bit later on by pci_swiotlb_late_init(), which
> > will call swiotlb_exit() if 'swiotlb' is false.
> >
> > Now, that used to work fine, because swiotlb_exit() clears
> > 'io_tlb_default_mem' to NULL, but now with the restricted DMA changes, I
> > think that all the devices which have successfully probed beforehand will
> > have stale pointers to the freed structure in their 'dev->dma_io_tlb_mem'
> > field.
>
> Yeah. I don't think we can do that anymore, and I also think it is
> a bad idea to start with.
I've had a crack at reworking things along the following lines:
- io_tlb_default_mem now lives in the BSS, the flexible array member
is now a pointer and that part is allocated dynamically (downside of
this is an extra indirection to get at the slots).
- io_tlb_default_mem.nslabs tells you whether the thing is valid
- swiotlb_exit() frees the slots array and clears the rest of the
structure to 0. I also extended it to free the actual slabs, but I'm
not sure why it wasn't doing that before.
So a non-NULL dev->dma_io_tlb_mem should always be valid to follow.
Untested diff below... Nathan, it would be ace if you're brave enough
to give this a shot.
Will
--->8
diff --git a/drivers/base/core.c b/drivers/base/core.c
index bbad7c559901..9e1218f89e4b 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2820,7 +2820,7 @@ void device_initialize(struct device *dev)
dev->dma_coherent = dma_default_coherent;
#endif
#ifdef CONFIG_SWIOTLB
- dev->dma_io_tlb_mem = io_tlb_default_mem;
+ dev->dma_io_tlb_mem = &io_tlb_default_mem;
#endif
}
EXPORT_SYMBOL_GPL(device_initialize);
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 785ec7e8be01..f06d9b4f1e0f 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -164,7 +164,7 @@ int __ref xen_swiotlb_init(void)
int rc = -ENOMEM;
char *start;
- if (io_tlb_default_mem != NULL) {
+ if (io_tlb_default_mem.nslabs) {
pr_warn("swiotlb buffer already initialized\n");
return -EEXIST;
}
@@ -547,7 +547,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
static int
xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
{
- return xen_phys_to_dma(hwdev, io_tlb_default_mem->end - 1) <= mask;
+ return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask;
}
const struct dma_map_ops xen_swiotlb_dma_ops = {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 39284ff2a6cd..b0cb2a9973f4 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -103,9 +103,9 @@ struct io_tlb_mem {
phys_addr_t orig_addr;
size_t alloc_size;
unsigned int list;
- } slots[];
+ } *slots;
};
-extern struct io_tlb_mem *io_tlb_default_mem;
+extern struct io_tlb_mem io_tlb_default_mem;
static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
{
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0ffbaae9fba2..91cd1d413027 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -70,7 +70,7 @@
enum swiotlb_force swiotlb_force;
-struct io_tlb_mem *io_tlb_default_mem;
+struct io_tlb_mem io_tlb_default_mem;
/*
* Max segment that we can provide which (if pages are contingous) will
@@ -101,7 +101,7 @@ early_param("swiotlb", setup_io_tlb_npages);
unsigned int swiotlb_max_segment(void)
{
- return io_tlb_default_mem ? max_segment : 0;
+ return io_tlb_default_mem.nslabs ? max_segment : 0;
}
EXPORT_SYMBOL_GPL(swiotlb_max_segment);
@@ -134,9 +134,9 @@ void __init swiotlb_adjust_size(unsigned long size)
void swiotlb_print_info(void)
{
- struct io_tlb_mem *mem = io_tlb_default_mem;
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
- if (!mem) {
+ if (!mem->nslabs) {
pr_warn("No low mem\n");
return;
}
@@ -163,11 +163,11 @@ static inline unsigned long nr_slots(u64 val)
*/
void __init swiotlb_update_mem_attributes(void)
{
- struct io_tlb_mem *mem = io_tlb_default_mem;
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
void *vaddr;
unsigned long bytes;
- if (!mem || mem->late_alloc)
+ if (!mem->nslabs || mem->late_alloc)
return;
vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
@@ -201,25 +201,24 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
{
- struct io_tlb_mem *mem;
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
size_t alloc_size;
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
/* protect against double initialization */
- if (WARN_ON_ONCE(io_tlb_default_mem))
+ if (WARN_ON_ONCE(mem->nslabs))
return -ENOMEM;
- alloc_size = PAGE_ALIGN(struct_size(mem, slots, nslabs));
- mem = memblock_alloc(alloc_size, PAGE_SIZE);
- if (!mem)
+ alloc_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), nslabs));
+ mem->slots = memblock_alloc(alloc_size, PAGE_SIZE);
+ if (!mem->slots)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
__func__, alloc_size, PAGE_SIZE);
swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
- io_tlb_default_mem = mem;
if (verbose)
swiotlb_print_info();
swiotlb_set_max_segment(mem->nslabs << IO_TLB_SHIFT);
@@ -304,26 +303,24 @@ swiotlb_late_init_with_default_size(size_t default_size)
int
swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
{
- struct io_tlb_mem *mem;
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
unsigned long bytes = nslabs << IO_TLB_SHIFT;
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
/* protect against double initialization */
- if (WARN_ON_ONCE(io_tlb_default_mem))
+ if (WARN_ON_ONCE(mem->nslabs))
return -ENOMEM;
- mem = (void *)__get_free_pages(GFP_KERNEL,
- get_order(struct_size(mem, slots, nslabs)));
- if (!mem)
+ mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+ get_order(array_size(sizeof(*mem->slots), nslabs)));
+ if (!mem->slots)
return -ENOMEM;
- memset(mem, 0, sizeof(*mem));
set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
- io_tlb_default_mem = mem;
swiotlb_print_info();
swiotlb_set_max_segment(mem->nslabs << IO_TLB_SHIFT);
return 0;
@@ -331,18 +328,23 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
void __init swiotlb_exit(void)
{
- struct io_tlb_mem *mem = io_tlb_default_mem;
- size_t size;
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
+ size_t tbl_size, slots_size;
- if (!mem)
+ if (!mem->nslabs)
return;
- size = struct_size(mem, slots, mem->nslabs);
- if (mem->late_alloc)
- free_pages((unsigned long)mem, get_order(size));
- else
- memblock_free_late(__pa(mem), PAGE_ALIGN(size));
- io_tlb_default_mem = NULL;
+ tbl_size = mem->end - mem->start;
+ slots_size = array_size(sizeof(*mem->slots), mem->nslabs);
+ if (mem->late_alloc) {
+ free_pages((unsigned long)mem->start, get_order(tbl_size));
+ free_pages((unsigned long)mem->slots, get_order(slots_size));
+ } else {
+ memblock_free_late(__pa(mem->start), PAGE_ALIGN(tbl_size));
+ memblock_free_late(__pa(mem->slots), PAGE_ALIGN(slots_size));
+ }
+
+ memset(mem, 0, sizeof(*mem));
}
/*
@@ -682,7 +684,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
bool is_swiotlb_active(struct device *dev)
{
- return dev->dma_io_tlb_mem != NULL;
+ struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+
+ return mem && mem->nslabs;
}
EXPORT_SYMBOL_GPL(is_swiotlb_active);
@@ -697,10 +701,10 @@ static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem)
static int __init swiotlb_create_default_debugfs(void)
{
- struct io_tlb_mem *mem = io_tlb_default_mem;
+ struct io_tlb_mem *mem = &io_tlb_default_mem;
debugfs_dir = debugfs_create_dir("swiotlb", NULL);
- if (mem) {
+ if (mem->nslabs) {
mem->debugfs = debugfs_dir;
swiotlb_create_debugfs_files(mem);
}
@@ -754,10 +758,17 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
* to it.
*/
if (!mem) {
- mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
+ mem = kzalloc(sizeof(*mem), GFP_KERNEL);
if (!mem)
return -ENOMEM;
+ mem->slots = kzalloc(array_size(sizeof(*mem->slots), nslabs),
+ GFP_KERNEL);
+ if (!mem->slots) {
+ kfree(mem);
+ return -ENOMEM;
+ }
+
set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
rmem->size >> PAGE_SHIFT);
swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
@@ -781,7 +792,7 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
struct device *dev)
{
- dev->dma_io_tlb_mem = io_tlb_default_mem;
+ dev->dma_io_tlb_mem = &io_tlb_default_mem;
}
static const struct reserved_mem_ops rmem_swiotlb_ops = {
On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> FWIW I was pondering the question of whether to do something along those
> lines or just scrap the default assignment entirely, so since I hadn't got
> round to saying that I've gone ahead and hacked up the alternative
> (similarly untested) for comparison :)
>
> TBH I'm still not sure which one I prefer...
Claire did implement something like your suggestion originally, but
I don't really like it as it doesn't scale for adding multiple global
pools, e.g. for the 64-bit addressable one for the various encrypted
secure guest schemes.
On 2021-07-06 14:24, Will Deacon wrote:
> On Tue, Jul 06, 2021 at 06:48:48AM +0200, Christoph Hellwig wrote:
>> On Mon, Jul 05, 2021 at 08:03:52PM +0100, Will Deacon wrote:
>>> So at this point, the AMD IOMMU driver does:
>>>
>>> swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;
>>>
>>> where 'swiotlb' is a global variable indicating whether or not swiotlb
>>> is in use. It's picked up a bit later on by pci_swiotlb_late_init(), which
>>> will call swiotlb_exit() if 'swiotlb' is false.
>>>
>>> Now, that used to work fine, because swiotlb_exit() clears
>>> 'io_tlb_default_mem' to NULL, but now with the restricted DMA changes, I
>>> think that all the devices which have successfully probed beforehand will
>>> have stale pointers to the freed structure in their 'dev->dma_io_tlb_mem'
>>> field.
>>
>> Yeah. I don't think we can do that anymore, and I also think it is
>> a bad idea to start with.
>
> I've had a crack at reworking things along the following lines:
>
> - io_tlb_default_mem now lives in the BSS, the flexible array member
> is now a pointer and that part is allocated dynamically (downside of
> this is an extra indirection to get at the slots).
>
> - io_tlb_default_mem.nslabs tells you whether the thing is valid
>
> - swiotlb_exit() frees the slots array and clears the rest of the
> structure to 0. I also extended it to free the actual slabs, but I'm
> not sure why it wasn't doing that before.
>
> So a non-NULL dev->dma_io_tlb_mem should always be valid to follow.
FWIW I was pondering the question of whether to do something along those
lines or just scrap the default assignment entirely, so since I hadn't
got round to saying that I've gone ahead and hacked up the alternative
(similarly untested) for comparison :)
TBH I'm still not sure which one I prefer...
Robin.
----->8-----
diff --git a/drivers/base/core.c b/drivers/base/core.c
index ea5b85354526..394abf184c1a 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2847,9 +2847,6 @@ void device_initialize(struct device *dev)
defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
#endif
-#ifdef CONFIG_SWIOTLB
- dev->dma_io_tlb_mem = io_tlb_default_mem;
-#endif
}
EXPORT_SYMBOL_GPL(device_initialize);
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 39284ff2a6cd..620f16d89a98 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -107,16 +107,21 @@ struct io_tlb_mem {
};
extern struct io_tlb_mem *io_tlb_default_mem;
+static inline struct io_tlb_mem *dev_iotlb_mem(struct device *dev)
+{
+ return dev->dma_io_tlb_mem ?: io_tlb_default_mem;
+}
+
static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t
paddr)
{
- struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+ struct io_tlb_mem *mem = dev_iotlb_mem(dev);
return mem && paddr >= mem->start && paddr < mem->end;
}
static inline bool is_swiotlb_force_bounce(struct device *dev)
{
- struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+ struct io_tlb_mem *mem = dev_iotlb_mem(dev);
return mem && mem->force_bounce;
}
@@ -167,7 +172,7 @@ bool swiotlb_free(struct device *dev, struct page
*page, size_t size);
static inline bool is_swiotlb_for_alloc(struct device *dev)
{
- return dev->dma_io_tlb_mem->for_alloc;
+ return dev_iotlb_mem(dev)->for_alloc;
}
#else
static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b7f76bca89bf..f4942149f87d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -359,7 +359,7 @@ static unsigned int swiotlb_align_offset(struct
device *dev, u64 addr)
static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr,
size_t size,
enum dma_data_direction dir)
{
- struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+ struct io_tlb_mem *mem = dev_iotlb_mem(dev);
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
@@ -440,7 +440,7 @@ static unsigned int wrap_index(struct io_tlb_mem
*mem, unsigned int index)
static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size)
{
- struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+ struct io_tlb_mem *mem = dev_iotlb_mem(dev);
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -522,7 +522,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device
*dev, phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
{
- struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+ struct io_tlb_mem *mem = dev_iotlb_mem(dev);
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int i;
int index;
@@ -565,7 +565,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device
*dev, phys_addr_t orig_addr,
static void swiotlb_release_slots(struct device *dev, phys_addr_t
tlb_addr)
{
- struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+ struct io_tlb_mem *mem = dev_iotlb_mem(dev);
unsigned long flags;
unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
@@ -682,7 +682,7 @@ size_t swiotlb_max_mapping_size(struct device *dev)
bool is_swiotlb_active(struct device *dev)
{
- return dev->dma_io_tlb_mem != NULL;
+ return dev_iotlb_mem(dev) != NULL;
}
EXPORT_SYMBOL_GPL(is_swiotlb_active);
@@ -729,7 +729,7 @@ static void rmem_swiotlb_debugfs_init(struct
reserved_mem *rmem)
struct page *swiotlb_alloc(struct device *dev, size_t size)
{
- struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+ struct io_tlb_mem *mem = dev_iotlb_mem(dev);
phys_addr_t tlb_addr;
int index;
@@ -792,7 +792,7 @@ static int rmem_swiotlb_device_init(struct
reserved_mem *rmem,
static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
struct device *dev)
{
- dev->dma_io_tlb_mem = io_tlb_default_mem;
+ dev->dma_io_tlb_mem = NULL;
}
static const struct reserved_mem_ops rmem_swiotlb_ops = {
On Tue, Jul 06, 2021 at 04:05:13PM +0200, Christoph Hellwig wrote:
> On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> > FWIW I was pondering the question of whether to do something along those
> > lines or just scrap the default assignment entirely, so since I hadn't got
> > round to saying that I've gone ahead and hacked up the alternative
> > (similarly untested) for comparison :)
> >
> > TBH I'm still not sure which one I prefer...
>
> Claire did implement something like your suggestion originally, but
> I don't really like it as it doesn't scale for adding multiple global
> pools, e.g. for the 64-bit addressable one for the various encrypted
> secure guest schemes.
Couple of things:
- I am not pushing to Linus the Claire's patchset until we have a
resolution on this. I hope you all agree that is a sensible way
forward as much as I hate doing that.
- I like Robin's fix as it is simplest looking. Would love to see if it
does fix the problem.
- Christopher - we can always add multiple pools as the next milestone
and just focus on this feature getting tested extensively during this
release.
- Would it be worth (for future or maybe in another tiny fix) to also add
a printk in swiotlb when we de-allocate the buffer so when someone looks
through the `dmesg` it becomes much easier to diagnose issues?
On 2021-07-06 15:05, Christoph Hellwig wrote:
> On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
>> FWIW I was pondering the question of whether to do something along those
>> lines or just scrap the default assignment entirely, so since I hadn't got
>> round to saying that I've gone ahead and hacked up the alternative
>> (similarly untested) for comparison :)
>>
>> TBH I'm still not sure which one I prefer...
>
> Claire did implement something like your suggestion originally, but
> I don't really like it as it doesn't scale for adding multiple global
> pools, e.g. for the 64-bit addressable one for the various encrypted
> secure guest schemes.
Ah yes, that had slipped my mind, and it's a fair point indeed. Since
we're not concerned with a minimal fix for backports anyway I'm more
than happy to focus on Will's approach. Another thing is that that looks
to take us a quiet step closer to the possibility of dynamically
resizing a SWIOTLB pool, which is something that some of the hypervisor
protection schemes looking to build on top of this series may want to
explore at some point.
Robin.
On Tue, Jul 06, 2021 at 10:46:07AM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 06, 2021 at 04:05:13PM +0200, Christoph Hellwig wrote:
> > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> > > FWIW I was pondering the question of whether to do something along those
> > > lines or just scrap the default assignment entirely, so since I hadn't got
> > > round to saying that I've gone ahead and hacked up the alternative
> > > (similarly untested) for comparison :)
> > >
> > > TBH I'm still not sure which one I prefer...
> >
> > Claire did implement something like your suggestion originally, but
> > I don't really like it as it doesn't scale for adding multiple global
> > pools, e.g. for the 64-bit addressable one for the various encrypted
> > secure guest schemes.
>
> Couple of things:
> - I am not pushing to Linus the Claire's patchset until we have a
> resolution on this. I hope you all agree that is a sensible way
> forward as much as I hate doing that.
Sure, it's a pity but we could clearly use a bit more time to get these
just right and we've run out of time for 5.14.
I think the main question I have is how would you like to see patches for
5.15? i.e. as patches on top of devel/for-linus-5.14 or something else?
Cheers,
Will
On Tue, Jul 06, 2021 at 05:57:21PM +0100, Will Deacon wrote:
> On Tue, Jul 06, 2021 at 10:46:07AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Tue, Jul 06, 2021 at 04:05:13PM +0200, Christoph Hellwig wrote:
> > > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> > > > FWIW I was pondering the question of whether to do something along those
> > > > lines or just scrap the default assignment entirely, so since I hadn't got
> > > > round to saying that I've gone ahead and hacked up the alternative
> > > > (similarly untested) for comparison :)
> > > >
> > > > TBH I'm still not sure which one I prefer...
> > >
> > > Claire did implement something like your suggestion originally, but
> > > I don't really like it as it doesn't scale for adding multiple global
> > > pools, e.g. for the 64-bit addressable one for the various encrypted
> > > secure guest schemes.
> >
> > Couple of things:
> > - I am not pushing to Linus the Claire's patchset until we have a
> > resolution on this. I hope you all agree that is a sensible way
> > forward as much as I hate doing that.
>
> Sure, it's a pity but we could clearly use a bit more time to get these
> just right and we've run out of time for 5.14.
>
> I think the main question I have is how would you like to see patches for
> 5.15? i.e. as patches on top of devel/for-linus-5.14 or something else?
Yes that would be perfect. If there are any dependencies on the rc1, I
can rebase it on top of that.
>
> Cheers,
>
> Will
On Tue, Jul 06, 2021 at 04:39:11PM +0100, Robin Murphy wrote:
> On 2021-07-06 15:05, Christoph Hellwig wrote:
> > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> > > FWIW I was pondering the question of whether to do something along those
> > > lines or just scrap the default assignment entirely, so since I hadn't got
> > > round to saying that I've gone ahead and hacked up the alternative
> > > (similarly untested) for comparison :)
> > >
> > > TBH I'm still not sure which one I prefer...
> >
> > Claire did implement something like your suggestion originally, but
> > I don't really like it as it doesn't scale for adding multiple global
> > pools, e.g. for the 64-bit addressable one for the various encrypted
> > secure guest schemes.
>
> Ah yes, that had slipped my mind, and it's a fair point indeed. Since we're
> not concerned with a minimal fix for backports anyway I'm more than happy to
> focus on Will's approach. Another thing is that that looks to take us a
> quiet step closer to the possibility of dynamically resizing a SWIOTLB pool,
> which is something that some of the hypervisor protection schemes looking to
> build on top of this series may want to explore at some point.
Ok, I'll split that nasty diff I posted up into a reviewable series and we
can take it from there.
Will
Hi Will and Robin,
On 7/6/2021 10:06 AM, Will Deacon wrote:
> On Tue, Jul 06, 2021 at 04:39:11PM +0100, Robin Murphy wrote:
>> On 2021-07-06 15:05, Christoph Hellwig wrote:
>>> On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
>>>> FWIW I was pondering the question of whether to do something along those
>>>> lines or just scrap the default assignment entirely, so since I hadn't got
>>>> round to saying that I've gone ahead and hacked up the alternative
>>>> (similarly untested) for comparison :)
>>>>
>>>> TBH I'm still not sure which one I prefer...
>>>
>>> Claire did implement something like your suggestion originally, but
>>> I don't really like it as it doesn't scale for adding multiple global
>>> pools, e.g. for the 64-bit addressable one for the various encrypted
>>> secure guest schemes.
>>
>> Ah yes, that had slipped my mind, and it's a fair point indeed. Since we're
>> not concerned with a minimal fix for backports anyway I'm more than happy to
>> focus on Will's approach. Another thing is that that looks to take us a
>> quiet step closer to the possibility of dynamically resizing a SWIOTLB pool,
>> which is something that some of the hypervisor protection schemes looking to
>> build on top of this series may want to explore at some point.
>
> Ok, I'll split that nasty diff I posted up into a reviewable series and we
> can take it from there.
For what it's worth, I attempted to boot Will's diff on top of Konrad's
devel/for-linus-5.14 and it did not work; in fact, I got no output on my
monitor period, even with earlyprintk=, and I do not think this machine
has a serial console.
Robin's fix does work, it survived ten reboots with no issues getting to
X and I do not see the KASAN and slub debug messages anymore but I
understand that this is not the preferred solution it seems (although
Konrad did want to know if it works).
I am happy to test any further patches or follow ups as needed, just
keep me on CC.
Cheers,
Nathan
On Tue, Jul 06, 2021 at 12:14:16PM -0700, Nathan Chancellor wrote:
> On 7/6/2021 10:06 AM, Will Deacon wrote:
> > On Tue, Jul 06, 2021 at 04:39:11PM +0100, Robin Murphy wrote:
> > > On 2021-07-06 15:05, Christoph Hellwig wrote:
> > > > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> > > > > FWIW I was pondering the question of whether to do something along those
> > > > > lines or just scrap the default assignment entirely, so since I hadn't got
> > > > > round to saying that I've gone ahead and hacked up the alternative
> > > > > (similarly untested) for comparison :)
> > > > >
> > > > > TBH I'm still not sure which one I prefer...
> > > >
> > > > Claire did implement something like your suggestion originally, but
> > > > I don't really like it as it doesn't scale for adding multiple global
> > > > pools, e.g. for the 64-bit addressable one for the various encrypted
> > > > secure guest schemes.
> > >
> > > Ah yes, that had slipped my mind, and it's a fair point indeed. Since we're
> > > not concerned with a minimal fix for backports anyway I'm more than happy to
> > > focus on Will's approach. Another thing is that that looks to take us a
> > > quiet step closer to the possibility of dynamically resizing a SWIOTLB pool,
> > > which is something that some of the hypervisor protection schemes looking to
> > > build on top of this series may want to explore at some point.
> >
> > Ok, I'll split that nasty diff I posted up into a reviewable series and we
> > can take it from there.
>
> For what it's worth, I attempted to boot Will's diff on top of Konrad's
> devel/for-linus-5.14 and it did not work; in fact, I got no output on my
> monitor period, even with earlyprintk=, and I do not think this machine has
> a serial console.
Looking back at the diff, I completely messed up swiotlb_exit() by mixing up
physical and virtual addresses.
> Robin's fix does work, it survived ten reboots with no issues getting to X
> and I do not see the KASAN and slub debug messages anymore but I understand
> that this is not the preferred solution it seems (although Konrad did want
> to know if it works).
>
> I am happy to test any further patches or follow ups as needed, just keep me
> on CC.
Cheers. Since this isn't 5.14 material any more, I'll CC you on a series
next week.
Will
On Tue, Jul 06, 2021 at 12:59:57PM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 06, 2021 at 05:57:21PM +0100, Will Deacon wrote:
> > On Tue, Jul 06, 2021 at 10:46:07AM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Tue, Jul 06, 2021 at 04:05:13PM +0200, Christoph Hellwig wrote:
> > > > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> > > > > FWIW I was pondering the question of whether to do something along those
> > > > > lines or just scrap the default assignment entirely, so since I hadn't got
> > > > > round to saying that I've gone ahead and hacked up the alternative
> > > > > (similarly untested) for comparison :)
> > > > >
> > > > > TBH I'm still not sure which one I prefer...
> > > >
> > > > Claire did implement something like your suggestion originally, but
> > > > I don't really like it as it doesn't scale for adding multiple global
> > > > pools, e.g. for the 64-bit addressable one for the various encrypted
> > > > secure guest schemes.
> > >
> > > Couple of things:
> > > - I am not pushing to Linus the Claire's patchset until we have a
> > > resolution on this. I hope you all agree that is a sensible way
> > > forward as much as I hate doing that.
> >
> > Sure, it's a pity but we could clearly use a bit more time to get these
> > just right and we've run out of time for 5.14.
> >
> > I think the main question I have is how would you like to see patches for
> > 5.15? i.e. as patches on top of devel/for-linus-5.14 or something else?
>
> Yes that would be perfect. If there are any dependencies on the rc1, I
> can rebase it on top of that.
Yes, please, rebasing would be very helpful. The broader rework of
'io_tlb_default_mem' is going to conflict quite badly otherwise.
Cheers,
Will
..snip..
> > > I think the main question I have is how would you like to see patches for
> > > 5.15? i.e. as patches on top of devel/for-linus-5.14 or something else?
> >
> > Yes that would be perfect. If there are any dependencies on the rc1, I
> > can rebase it on top of that.
>
> Yes, please, rebasing would be very helpful. The broader rework of
> 'io_tlb_default_mem' is going to conflict quite badly otherwise.
There is a devel/for-linus-5.15 (based on v5.14-rc1) now.
Thank you!
>
> Cheers,
>
> Will
Hi Claire,
On Thu, Jun 24, 2021 at 11:55:24PM +0800, Claire Chang wrote:
> Add the initialization function to create restricted DMA pools from
> matching reserved-memory nodes.
>
> Regardless of swiotlb setting, the restricted DMA pool is preferred if
> available.
>
> The restricted DMA pools provide a basic level of protection against the
> DMA overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system
> needs to provide a way to lock down the memory access, e.g., MPU.
>
> Signed-off-by: Claire Chang <[email protected]>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Tested-by: Stefano Stabellini <[email protected]>
> Tested-by: Will Deacon <[email protected]>
> ---
> include/linux/swiotlb.h | 3 +-
> kernel/dma/Kconfig | 14 ++++++++
> kernel/dma/swiotlb.c | 76 +++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 92 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 3b9454d1e498..39284ff2a6cd 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -73,7 +73,8 @@ extern enum swiotlb_force swiotlb_force;
> * range check to see if the memory was in fact allocated by this
> * API.
> * @nslabs: The number of IO TLB blocks (in groups of 64) between @start and
> - * @end. This is command line adjustable via setup_io_tlb_npages.
> + * @end. For default swiotlb, this is command line adjustable via
> + * setup_io_tlb_npages.
> * @used: The number of used IO TLB block.
> * @list: The free list describing the number of free entries available
> * from each index.
> diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> index 77b405508743..3e961dc39634 100644
> --- a/kernel/dma/Kconfig
> +++ b/kernel/dma/Kconfig
> @@ -80,6 +80,20 @@ config SWIOTLB
> bool
> select NEED_DMA_MAP_STATE
>
> +config DMA_RESTRICTED_POOL
> + bool "DMA Restricted Pool"
> + depends on OF && OF_RESERVED_MEM
> + select SWIOTLB
This makes SWIOTLB user configurable, which in turn results in
mips64-linux-ld: arch/mips/kernel/setup.o: in function `arch_mem_init':
setup.c:(.init.text+0x19c8): undefined reference to `plat_swiotlb_setup'
make[1]: *** [Makefile:1280: vmlinux] Error 1
when building mips:allmodconfig.
Should this possibly be "depends on SWIOTLB" ?
Thanks,
Guenter
On Tue, Aug 24, 2021 at 10:26 PM Guenter Roeck <[email protected]> wrote:
>
> Hi Claire,
>
> On Thu, Jun 24, 2021 at 11:55:24PM +0800, Claire Chang wrote:
> > Add the initialization function to create restricted DMA pools from
> > matching reserved-memory nodes.
> >
> > Regardless of swiotlb setting, the restricted DMA pool is preferred if
> > available.
> >
> > The restricted DMA pools provide a basic level of protection against the
> > DMA overwriting buffer contents at unexpected times. However, to protect
> > against general data leakage and system memory corruption, the system
> > needs to provide a way to lock down the memory access, e.g., MPU.
> >
> > Signed-off-by: Claire Chang <[email protected]>
> > Reviewed-by: Christoph Hellwig <[email protected]>
> > Tested-by: Stefano Stabellini <[email protected]>
> > Tested-by: Will Deacon <[email protected]>
> > ---
> > include/linux/swiotlb.h | 3 +-
> > kernel/dma/Kconfig | 14 ++++++++
> > kernel/dma/swiotlb.c | 76 +++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 92 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > index 3b9454d1e498..39284ff2a6cd 100644
> > --- a/include/linux/swiotlb.h
> > +++ b/include/linux/swiotlb.h
> > @@ -73,7 +73,8 @@ extern enum swiotlb_force swiotlb_force;
> > * range check to see if the memory was in fact allocated by this
> > * API.
> > * @nslabs: The number of IO TLB blocks (in groups of 64) between @start and
> > - * @end. This is command line adjustable via setup_io_tlb_npages.
> > + * @end. For default swiotlb, this is command line adjustable via
> > + * setup_io_tlb_npages.
> > * @used: The number of used IO TLB block.
> > * @list: The free list describing the number of free entries available
> > * from each index.
> > diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> > index 77b405508743..3e961dc39634 100644
> > --- a/kernel/dma/Kconfig
> > +++ b/kernel/dma/Kconfig
> > @@ -80,6 +80,20 @@ config SWIOTLB
> > bool
> > select NEED_DMA_MAP_STATE
> >
> > +config DMA_RESTRICTED_POOL
> > + bool "DMA Restricted Pool"
> > + depends on OF && OF_RESERVED_MEM
> > + select SWIOTLB
>
> This makes SWIOTLB user configurable, which in turn results in
>
> mips64-linux-ld: arch/mips/kernel/setup.o: in function `arch_mem_init':
> setup.c:(.init.text+0x19c8): undefined reference to `plat_swiotlb_setup'
> make[1]: *** [Makefile:1280: vmlinux] Error 1
>
> when building mips:allmodconfig.
>
> Should this possibly be "depends on SWIOTLB" ?
Patch is sent here: https://lkml.org/lkml/2021/8/26/932
>
> Thanks,
> Guenter
Thanks,
Claire
On Thu, Jun 24, 2021 at 6:59 PM Claire Chang <[email protected]> wrote:
>
> Add the initialization function to create restricted DMA pools from
> matching reserved-memory nodes.
>
> Regardless of swiotlb setting, the restricted DMA pool is preferred if
> available.
>
> The restricted DMA pools provide a basic level of protection against the
> DMA overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system
> needs to provide a way to lock down the memory access, e.g., MPU.
> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> + struct device *dev)
> +{
> + struct io_tlb_mem *mem = rmem->priv;
> + unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
> +
> + /*
> + * Since multiple devices can share the same pool, the private data,
> + * io_tlb_mem struct, will be initialized by the first device attached
> + * to it.
> + */
> + if (!mem) {
Can it be rather
if (mem)
goto out_assign;
or so?
> + mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
> + if (!mem)
> + return -ENOMEM;
> +
> + set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
> + rmem->size >> PAGE_SHIFT);
Below you are using a macro from pfn.h, but not here, I think it's PFN_DOWN().
> + swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
> + mem->force_bounce = true;
> + mem->for_alloc = true;
> +
> + rmem->priv = mem;
> +
> + if (IS_ENABLED(CONFIG_DEBUG_FS)) {
> + mem->debugfs =
> + debugfs_create_dir(rmem->name, debugfs_dir);
> + swiotlb_create_debugfs_files(mem);
> + }
> + }
> +
> + dev->dma_io_tlb_mem = mem;
> +
> + return 0;
> +}
> +
> +static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
> + struct device *dev)
> +{
> + dev->dma_io_tlb_mem = io_tlb_default_mem;
> +}
> +
> +static const struct reserved_mem_ops rmem_swiotlb_ops = {
> + .device_init = rmem_swiotlb_device_init,
> + .device_release = rmem_swiotlb_device_release,
> +};
> +
> +static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
> +{
> + unsigned long node = rmem->fdt_node;
> +
> + if (of_get_flat_dt_prop(node, "reusable", NULL) ||
> + of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
> + of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
> + of_get_flat_dt_prop(node, "no-map", NULL))
> + return -EINVAL;
> +
> + if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base)))) {
> + pr_err("Restricted DMA pool must be accessible within the linear mapping.");
> + return -EINVAL;
> + }
> +
> + rmem->ops = &rmem_swiotlb_ops;
> + pr_info("Reserved memory: created restricted DMA pool at %pa, size %ld MiB\n",
> + &rmem->base, (unsigned long)rmem->size / SZ_1M);
Oh là là, besides explicit casting that I believe can be avoided, %ld
!= unsigned long. Can you check the printk-formats.rst document?
> + return 0;
> +}
> +
> +RESERVEDMEM_OF_DECLARE(dma, "restricted-dma-pool", rmem_swiotlb_setup);
> #endif /* CONFIG_DMA_RESTRICTED_POOL */
--
With Best Regards,
Andy Shevchenko