From: Petr Tesarik <[email protected]>
The goal of my work is to provide more flexibility in the sizing of
SWIOTLB.
The software IO TLB was designed with these assumptions:
1. It would not be used much, especially on 64-bit systems.
2. A small fixed memory area (64 MiB by default) is sufficient to
handle the few cases which require a bounce buffer.
3. 64 MiB is little enough that it has no impact on the rest of the
system.
First, if SEV is active, all DMA must be done through shared
unencrypted pages, and SWIOTLB is used to make this happen without
changing device drivers. The software IO TLB size is increased to
6% of total memory in sev_setup_arch(), but that is more of an
approximation. The actual requirements may vary depending on the
amount of I/O and which drivers are used. These factors may not be
know at boot time, i.e. when SWIOTLB is allocated.
Second, other colleagues have noticed that they can reliably get
rid of occasional OOM kills on an Arm embedded device by reducing
the SWIOTLB size. This can be achieved with a kernel parameter, but
determining the right value puts additional burden on pre-release
testing, which could be avoided if SWIOTLB is allocated small and
grows only when necessary.
Changes from v1-devel-v7:
- Add comments to acquire/release barriers
- Fix whitespace issues reported by checkpatch.pl
Changes from v1-devel-v6:
- Provide long description of functions
- Fix kernel-doc (Returns: to Return:)
- Rename __lookup_dyn_slot() to lookup_dyn_slot_locked()
Changes from RFC:
- Track dynamic buffers per device instead of per swiotlb
- Use a linked list instead of a maple tree
- Move initialization of swiotlb fields of struct device to a
helper function
- Rename __lookup_dyn_slot() to lookup_dyn_slot_locked()
- Introduce per-device flag if dynamic buffers are in use
- Add one more user of DMA_ATTR_MAY_SLEEP
- Add kernel-doc comments for new (and some old) code
- Properly escape '*' in dma-attributes.rst
Petr Tesarik (7):
swiotlb: Use a helper to initialize swiotlb fields in struct device
swiotlb: Move code around in preparation for dynamic bounce buffers
dma-mapping: introduce the DMA_ATTR_MAY_SLEEP attribute
swiotlb: Dynamically allocated bounce buffers
swiotlb: Add a boot option to enable dynamic bounce buffers
drm: Use DMA_ATTR_MAY_SLEEP from process context
swiotlb: per-device flag if there are dynamically allocated buffers
.../admin-guide/kernel-parameters.txt | 6 +-
Documentation/core-api/dma-attributes.rst | 10 +
drivers/base/core.c | 4 +-
drivers/gpu/drm/drm_gem_shmem_helper.c | 2 +-
drivers/gpu/drm/drm_prime.c | 2 +-
include/linux/device.h | 12 +
include/linux/dma-mapping.h | 6 +
include/linux/swiotlb.h | 54 ++-
kernel/dma/swiotlb.c | 382 ++++++++++++++++--
9 files changed, 443 insertions(+), 35 deletions(-)
--
2.25.1
From: Petr Tesarik <[email protected]>
Move swiotlb initialization code to swiotlb.h. This change also
allows to provide a stub implementation if swiotlb is not
configured, getting rid of an #ifdef in driver core.
Signed-off-by: Petr Tesarik <[email protected]>
---
drivers/base/core.c | 4 +---
include/linux/swiotlb.h | 12 ++++++++++++
2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 6878dfcbf0d6..a5dc7c673102 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3060,9 +3060,7 @@ void device_initialize(struct device *dev)
defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
#endif
-#ifdef CONFIG_SWIOTLB
- dev->dma_io_tlb_mem = &io_tlb_default_mem;
-#endif
+ swiotlb_dev_init(dev);
}
EXPORT_SYMBOL_GPL(device_initialize);
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index bcef10e20ea4..b65b7330f7e5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -119,6 +119,15 @@ static inline bool is_swiotlb_force_bounce(struct device *dev)
return mem && mem->force_bounce;
}
+/**
+ * swiotlb_dev_init() - initialize swiotlb fields in &struct device
+ * @dev: device to be initialized
+ */
+static inline void swiotlb_dev_init(struct device *dev)
+{
+ dev->dma_io_tlb_mem = &io_tlb_default_mem;
+}
+
void swiotlb_init(bool addressing_limited, unsigned int flags);
void __init swiotlb_exit(void);
size_t swiotlb_max_mapping_size(struct device *dev);
@@ -128,6 +137,9 @@ void __init swiotlb_adjust_size(unsigned long size);
static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
{
}
+static inline void swiotlb_dev_init(struct device *dev)
+{
+}
static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
{
return false;
--
2.25.1
From: Petr Tesarik <[email protected]>
To prepare for the introduction of dynamically allocated bounce
buffers, separate out common code and code which handles non-dynamic
(aka fixed) bounce buffers.
No functional change, but this commit should make the addition of
dynamic allocations easier to review.
Signed-off-by: Petr Tesarik <[email protected]>
---
include/linux/swiotlb.h | 31 ++++++++++-
kernel/dma/swiotlb.c | 110 +++++++++++++++++++++++++++++++++-------
2 files changed, 122 insertions(+), 19 deletions(-)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index b65b7330f7e5..a0bb6b30b17c 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -105,11 +105,40 @@ struct io_tlb_mem {
};
extern struct io_tlb_mem io_tlb_default_mem;
+/**
+ * is_swiotlb_fixed() - check if a physical address belongs to a swiotlb slot
+ * @mem: relevant swiotlb pool
+ * @paddr: physical address within the DMA buffer
+ *
+ * Check if @paddr points into a fixed bounce buffer slot.
+ * This check should be as fast as possible.
+ *
+ * Return:
+ * * %true if @paddr points into a @mem fixed slot
+ * * %false otherwise
+ */
+static inline bool is_swiotlb_fixed(struct io_tlb_mem *mem, phys_addr_t paddr)
+{
+ return paddr >= mem->start && paddr < mem->end;
+}
+
+/**
+ * is_swiotlb_buffer() - check if a physical address is allocated from the
+ * swiotlb pool
+ * @dev: device which has mapped the buffer
+ * @paddr: physical address within the DMA buffer
+ *
+ * Check if @paddr points into a bounce buffer.
+ *
+ * Return:
+ * * %true if @paddr points into a bounce buffer
+ * * %false otherwise
+ */
static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
{
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
- return mem && paddr >= mem->start && paddr < mem->end;
+ return mem && is_swiotlb_fixed(mem, paddr);
}
static inline bool is_swiotlb_force_bounce(struct device *dev)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index dac42a2ad588..2f09a4ed4215 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -78,6 +78,10 @@ phys_addr_t swiotlb_unencrypted_base;
static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
static unsigned long default_nareas;
+static void swiotlb_copy(struct device *dev, phys_addr_t orig_addr,
+ unsigned char *vaddr, size_t size, size_t alloc_size,
+ unsigned int tlb_offset, enum dma_data_direction dir);
+
/**
* struct io_tlb_area - IO TLB memory area descriptor
*
@@ -523,7 +527,6 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
- unsigned long pfn = PFN_DOWN(orig_addr);
unsigned char *vaddr = mem->vaddr + tlb_addr - mem->start;
unsigned int tlb_offset, orig_addr_offset;
@@ -540,6 +543,34 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
}
tlb_offset -= orig_addr_offset;
+ swiotlb_copy(dev, orig_addr, vaddr, size, alloc_size, tlb_offset, dir);
+}
+
+/**
+ * swiotlb_copy() - copy swiotlb buffer content, checking for overflows.
+ * @dev: device which has mapped the bounce buffer
+ * @orig_addr: physical address of the original buffer
+ * @vaddr: virtual address inside the bounce buffer
+ * @size: number of bytes to copy
+ * @alloc_size: total allocated size of the bounce buffer
+ * @tlb_offset: offset within the bounce buffer
+ * @dir: direction of the data transfer
+ *
+ * If @dir is %DMA_TO_DEVICE, copy data from the original buffer to the
+ * bounce buffer, otherwise copy from the bounce buffer to the original
+ * buffer.
+ *
+ * The original buffer may be in high memory; that's why @orig_addr is
+ * a physical address. Note that this is the address of the beginning
+ * of the bounce buffer. Copying starts at offset @tlb_offset. This is
+ * needed to check accesses beyond the allocated size.
+ */
+static void swiotlb_copy(struct device *dev, phys_addr_t orig_addr,
+ unsigned char *vaddr, size_t size, size_t alloc_size,
+ unsigned int tlb_offset, enum dma_data_direction dir)
+{
+ unsigned long pfn = PFN_DOWN(orig_addr);
+
if (tlb_offset > alloc_size) {
dev_WARN_ONCE(dev, 1,
"Buffer overflow detected. Allocation size: %zu. Mapping size: %zu+%u.\n",
@@ -734,15 +765,65 @@ static unsigned long mem_used(struct io_tlb_mem *mem)
return used;
}
+/**
+ * swiotlb_fixed_map() - allocate a bounce buffer from fixed slots
+ * @dev: device which maps the buffer
+ * @orig_addr: address of the original buffer
+ * @alloc_size: total size of the original buffer
+ * @alloc_align_mask:
+ * required physical alignment of the I/O buffer
+ * @attrs: optional DMA attributes for the map operation
+ *
+ * Search for a suitable slot or sequence of slots and initialize them
+ * for use with the original buffer.
+ *
+ * Return: Physical address of the bounce buffer, or %DMA_MAPPING_ERROR.
+ */
+static phys_addr_t swiotlb_fixed_map(struct device *dev, phys_addr_t orig_addr,
+ size_t alloc_size, unsigned int alloc_align_mask,
+ unsigned long attrs)
+{
+ struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+ unsigned int offset = swiotlb_align_offset(dev, orig_addr);
+ int index = swiotlb_find_slots(dev, orig_addr,
+ alloc_size + offset, alloc_align_mask);
+ unsigned int i;
+
+ if (index == -1)
+ return (phys_addr_t)DMA_MAPPING_ERROR;
+
+ /*
+ * Save away the mapping from the original address to the DMA address.
+ * This is needed when we sync the memory. Then we sync the buffer if
+ * needed.
+ */
+ for (i = 0; i < nr_slots(alloc_size + offset); i++)
+ mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
+ return slot_addr(mem->start, index) + offset;
+}
+
+/**
+ * swiotlb_tbl_map_single() - map DMA buffer to a bounce buffer
+ * @dev: device which maps the buffer
+ * @orig_addr: address of the original buffer
+ * @mapping_size: size of the original buffer to be synced now
+ * @alloc_size: total size of the original buffer
+ * @alloc_align_mask:
+ * required physical alignment of the I/O buffer
+ * @dir: direction of the data transfer
+ * @attrs: optional DMA attributes for the map operation
+ *
+ * Create a mapping of the DMA buffer into a bounce buffer and copy the
+ * original data.
+ *
+ * Return: Physical address of the bounce buffer, or %DMA_MAPPING_ERROR.
+ */
phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
unsigned int alloc_align_mask, enum dma_data_direction dir,
unsigned long attrs)
{
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
- unsigned int offset = swiotlb_align_offset(dev, orig_addr);
- unsigned int i;
- int index;
phys_addr_t tlb_addr;
if (!mem || !mem->nslabs) {
@@ -760,24 +841,17 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
return (phys_addr_t)DMA_MAPPING_ERROR;
}
- index = swiotlb_find_slots(dev, orig_addr,
- alloc_size + offset, alloc_align_mask);
- if (index == -1) {
+ tlb_addr = swiotlb_fixed_map(dev, orig_addr, alloc_size,
+ alloc_align_mask, attrs);
+
+ if (tlb_addr == (phys_addr_t)DMA_MAPPING_ERROR) {
if (!(attrs & DMA_ATTR_NO_WARN))
dev_warn_ratelimited(dev,
- "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
- alloc_size, mem->nslabs, mem_used(mem));
- return (phys_addr_t)DMA_MAPPING_ERROR;
+ "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
+ alloc_size, mem->nslabs, mem_used(mem));
+ return tlb_addr;
}
- /*
- * Save away the mapping from the original address to the DMA address.
- * This is needed when we sync the memory. Then we sync the buffer if
- * needed.
- */
- for (i = 0; i < nr_slots(alloc_size + offset); i++)
- mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
- tlb_addr = slot_addr(mem->start, index) + offset;
/*
* When dir == DMA_FROM_DEVICE we could omit the copy from the orig
* to the tlb buffer, if we knew for sure the device will
--
2.25.1
From: Petr Tesarik <[email protected]>
Introduce a DMA attribute to tell the DMA-mapping subsystem that
the operation is allowed to sleep.
This patch merely adds the flag, but it does not do anything at
the moment.
Signed-off-by: Petr Tesarik <[email protected]>
---
Documentation/core-api/dma-attributes.rst | 10 ++++++++++
include/linux/dma-mapping.h | 6 ++++++
2 files changed, 16 insertions(+)
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 1887d92e8e92..9ce00926455f 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,3 +130,13 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
+
+DMA_ATTR_MAY_SLEEP
+------------------
+
+This tells the DMA-mapping subsystem that it is allowed to sleep. For example,
+if mapping needs a bounce buffer, software IO TLB may use CMA for the
+allocation if this flag is given.
+
+This attribute is not used for dma_alloc\* functions. Instead, the provided
+GFP flags are used to determine whether the allocation may sleep.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 0ee20b764000..7a75c503ac38 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,6 +61,12 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)
+/*
+ * DMA_ATTR_MAY_SLEEP: This tells the DMA-mapping subsystem that it is allowed
+ * to sleep.
+ */
+#define DMA_ATTR_MAY_SLEEP (1UL << 10)
+
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
--
2.25.1
From: Petr Tesarik <[email protected]>
The software IO TLB was designed with the assumption that it is not
used much, especially on 64-bit systems, so a small fixed memory
area (currently 64 MiB) is sufficient to handle the few cases which
still require a bounce buffer. However, these cases are not so rare
in some circumstances.
First, if SEV is active, all DMA must be done through shared
unencrypted pages, and SWIOTLB is used to make this happen without
changing device drivers. The software IO TLB size is increased to 6%
of total memory in sev_setup_arch(), but that is more of an
approximation. The actual requirements may vary depending on which
drivers are used and the amount of I/O.
Second, some embedded devices have very little RAM, so 64 MiB is not
negligible. Sadly, these are exactly the devices that also often
need a software IO TLB. Although minimum swiotlb size can be found
empirically by extensive testing, it would be easier to allocate a
small swiotlb at boot and let it grow on demand.
Growing the SWIOTLB data structures at run time is impossible. The
whole SWIOTLB region is contiguous in physical memory to allow
combining adjacent slots and also to ensure that alignment
constraints can be met. The SWIOTLB is too big for the buddy
allocator (cf. MAX_ORDER). More importantly, even if a new SWIOTLB
could be allocated (e.g. from CMA), it cannot be extended in-place
(because surrounding pages may be already allocated for other
purposes), and there is no mechanism for relocating already mapped
bounce buffers: The DMA API gets only the address of a buffer, and
the implementation (direct or IOMMU) checks whether it belongs to
the software IO TLB.
It is possible to allocate multiple smaller struct io_tlb_mem
instances. However, they would have to be stored in a non-constant
container (list or tree), which needs synchronization between
readers and writers, creating contention in a hot path for all
devices, not only those which need software IO TLB.
Another option is to allocate a very large SWIOTLB at boot, but
allow migrating pages to other users (like CMA does). This approach
might work, but there are many open issues:
1. After a page is migrated away from SWIOTLB, it must not be used
as a (direct) DMA buffer. Otherwise SWIOTLB code would have to
check which pages have been migrated to determine whether a given
buffer address belongs to a bounce buffer or not, effectively
introducing all the issues of multiple SWIOTLB instances.
2. Unlike SWIOTLB, CMA cannot be used from atomic contexts, and that
for many different reasons. This might be changed in theory, but
it would take a lot of investigation and time. OTOH improvement
to the SWIOTLB is needed now.
3. If SWIOTLB is implemented separately from CMA and not as its
part, users have to solve the dilemma of how to distribute
precious DMA-able memory between them.
The present patch is a simplistic solution. Bounce buffers are
allocated using the non-coherent DMA API, removing the need to
implement a new custom allocator. These buffers are then tracked in
a per-device linked list, reducing the impact on devices that do not
need the SWIOTLB.
Analysis of real-world I/O patterns has shown that the same buffer
is typically looked up repeatedly (for further sync operations, or
to be unmapped). The most recently used bounce buffer is therefore
always moved to the beginning of the list. The list performed better
than a maple tree when tested with fio against a QEMU SATA drive
backed by a RAM block device in the host (4 cores, 16 iodepth).
Other scenarios are also likely to benefit from this MRU order but
have not been tested.
Operations on the list are serialized with a spinlock. It is
unfortunately not possible to use an RCU list, because quiescent
state is not guaranteed to happen before an asynchronous event (e.g.
hardware interrupt) on another CPU. If that CPU used an old version
of the list, it would fail to copy data to and/or from the newly
allocated bounce buffer.
Last but not least, bounce buffers are never allocated dynamically
if the SWIOTLB is in fact a DMA restricted pool, because that would
defeat the purpose of a restricted pool.
Signed-off-by: Petr Tesarik <[email protected]>
---
include/linux/device.h | 8 ++
include/linux/swiotlb.h | 8 +-
kernel/dma/swiotlb.c | 252 ++++++++++++++++++++++++++++++++++++++--
3 files changed, 259 insertions(+), 9 deletions(-)
diff --git a/include/linux/device.h b/include/linux/device.h
index 1508e637bb26..e12d6092bb9c 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -505,6 +505,12 @@ struct device_physical_location {
* @dma_mem: Internal for coherent mem override.
* @cma_area: Contiguous memory area for dma allocations
* @dma_io_tlb_mem: Pointer to the swiotlb pool used. Not for driver use.
+ * @dma_io_tlb_dyn_lock:
+ * Spinlock to protect the list of dynamically allocated bounce
+ * buffers.
+ * @dma_io_tlb_dyn_slots:
+ * Dynamically allocated bounce buffers for this device.
+ * Not for driver use.
* @archdata: For arch-specific additions.
* @of_node: Associated device tree node.
* @fwnode: Associated device node supplied by platform firmware.
@@ -610,6 +616,8 @@ struct device {
#endif
#ifdef CONFIG_SWIOTLB
struct io_tlb_mem *dma_io_tlb_mem;
+ spinlock_t dma_io_tlb_dyn_lock;
+ struct list_head dma_io_tlb_dyn_slots;
#endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index a0bb6b30b17c..0856eddb9063 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -105,6 +105,8 @@ struct io_tlb_mem {
};
extern struct io_tlb_mem io_tlb_default_mem;
+bool is_swiotlb_dyn(struct device *dev, phys_addr_t paddr);
+
/**
* is_swiotlb_fixed() - check if a physical address belongs to a swiotlb slot
* @mem: relevant swiotlb pool
@@ -138,7 +140,9 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
{
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
- return mem && is_swiotlb_fixed(mem, paddr);
+ return mem &&
+ (is_swiotlb_fixed(mem, paddr) ||
+ is_swiotlb_dyn(dev, paddr));
}
static inline bool is_swiotlb_force_bounce(struct device *dev)
@@ -155,6 +159,8 @@ static inline bool is_swiotlb_force_bounce(struct device *dev)
static inline void swiotlb_dev_init(struct device *dev)
{
dev->dma_io_tlb_mem = &io_tlb_default_mem;
+ spin_lock_init(&dev->dma_io_tlb_dyn_lock);
+ INIT_LIST_HEAD(&dev->dma_io_tlb_dyn_slots);
}
void swiotlb_init(bool addressing_limited, unsigned int flags);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 2f09a4ed4215..f4faee38ead9 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -68,6 +68,22 @@ struct io_tlb_slot {
unsigned int list;
};
+/**
+ * struct io_tlb_dyn_slot - dynamically allocated swiotlb slot
+ * @node: node in the per-device list
+ * @orig_addr: physical address of the original DMA buffer
+ * @alloc_size: total size of the DMA buffer
+ * @page: first page of the bounce buffer
+ * @dma_addr: DMA address of the bounce buffer
+ */
+struct io_tlb_dyn_slot {
+ struct list_head node;
+ phys_addr_t orig_addr;
+ size_t alloc_size;
+ struct page *page;
+ dma_addr_t dma_addr;
+};
+
static bool swiotlb_force_bounce;
static bool swiotlb_force_disable;
@@ -509,6 +525,191 @@ void __init swiotlb_exit(void)
memset(mem, 0, sizeof(*mem));
}
+/**
+ * lookup_dyn_slot_locked() - look up a dynamically allocated bounce buffer
+ * @dev: device which has mapped the buffer
+ * @paddr: physical address within the bounce buffer
+ *
+ * Walk through the list of bounce buffers dynamically allocated for @dev,
+ * looking for a buffer which contains @paddr.
+ *
+ * Context: Any context. Expects dma_io_tlb_dyn_lock lock to be held.
+ * Return:
+ * Address of a &struct io_tlb_dyn_slot, or %NULL if not found.
+ */
+static struct io_tlb_dyn_slot *lookup_dyn_slot_locked(struct device *dev,
+ phys_addr_t paddr)
+{
+ struct io_tlb_dyn_slot *slot;
+
+ list_for_each_entry(slot, &dev->dma_io_tlb_dyn_slots, node) {
+ phys_addr_t start = page_to_phys(slot->page);
+ phys_addr_t end = start + slot->alloc_size - 1;
+
+ if (start <= paddr && paddr <= end)
+ return slot;
+ }
+ return NULL;
+}
+
+/**
+ * lookup_dyn_slot() - look up a dynamically allocated bounce buffer
+ * @dev: device which has mapped the buffer
+ * @paddr: physical address within the bounce buffer
+ *
+ * Search for a dynamically allocated bounce buffer which contains
+ * @paddr. If found, the buffer is moved to the beginning of the linked
+ * list. Use lookup_dyn_slot_locked() directly where this behavior is not
+ * required/desired.
+ *
+ * Context: Any context. Takes and releases dma_io_tlb_dyn_lock.
+ * Return:
+ * Address of a &struct io_tlb_dyn_slot, or %NULL if not found.
+ */
+static struct io_tlb_dyn_slot *lookup_dyn_slot(struct device *dev,
+ phys_addr_t paddr)
+{
+ struct io_tlb_dyn_slot *slot;
+ unsigned long flags;
+
+ spin_lock_irqsave(&dev->dma_io_tlb_dyn_lock, flags);
+ slot = lookup_dyn_slot_locked(dev, paddr);
+ list_move(&slot->node, &dev->dma_io_tlb_dyn_slots);
+ spin_unlock_irqrestore(&dev->dma_io_tlb_dyn_lock, flags);
+ return slot;
+}
+
+/**
+ * is_swiotlb_dyn() - check if a physical address belongs to a dynamically
+ * allocated bounce buffer
+ * @dev: device which has mapped the buffer
+ * @paddr: physical address within the bounce buffer
+ *
+ * Check whether there is a dynamically allocated bounce buffer which
+ * contains @paddr. If found, the buffer is moved to the beginning of
+ * the MRU list.
+ *
+ * Return:
+ * * %true if @paddr points into a dynamically allocated bounce buffer
+ * * %false otherwise
+ */
+bool is_swiotlb_dyn(struct device *dev, phys_addr_t paddr)
+{
+ return !!lookup_dyn_slot(dev, paddr);
+}
+
+/**
+ * swiotlb_dyn_bounce() - copy a dynamically allocated buffer from or back
+ * to the original dma location
+ * @dev: device which has mapped the buffer
+ * @tlb_addr: physical address inside the bounce buffer
+ * @size: size of the region to be copied
+ * @dir: direction of the data transfer
+ *
+ * Copy data to or from a buffer of @size bytes starting at @tlb_addr.
+ * This function works only for dynamically allocated bounce buffers.
+ */
+static void swiotlb_dyn_bounce(struct device *dev, phys_addr_t tlb_addr,
+ size_t size, enum dma_data_direction dir)
+{
+ struct io_tlb_dyn_slot *slot = lookup_dyn_slot(dev, tlb_addr);
+ unsigned int tlb_offset;
+ unsigned char *vaddr;
+
+ if (!slot)
+ return;
+
+ tlb_offset = tlb_addr - page_to_phys(slot->page);
+ vaddr = page_address(slot->page) + tlb_offset;
+
+ swiotlb_copy(dev, slot->orig_addr, vaddr, size, slot->alloc_size,
+ tlb_offset, dir);
+}
+
+/**
+ * swiotlb_dyn_map() - allocate a bounce buffer dynamically
+ * @dev: device which maps the buffer
+ * @orig_addr: address of the original buffer
+ * @alloc_size: total size of the original buffer
+ * @alloc_align_mask:
+ * required physical alignment of the I/O buffer
+ * @dir: direction of the data transfer
+ * @attrs: optional DMA attributes for the map operation
+ *
+ * Allocate a new bounce buffer using DMA non-coherent API. This function
+ * assumes that there is a fallback allocation scheme if the allocation
+ * fails. In fact, it always fails for buffers smaller than a page and
+ * for address constraints that are not (yet) correctly handled by
+ * dma_direct_alloc_pages().
+ *
+ * Return: Physical address of the bounce buffer, or %DMA_MAPPING_ERROR.
+ */
+static phys_addr_t swiotlb_dyn_map(struct device *dev, phys_addr_t orig_addr,
+ size_t alloc_size, unsigned int alloc_align_mask,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+ struct io_tlb_dyn_slot *slot;
+ unsigned long flags;
+ gfp_t gfp;
+
+ /* Allocation has page granularity. Avoid small buffers. */
+ if (alloc_size < PAGE_SIZE)
+ goto err;
+
+ /* DMA direct does not deal with physical address constraints. */
+ if (alloc_align_mask || dma_get_min_align_mask(dev))
+ goto err;
+
+ gfp = (attrs & DMA_ATTR_MAY_SLEEP) ? GFP_NOIO : GFP_NOWAIT;
+ slot = kmalloc(sizeof(*slot), gfp | __GFP_NOWARN);
+ if (!slot)
+ goto err;
+
+ slot->orig_addr = orig_addr;
+ slot->alloc_size = alloc_size;
+ slot->page = dma_direct_alloc_pages(dev, PAGE_ALIGN(alloc_size),
+ &slot->dma_addr, dir,
+ gfp | __GFP_NOWARN);
+ if (!slot->page)
+ goto err_free_slot;
+
+ spin_lock_irqsave(&dev->dma_io_tlb_dyn_lock, flags);
+ list_add(&slot->node, &dev->dma_io_tlb_dyn_slots);
+ spin_unlock_irqrestore(&dev->dma_io_tlb_dyn_lock, flags);
+
+ return page_to_phys(slot->page);
+
+err_free_slot:
+ kfree(slot);
+err:
+ return (phys_addr_t)DMA_MAPPING_ERROR;
+}
+
+/**
+ * swiotlb_dyn_unmap() - unmap a dynamically allocated bounce buffer
+ * @dev: device which mapped the buffer
+ * @tlb_addr: physical address of the bounce buffer
+ * @dir: direction of the data transfer
+ *
+ * Release all resources associated with a dynamically allocated bounce
+ * buffer.
+ */
+static void swiotlb_dyn_unmap(struct device *dev, phys_addr_t tlb_addr,
+ enum dma_data_direction dir)
+{
+ struct io_tlb_dyn_slot *slot;
+ unsigned long flags;
+
+ spin_lock_irqsave(&dev->dma_io_tlb_dyn_lock, flags);
+ slot = lookup_dyn_slot_locked(dev, tlb_addr);
+ list_del(&slot->node);
+ spin_unlock_irqrestore(&dev->dma_io_tlb_dyn_lock, flags);
+
+ dma_direct_free_pages(dev, slot->alloc_size, slot->page,
+ slot->dma_addr, dir);
+ kfree(slot);
+}
+
/*
* Return the offset into a iotlb slot required to keep the device happy.
*/
@@ -517,11 +718,19 @@ static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);
}
-/*
- * Bounce: copy the swiotlb buffer from or back to the original dma location
+/**
+ * swiotlb_fixed_bounce() - copy a fixed-slot swiotlb buffer from or back
+ * to the original dma location
+ * @dev: device which has mapped the buffer
+ * @tlb_addr: physical address inside the bounce buffer
+ * @size: size of the region to be copied
+ * @dir: direction of the data transfer
+ *
+ * Copy data to or from a buffer of @size bytes starting at @tlb_addr.
+ * This function works only for fixed bounce buffers.
*/
-static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size,
- enum dma_data_direction dir)
+static void swiotlb_fixed_bounce(struct device *dev, phys_addr_t tlb_addr,
+ size_t size, enum dma_data_direction dir)
{
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
@@ -617,6 +826,25 @@ static void swiotlb_copy(struct device *dev, phys_addr_t orig_addr,
}
}
+/**
+ * swiotlb_bounce() - copy the swiotlb buffer from or back to the original
+ * dma location
+ * @dev: device which has mapped the buffer
+ * @tlb_addr: physical address inside the bounce buffer
+ * @size: size of the region to be copied
+ * @dir: direction of the data transfer
+ *
+ * Copy data to or from a buffer of @size bytes starting at @tlb_addr.
+ */
+static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size,
+ enum dma_data_direction dir)
+{
+ if (is_swiotlb_fixed(dev->dma_io_tlb_mem, tlb_addr))
+ swiotlb_fixed_bounce(dev, tlb_addr, size, dir);
+ else
+ swiotlb_dyn_bounce(dev, tlb_addr, size, dir);
+}
+
static inline phys_addr_t slot_addr(phys_addr_t start, phys_addr_t idx)
{
return start + (idx << IO_TLB_SHIFT);
@@ -841,8 +1069,13 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
return (phys_addr_t)DMA_MAPPING_ERROR;
}
- tlb_addr = swiotlb_fixed_map(dev, orig_addr, alloc_size,
- alloc_align_mask, attrs);
+ tlb_addr = (phys_addr_t)DMA_MAPPING_ERROR;
+ if (!is_swiotlb_for_alloc(dev))
+ tlb_addr = swiotlb_dyn_map(dev, orig_addr, alloc_size,
+ alloc_align_mask, dir, attrs);
+ if (tlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
+ tlb_addr = swiotlb_fixed_map(dev, orig_addr, alloc_size,
+ alloc_align_mask, attrs);
if (tlb_addr == (phys_addr_t)DMA_MAPPING_ERROR) {
if (!(attrs & DMA_ATTR_NO_WARN))
@@ -924,7 +1157,10 @@ void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
- swiotlb_release_slots(dev, tlb_addr);
+ if (is_swiotlb_fixed(dev->dma_io_tlb_mem, tlb_addr))
+ swiotlb_release_slots(dev, tlb_addr);
+ else
+ swiotlb_dyn_unmap(dev, tlb_addr, dir);
}
void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
@@ -1055,7 +1291,7 @@ bool swiotlb_free(struct device *dev, struct page *page, size_t size)
{
phys_addr_t tlb_addr = page_to_phys(page);
- if (!is_swiotlb_buffer(dev, tlb_addr))
+ if (!is_swiotlb_fixed(dev->dma_io_tlb_mem, tlb_addr))
return false;
swiotlb_release_slots(dev, tlb_addr);
--
2.25.1
From: Petr Tesarik <[email protected]>
The main goal of allocating bounce buffers dynamically is to allow
allocating a minimal fixed swiotlb at boot time but avoid hard
limits on the amount of I/O that can be handled later.
Compared to fixed IO TLB slots, dynamic allocation of bounce buffers
typically increases the worst-case I/O latency and may also reduce
performance for some workloads.
I did some basic testing with fio against a QEMU SATA drive backed
by a RAM block device in the host to minimize external factors. The
kernel was booted with "swiotlb=force,dynamic". I performed testing
of single-threaded I/O of 4-KiB segments, single-threaded I/O of
1-MiB segments, and 4-core parallel I/O of 64-KiB segments. The last
column is the coefficient of variance in 5 runs of the test:
Read Write Coeff
single 4-KiB +1.9% +1.9% 1.7%
single 1-MiB -8.1% -8.2% 2.2%
parallel -9.4% -9.5% 2.6%
There is a slight increase in bandwidth for single-threaded 4-KiB
segments. This is because the buddy allocator is quite efficient for
order-0 allocations, so the overhead is offset by faster allocation
from an almost empty fixed swiotlb (which is still used for buffers
smaller than one page).
Anyway, since the feature is new and does not benefit all
workloads, make it disabled by default and let people turn it on
with "swiotlb=dynamic" if needed. Since this option can be combined
with "force", the parser is modified to allow multiple options
separated by commas.
A new bool field is added to struct io_tlb_mem to tell whether
dynamic allocations are allowed. This field is always false for DMA
restricted pools. It is also false for other software IO TLBs
unless "swiotlb=dynamic" was specified.
Signed-off-by: Petr Tesarik <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 6 +++++-
include/linux/swiotlb.h | 3 ++-
kernel/dma/swiotlb.c | 20 ++++++++++++++-----
3 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6221a1d057dd..c8bc0c8b8df6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6110,14 +6110,18 @@
Execution Facility on pSeries.
swiotlb= [ARM,IA-64,PPC,MIPS,X86]
- Format: { <int> [,<int>] | force | noforce }
+ Format: { <int> [,<int>] [,option-list] | option-list }
<int> -- Number of I/O TLB slabs
<int> -- Second integer after comma. Number of swiotlb
areas with their own lock. Will be rounded up
to a power of 2.
+ <option-list> -- Comma-separated list of options.
+
+ Available options:
force -- force using of bounce buffers even if they
wouldn't be automatically used by the kernel
noforce -- Never use bounce buffers (for debugging)
+ dynamic -- allow dynamic allocation of bounce buffers
switches= [HW,M68k]
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 0856eddb9063..e614aa0f4f64 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -98,6 +98,7 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+ bool allow_dyn;
unsigned int nareas;
unsigned int area_nslabs;
struct io_tlb_area *areas;
@@ -142,7 +143,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
return mem &&
(is_swiotlb_fixed(mem, paddr) ||
- is_swiotlb_dyn(dev, paddr));
+ (mem->allow_dyn && is_swiotlb_dyn(dev, paddr)));
}
static inline bool is_swiotlb_force_bounce(struct device *dev)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f4faee38ead9..4899fb0e4331 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -86,6 +86,7 @@ struct io_tlb_dyn_slot {
static bool swiotlb_force_bounce;
static bool swiotlb_force_disable;
+static bool swiotlb_dynamic;
struct io_tlb_mem io_tlb_default_mem;
@@ -167,10 +168,18 @@ setup_io_tlb_npages(char *str)
swiotlb_adjust_nareas(simple_strtoul(str, &str, 0));
if (*str == ',')
++str;
- if (!strcmp(str, "force"))
- swiotlb_force_bounce = true;
- else if (!strcmp(str, "noforce"))
- swiotlb_force_disable = true;
+ while (str && *str) {
+ char *opt = strsep(&str, ",");
+
+ if (!strcmp(opt, "force"))
+ swiotlb_force_bounce = true;
+ else if (!strcmp(opt, "noforce"))
+ swiotlb_force_disable = true;
+ else if (!strcmp(opt, "dynamic"))
+ swiotlb_dynamic = true;
+ else
+ pr_warn("Invalid swiotlb option: %s", opt);
+ }
return 0;
}
@@ -287,6 +296,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
mem->area_nslabs = nslabs / mem->nareas;
mem->force_bounce = swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
+ mem->allow_dyn = swiotlb_dynamic;
for (i = 0; i < mem->nareas; i++) {
spin_lock_init(&mem->areas[i].lock);
@@ -1070,7 +1080,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
}
tlb_addr = (phys_addr_t)DMA_MAPPING_ERROR;
- if (!is_swiotlb_for_alloc(dev))
+ if (mem->allow_dyn)
tlb_addr = swiotlb_dyn_map(dev, orig_addr, alloc_size,
alloc_align_mask, dir, attrs);
if (tlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
--
2.25.1
From: Petr Tesarik <[email protected]>
These mappings are never done from atomic context. If a dynamically
allocated bounce buffer is used for the mapping, this change allows
to allocate from CMA.
Signed-off-by: Petr Tesarik <[email protected]>
---
drivers/gpu/drm/drm_gem_shmem_helper.c | 2 +-
drivers/gpu/drm/drm_prime.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 2b2163c8138e..b5bb4f9c130a 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -702,7 +702,7 @@ static struct sg_table *drm_gem_shmem_get_pages_sgt_locked(struct drm_gem_shmem_
goto err_put_pages;
}
/* Map the pages for use by the h/w. */
- ret = dma_map_sgtable(obj->dev->dev, sgt, DMA_BIDIRECTIONAL, 0);
+ ret = dma_map_sgtable(obj->dev->dev, sgt, DMA_BIDIRECTIONAL, DMA_ATTR_MAY_SLEEP);
if (ret)
goto err_free_sgt;
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index f924b8b4ab6b..f32e12445570 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -635,7 +635,7 @@ struct sg_table *drm_gem_map_dma_buf(struct dma_buf_attachment *attach,
return sgt;
ret = dma_map_sgtable(attach->dev, sgt, dir,
- DMA_ATTR_SKIP_CPU_SYNC);
+ DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MAY_SLEEP);
if (ret) {
sg_free_table(sgt);
kfree(sgt);
--
2.25.1
From: Petr Tesarik <[email protected]>
Do not walk the list of dynamically allocated bounce buffers if the
list is empty. This avoids taking dma_io_tlb_dyn_lock for devices
which do not use any dynamically allocated bounce buffers.
When unmapping the last dynamically allocated bounce buffer, the
flag is set to false as soon as possible to allow skipping the
spinlock even before the list itself is updated.
Signed-off-by: Petr Tesarik <[email protected]>
---
include/linux/device.h | 4 ++++
include/linux/swiotlb.h | 6 +++++-
kernel/dma/swiotlb.c | 6 ++++++
3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/include/linux/device.h b/include/linux/device.h
index e12d6092bb9c..131b6db7fb3f 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -511,6 +511,9 @@ struct device_physical_location {
* @dma_io_tlb_dyn_slots:
* Dynamically allocated bounce buffers for this device.
* Not for driver use.
+ * @dma_io_tlb_have_dyn:
+ * Does this device have any dynamically allocated bounce
+ * buffers? Not for driver use.
* @archdata: For arch-specific additions.
* @of_node: Associated device tree node.
* @fwnode: Associated device node supplied by platform firmware.
@@ -618,6 +621,7 @@ struct device {
struct io_tlb_mem *dma_io_tlb_mem;
spinlock_t dma_io_tlb_dyn_lock;
struct list_head dma_io_tlb_dyn_slots;
+ bool dma_io_tlb_have_dyn;
#endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index e614aa0f4f64..9ca4812b5977 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -143,7 +143,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
return mem &&
(is_swiotlb_fixed(mem, paddr) ||
- (mem->allow_dyn && is_swiotlb_dyn(dev, paddr)));
+ /* Pairs with smp_store_release() in swiotlb_dyn_map()
+ * and swiotlb_dyn_unmap().
+ */
+ (smp_load_acquire(&dev->dma_io_tlb_have_dyn) &&
+ is_swiotlb_dyn(dev, paddr)));
}
static inline bool is_swiotlb_force_bounce(struct device *dev)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 4899fb0e4331..9b4faed7ef8f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -685,6 +685,9 @@ static phys_addr_t swiotlb_dyn_map(struct device *dev, phys_addr_t orig_addr,
spin_lock_irqsave(&dev->dma_io_tlb_dyn_lock, flags);
list_add(&slot->node, &dev->dma_io_tlb_dyn_slots);
+ if (!dev->dma_io_tlb_have_dyn)
+ /* Pairs with smp_load_acquire() in is_swiotlb_buffer() */
+ smp_store_release(&dev->dma_io_tlb_have_dyn, true);
spin_unlock_irqrestore(&dev->dma_io_tlb_dyn_lock, flags);
return page_to_phys(slot->page);
@@ -711,6 +714,9 @@ static void swiotlb_dyn_unmap(struct device *dev, phys_addr_t tlb_addr,
unsigned long flags;
spin_lock_irqsave(&dev->dma_io_tlb_dyn_lock, flags);
+ if (list_is_singular(&dev->dma_io_tlb_dyn_slots))
+ /* Pairs with smp_load_acquire() in is_swiotlb_buffer() */
+ smp_store_release(&dev->dma_io_tlb_have_dyn, false);
slot = lookup_dyn_slot_locked(dev, tlb_addr);
list_del(&slot->node);
spin_unlock_irqrestore(&dev->dma_io_tlb_dyn_lock, flags);
--
2.25.1
On Wed, Apr 26, 2023 at 02:15:20PM +0200, Petr Tesařík wrote:
> Hi,
>
> On Wed, 19 Apr 2023 12:03:52 +0200
> Petr Tesarik <[email protected]> wrote:
>
> > From: Petr Tesarik <[email protected]>
> >
> > The goal of my work is to provide more flexibility in the sizing of
> > SWIOTLB.
> >
> > The software IO TLB was designed with these assumptions:
> >
> > 1. It would not be used much, especially on 64-bit systems.
> > 2. A small fixed memory area (64 MiB by default) is sufficient to
> > handle the few cases which require a bounce buffer.
> > 3. 64 MiB is little enough that it has no impact on the rest of the
> > system.
> >
> > First, if SEV is active, all DMA must be done through shared
> > unencrypted pages, and SWIOTLB is used to make this happen without
> > changing device drivers. The software IO TLB size is increased to
> > 6% of total memory in sev_setup_arch(), but that is more of an
> > approximation. The actual requirements may vary depending on the
> > amount of I/O and which drivers are used. These factors may not be
> > know at boot time, i.e. when SWIOTLB is allocated.
> >
> > Second, other colleagues have noticed that they can reliably get
> > rid of occasional OOM kills on an Arm embedded device by reducing
> > the SWIOTLB size. This can be achieved with a kernel parameter, but
> > determining the right value puts additional burden on pre-release
> > testing, which could be avoided if SWIOTLB is allocated small and
> > grows only when necessary.
>
> Now that merging into 6.4 has begun, what about this patch series? I'm
> eager to get some feedback (positive or negative) and respin the next
> version.
It's the merge window, we can't add new things that haven't been in
linux-next already. Please resubmit it after -rc1 is out.
thanks,
greg k-h
Hi,
On Wed, 19 Apr 2023 12:03:52 +0200
Petr Tesarik <[email protected]> wrote:
> From: Petr Tesarik <[email protected]>
>
> The goal of my work is to provide more flexibility in the sizing of
> SWIOTLB.
>
> The software IO TLB was designed with these assumptions:
>
> 1. It would not be used much, especially on 64-bit systems.
> 2. A small fixed memory area (64 MiB by default) is sufficient to
> handle the few cases which require a bounce buffer.
> 3. 64 MiB is little enough that it has no impact on the rest of the
> system.
>
> First, if SEV is active, all DMA must be done through shared
> unencrypted pages, and SWIOTLB is used to make this happen without
> changing device drivers. The software IO TLB size is increased to
> 6% of total memory in sev_setup_arch(), but that is more of an
> approximation. The actual requirements may vary depending on the
> amount of I/O and which drivers are used. These factors may not be
> know at boot time, i.e. when SWIOTLB is allocated.
>
> Second, other colleagues have noticed that they can reliably get
> rid of occasional OOM kills on an Arm embedded device by reducing
> the SWIOTLB size. This can be achieved with a kernel parameter, but
> determining the right value puts additional burden on pre-release
> testing, which could be avoided if SWIOTLB is allocated small and
> grows only when necessary.
Now that merging into 6.4 has begun, what about this patch series? I'm
eager to get some feedback (positive or negative) and respin the next
version.
Petr T
Hi Greg,
On Wed, 26 Apr 2023 14:26:36 +0200
Greg Kroah-Hartman <[email protected]> wrote:
> On Wed, Apr 26, 2023 at 02:15:20PM +0200, Petr Tesařík wrote:
> > Hi,
> >
> > On Wed, 19 Apr 2023 12:03:52 +0200
> > Petr Tesarik <[email protected]> wrote:
> >
> > > From: Petr Tesarik <[email protected]>
> > >
> > > The goal of my work is to provide more flexibility in the sizing of
> > > SWIOTLB.
> > >
> > > The software IO TLB was designed with these assumptions:
> > >
> > > 1. It would not be used much, especially on 64-bit systems.
> > > 2. A small fixed memory area (64 MiB by default) is sufficient to
> > > handle the few cases which require a bounce buffer.
> > > 3. 64 MiB is little enough that it has no impact on the rest of the
> > > system.
> > >
> > > First, if SEV is active, all DMA must be done through shared
> > > unencrypted pages, and SWIOTLB is used to make this happen without
> > > changing device drivers. The software IO TLB size is increased to
> > > 6% of total memory in sev_setup_arch(), but that is more of an
> > > approximation. The actual requirements may vary depending on the
> > > amount of I/O and which drivers are used. These factors may not be
> > > know at boot time, i.e. when SWIOTLB is allocated.
> > >
> > > Second, other colleagues have noticed that they can reliably get
> > > rid of occasional OOM kills on an Arm embedded device by reducing
> > > the SWIOTLB size. This can be achieved with a kernel parameter, but
> > > determining the right value puts additional burden on pre-release
> > > testing, which could be avoided if SWIOTLB is allocated small and
> > > grows only when necessary.
> >
> > Now that merging into 6.4 has begun, what about this patch series? I'm
> > eager to get some feedback (positive or negative) and respin the next
> > version.
>
> It's the merge window, we can't add new things that haven't been in
> linux-next already.
This is understood. I'm not asking for immediate inclusion.
> Please resubmit it after -rc1 is out.
If you can believe that rebasing to -rc1 will be enough, then I will
also try to believe I'm lucky. ;-)
The kind of feedback I really want to get is e.g. about the extra
per-device DMA-specific fields. If they cannot be added to struct
device, then I'd rather start discussing an interim solution, because
getting all existing DMA fields out of that struct will take a lot of
time...
Thanks,
Petr T
On Wed, Apr 26, 2023 at 02:44:39PM +0200, Petr Tesařík wrote:
> Hi Greg,
>
> On Wed, 26 Apr 2023 14:26:36 +0200
> Greg Kroah-Hartman <[email protected]> wrote:
>
> > On Wed, Apr 26, 2023 at 02:15:20PM +0200, Petr Tesařík wrote:
> > > Hi,
> > >
> > > On Wed, 19 Apr 2023 12:03:52 +0200
> > > Petr Tesarik <[email protected]> wrote:
> > >
> > > > From: Petr Tesarik <[email protected]>
> > > >
> > > > The goal of my work is to provide more flexibility in the sizing of
> > > > SWIOTLB.
> > > >
> > > > The software IO TLB was designed with these assumptions:
> > > >
> > > > 1. It would not be used much, especially on 64-bit systems.
> > > > 2. A small fixed memory area (64 MiB by default) is sufficient to
> > > > handle the few cases which require a bounce buffer.
> > > > 3. 64 MiB is little enough that it has no impact on the rest of the
> > > > system.
> > > >
> > > > First, if SEV is active, all DMA must be done through shared
> > > > unencrypted pages, and SWIOTLB is used to make this happen without
> > > > changing device drivers. The software IO TLB size is increased to
> > > > 6% of total memory in sev_setup_arch(), but that is more of an
> > > > approximation. The actual requirements may vary depending on the
> > > > amount of I/O and which drivers are used. These factors may not be
> > > > know at boot time, i.e. when SWIOTLB is allocated.
> > > >
> > > > Second, other colleagues have noticed that they can reliably get
> > > > rid of occasional OOM kills on an Arm embedded device by reducing
> > > > the SWIOTLB size. This can be achieved with a kernel parameter, but
> > > > determining the right value puts additional burden on pre-release
> > > > testing, which could be avoided if SWIOTLB is allocated small and
> > > > grows only when necessary.
> > >
> > > Now that merging into 6.4 has begun, what about this patch series? I'm
> > > eager to get some feedback (positive or negative) and respin the next
> > > version.
> >
> > It's the merge window, we can't add new things that haven't been in
> > linux-next already.
>
> This is understood. I'm not asking for immediate inclusion.
>
> > Please resubmit it after -rc1 is out.
>
> If you can believe that rebasing to -rc1 will be enough, then I will
> also try to believe I'm lucky. ;-)
>
> The kind of feedback I really want to get is e.g. about the extra
> per-device DMA-specific fields. If they cannot be added to struct
> device, then I'd rather start discussing an interim solution, because
> getting all existing DMA fields out of that struct will take a lot of
> time...
I thought the goal was to get them out of the device and into the bus
instead right? Or was it the other way around? I can't remember
anymore, sorry...
greg k-h
On Wed, 26 Apr 2023 14:53:52 +0200
Greg Kroah-Hartman <[email protected]> wrote:
> On Wed, Apr 26, 2023 at 02:44:39PM +0200, Petr Tesařík wrote:
> > Hi Greg,
> >
> > On Wed, 26 Apr 2023 14:26:36 +0200
> > Greg Kroah-Hartman <[email protected]> wrote:
> >
> > > On Wed, Apr 26, 2023 at 02:15:20PM +0200, Petr Tesařík wrote:
> > > > Hi,
> > > >
> > > > On Wed, 19 Apr 2023 12:03:52 +0200
> > > > Petr Tesarik <[email protected]> wrote:
> > > >
> > > > > From: Petr Tesarik <[email protected]>
> > > > >
> > > > > The goal of my work is to provide more flexibility in the sizing of
> > > > > SWIOTLB.
> > > > >
> > > > > The software IO TLB was designed with these assumptions:
> > > > >
> > > > > 1. It would not be used much, especially on 64-bit systems.
> > > > > 2. A small fixed memory area (64 MiB by default) is sufficient to
> > > > > handle the few cases which require a bounce buffer.
> > > > > 3. 64 MiB is little enough that it has no impact on the rest of the
> > > > > system.
> > > > >
> > > > > First, if SEV is active, all DMA must be done through shared
> > > > > unencrypted pages, and SWIOTLB is used to make this happen without
> > > > > changing device drivers. The software IO TLB size is increased to
> > > > > 6% of total memory in sev_setup_arch(), but that is more of an
> > > > > approximation. The actual requirements may vary depending on the
> > > > > amount of I/O and which drivers are used. These factors may not be
> > > > > know at boot time, i.e. when SWIOTLB is allocated.
> > > > >
> > > > > Second, other colleagues have noticed that they can reliably get
> > > > > rid of occasional OOM kills on an Arm embedded device by reducing
> > > > > the SWIOTLB size. This can be achieved with a kernel parameter, but
> > > > > determining the right value puts additional burden on pre-release
> > > > > testing, which could be avoided if SWIOTLB is allocated small and
> > > > > grows only when necessary.
> > > >
> > > > Now that merging into 6.4 has begun, what about this patch series? I'm
> > > > eager to get some feedback (positive or negative) and respin the next
> > > > version.
> > >
> > > It's the merge window, we can't add new things that haven't been in
> > > linux-next already.
> >
> > This is understood. I'm not asking for immediate inclusion.
> >
> > > Please resubmit it after -rc1 is out.
> >
> > If you can believe that rebasing to -rc1 will be enough, then I will
> > also try to believe I'm lucky. ;-)
> >
> > The kind of feedback I really want to get is e.g. about the extra
> > per-device DMA-specific fields. If they cannot be added to struct
> > device, then I'd rather start discussing an interim solution, because
> > getting all existing DMA fields out of that struct will take a lot of
> > time...
>
> I thought the goal was to get them out of the device and into the bus
> instead right? Or was it the other way around? I can't remember
> anymore, sorry...
You remember it almost right. The goal is to get them out of struct
device into the bus (or platform device, or whatever holds struct
device). But I'd like to keep this task decoupled from the dynamic
swiotlb.
Thanks,
Petr T
On Wed, 19 Apr 2023 at 11:05, Petr Tesarik <[email protected]> wrote:
>
> From: Petr Tesarik <[email protected]>
>
> The goal of my work is to provide more flexibility in the sizing of
> SWIOTLB.
>
> The software IO TLB was designed with these assumptions:
>
> 1. It would not be used much, especially on 64-bit systems.
> 2. A small fixed memory area (64 MiB by default) is sufficient to
> handle the few cases which require a bounce buffer.
> 3. 64 MiB is little enough that it has no impact on the rest of the
> system.
>
> First, if SEV is active, all DMA must be done through shared
> unencrypted pages, and SWIOTLB is used to make this happen without
> changing device drivers. The software IO TLB size is increased to
> 6% of total memory in sev_setup_arch(), but that is more of an
> approximation. The actual requirements may vary depending on the
> amount of I/O and which drivers are used. These factors may not be
> know at boot time, i.e. when SWIOTLB is allocated.
>
> Second, other colleagues have noticed that they can reliably get
> rid of occasional OOM kills on an Arm embedded device by reducing
> the SWIOTLB size. This can be achieved with a kernel parameter, but
> determining the right value puts additional burden on pre-release
> testing, which could be avoided if SWIOTLB is allocated small and
> grows only when necessary.
>
> Changes from v1-devel-v7:
> - Add comments to acquire/release barriers
> - Fix whitespace issues reported by checkpatch.pl
>
> Changes from v1-devel-v6:
> - Provide long description of functions
> - Fix kernel-doc (Returns: to Return:)
> - Rename __lookup_dyn_slot() to lookup_dyn_slot_locked()
>
> Changes from RFC:
> - Track dynamic buffers per device instead of per swiotlb
> - Use a linked list instead of a maple tree
> - Move initialization of swiotlb fields of struct device to a
> helper function
> - Rename __lookup_dyn_slot() to lookup_dyn_slot_locked()
> - Introduce per-device flag if dynamic buffers are in use
> - Add one more user of DMA_ATTR_MAY_SLEEP
> - Add kernel-doc comments for new (and some old) code
> - Properly escape '*' in dma-attributes.rst
>
> Petr Tesarik (7):
> swiotlb: Use a helper to initialize swiotlb fields in struct device
> swiotlb: Move code around in preparation for dynamic bounce buffers
> dma-mapping: introduce the DMA_ATTR_MAY_SLEEP attribute
> swiotlb: Dynamically allocated bounce buffers
> swiotlb: Add a boot option to enable dynamic bounce buffers
> drm: Use DMA_ATTR_MAY_SLEEP from process context
> swiotlb: per-device flag if there are dynamically allocated buffers
>
> .../admin-guide/kernel-parameters.txt | 6 +-
> Documentation/core-api/dma-attributes.rst | 10 +
> drivers/base/core.c | 4 +-
> drivers/gpu/drm/drm_gem_shmem_helper.c | 2 +-
> drivers/gpu/drm/drm_prime.c | 2 +-
> include/linux/device.h | 12 +
> include/linux/dma-mapping.h | 6 +
> include/linux/swiotlb.h | 54 ++-
> kernel/dma/swiotlb.c | 382 ++++++++++++++++--
> 9 files changed, 443 insertions(+), 35 deletions(-)
>
> --
> 2.25.1
>
Hi
Is this a potential fix for
https://bugzilla.kernel.org/show_bug.cgi?id=217310 where I'm manually
setting bigger buffers to keep my wifi working?
Thanks
Mike
On Fri, 28 Apr 2023 09:53:38 +0100
Mike Lothian <[email protected]> wrote:
> On Wed, 19 Apr 2023 at 11:05, Petr Tesarik <[email protected]> wrote:
> >
> > From: Petr Tesarik <[email protected]>
> >
> > The goal of my work is to provide more flexibility in the sizing of
> > SWIOTLB.
> >
> > The software IO TLB was designed with these assumptions:
> >
> > 1. It would not be used much, especially on 64-bit systems.
> > 2. A small fixed memory area (64 MiB by default) is sufficient to
> > handle the few cases which require a bounce buffer.
> > 3. 64 MiB is little enough that it has no impact on the rest of the
> > system.
> >
> > First, if SEV is active, all DMA must be done through shared
> > unencrypted pages, and SWIOTLB is used to make this happen without
> > changing device drivers. The software IO TLB size is increased to
> > 6% of total memory in sev_setup_arch(), but that is more of an
> > approximation. The actual requirements may vary depending on the
> > amount of I/O and which drivers are used. These factors may not be
> > know at boot time, i.e. when SWIOTLB is allocated.
> >
> > Second, other colleagues have noticed that they can reliably get
> > rid of occasional OOM kills on an Arm embedded device by reducing
> > the SWIOTLB size. This can be achieved with a kernel parameter, but
> > determining the right value puts additional burden on pre-release
> > testing, which could be avoided if SWIOTLB is allocated small and
> > grows only when necessary.
> >
> > Changes from v1-devel-v7:
> > - Add comments to acquire/release barriers
> > - Fix whitespace issues reported by checkpatch.pl
> >
> > Changes from v1-devel-v6:
> > - Provide long description of functions
> > - Fix kernel-doc (Returns: to Return:)
> > - Rename __lookup_dyn_slot() to lookup_dyn_slot_locked()
> >
> > Changes from RFC:
> > - Track dynamic buffers per device instead of per swiotlb
> > - Use a linked list instead of a maple tree
> > - Move initialization of swiotlb fields of struct device to a
> > helper function
> > - Rename __lookup_dyn_slot() to lookup_dyn_slot_locked()
> > - Introduce per-device flag if dynamic buffers are in use
> > - Add one more user of DMA_ATTR_MAY_SLEEP
> > - Add kernel-doc comments for new (and some old) code
> > - Properly escape '*' in dma-attributes.rst
> >
> > Petr Tesarik (7):
> > swiotlb: Use a helper to initialize swiotlb fields in struct device
> > swiotlb: Move code around in preparation for dynamic bounce buffers
> > dma-mapping: introduce the DMA_ATTR_MAY_SLEEP attribute
> > swiotlb: Dynamically allocated bounce buffers
> > swiotlb: Add a boot option to enable dynamic bounce buffers
> > drm: Use DMA_ATTR_MAY_SLEEP from process context
> > swiotlb: per-device flag if there are dynamically allocated buffers
> >
> > .../admin-guide/kernel-parameters.txt | 6 +-
> > Documentation/core-api/dma-attributes.rst | 10 +
> > drivers/base/core.c | 4 +-
> > drivers/gpu/drm/drm_gem_shmem_helper.c | 2 +-
> > drivers/gpu/drm/drm_prime.c | 2 +-
> > include/linux/device.h | 12 +
> > include/linux/dma-mapping.h | 6 +
> > include/linux/swiotlb.h | 54 ++-
> > kernel/dma/swiotlb.c | 382 ++++++++++++++++--
> > 9 files changed, 443 insertions(+), 35 deletions(-)
> >
> > --
> > 2.25.1
> >
>
> Hi
>
> Is this a potential fix for
> https://bugzilla.kernel.org/show_bug.cgi?id=217310 where I'm manually
> setting bigger buffers to keep my wifi working?
Yes. With these patches applied, your system should run just fine with
swiotlb=dynamic. However, keep in mind that this implementation adds a
bit of overhead. In short, it trades a bit of performance for not
having to figure out the optimal swiotlb size at boot time.
Petr T
Hi
I've not had any issues since using this, but I imagine most people
won't know how to set swiotlb=dynamic if they start seeing this (when
it lands)
Any clue as to why this broke last cycle?
Thanks
Mike
On Fri, 28 Apr 2023 at 10:07, Petr Tesařík <[email protected]> wrote:
>
> On Fri, 28 Apr 2023 09:53:38 +0100
> Mike Lothian <[email protected]> wrote:
>
> > On Wed, 19 Apr 2023 at 11:05, Petr Tesarik <[email protected]> wrote:
> > >
> > > From: Petr Tesarik <[email protected]>
> > >
> > > The goal of my work is to provide more flexibility in the sizing of
> > > SWIOTLB.
> > >
> > > The software IO TLB was designed with these assumptions:
> > >
> > > 1. It would not be used much, especially on 64-bit systems.
> > > 2. A small fixed memory area (64 MiB by default) is sufficient to
> > > handle the few cases which require a bounce buffer.
> > > 3. 64 MiB is little enough that it has no impact on the rest of the
> > > system.
> > >
> > > First, if SEV is active, all DMA must be done through shared
> > > unencrypted pages, and SWIOTLB is used to make this happen without
> > > changing device drivers. The software IO TLB size is increased to
> > > 6% of total memory in sev_setup_arch(), but that is more of an
> > > approximation. The actual requirements may vary depending on the
> > > amount of I/O and which drivers are used. These factors may not be
> > > know at boot time, i.e. when SWIOTLB is allocated.
> > >
> > > Second, other colleagues have noticed that they can reliably get
> > > rid of occasional OOM kills on an Arm embedded device by reducing
> > > the SWIOTLB size. This can be achieved with a kernel parameter, but
> > > determining the right value puts additional burden on pre-release
> > > testing, which could be avoided if SWIOTLB is allocated small and
> > > grows only when necessary.
> > >
> > > Changes from v1-devel-v7:
> > > - Add comments to acquire/release barriers
> > > - Fix whitespace issues reported by checkpatch.pl
> > >
> > > Changes from v1-devel-v6:
> > > - Provide long description of functions
> > > - Fix kernel-doc (Returns: to Return:)
> > > - Rename __lookup_dyn_slot() to lookup_dyn_slot_locked()
> > >
> > > Changes from RFC:
> > > - Track dynamic buffers per device instead of per swiotlb
> > > - Use a linked list instead of a maple tree
> > > - Move initialization of swiotlb fields of struct device to a
> > > helper function
> > > - Rename __lookup_dyn_slot() to lookup_dyn_slot_locked()
> > > - Introduce per-device flag if dynamic buffers are in use
> > > - Add one more user of DMA_ATTR_MAY_SLEEP
> > > - Add kernel-doc comments for new (and some old) code
> > > - Properly escape '*' in dma-attributes.rst
> > >
> > > Petr Tesarik (7):
> > > swiotlb: Use a helper to initialize swiotlb fields in struct device
> > > swiotlb: Move code around in preparation for dynamic bounce buffers
> > > dma-mapping: introduce the DMA_ATTR_MAY_SLEEP attribute
> > > swiotlb: Dynamically allocated bounce buffers
> > > swiotlb: Add a boot option to enable dynamic bounce buffers
> > > drm: Use DMA_ATTR_MAY_SLEEP from process context
> > > swiotlb: per-device flag if there are dynamically allocated buffers
> > >
> > > .../admin-guide/kernel-parameters.txt | 6 +-
> > > Documentation/core-api/dma-attributes.rst | 10 +
> > > drivers/base/core.c | 4 +-
> > > drivers/gpu/drm/drm_gem_shmem_helper.c | 2 +-
> > > drivers/gpu/drm/drm_prime.c | 2 +-
> > > include/linux/device.h | 12 +
> > > include/linux/dma-mapping.h | 6 +
> > > include/linux/swiotlb.h | 54 ++-
> > > kernel/dma/swiotlb.c | 382 ++++++++++++++++--
> > > 9 files changed, 443 insertions(+), 35 deletions(-)
> > >
> > > --
> > > 2.25.1
> > >
> >
> > Hi
> >
> > Is this a potential fix for
> > https://bugzilla.kernel.org/show_bug.cgi?id=217310 where I'm manually
> > setting bigger buffers to keep my wifi working?
>
> Yes. With these patches applied, your system should run just fine with
> swiotlb=dynamic. However, keep in mind that this implementation adds a
> bit of overhead. In short, it trades a bit of performance for not
> having to figure out the optimal swiotlb size at boot time.
>
> Petr T
On Wed, 26 Apr 2023 14:44:39 +0200
Petr Tesařík <[email protected]> wrote:
> Hi Greg,
>
> On Wed, 26 Apr 2023 14:26:36 +0200
> Greg Kroah-Hartman <[email protected]> wrote:
>
> > On Wed, Apr 26, 2023 at 02:15:20PM +0200, Petr Tesařík wrote:
> > > Hi,
> > >
> > > On Wed, 19 Apr 2023 12:03:52 +0200
> > > Petr Tesarik <[email protected]> wrote:
> > >
> > > > From: Petr Tesarik <[email protected]>
> > > >
> > > > The goal of my work is to provide more flexibility in the sizing of
> > > > SWIOTLB.
> > > >
> > > > The software IO TLB was designed with these assumptions:
> > > >
> > > > 1. It would not be used much, especially on 64-bit systems.
> > > > 2. A small fixed memory area (64 MiB by default) is sufficient to
> > > > handle the few cases which require a bounce buffer.
> > > > 3. 64 MiB is little enough that it has no impact on the rest of the
> > > > system.
> > > >
> > > > First, if SEV is active, all DMA must be done through shared
> > > > unencrypted pages, and SWIOTLB is used to make this happen without
> > > > changing device drivers. The software IO TLB size is increased to
> > > > 6% of total memory in sev_setup_arch(), but that is more of an
> > > > approximation. The actual requirements may vary depending on the
> > > > amount of I/O and which drivers are used. These factors may not be
> > > > know at boot time, i.e. when SWIOTLB is allocated.
> > > >
> > > > Second, other colleagues have noticed that they can reliably get
> > > > rid of occasional OOM kills on an Arm embedded device by reducing
> > > > the SWIOTLB size. This can be achieved with a kernel parameter, but
> > > > determining the right value puts additional burden on pre-release
> > > > testing, which could be avoided if SWIOTLB is allocated small and
> > > > grows only when necessary.
> > >
> > > Now that merging into 6.4 has begun, what about this patch series? I'm
> > > eager to get some feedback (positive or negative) and respin the next
> > > version.
> >
> > It's the merge window, we can't add new things that haven't been in
> > linux-next already.
>
> This is understood. I'm not asking for immediate inclusion.
>
> > Please resubmit it after -rc1 is out.
>
> If you can believe that rebasing to -rc1 will be enough, then I will
> also try to believe I'm lucky. ;-)
>
> The kind of feedback I really want to get is e.g. about the extra
> per-device DMA-specific fields. If they cannot be added to struct
> device, then I'd rather start discussing an interim solution, because
> getting all existing DMA fields out of that struct will take a lot of
> time...
All right, 6.4-rc1 is out now. The patch series still applies cleanly.
Any comments what must be changed (if anything) to get it in?
Thanks,
Petr T
On Tue, May 09, 2023 at 09:16:35AM +0200, Petr Tesařík wrote:
> On Wed, 26 Apr 2023 14:44:39 +0200
> Petr Tesařík <[email protected]> wrote:
>
> > Hi Greg,
> >
> > On Wed, 26 Apr 2023 14:26:36 +0200
> > Greg Kroah-Hartman <[email protected]> wrote:
> >
> > > On Wed, Apr 26, 2023 at 02:15:20PM +0200, Petr Tesařík wrote:
> > > > Hi,
> > > >
> > > > On Wed, 19 Apr 2023 12:03:52 +0200
> > > > Petr Tesarik <[email protected]> wrote:
> > > >
> > > > > From: Petr Tesarik <[email protected]>
> > > > >
> > > > > The goal of my work is to provide more flexibility in the sizing of
> > > > > SWIOTLB.
> > > > >
> > > > > The software IO TLB was designed with these assumptions:
> > > > >
> > > > > 1. It would not be used much, especially on 64-bit systems.
> > > > > 2. A small fixed memory area (64 MiB by default) is sufficient to
> > > > > handle the few cases which require a bounce buffer.
> > > > > 3. 64 MiB is little enough that it has no impact on the rest of the
> > > > > system.
> > > > >
> > > > > First, if SEV is active, all DMA must be done through shared
> > > > > unencrypted pages, and SWIOTLB is used to make this happen without
> > > > > changing device drivers. The software IO TLB size is increased to
> > > > > 6% of total memory in sev_setup_arch(), but that is more of an
> > > > > approximation. The actual requirements may vary depending on the
> > > > > amount of I/O and which drivers are used. These factors may not be
> > > > > know at boot time, i.e. when SWIOTLB is allocated.
> > > > >
> > > > > Second, other colleagues have noticed that they can reliably get
> > > > > rid of occasional OOM kills on an Arm embedded device by reducing
> > > > > the SWIOTLB size. This can be achieved with a kernel parameter, but
> > > > > determining the right value puts additional burden on pre-release
> > > > > testing, which could be avoided if SWIOTLB is allocated small and
> > > > > grows only when necessary.
> > > >
> > > > Now that merging into 6.4 has begun, what about this patch series? I'm
> > > > eager to get some feedback (positive or negative) and respin the next
> > > > version.
> > >
> > > It's the merge window, we can't add new things that haven't been in
> > > linux-next already.
> >
> > This is understood. I'm not asking for immediate inclusion.
> >
> > > Please resubmit it after -rc1 is out.
> >
> > If you can believe that rebasing to -rc1 will be enough, then I will
> > also try to believe I'm lucky. ;-)
> >
> > The kind of feedback I really want to get is e.g. about the extra
> > per-device DMA-specific fields. If they cannot be added to struct
> > device, then I'd rather start discussing an interim solution, because
> > getting all existing DMA fields out of that struct will take a lot of
> > time...
>
> All right, 6.4-rc1 is out now. The patch series still applies cleanly.
>
> Any comments what must be changed (if anything) to get it in?
Try resending it, it's long out of my review queue...
thanks,
greg k-h
Hi
I was hoping this might land for 6.5-rc1, is there a new version that
might apply against 6.5?
Cheers
Mike
On Tue, 9 May 2023 at 08:32, Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Tue, May 09, 2023 at 09:16:35AM +0200, Petr Tesařík wrote:
> > On Wed, 26 Apr 2023 14:44:39 +0200
> > Petr Tesařík <[email protected]> wrote:
> >
> > > Hi Greg,
> > >
> > > On Wed, 26 Apr 2023 14:26:36 +0200
> > > Greg Kroah-Hartman <[email protected]> wrote:
> > >
> > > > On Wed, Apr 26, 2023 at 02:15:20PM +0200, Petr Tesařík wrote:
> > > > > Hi,
> > > > >
> > > > > On Wed, 19 Apr 2023 12:03:52 +0200
> > > > > Petr Tesarik <[email protected]> wrote:
> > > > >
> > > > > > From: Petr Tesarik <[email protected]>
> > > > > >
> > > > > > The goal of my work is to provide more flexibility in the sizing of
> > > > > > SWIOTLB.
> > > > > >
> > > > > > The software IO TLB was designed with these assumptions:
> > > > > >
> > > > > > 1. It would not be used much, especially on 64-bit systems.
> > > > > > 2. A small fixed memory area (64 MiB by default) is sufficient to
> > > > > > handle the few cases which require a bounce buffer.
> > > > > > 3. 64 MiB is little enough that it has no impact on the rest of the
> > > > > > system.
> > > > > >
> > > > > > First, if SEV is active, all DMA must be done through shared
> > > > > > unencrypted pages, and SWIOTLB is used to make this happen without
> > > > > > changing device drivers. The software IO TLB size is increased to
> > > > > > 6% of total memory in sev_setup_arch(), but that is more of an
> > > > > > approximation. The actual requirements may vary depending on the
> > > > > > amount of I/O and which drivers are used. These factors may not be
> > > > > > know at boot time, i.e. when SWIOTLB is allocated.
> > > > > >
> > > > > > Second, other colleagues have noticed that they can reliably get
> > > > > > rid of occasional OOM kills on an Arm embedded device by reducing
> > > > > > the SWIOTLB size. This can be achieved with a kernel parameter, but
> > > > > > determining the right value puts additional burden on pre-release
> > > > > > testing, which could be avoided if SWIOTLB is allocated small and
> > > > > > grows only when necessary.
> > > > >
> > > > > Now that merging into 6.4 has begun, what about this patch series? I'm
> > > > > eager to get some feedback (positive or negative) and respin the next
> > > > > version.
> > > >
> > > > It's the merge window, we can't add new things that haven't been in
> > > > linux-next already.
> > >
> > > This is understood. I'm not asking for immediate inclusion.
> > >
> > > > Please resubmit it after -rc1 is out.
> > >
> > > If you can believe that rebasing to -rc1 will be enough, then I will
> > > also try to believe I'm lucky. ;-)
> > >
> > > The kind of feedback I really want to get is e.g. about the extra
> > > per-device DMA-specific fields. If they cannot be added to struct
> > > device, then I'd rather start discussing an interim solution, because
> > > getting all existing DMA fields out of that struct will take a lot of
> > > time...
> >
> > All right, 6.4-rc1 is out now. The patch series still applies cleanly.
> >
> > Any comments what must be changed (if anything) to get it in?
>
> Try resending it, it's long out of my review queue...
>
> thanks,
>
> greg k-h
On Mon, 10 Jul 2023 23:23:45 +0100
Mike Lothian <[email protected]> wrote:
> Hi
>
> I was hoping this might land for 6.5-rc1, is there a new version that
> might apply against 6.5?
Yes, there is a v3, which is a complete rewrite based on feedback from
various people on this mailing list:
https://lore.kernel.org/linux-iommu/[email protected]/T/
Petr T
> Cheers
>
> Mike
>
> On Tue, 9 May 2023 at 08:32, Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > On Tue, May 09, 2023 at 09:16:35AM +0200, Petr Tesařík wrote:
> > > On Wed, 26 Apr 2023 14:44:39 +0200
> > > Petr Tesařík <[email protected]> wrote:
> > >
> > > > Hi Greg,
> > > >
> > > > On Wed, 26 Apr 2023 14:26:36 +0200
> > > > Greg Kroah-Hartman <[email protected]> wrote:
> > > >
> > > > > On Wed, Apr 26, 2023 at 02:15:20PM +0200, Petr Tesařík wrote:
> > > > > > Hi,
> > > > > >
> > > > > > On Wed, 19 Apr 2023 12:03:52 +0200
> > > > > > Petr Tesarik <[email protected]> wrote:
> > > > > >
> > > > > > > From: Petr Tesarik <[email protected]>
> > > > > > >
> > > > > > > The goal of my work is to provide more flexibility in the sizing of
> > > > > > > SWIOTLB.
> > > > > > >
> > > > > > > The software IO TLB was designed with these assumptions:
> > > > > > >
> > > > > > > 1. It would not be used much, especially on 64-bit systems.
> > > > > > > 2. A small fixed memory area (64 MiB by default) is sufficient to
> > > > > > > handle the few cases which require a bounce buffer.
> > > > > > > 3. 64 MiB is little enough that it has no impact on the rest of the
> > > > > > > system.
> > > > > > >
> > > > > > > First, if SEV is active, all DMA must be done through shared
> > > > > > > unencrypted pages, and SWIOTLB is used to make this happen without
> > > > > > > changing device drivers. The software IO TLB size is increased to
> > > > > > > 6% of total memory in sev_setup_arch(), but that is more of an
> > > > > > > approximation. The actual requirements may vary depending on the
> > > > > > > amount of I/O and which drivers are used. These factors may not be
> > > > > > > know at boot time, i.e. when SWIOTLB is allocated.
> > > > > > >
> > > > > > > Second, other colleagues have noticed that they can reliably get
> > > > > > > rid of occasional OOM kills on an Arm embedded device by reducing
> > > > > > > the SWIOTLB size. This can be achieved with a kernel parameter, but
> > > > > > > determining the right value puts additional burden on pre-release
> > > > > > > testing, which could be avoided if SWIOTLB is allocated small and
> > > > > > > grows only when necessary.
> > > > > >
> > > > > > Now that merging into 6.4 has begun, what about this patch series? I'm
> > > > > > eager to get some feedback (positive or negative) and respin the next
> > > > > > version.
> > > > >
> > > > > It's the merge window, we can't add new things that haven't been in
> > > > > linux-next already.
> > > >
> > > > This is understood. I'm not asking for immediate inclusion.
> > > >
> > > > > Please resubmit it after -rc1 is out.
> > > >
> > > > If you can believe that rebasing to -rc1 will be enough, then I will
> > > > also try to believe I'm lucky. ;-)
> > > >
> > > > The kind of feedback I really want to get is e.g. about the extra
> > > > per-device DMA-specific fields. If they cannot be added to struct
> > > > device, then I'd rather start discussing an interim solution, because
> > > > getting all existing DMA fields out of that struct will take a lot of
> > > > time...
> > >
> > > All right, 6.4-rc1 is out now. The patch series still applies cleanly.
> > >
> > > Any comments what must be changed (if anything) to get it in?
> [...]
On Tue, 11 Jul 2023 at 10:03, Petr Tesařík <[email protected]> wrote:
>
> On Mon, 10 Jul 2023 23:23:45 +0100
> Mike Lothian <[email protected]> wrote:
>
> > Hi
> >
> > I was hoping this might land for 6.5-rc1, is there a new version that
> > might apply against 6.5?
>
> Yes, there is a v3, which is a complete rewrite based on feedback from
> various people on this mailing list:
>
> https://lore.kernel.org/linux-iommu/[email protected]/T/
>
> Petr T
>
Patch 2 doesn't apply cleanly for me on 6.5-rc1
On Tue, 11 Jul 2023 12:29:44 +0100
Mike Lothian <[email protected]> wrote:
> On Tue, 11 Jul 2023 at 10:03, Petr Tesařík <[email protected]> wrote:
> >
> > On Mon, 10 Jul 2023 23:23:45 +0100
> > Mike Lothian <[email protected]> wrote:
> >
> > > Hi
> > >
> > > I was hoping this might land for 6.5-rc1, is there a new version that
> > > might apply against 6.5?
> >
> > Yes, there is a v3, which is a complete rewrite based on feedback from
> > various people on this mailing list:
> >
> > https://lore.kernel.org/linux-iommu/[email protected]/T/
> >
> > Petr T
> >
>
>
> Patch 2 doesn't apply cleanly for me on 6.5-rc1
Ah, right. I'm going to rebase the series and include a few other
suggested changes.
I'm a bit worried that Christoph and all other maintainers (all taken
back into Cc) have stayed silent about the v3 series.
@Christoph: Are uncomfortable with something in the idea itself, or are
you just busy with other things?
Petr T
On Tue, 11 Jul 2023 at 14:21, Petr Tesařík <[email protected]> wrote:
>
> On Tue, 11 Jul 2023 12:29:44 +0100
> Mike Lothian <[email protected]> wrote:
>
> > On Tue, 11 Jul 2023 at 10:03, Petr Tesařík <[email protected]> wrote:
> > >
> > > On Mon, 10 Jul 2023 23:23:45 +0100
> > > Mike Lothian <[email protected]> wrote:
> > >
> > > > Hi
> > > >
> > > > I was hoping this might land for 6.5-rc1, is there a new version that
> > > > might apply against 6.5?
> > >
> > > Yes, there is a v3, which is a complete rewrite based on feedback from
> > > various people on this mailing list:
> > >
> > > https://lore.kernel.org/linux-iommu/[email protected]/T/
> > >
> > > Petr T
> > >
> >
> >
> > Patch 2 doesn't apply cleanly for me on 6.5-rc1
>
> Ah, right. I'm going to rebase the series and include a few other
> suggested changes.
>
> I'm a bit worried that Christoph and all other maintainers (all taken
> back into Cc) have stayed silent about the v3 series.
>
> @Christoph: Are uncomfortable with something in the idea itself, or are
> you just busy with other things?
>
> Petr T
I imagine once 6.4 starts appearing in distros there will be more bugs
appearing when people's wifi disconnects
If this is being delayed until 6.6 would it be better to increase the
default number, perhaps based on how much memory the system has and
maybe back port the fix?