2024-03-21 17:19:46

by Petr Tesarik

[permalink] [raw]
Subject: [PATCH v3 0/2] swiotlb: allocate padding slots if necessary

From: Petr Tesarik <[email protected]>

If the allocation alignment is bigger than IO_TLB_SIZE and min_align_mask
covers some bits in the original address between IO_TLB_SIZE and
alloc_align_mask, preserve these bits by allocating additional padding
slots before the actual swiotlb buffer.

Changes from v2
---------------
* Fix assignment of an uninitialized variable to pad_slots.
* Improve commit message wrt INVALID_PHYS_ADDR.

Changes from v1
---------------
* Rename padding to pad_slots.
* Set pad_slots only for the first allocated non-padding slot.
* Do not bother initializing orig_addr to INVALID_PHYS_ADDR.
* Change list and pad_slots to unsigned short to avoid growing
struct io_tlb_slot on 32-bit targets.
* Add build-time check that list and pad_slots can hold the maximum
allowed value of IO_TLB_SEGSIZE.

Petr Tesarik (2):
swiotlb: extend buffer pre-padding to alloc_align_mask if necessary
bug: introduce ASSERT_VAR_CAN_HOLD()

include/linux/build_bug.h | 10 ++++++++++
kernel/dma/swiotlb.c | 37 +++++++++++++++++++++++++++++++------
2 files changed, 41 insertions(+), 6 deletions(-)

--
2.34.1



2024-03-21 17:19:56

by Petr Tesarik

[permalink] [raw]
Subject: [PATCH v3 1/2] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary

From: Petr Tesarik <[email protected]>

Allow a buffer pre-padding of up to alloc_align_mask. If the allocation
alignment is bigger than IO_TLB_SIZE and min_align_mask covers any non-zero
bits in the original address between IO_TLB_SIZE and alloc_align_mask,
these bits are not preserved in the swiotlb buffer address.

To fix this case, increase the allocation size and use a larger offset
within the allocated buffer. As a result, extra padding slots may be
allocated before the mapping start address.

Leave orig_addr in these padding slots initialized to INVALID_PHYS_ADDR.
These slots do not correspond to any CPU buffer, so attempts to sync the
data should be ignored.

The padding slots should be automatically released when the buffer is
unmapped. However, swiotlb_tbl_unmap_single() takes only the address of the
DMA buffer slot, not the first padding slot. Save the number of padding
slots in struct io_tlb_slot and use it to adjust the slot index in
swiotlb_release_slots(), so all allocated slots are properly freed.

Fixes: 2fd4fa5d3fb5 ("swiotlb: Fix alignment checks when both allocation and DMA masks are present")
Link: https://lore.kernel.org/linux-iommu/[email protected]/
Signed-off-by: Petr Tesarik <[email protected]>
---
kernel/dma/swiotlb.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 86fe172b5958..3779a48eec9b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -69,11 +69,14 @@
* @alloc_size: Size of the allocated buffer.
* @list: The free list describing the number of free entries available
* from each index.
+ * @pad_slots: Number of preceding padding slots. Valid only in the first
+ * allocated non-padding slot.
*/
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
- unsigned int list;
+ unsigned short list;
+ unsigned short pad_slots;
};

static bool swiotlb_force_bounce;
@@ -287,6 +290,7 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
mem->nslabs - i);
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
mem->slots[i].alloc_size = 0;
+ mem->slots[i].pad_slots = 0;
}

memset(vaddr, 0, bytes);
@@ -1328,11 +1332,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
unsigned long attrs)
{
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
- unsigned int offset = swiotlb_align_offset(dev, orig_addr);
+ unsigned int offset;
struct io_tlb_pool *pool;
unsigned int i;
int index;
phys_addr_t tlb_addr;
+ unsigned short pad_slots;

if (!mem || !mem->nslabs) {
dev_warn_ratelimited(dev,
@@ -1349,6 +1354,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
return (phys_addr_t)DMA_MAPPING_ERROR;
}

+ /*
+ * Calculate buffer pre-padding within the allocated space. Use it to
+ * preserve the low bits of the original address according to device's
+ * min_align_mask. Limit the padding to alloc_align_mask or slot size
+ * (whichever is bigger); higher bits of the original address are
+ * preserved by selecting a suitable IO TLB slot.
+ */
+ offset = orig_addr & dma_get_min_align_mask(dev) &
+ (alloc_align_mask | (IO_TLB_SIZE - 1));
index = swiotlb_find_slots(dev, orig_addr,
alloc_size + offset, alloc_align_mask, &pool);
if (index == -1) {
@@ -1364,6 +1378,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
* This is needed when we sync the memory. Then we sync the buffer if
* needed.
*/
+ pad_slots = offset / IO_TLB_SIZE;
+ offset %= IO_TLB_SIZE;
+ index += pad_slots;
+ pool->slots[index].pad_slots = pad_slots;
for (i = 0; i < nr_slots(alloc_size + offset); i++)
pool->slots[index + i].orig_addr = slot_addr(orig_addr, i);
tlb_addr = slot_addr(pool->start, index) + offset;
@@ -1385,12 +1403,16 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
unsigned long flags;
unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
- int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
- int nslots = nr_slots(mem->slots[index].alloc_size + offset);
- int aindex = index / mem->area_nslabs;
- struct io_tlb_area *area = &mem->areas[aindex];
+ int index, nslots, aindex;
+ struct io_tlb_area *area;
int count, i;

+ index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
+ index -= mem->slots[index].pad_slots;
+ nslots = nr_slots(mem->slots[index].alloc_size + offset);
+ aindex = index / mem->area_nslabs;
+ area = &mem->areas[aindex];
+
/*
* Return the buffer to the free list by setting the corresponding
* entries to indicate the number of contiguous entries available.
@@ -1413,6 +1435,7 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
mem->slots[i].list = ++count;
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
mem->slots[i].alloc_size = 0;
+ mem->slots[i].pad_slots = 0;
}

/*
--
2.34.1


2024-03-21 17:21:01

by Petr Tesarik

[permalink] [raw]
Subject: [PATCH v3 2/2] bug: introduce ASSERT_VAR_CAN_HOLD()

From: Petr Tesarik <[email protected]>

Introduce an ASSERT_VAR_CAN_HOLD() macro to check at build time that a
variable can hold the given value.

Use this macro in swiotlb to make sure that the list and pad_slots fields
of struct io_tlb_slot are big enough to hold the maximum possible value of
IO_TLB_SEGSIZE.

Signed-off-by: Petr Tesarik <[email protected]>
---
include/linux/build_bug.h | 10 ++++++++++
kernel/dma/swiotlb.c | 2 ++
2 files changed, 12 insertions(+)

diff --git a/include/linux/build_bug.h b/include/linux/build_bug.h
index 3aa3640f8c18..6e2486508af0 100644
--- a/include/linux/build_bug.h
+++ b/include/linux/build_bug.h
@@ -86,4 +86,14 @@
"Offset of " #field " in " #type " has changed.")


+/*
+ * Compile time check that a variable can hold the given value
+ */
+#define ASSERT_VAR_CAN_HOLD(var, value) ({ \
+ typeof(value) __val = (value); \
+ typeof(var) __tmp = __val; \
+ BUILD_BUG_ON_MSG(__tmp != __val, \
+ #var " cannot hold " #value "."); \
+})
+
#endif /* _LINUX_BUILD_BUG_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 3779a48eec9b..8256fcdc0cf6 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -285,6 +285,8 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
mem->areas[i].used = 0;
}

+ ASSERT_VAR_CAN_HOLD(mem->slots[0].list, IO_TLB_SEGSIZE);
+ ASSERT_VAR_CAN_HOLD(mem->slots[0].pad_slots, IO_TLB_SEGSIZE);
for (i = 0; i < mem->nslabs; i++) {
mem->slots[i].list = min(IO_TLB_SEGSIZE - io_tlb_offset(i),
mem->nslabs - i);
--
2.34.1


2024-03-22 04:29:31

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH v3 1/2] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary

From: Petr Tesarik <[email protected]> Sent: Thursday, March 21, 2024 10:19 AM
>
> Allow a buffer pre-padding of up to alloc_align_mask. If the allocation
> alignment is bigger than IO_TLB_SIZE and min_align_mask covers any non-zero
> bits in the original address between IO_TLB_SIZE and alloc_align_mask,
> these bits are not preserved in the swiotlb buffer address.
>
> To fix this case, increase the allocation size and use a larger offset
> within the allocated buffer. As a result, extra padding slots may be
> allocated before the mapping start address.
>
> Leave orig_addr in these padding slots initialized to INVALID_PHYS_ADDR.
> These slots do not correspond to any CPU buffer, so attempts to sync the
> data should be ignored.
>
> The padding slots should be automatically released when the buffer is
> unmapped. However, swiotlb_tbl_unmap_single() takes only the address of the
> DMA buffer slot, not the first padding slot. Save the number of padding
> slots in struct io_tlb_slot and use it to adjust the slot index in
> swiotlb_release_slots(), so all allocated slots are properly freed.
>
> Fixes: 2fd4fa5d3fb5 ("swiotlb: Fix alignment checks when both allocation and DMA masks are present")
> Link: https://lore.kernel.org/linux-iommu/[email protected]/
> Signed-off-by: Petr Tesarik <[email protected]>
> ---
> kernel/dma/swiotlb.c | 35 +++++++++++++++++++++++++++++------
> 1 file changed, 29 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 86fe172b5958..3779a48eec9b 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -69,11 +69,14 @@
> * @alloc_size: Size of the allocated buffer.
> * @list: The free list describing the number of free entries available
> * from each index.
> + * @pad_slots: Number of preceding padding slots. Valid only in the first
> + * allocated non-padding slot.
> */
> struct io_tlb_slot {
> phys_addr_t orig_addr;
> size_t alloc_size;
> - unsigned int list;
> + unsigned short list;
> + unsigned short pad_slots;
> };
>
> static bool swiotlb_force_bounce;
> @@ -287,6 +290,7 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
> mem->nslabs - i);
> mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> mem->slots[i].alloc_size = 0;
> + mem->slots[i].pad_slots = 0;
> }
>
> memset(vaddr, 0, bytes);
> @@ -1328,11 +1332,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> unsigned long attrs)
> {
> struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> - unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> + unsigned int offset;
> struct io_tlb_pool *pool;
> unsigned int i;
> int index;
> phys_addr_t tlb_addr;
> + unsigned short pad_slots;
>
> if (!mem || !mem->nslabs) {
> dev_warn_ratelimited(dev,
> @@ -1349,6 +1354,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> return (phys_addr_t)DMA_MAPPING_ERROR;
> }
>
> + /*
> + * Calculate buffer pre-padding within the allocated space. Use it to
> + * preserve the low bits of the original address according to device's
> + * min_align_mask. Limit the padding to alloc_align_mask or slot size
> + * (whichever is bigger); higher bits of the original address are
> + * preserved by selecting a suitable IO TLB slot.
> + */
> + offset = orig_addr & dma_get_min_align_mask(dev) &
> + (alloc_align_mask | (IO_TLB_SIZE - 1));
> index = swiotlb_find_slots(dev, orig_addr,
> alloc_size + offset, alloc_align_mask, &pool);
> if (index == -1) {
> @@ -1364,6 +1378,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> * This is needed when we sync the memory. Then we sync the buffer if
> * needed.
> */
> + pad_slots = offset / IO_TLB_SIZE;
> + offset %= IO_TLB_SIZE;
> + index += pad_slots;
> + pool->slots[index].pad_slots = pad_slots;
> for (i = 0; i < nr_slots(alloc_size + offset); i++)
> pool->slots[index + i].orig_addr = slot_addr(orig_addr, i);
> tlb_addr = slot_addr(pool->start, index) + offset;
> @@ -1385,12 +1403,16 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
> struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
> unsigned long flags;
> unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
> - int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> - int nslots = nr_slots(mem->slots[index].alloc_size + offset);
> - int aindex = index / mem->area_nslabs;
> - struct io_tlb_area *area = &mem->areas[aindex];
> + int index, nslots, aindex;
> + struct io_tlb_area *area;
> int count, i;
>
> + index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> + index -= mem->slots[index].pad_slots;
> + nslots = nr_slots(mem->slots[index].alloc_size + offset);
> + aindex = index / mem->area_nslabs;
> + area = &mem->areas[aindex];
> +
> /*
> * Return the buffer to the free list by setting the corresponding
> * entries to indicate the number of contiguous entries available.
> @@ -1413,6 +1435,7 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
> mem->slots[i].list = ++count;
> mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> mem->slots[i].alloc_size = 0;
> + mem->slots[i].pad_slots = 0;
> }
>
> /*
> --
> 2.34.1

I've tested this patch in conjunction with Will's series of 6 patches, and
all looks good. I tested on x86/x64 w/4K page size and on arm64
w/64K page size and a variety of min_align_mask values, alloc_align_mask
values, mapping size values, and orig_addr low order bits. The tests are
doing disk I/O through the bounce buffers, and they verify that the data
written can be read back correctly. So the bouncing is working correctly
with the slots that are being set up.

I'm not able to test with min_align_mask less than 4K, because my
synthetic disk driver doesn't work if that alignment isn't maintained.
But min_align_mask values of 8K, 16K, and 64K work correctly. I've
seen up to 5 padding slots be allocated.

I have not tried any 32-bit builds.

Overall, it looks solid to me.

Tested-by: Michael Kelley <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>

2024-03-22 04:33:58

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH v3 2/2] bug: introduce ASSERT_VAR_CAN_HOLD()

From: Petr Tesarik <[email protected]> Sent: Thursday, March 21, 2024 10:19 AM
>
> Introduce an ASSERT_VAR_CAN_HOLD() macro to check at build time that a
> variable can hold the given value.
>
> Use this macro in swiotlb to make sure that the list and pad_slots fields
> of struct io_tlb_slot are big enough to hold the maximum possible value of
> IO_TLB_SEGSIZE.
>
> Signed-off-by: Petr Tesarik <[email protected]>
> ---
> include/linux/build_bug.h | 10 ++++++++++
> kernel/dma/swiotlb.c | 2 ++
> 2 files changed, 12 insertions(+)
>
> diff --git a/include/linux/build_bug.h b/include/linux/build_bug.h
> index 3aa3640f8c18..6e2486508af0 100644
> --- a/include/linux/build_bug.h
> +++ b/include/linux/build_bug.h
> @@ -86,4 +86,14 @@
> "Offset of " #field " in " #type " has changed.")
>
>
> +/*
> + * Compile time check that a variable can hold the given value
> + */
> +#define ASSERT_VAR_CAN_HOLD(var, value) ({ \
> + typeof(value) __val = (value); \
> + typeof(var) __tmp = __val; \
> + BUILD_BUG_ON_MSG(__tmp != __val, \
> + #var " cannot hold " #value "."); \
> +})
> +
> #endif /* _LINUX_BUILD_BUG_H */
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 3779a48eec9b..8256fcdc0cf6 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -285,6 +285,8 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
> mem->areas[i].used = 0;
> }
>
> + ASSERT_VAR_CAN_HOLD(mem->slots[0].list, IO_TLB_SEGSIZE);
> + ASSERT_VAR_CAN_HOLD(mem->slots[0].pad_slots, IO_TLB_SEGSIZE);
> for (i = 0; i < mem->nslabs; i++) {
> mem->slots[i].list = min(IO_TLB_SEGSIZE - io_tlb_offset(i),
> mem->nslabs - i);
> --
> 2.34.1

This was tested implicitly as part of my broader testing of
Patch 1 of the series. From a code review standpoint, I
don't feel very competent to recognize potential problems
in ASSERT_VAR_CAN_HOLD(), so I haven't given a
"Reviewed-by:" for this patch.

Tested-by: Michael Kelley <[email protected]>

2024-03-22 10:29:53

by Petr Tesarik

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary

On 3/22/2024 5:29 AM, Michael Kelley wrote:
> From: Petr Tesarik <[email protected]> Sent: Thursday, March 21, 2024 10:19 AM
>>
>> Allow a buffer pre-padding of up to alloc_align_mask. If the allocation
>> alignment is bigger than IO_TLB_SIZE and min_align_mask covers any non-zero
>> bits in the original address between IO_TLB_SIZE and alloc_align_mask,
>> these bits are not preserved in the swiotlb buffer address.
>>
>> To fix this case, increase the allocation size and use a larger offset
>> within the allocated buffer. As a result, extra padding slots may be
>> allocated before the mapping start address.
>>
>> Leave orig_addr in these padding slots initialized to INVALID_PHYS_ADDR.
>> These slots do not correspond to any CPU buffer, so attempts to sync the
>> data should be ignored.
>>
>> The padding slots should be automatically released when the buffer is
>> unmapped. However, swiotlb_tbl_unmap_single() takes only the address of the
>> DMA buffer slot, not the first padding slot. Save the number of padding
>> slots in struct io_tlb_slot and use it to adjust the slot index in
>> swiotlb_release_slots(), so all allocated slots are properly freed.
>>
>> Fixes: 2fd4fa5d3fb5 ("swiotlb: Fix alignment checks when both allocation and DMA masks are present")
>> Link: https://lore.kernel.org/linux-iommu/[email protected]/
>> Signed-off-by: Petr Tesarik <[email protected]>
>> ---
>> kernel/dma/swiotlb.c | 35 +++++++++++++++++++++++++++++------
>> 1 file changed, 29 insertions(+), 6 deletions(-)
>>
>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>> index 86fe172b5958..3779a48eec9b 100644
>> --- a/kernel/dma/swiotlb.c
>> +++ b/kernel/dma/swiotlb.c
>> @@ -69,11 +69,14 @@
>> * @alloc_size: Size of the allocated buffer.
>> * @list: The free list describing the number of free entries available
>> * from each index.
>> + * @pad_slots: Number of preceding padding slots. Valid only in the first
>> + * allocated non-padding slot.
>> */
>> struct io_tlb_slot {
>> phys_addr_t orig_addr;
>> size_t alloc_size;
>> - unsigned int list;
>> + unsigned short list;
>> + unsigned short pad_slots;
>> };
>>
>> static bool swiotlb_force_bounce;
>> @@ -287,6 +290,7 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
>> mem->nslabs - i);
>> mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>> mem->slots[i].alloc_size = 0;
>> + mem->slots[i].pad_slots = 0;
>> }
>>
>> memset(vaddr, 0, bytes);
>> @@ -1328,11 +1332,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
>> unsigned long attrs)
>> {
>> struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>> - unsigned int offset = swiotlb_align_offset(dev, orig_addr);
>> + unsigned int offset;
>> struct io_tlb_pool *pool;
>> unsigned int i;
>> int index;
>> phys_addr_t tlb_addr;
>> + unsigned short pad_slots;
>>
>> if (!mem || !mem->nslabs) {
>> dev_warn_ratelimited(dev,
>> @@ -1349,6 +1354,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
>> return (phys_addr_t)DMA_MAPPING_ERROR;
>> }
>>
>> + /*
>> + * Calculate buffer pre-padding within the allocated space. Use it to
>> + * preserve the low bits of the original address according to device's
>> + * min_align_mask. Limit the padding to alloc_align_mask or slot size
>> + * (whichever is bigger); higher bits of the original address are
>> + * preserved by selecting a suitable IO TLB slot.
>> + */
>> + offset = orig_addr & dma_get_min_align_mask(dev) &
>> + (alloc_align_mask | (IO_TLB_SIZE - 1));
>> index = swiotlb_find_slots(dev, orig_addr,
>> alloc_size + offset, alloc_align_mask, &pool);
>> if (index == -1) {
>> @@ -1364,6 +1378,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
>> * This is needed when we sync the memory. Then we sync the buffer if
>> * needed.
>> */
>> + pad_slots = offset / IO_TLB_SIZE;
>> + offset %= IO_TLB_SIZE;
>> + index += pad_slots;
>> + pool->slots[index].pad_slots = pad_slots;
>> for (i = 0; i < nr_slots(alloc_size + offset); i++)
>> pool->slots[index + i].orig_addr = slot_addr(orig_addr, i);
>> tlb_addr = slot_addr(pool->start, index) + offset;
>> @@ -1385,12 +1403,16 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
>> struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
>> unsigned long flags;
>> unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
>> - int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
>> - int nslots = nr_slots(mem->slots[index].alloc_size + offset);
>> - int aindex = index / mem->area_nslabs;
>> - struct io_tlb_area *area = &mem->areas[aindex];
>> + int index, nslots, aindex;
>> + struct io_tlb_area *area;
>> int count, i;
>>
>> + index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
>> + index -= mem->slots[index].pad_slots;
>> + nslots = nr_slots(mem->slots[index].alloc_size + offset);
>> + aindex = index / mem->area_nslabs;
>> + area = &mem->areas[aindex];
>> +
>> /*
>> * Return the buffer to the free list by setting the corresponding
>> * entries to indicate the number of contiguous entries available.
>> @@ -1413,6 +1435,7 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
>> mem->slots[i].list = ++count;
>> mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>> mem->slots[i].alloc_size = 0;
>> + mem->slots[i].pad_slots = 0;
>> }
>>
>> /*
>> --
>> 2.34.1
>
> I've tested this patch in conjunction with Will's series of 6 patches, and
> all looks good. I tested on x86/x64 w/4K page size and on arm64
> w/64K page size and a variety of min_align_mask values, alloc_align_mask
> values, mapping size values, and orig_addr low order bits. The tests are
> doing disk I/O through the bounce buffers, and they verify that the data
> written can be read back correctly. So the bouncing is working correctly
> with the slots that are being set up.
>
> I'm not able to test with min_align_mask less than 4K, because my
> synthetic disk driver doesn't work if that alignment isn't maintained.
> But min_align_mask values of 8K, 16K, and 64K work correctly. I've
> seen up to 5 padding slots be allocated.

Thank you for this extensive testing. Just wow!

> I have not tried any 32-bit builds.

I have at least tried my KUnit test with --arch=arm, and that passes all
tests. But I agree it doesn't prove much.

Petr T

> Overall, it looks solid to me.
>
> Tested-by: Michael Kelley <[email protected]>
> Reviewed-by: Michael Kelley <[email protected]>


2024-03-22 15:09:54

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] swiotlb: allocate padding slots if necessary

Hi Petr,

On Thu, Mar 21, 2024 at 06:19:00PM +0100, Petr Tesarik wrote:
> From: Petr Tesarik <[email protected]>
>
> If the allocation alignment is bigger than IO_TLB_SIZE and min_align_mask
> covers some bits in the original address between IO_TLB_SIZE and
> alloc_align_mask, preserve these bits by allocating additional padding
> slots before the actual swiotlb buffer.

Thanks for fixing this! I was out at a conference last week, so I didn't
get very far with it myself, but I ended up in a pickle trying to avoid
extending 'struct io_tlb_slot'. Your solution is much better than the
crazy avenue I started going down...

With your changes, can we now simplify swiotlb_align_offset() to ignore
dma_get_min_align_mask() altogether and just:

return addr & (IO_TLB_SIZE - 1);

?

Will

2024-03-22 15:38:22

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] bug: introduce ASSERT_VAR_CAN_HOLD()

On Thu, Mar 21, 2024 at 06:19:02PM +0100, Petr Tesarik wrote:
> From: Petr Tesarik <[email protected]>
>
> Introduce an ASSERT_VAR_CAN_HOLD() macro to check at build time that a
> variable can hold the given value.
>
> Use this macro in swiotlb to make sure that the list and pad_slots fields
> of struct io_tlb_slot are big enough to hold the maximum possible value of
> IO_TLB_SEGSIZE.
>
> Signed-off-by: Petr Tesarik <[email protected]>
> ---
> include/linux/build_bug.h | 10 ++++++++++
> kernel/dma/swiotlb.c | 2 ++
> 2 files changed, 12 insertions(+)
>
> diff --git a/include/linux/build_bug.h b/include/linux/build_bug.h
> index 3aa3640f8c18..6e2486508af0 100644
> --- a/include/linux/build_bug.h
> +++ b/include/linux/build_bug.h
> @@ -86,4 +86,14 @@
> "Offset of " #field " in " #type " has changed.")
>
>
> +/*
> + * Compile time check that a variable can hold the given value
> + */
> +#define ASSERT_VAR_CAN_HOLD(var, value) ({ \
> + typeof(value) __val = (value); \
> + typeof(var) __tmp = __val; \
> + BUILD_BUG_ON_MSG(__tmp != __val, \
> + #var " cannot hold " #value "."); \
> +})

nit, but I think this prevents putting negative values into unsigned
types. Not sure whether we care? Arguably it's even correct to complain.

e.g.

u16 s;
ASSERT_VAR_CAN_HOLD(s, -1);

explodes for me.

Will

2024-03-22 17:33:35

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] bug: introduce ASSERT_VAR_CAN_HOLD()

On Fri, 22 Mar 2024 15:37:38 +0000
Will Deacon <[email protected]> wrote:

> On Thu, Mar 21, 2024 at 06:19:02PM +0100, Petr Tesarik wrote:
> > From: Petr Tesarik <[email protected]>
> >
> > Introduce an ASSERT_VAR_CAN_HOLD() macro to check at build time that a
> > variable can hold the given value.
> >
> > Use this macro in swiotlb to make sure that the list and pad_slots fields
> > of struct io_tlb_slot are big enough to hold the maximum possible value of
> > IO_TLB_SEGSIZE.
> >
> > Signed-off-by: Petr Tesarik <[email protected]>
> > ---
> > include/linux/build_bug.h | 10 ++++++++++
> > kernel/dma/swiotlb.c | 2 ++
> > 2 files changed, 12 insertions(+)
> >
> > diff --git a/include/linux/build_bug.h b/include/linux/build_bug.h
> > index 3aa3640f8c18..6e2486508af0 100644
> > --- a/include/linux/build_bug.h
> > +++ b/include/linux/build_bug.h
> > @@ -86,4 +86,14 @@
> > "Offset of " #field " in " #type " has changed.")
> >
> >
> > +/*
> > + * Compile time check that a variable can hold the given value
> > + */
> > +#define ASSERT_VAR_CAN_HOLD(var, value) ({ \
> > + typeof(value) __val = (value); \
> > + typeof(var) __tmp = __val; \
> > + BUILD_BUG_ON_MSG(__tmp != __val, \
> > + #var " cannot hold " #value "."); \
> > +})
>
> nit, but I think this prevents putting negative values into unsigned
> types. Not sure whether we care? Arguably it's even correct to complain.
>
> e.g.
>
> u16 s;
> ASSERT_VAR_CAN_HOLD(s, -1);
>
> explodes for me.

Then it works as intended. I specifically aimed at making a macro that
checks at build time whether a given constant is within the value range
of a variable, so it explodes for a signed overflow (in either
direction).

To check the size of a variable, I could have gone with something
simpler like BUG_ON(sizeof(var) < sizeof(val)).

Petr T

2024-03-22 17:52:06

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] swiotlb: allocate padding slots if necessary

On Fri, 22 Mar 2024 15:09:41 +0000
Will Deacon <[email protected]> wrote:

> Hi Petr,
>
> On Thu, Mar 21, 2024 at 06:19:00PM +0100, Petr Tesarik wrote:
> > From: Petr Tesarik <[email protected]>
> >
> > If the allocation alignment is bigger than IO_TLB_SIZE and min_align_mask
> > covers some bits in the original address between IO_TLB_SIZE and
> > alloc_align_mask, preserve these bits by allocating additional padding
> > slots before the actual swiotlb buffer.
>
> Thanks for fixing this! I was out at a conference last week, so I didn't
> get very far with it myself, but I ended up in a pickle trying to avoid
> extending 'struct io_tlb_slot'. Your solution is much better than the
> crazy avenue I started going down...
>
> With your changes, can we now simplify swiotlb_align_offset() to ignore
> dma_get_min_align_mask() altogether and just:
>
> return addr & (IO_TLB_SIZE - 1);

I have also thought about this but I don't think it's right. If we
removed dma_get_min_align_mask() from swiotlb_align_offset(), we would
always ask to preserve the lowest IO_TLB_SHIFT bits. This may cause
less efficient use of the SWIOTLB.

For example, if a device does not specify any min_align_mask, it is
presumably happy with any buffer alignment, so SWIOTLB may allocate at
the beginning of a slot, like here:

orig_addr | ++|++ |
tlb_addr |++++ | |

Without dma_get_min_align_mask() in swiotlb_align_offset(), it would
have to allocate two mostly-empty slots:

tlb_addr | ++|++ |

where:
| mark a multiple of IO_TLB_SIZE (in physical address space)
+ used memory
free memory

Petr T

2024-03-22 18:12:04

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH v3 0/2] swiotlb: allocate padding slots if necessary

From: Will Deacon <[email protected]> Sent: Friday, March 22, 2024 8:10 AM
>
> Hi Petr,
>
> On Thu, Mar 21, 2024 at 06:19:00PM +0100, Petr Tesarik wrote:
> > From: Petr Tesarik <[email protected]>
> >
> > If the allocation alignment is bigger than IO_TLB_SIZE and min_align_mask
> > covers some bits in the original address between IO_TLB_SIZE and
> > alloc_align_mask, preserve these bits by allocating additional padding
> > slots before the actual swiotlb buffer.
>
> Thanks for fixing this! I was out at a conference last week, so I didn't
> get very far with it myself, but I ended up in a pickle trying to avoid
> extending 'struct io_tlb_slot'. Your solution is much better than the
> crazy avenue I started going down...
>
> With your changes, can we now simplify swiotlb_align_offset() to ignore
> dma_get_min_align_mask() altogether and just:
>
> return addr & (IO_TLB_SIZE - 1);
>
> ?
>

I don't think such a change is correct, since we want to allow the
DMA min_align_mask to work if it is set to 0x3FF or 0x1FF or
something else smaller than IO_TLB_SIZE - 1.

Petr's new offset calculation in swiotlb_tbl_map_single() is this:

offset = orig_addr & dma_get_min_align_mask(dev) &
(alloc_align_mask | (IO_TLB_SIZE - 1));

In the normal stream mapping case, where alloc_align_mask is
zero, Petr's new statement is equivalent to swiotlb_align_offset().
And I think it needs to continue to be equivalent so that
swiotlb_search_pool_area(), swiotlb_bounce() and
swiotlb_release_slots() calculate the same offset as
swiotlb_tbl_map_single() uses after it separately processes
the padding slots.

Perhaps a better approach to maintaining the equivalence is
to modify swiotlb_align_offset() to be Petr's new calculation,
with alloc_align_mask passed as an argument.
swiotlb_search_pool_area(), swiotlb_bounce(), and
swiotlb_release_slots() would all pass 0 as the alloc_align_mask
argument.

Michael

2024-03-22 19:31:42

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] swiotlb: allocate padding slots if necessary

On Fri, 22 Mar 2024 18:11:50 +0000
Michael Kelley <[email protected]> wrote:

> From: Will Deacon <[email protected]> Sent: Friday, March 22, 2024 8:10 AM
> >
> > Hi Petr,
> >
> > On Thu, Mar 21, 2024 at 06:19:00PM +0100, Petr Tesarik wrote:
> > > From: Petr Tesarik <[email protected]>
> > >
> > > If the allocation alignment is bigger than IO_TLB_SIZE and min_align_mask
> > > covers some bits in the original address between IO_TLB_SIZE and
> > > alloc_align_mask, preserve these bits by allocating additional padding
> > > slots before the actual swiotlb buffer.
> >
> > Thanks for fixing this! I was out at a conference last week, so I didn't
> > get very far with it myself, but I ended up in a pickle trying to avoid
> > extending 'struct io_tlb_slot'. Your solution is much better than the
> > crazy avenue I started going down...
> >
> > With your changes, can we now simplify swiotlb_align_offset() to ignore
> > dma_get_min_align_mask() altogether and just:
> >
> > return addr & (IO_TLB_SIZE - 1);
> >
> > ?
> >
>
> I don't think such a change is correct, since we want to allow the
> DMA min_align_mask to work if it is set to 0x3FF or 0x1FF or
> something else smaller than IO_TLB_SIZE - 1.
>
> Petr's new offset calculation in swiotlb_tbl_map_single() is this:
>
> offset = orig_addr & dma_get_min_align_mask(dev) &
> (alloc_align_mask | (IO_TLB_SIZE - 1));
>
> In the normal stream mapping case, where alloc_align_mask is
> zero, Petr's new statement is equivalent to swiotlb_align_offset().
> And I think it needs to continue to be equivalent so that
> swiotlb_search_pool_area(), swiotlb_bounce() and
> swiotlb_release_slots() calculate the same offset as
> swiotlb_tbl_map_single() uses after it separately processes
> the padding slots.
>
> Perhaps a better approach to maintaining the equivalence is
> to modify swiotlb_align_offset() to be Petr's new calculation,
> with alloc_align_mask passed as an argument.
> swiotlb_search_pool_area(), swiotlb_bounce(), and
> swiotlb_release_slots() would all pass 0 as the alloc_align_mask
> argument.

I like this idea.

Petr T

2024-03-27 19:42:13

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] swiotlb: allocate padding slots if necessary

On Fri, Mar 22, 2024 at 06:51:38PM +0100, Petr Tesařík wrote:
> On Fri, 22 Mar 2024 15:09:41 +0000
> Will Deacon <[email protected]> wrote:
>
> > Hi Petr,
> >
> > On Thu, Mar 21, 2024 at 06:19:00PM +0100, Petr Tesarik wrote:
> > > From: Petr Tesarik <[email protected]>
> > >
> > > If the allocation alignment is bigger than IO_TLB_SIZE and min_align_mask
> > > covers some bits in the original address between IO_TLB_SIZE and
> > > alloc_align_mask, preserve these bits by allocating additional padding
> > > slots before the actual swiotlb buffer.
> >
> > Thanks for fixing this! I was out at a conference last week, so I didn't
> > get very far with it myself, but I ended up in a pickle trying to avoid
> > extending 'struct io_tlb_slot'. Your solution is much better than the
> > crazy avenue I started going down...
> >
> > With your changes, can we now simplify swiotlb_align_offset() to ignore
> > dma_get_min_align_mask() altogether and just:
> >
> > return addr & (IO_TLB_SIZE - 1);
>
> I have also thought about this but I don't think it's right. If we
> removed dma_get_min_align_mask() from swiotlb_align_offset(), we would
> always ask to preserve the lowest IO_TLB_SHIFT bits. This may cause
> less efficient use of the SWIOTLB.
>
> For example, if a device does not specify any min_align_mask, it is
> presumably happy with any buffer alignment, so SWIOTLB may allocate at
> the beginning of a slot, like here:
>
> orig_addr | ++|++ |
> tlb_addr |++++ | |
>
> Without dma_get_min_align_mask() in swiotlb_align_offset(), it would
> have to allocate two mostly-empty slots:
>
> tlb_addr | ++|++ |
>
> where:
> | mark a multiple of IO_TLB_SIZE (in physical address space)
> + used memory
> free memory

Thanks for the patient explanation. I'd got so caught up with the DMA
alignment mask that I forgot the usual case where it's not specified at
all!

Will