2024-03-12 13:42:34

by Petr Tesarik

[permalink] [raw]
Subject: [PATCH 1/1] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary

From: Petr Tesarik <[email protected]>

Allow a buffer pre-padding of up to alloc_align_mask. If the allocation
alignment is bigger than IO_TLB_SIZE and min_align_mask covers any non-zero
bits in the original address between IO_TLB_SIZE and alloc_align_mask,
these bits are not preserved in the swiotlb buffer address.

To fix this case, increase the allocation size and use a larger offset
within the allocated buffer. As a result, extra padding slots may be
allocated before the mapping start address.

Set the orig_addr in these padding slots to INVALID_PHYS_ADDR, because they
do not correspond to any CPU buffer and the data must never be synced.

The padding slots should be automatically released when the buffer is
unmapped. However, swiotlb_tbl_unmap_single() takes only the address of the
DMA buffer slot, not the first padding slot. Save the number of padding
slots in struct io_tlb_slot and use it to adjust the slot index in
swiotlb_release_slots(), so all allocated slots are properly freed.

Fixes: 2fd4fa5d3fb5 ("swiotlb: Fix alignment checks when both allocation and DMA masks are present")
Link: https://lore.kernel.org/linux-iommu/[email protected]/
Signed-off-by: Petr Tesarik <[email protected]>
---
kernel/dma/swiotlb.c | 34 +++++++++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 86fe172b5958..8ce11abc691f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -69,11 +69,13 @@
* @alloc_size: Size of the allocated buffer.
* @list: The free list describing the number of free entries available
* from each index.
+ * @padding: Number of preceding padding slots.
*/
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
unsigned int list;
+ unsigned int padding;
};

static bool swiotlb_force_bounce;
@@ -287,6 +289,7 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
mem->nslabs - i);
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
mem->slots[i].alloc_size = 0;
+ mem->slots[i].padding = 0;
}

memset(vaddr, 0, bytes);
@@ -1328,11 +1331,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
unsigned long attrs)
{
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
- unsigned int offset = swiotlb_align_offset(dev, orig_addr);
+ unsigned int offset;
struct io_tlb_pool *pool;
unsigned int i;
int index;
phys_addr_t tlb_addr;
+ unsigned int padding;

if (!mem || !mem->nslabs) {
dev_warn_ratelimited(dev,
@@ -1349,6 +1353,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
return (phys_addr_t)DMA_MAPPING_ERROR;
}

+ /*
+ * Calculate buffer pre-padding within the allocated space. Use it to
+ * preserve the low bits of the original address according to device's
+ * min_align_mask. Limit the padding to alloc_align_mask or slot size
+ * (whichever is bigger); higher bits of the original address are
+ * preserved by selecting a suitable IO TLB slot.
+ */
+ offset = orig_addr & dma_get_min_align_mask(dev) &
+ (alloc_align_mask | (IO_TLB_SIZE - 1));
index = swiotlb_find_slots(dev, orig_addr,
alloc_size + offset, alloc_align_mask, &pool);
if (index == -1) {
@@ -1364,6 +1377,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
* This is needed when we sync the memory. Then we sync the buffer if
* needed.
*/
+ padding = 0;
+ while (offset >= IO_TLB_SIZE) {
+ pool->slots[index++].orig_addr = INVALID_PHYS_ADDR;
+ pool->slots[index].padding = ++padding;
+ offset -= IO_TLB_SIZE;
+ }
for (i = 0; i < nr_slots(alloc_size + offset); i++)
pool->slots[index + i].orig_addr = slot_addr(orig_addr, i);
tlb_addr = slot_addr(pool->start, index) + offset;
@@ -1385,12 +1404,16 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
unsigned long flags;
unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
- int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
- int nslots = nr_slots(mem->slots[index].alloc_size + offset);
- int aindex = index / mem->area_nslabs;
- struct io_tlb_area *area = &mem->areas[aindex];
+ int index, nslots, aindex;
+ struct io_tlb_area *area;
int count, i;

+ index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
+ index -= mem->slots[index].padding;
+ nslots = nr_slots(mem->slots[index].alloc_size + offset);
+ aindex = index / mem->area_nslabs;
+ area = &mem->areas[aindex];
+
/*
* Return the buffer to the free list by setting the corresponding
* entries to indicate the number of contiguous entries available.
@@ -1413,6 +1436,7 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
mem->slots[i].list = ++count;
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
mem->slots[i].alloc_size = 0;
+ mem->slots[i].padding = 0;
}

/*
--
2.34.1



2024-03-15 02:53:25

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH 1/1] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary

From: Petr Tesarik <[email protected]> Sent: Tuesday, March 12, 2024 6:42 AM
>
> Allow a buffer pre-padding of up to alloc_align_mask. If the allocation
> alignment is bigger than IO_TLB_SIZE and min_align_mask covers any non-zero
> bits in the original address between IO_TLB_SIZE and alloc_align_mask,
> these bits are not preserved in the swiotlb buffer address.
>
> To fix this case, increase the allocation size and use a larger offset
> within the allocated buffer. As a result, extra padding slots may be
> allocated before the mapping start address.
>
> Set the orig_addr in these padding slots to INVALID_PHYS_ADDR, because they
> do not correspond to any CPU buffer and the data must never be synced.
>
> The padding slots should be automatically released when the buffer is
> unmapped. However, swiotlb_tbl_unmap_single() takes only the address of the
> DMA buffer slot, not the first padding slot. Save the number of padding
> slots in struct io_tlb_slot and use it to adjust the slot index in
> swiotlb_release_slots(), so all allocated slots are properly freed.
>

I've been staring at this the past two days, and have run tests with
various min_align_masks and alloc_masks against orig_addr's with
various alignments and mapping sizes. I'm planning to run additional
tests over the weekend, but I think it's solid. See a few comments
below.

> Fixes: 2fd4fa5d3fb5 ("swiotlb: Fix alignment checks when both allocation and DMA masks are present")
> Link: https://lore.kernel.org/linux-iommu/[email protected]/
> Signed-off-by: Petr Tesarik <[email protected]>
> ---
> kernel/dma/swiotlb.c | 34 +++++++++++++++++++++++++++++-----
> 1 file changed, 29 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 86fe172b5958..8ce11abc691f 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -69,11 +69,13 @@
> * @alloc_size: Size of the allocated buffer.
> * @list: The free list describing the number of free entries available
> * from each index.
> + * @padding: Number of preceding padding slots.
> */
> struct io_tlb_slot {
> phys_addr_t orig_addr;
> size_t alloc_size;
> unsigned int list;
> + unsigned int padding;

Even without the padding field, I presume that in
64-bit builds this struct is already 24 bytes in size so as
to maintain 64-bit alignment for the orig_addr and
alloc_size fields. If that's the case, then adding the
padding field doesn't change the amount of memory
required for the slot array. Is that correct? Both the
"list" and "padding" fields contain only small integers,
but presumably reducing their size from "int" to "short"
wouldn't help except in 32-bit builds.

> };
>
> static bool swiotlb_force_bounce;
> @@ -287,6 +289,7 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
> mem->nslabs - i);
> mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> mem->slots[i].alloc_size = 0;
> + mem->slots[i].padding = 0;
> }
>
> memset(vaddr, 0, bytes);
> @@ -1328,11 +1331,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> unsigned long attrs)
> {
> struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> - unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> + unsigned int offset;
> struct io_tlb_pool *pool;
> unsigned int i;
> int index;
> phys_addr_t tlb_addr;
> + unsigned int padding;
>
> if (!mem || !mem->nslabs) {
> dev_warn_ratelimited(dev,
> @@ -1349,6 +1353,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> return (phys_addr_t)DMA_MAPPING_ERROR;
> }
>
> + /*
> + * Calculate buffer pre-padding within the allocated space. Use it to
> + * preserve the low bits of the original address according to device's
> + * min_align_mask. Limit the padding to alloc_align_mask or slot size
> + * (whichever is bigger); higher bits of the original address are
> + * preserved by selecting a suitable IO TLB slot.
> + */
> + offset = orig_addr & dma_get_min_align_mask(dev) &
> + (alloc_align_mask | (IO_TLB_SIZE - 1));
> index = swiotlb_find_slots(dev, orig_addr,
> alloc_size + offset, alloc_align_mask, &pool);
> if (index == -1) {
> @@ -1364,6 +1377,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> * This is needed when we sync the memory. Then we sync the buffer if
> * needed.
> */
> + padding = 0;
> + while (offset >= IO_TLB_SIZE) {
> + pool->slots[index++].orig_addr = INVALID_PHYS_ADDR;
> + pool->slots[index].padding = ++padding;
> + offset -= IO_TLB_SIZE;
> + }

Looping to fill in the padding slots seems unnecessary.
The orig_addr field should already be initialized to
INVALID_PHYS_ADDR, and the padding field should already
be initialized to zero. The value of "padding" should be just
(offset / IO_TLB_SIZE), and it only needs to be stored in the
first non-padding slot where swiotlb_release_slots() will
find it.

FWIW, your while loop above feels a bit weird in that it
updates two different slots each time through the loop,
and the padding field of the first padding slot isn't
updated. It took me a little while to figure that out. :-)

> for (i = 0; i < nr_slots(alloc_size + offset); i++)
> pool->slots[index + i].orig_addr = slot_addr(orig_addr, i);
> tlb_addr = slot_addr(pool->start, index) + offset;
> @@ -1385,12 +1404,16 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
> struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
> unsigned long flags;
> unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
> - int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> - int nslots = nr_slots(mem->slots[index].alloc_size + offset);
> - int aindex = index / mem->area_nslabs;
> - struct io_tlb_area *area = &mem->areas[aindex];
> + int index, nslots, aindex;
> + struct io_tlb_area *area;
> int count, i;
>
> + index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> + index -= mem->slots[index].padding;
> + nslots = nr_slots(mem->slots[index].alloc_size + offset);
> + aindex = index / mem->area_nslabs;
> + area = &mem->areas[aindex];
> +
> /*
> * Return the buffer to the free list by setting the corresponding
> * entries to indicate the number of contiguous entries available.
> @@ -1413,6 +1436,7 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
> mem->slots[i].list = ++count;
> mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> mem->slots[i].alloc_size = 0;
> + mem->slots[i].padding = 0;
> }
>
> /*
> --
> 2.34.1


2024-03-15 12:26:30

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH 1/1] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary

On Fri, 15 Mar 2024 02:53:10 +0000
Michael Kelley <[email protected]> wrote:

> From: Petr Tesarik <[email protected]> Sent: Tuesday, March 12, 2024 6:42 AM
> >
> > Allow a buffer pre-padding of up to alloc_align_mask. If the allocation
> > alignment is bigger than IO_TLB_SIZE and min_align_mask covers any non-zero
> > bits in the original address between IO_TLB_SIZE and alloc_align_mask,
> > these bits are not preserved in the swiotlb buffer address.
> >
> > To fix this case, increase the allocation size and use a larger offset
> > within the allocated buffer. As a result, extra padding slots may be
> > allocated before the mapping start address.
> >
> > Set the orig_addr in these padding slots to INVALID_PHYS_ADDR, because they
> > do not correspond to any CPU buffer and the data must never be synced.
> >
> > The padding slots should be automatically released when the buffer is
> > unmapped. However, swiotlb_tbl_unmap_single() takes only the address of the
> > DMA buffer slot, not the first padding slot. Save the number of padding
> > slots in struct io_tlb_slot and use it to adjust the slot index in
> > swiotlb_release_slots(), so all allocated slots are properly freed.
> >
>
> I've been staring at this the past two days, and have run tests with
> various min_align_masks and alloc_masks against orig_addr's with
> various alignments and mapping sizes. I'm planning to run additional
> tests over the weekend, but I think it's solid. See a few comments
> below.
>
> > Fixes: 2fd4fa5d3fb5 ("swiotlb: Fix alignment checks when both allocation and DMA masks are present")
> > Link: https://lore.kernel.org/linux-iommu/[email protected]/
> > Signed-off-by: Petr Tesarik <[email protected]>
> > ---
> > kernel/dma/swiotlb.c | 34 +++++++++++++++++++++++++++++-----
> > 1 file changed, 29 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index 86fe172b5958..8ce11abc691f 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -69,11 +69,13 @@
> > * @alloc_size: Size of the allocated buffer.
> > * @list: The free list describing the number of free entries available
> > * from each index.
> > + * @padding: Number of preceding padding slots.
> > */
> > struct io_tlb_slot {
> > phys_addr_t orig_addr;
> > size_t alloc_size;
> > unsigned int list;
> > + unsigned int padding;
>
> Even without the padding field, I presume that in
> 64-bit builds this struct is already 24 bytes in size so as
> to maintain 64-bit alignment for the orig_addr and
> alloc_size fields. If that's the case, then adding the
> padding field doesn't change the amount of memory
> required for the slot array. Is that correct? Both the
> "list" and "padding" fields contain only small integers,
> but presumably reducing their size from "int" to "short"
> wouldn't help except in 32-bit builds.

That's correct. In 64-bit builds, this is the layout before my patch:

/* offset | size */ type = struct io_tlb_slot {
/* 0 | 8 */ phys_addr_t orig_addr;
/* 8 | 8 */ size_t alloc_size;
/* 16 | 4 */ unsigned int list;
/* XXX 4-byte padding */

/* total size (bytes): 24 */
}

And this is the layout after my patch:

/* offset | size */ type = struct io_tlb_slot {
/* 0 | 8 */ phys_addr_t orig_addr;
/* 8 | 8 */ size_t alloc_size;
/* 16 | 4 */ unsigned int list;
/* 20 | 4 */ unsigned int padding;

/* total size (bytes): 24 */
}

Admittedly, I haven't really considered 32-bit builds. The layout
depends on the size of phys_addr_t. For example, on 32-bit Arm with
32-bit physical addresses, the struct grows from 12 bytes to 16 bytes.
With 32-bit physical addresses, it grows from 16 bytes to 24 bytes
(with a 4-byte padding at the end, due to a stricter Arm ABI alignment
requirements). On other platforms, it would be only 20 bytes.

The default SWIOTLB size is 64M, divided into 32K slots of 2K each.
If each slot grows by 8 bytes, runtime memory consumption increases by
256K. That's not much, but I agree that it can be zero if the type is
changed to unsigned short. And there is no downside AFAICS.

FTR the maximum value that can be stored in these two fields is
IO_TLB_SEGSIZE. This constant has never ever changed, but I can add a
BUILD_BUG_ON() to swiotlb_init_io_tlb_pool(), just to make sure.

> > };
> >
> > static bool swiotlb_force_bounce;
> > @@ -287,6 +289,7 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
> > mem->nslabs - i);
> > mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> > mem->slots[i].alloc_size = 0;
> > + mem->slots[i].padding = 0;
> > }
> >
> > memset(vaddr, 0, bytes);
> > @@ -1328,11 +1331,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> > unsigned long attrs)
> > {
> > struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> > - unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> > + unsigned int offset;
> > struct io_tlb_pool *pool;
> > unsigned int i;
> > int index;
> > phys_addr_t tlb_addr;
> > + unsigned int padding;
> >
> > if (!mem || !mem->nslabs) {
> > dev_warn_ratelimited(dev,
> > @@ -1349,6 +1353,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> > return (phys_addr_t)DMA_MAPPING_ERROR;
> > }
> >
> > + /*
> > + * Calculate buffer pre-padding within the allocated space. Use it to
> > + * preserve the low bits of the original address according to device's
> > + * min_align_mask. Limit the padding to alloc_align_mask or slot size
> > + * (whichever is bigger); higher bits of the original address are
> > + * preserved by selecting a suitable IO TLB slot.
> > + */
> > + offset = orig_addr & dma_get_min_align_mask(dev) &
> > + (alloc_align_mask | (IO_TLB_SIZE - 1));
> > index = swiotlb_find_slots(dev, orig_addr,
> > alloc_size + offset, alloc_align_mask, &pool);
> > if (index == -1) {
> > @@ -1364,6 +1377,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> > * This is needed when we sync the memory. Then we sync the buffer if
> > * needed.
> > */
> > + padding = 0;
> > + while (offset >= IO_TLB_SIZE) {
> > + pool->slots[index++].orig_addr = INVALID_PHYS_ADDR;
> > + pool->slots[index].padding = ++padding;
> > + offset -= IO_TLB_SIZE;
> > + }
>
> Looping to fill in the padding slots seems unnecessary.
> The orig_addr field should already be initialized to
> INVALID_PHYS_ADDR, and the padding field should already
> be initialized to zero.

Ack.

> The value of "padding" should be just
> (offset / IO_TLB_SIZE), and it only needs to be stored in the
> first non-padding slot where swiotlb_release_slots() will
> find it.

This is also right. I asked myself the question what should happen if
somebody passes a padding slot's address to swiotlb_tbl_unmap_single().
Of course, it's an invalid address, but as a proponent of defensive
programming, I still asked what would be the best response? If I store
padding in each slot, the whole block is released. If I store it only
in the first non-padding slot, some slots may leak.

On a second thought, the resulting SWIOTLB state is consistent either
way, and we don't to care about leaking some slots if a driver is
buggy. Maybe it's even better, because the leak will be noticed.

In short, I agree, let's keep the code simple.

Thank you for your review!

Petr T

> FWIW, your while loop above feels a bit weird in that it
> updates two different slots each time through the loop,
> and the padding field of the first padding slot isn't
> updated. It took me a little while to figure that out. :-)
>
> > for (i = 0; i < nr_slots(alloc_size + offset); i++)
> > pool->slots[index + i].orig_addr = slot_addr(orig_addr, i);
> > tlb_addr = slot_addr(pool->start, index) + offset;
> > @@ -1385,12 +1404,16 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
> > struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
> > unsigned long flags;
> > unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
> > - int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> > - int nslots = nr_slots(mem->slots[index].alloc_size + offset);
> > - int aindex = index / mem->area_nslabs;
> > - struct io_tlb_area *area = &mem->areas[aindex];
> > + int index, nslots, aindex;
> > + struct io_tlb_area *area;
> > int count, i;
> >
> > + index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> > + index -= mem->slots[index].padding;
> > + nslots = nr_slots(mem->slots[index].alloc_size + offset);
> > + aindex = index / mem->area_nslabs;
> > + area = &mem->areas[aindex];
> > +
> > /*
> > * Return the buffer to the free list by setting the corresponding
> > * entries to indicate the number of contiguous entries available.
> > @@ -1413,6 +1436,7 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
> > mem->slots[i].list = ++count;
> > mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> > mem->slots[i].alloc_size = 0;
> > + mem->slots[i].padding = 0;
> > }
> >
> > /*
> > --
> > 2.34.1
>
>


2024-03-15 14:59:38

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH 1/1] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary

From: Petr Tesa??k <[email protected]> Sent: Friday, March 15, 2024 5:26 AM
>
> On Fri, 15 Mar 2024 02:53:10 +0000
> Michael Kelley <[email protected]> wrote:
>
> > From: Petr Tesarik <[email protected]> Sent: Tuesday, March 12, 2024 6:42 AM
> > >

[snip]

> > > @@ -1349,6 +1353,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> > > return (phys_addr_t)DMA_MAPPING_ERROR;
> > > }
> > >
> > > + /*
> > > + * Calculate buffer pre-padding within the allocated space. Use it to
> > > + * preserve the low bits of the original address according to device's
> > > + * min_align_mask. Limit the padding to alloc_align_mask or slot size
> > > + * (whichever is bigger); higher bits of the original address are
> > > + * preserved by selecting a suitable IO TLB slot.
> > > + */
> > > + offset = orig_addr & dma_get_min_align_mask(dev) &
> > > + (alloc_align_mask | (IO_TLB_SIZE - 1));
> > > index = swiotlb_find_slots(dev, orig_addr,
> > > alloc_size + offset, alloc_align_mask, &pool);
> > > if (index == -1) {
> > > @@ -1364,6 +1377,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> > > * This is needed when we sync the memory. Then we sync the buffer if
> > > * needed.
> > > */
> > > + padding = 0;
> > > + while (offset >= IO_TLB_SIZE) {
> > > + pool->slots[index++].orig_addr = INVALID_PHYS_ADDR;
> > > + pool->slots[index].padding = ++padding;
> > > + offset -= IO_TLB_SIZE;
> > > + }
> >
> > Looping to fill in the padding slots seems unnecessary.
> > The orig_addr field should already be initialized to
> > INVALID_PHYS_ADDR, and the padding field should already
> > be initialized to zero.
>
> Ack.
>
> > The value of "padding" should be just
> > (offset / IO_TLB_SIZE), and it only needs to be stored in the
> > first non-padding slot where swiotlb_release_slots() will
> > find it.
>
> This is also right. I asked myself the question what should happen if
> somebody passes a padding slot's address to swiotlb_tbl_unmap_single().
> Of course, it's an invalid address, but as a proponent of defensive
> programming, I still asked what would be the best response? If I store
> padding in each slot, the whole block is released. If I store it only
> in the first non-padding slot, some slots may leak.
>
> On a second thought, the resulting SWIOTLB state is consistent either
> way, and we don't to care about leaking some slots if a driver is
> buggy. Maybe it's even better, because the leak will be noticed.
>
> In short, I agree, let's keep the code simple.
>

One other thought: Fundamentally, fixing the core problem
requires swiotlb_tbl_unmap_single() to have some information
it doesn't have in the current code. It needs to know the
number of padding slots, so that it can free them correctly.
Your solution is to store the # of padding slots in the
io_tlb_slot array.

Another approach would be to have callers pass the
alloc_align_mask as an argument to swiotlb_tbl_unmap_single().
It can then calculate the offset and the number of padding
slots just like swiotlb_tbl_map_single() does. Nothing
additional would need to be stored in the io_tlb_slot array.
The IOMMU code is the only caller than uses a non-zero
alloc_align_mask. From a quick look at that code, the
unmap path has the iova_mask() available, so that would
work. Other callers would pass zero, just like they do for
swiotlb_tbl_map_single().

I don't immediately have a strong opinion either way, but
it's something to think about a bit more.

Michael

2024-03-15 15:09:46

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH 1/1] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary

On Fri, 15 Mar 2024 14:59:08 +0000
Michael Kelley <[email protected]> wrote:

> From: Petr Tesařík <[email protected]> Sent: Friday, March 15, 2024 5:26 AM
> >
> > On Fri, 15 Mar 2024 02:53:10 +0000
> > Michael Kelley <[email protected]> wrote:
> >
> > > From: Petr Tesarik <[email protected]> Sent: Tuesday, March 12, 2024 6:42 AM
> > > >
>
> [snip]
>
> > > > @@ -1349,6 +1353,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> > > > return (phys_addr_t)DMA_MAPPING_ERROR;
> > > > }
> > > >
> > > > + /*
> > > > + * Calculate buffer pre-padding within the allocated space. Use it to
> > > > + * preserve the low bits of the original address according to device's
> > > > + * min_align_mask. Limit the padding to alloc_align_mask or slot size
> > > > + * (whichever is bigger); higher bits of the original address are
> > > > + * preserved by selecting a suitable IO TLB slot.
> > > > + */
> > > > + offset = orig_addr & dma_get_min_align_mask(dev) &
> > > > + (alloc_align_mask | (IO_TLB_SIZE - 1));
> > > > index = swiotlb_find_slots(dev, orig_addr,
> > > > alloc_size + offset, alloc_align_mask, &pool);
> > > > if (index == -1) {
> > > > @@ -1364,6 +1377,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> > > > * This is needed when we sync the memory. Then we sync the buffer if
> > > > * needed.
> > > > */
> > > > + padding = 0;
> > > > + while (offset >= IO_TLB_SIZE) {
> > > > + pool->slots[index++].orig_addr = INVALID_PHYS_ADDR;
> > > > + pool->slots[index].padding = ++padding;
> > > > + offset -= IO_TLB_SIZE;
> > > > + }
> > >
> > > Looping to fill in the padding slots seems unnecessary.
> > > The orig_addr field should already be initialized to
> > > INVALID_PHYS_ADDR, and the padding field should already
> > > be initialized to zero.
> >
> > Ack.
> >
> > > The value of "padding" should be just
> > > (offset / IO_TLB_SIZE), and it only needs to be stored in the
> > > first non-padding slot where swiotlb_release_slots() will
> > > find it.
> >
> > This is also right. I asked myself the question what should happen if
> > somebody passes a padding slot's address to swiotlb_tbl_unmap_single().
> > Of course, it's an invalid address, but as a proponent of defensive
> > programming, I still asked what would be the best response? If I store
> > padding in each slot, the whole block is released. If I store it only
> > in the first non-padding slot, some slots may leak.
> >
> > On a second thought, the resulting SWIOTLB state is consistent either
> > way, and we don't to care about leaking some slots if a driver is
> > buggy. Maybe it's even better, because the leak will be noticed.
> >
> > In short, I agree, let's keep the code simple.
> >
>
> One other thought: Fundamentally, fixing the core problem
> requires swiotlb_tbl_unmap_single() to have some information
> it doesn't have in the current code. It needs to know the
> number of padding slots, so that it can free them correctly.
> Your solution is to store the # of padding slots in the
> io_tlb_slot array.
>
> Another approach would be to have callers pass the
> alloc_align_mask as an argument to swiotlb_tbl_unmap_single().
> It can then calculate the offset and the number of padding
> slots just like swiotlb_tbl_map_single() does. Nothing
> additional would need to be stored in the io_tlb_slot array.
> The IOMMU code is the only caller than uses a non-zero
> alloc_align_mask. From a quick look at that code, the
> unmap path has the iova_mask() available, so that would
> work. Other callers would pass zero, just like they do for
> swiotlb_tbl_map_single().
>
> I don't immediately have a strong opinion either way, but
> it's something to think about a bit more.

I believe it's slightly more robust to store how the buffer was
actually allocated than to rely on the caller. It seems to me that this
was also a design goal of the original author. For example, note that
swiotlb_tbl_unmap_single() uses mapping_size only to do the final
buffer sync, but not to determine how many slots should be released.
This information is taken from struct io_tlb_slot.alloc_size.

Petr T

2024-03-18 13:05:48

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 1/1] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary

On 15/03/2024 2:53 am, Michael Kelley wrote:
[...]
>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>> index 86fe172b5958..8ce11abc691f 100644
>> --- a/kernel/dma/swiotlb.c
>> +++ b/kernel/dma/swiotlb.c
>> @@ -69,11 +69,13 @@
>> * @alloc_size: Size of the allocated buffer.
>> * @list: The free list describing the number of free entries available
>> * from each index.
>> + * @padding: Number of preceding padding slots.
>> */
>> struct io_tlb_slot {
>> phys_addr_t orig_addr;
>> size_t alloc_size;
>> unsigned int list;
>> + unsigned int padding;
>
> Even without the padding field, I presume that in
> 64-bit builds this struct is already 24 bytes in size so as
> to maintain 64-bit alignment for the orig_addr and
> alloc_size fields. If that's the case, then adding the
> padding field doesn't change the amount of memory
> required for the slot array. Is that correct? Both the
> "list" and "padding" fields contain only small integers,
> but presumably reducing their size from "int" to "short"
> wouldn't help except in 32-bit builds.

Technically I think we could shrink the whole thing down to 16 bytes*,
since just 24 bits of size should still be more than enough, with the
remaining 8 bits similarly plenty for a padding slot count. Depends if
we think the overall memory saving is worth the marginal extra
complexity of packing values into bitfields.

Thanks,
Robin.


* The relevance of SWIOTLB to 32-bit builds is primarily going to be for
PAE cases where phys_addr_t is still 64-bit.