2024-03-08 15:28:46

by Will Deacon

[permalink] [raw]
Subject: [PATCH v6 0/6] Fix double allocation in swiotlb_alloc()

Hi again, folks,

This is version six of the patches which I previously posted at:

v1: https://lore.kernel.org/r/[email protected]
v2: https://lore.kernel.org/r/[email protected]
v3: https://lore.kernel.org/r/[email protected]
v4: https://lore.kernel.org/r/[email protected]
v5: https://lore.kernel.org/r/[email protected]

Changes since v5 include:

- Rework the final patch to preserve page-alignment for streaming
requests without a DMA alignment mask

- Added Reviewed-by tags from Michael

Cheers,

Will

Cc: [email protected]
Cc: Christoph Hellwig <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Petr Tesarik <[email protected]>
Cc: Dexuan Cui <[email protected]>
Cc: Nicolin Chen <[email protected]>
Cc: Michael Kelley <[email protected]>

--->8

Nicolin Chen (1):
iommu/dma: Force swiotlb_max_mapping_size on an untrusted device

Will Deacon (5):
swiotlb: Fix double-allocation of slots due to broken alignment
handling
swiotlb: Enforce page alignment in swiotlb_alloc()
swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()
swiotlb: Fix alignment checks when both allocation and DMA masks are
present
swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE

drivers/iommu/dma-iommu.c | 9 ++++++++
kernel/dma/swiotlb.c | 47 ++++++++++++++++++++++++++++-----------
2 files changed, 43 insertions(+), 13 deletions(-)

--
2.44.0.278.ge034bb2e1d-goog



2024-03-08 15:28:57

by Will Deacon

[permalink] [raw]
Subject: [PATCH v6 1/6] swiotlb: Fix double-allocation of slots due to broken alignment handling

Commit bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix"),
which was a fix for commit 0eee5ae10256 ("swiotlb: fix slot alignment
checks"), causes a functional regression with vsock in a virtual machine
using bouncing via a restricted DMA SWIOTLB pool.

When virtio allocates the virtqueues for the vsock device using
dma_alloc_coherent(), the SWIOTLB search can return page-unaligned
allocations if 'area->index' was left unaligned by a previous allocation
from the buffer:

# Final address in brackets is the SWIOTLB address returned to the caller
| virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1645-1649/7168 (0x98326800)
| virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1649-1653/7168 (0x98328800)
| virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1653-1657/7168 (0x9832a800)

This ends badly (typically buffer corruption and/or a hang) because
swiotlb_alloc() is expecting a page-aligned allocation and so blindly
returns a pointer to the 'struct page' corresponding to the allocation,
therefore double-allocating the first half (2KiB slot) of the 4KiB page.

Fix the problem by treating the allocation alignment separately to any
additional alignment requirements from the device, using the maximum
of the two as the stride to search the buffer slots and taking care
to ensure a minimum of page-alignment for buffers larger than a page.

This also resolves swiotlb allocation failures occuring due to the
inclusion of ~PAGE_MASK in 'iotlb_align_mask' for large allocations and
resulting in alignment requirements exceeding swiotlb_max_mapping_size().

Fixes: bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix")
Fixes: 0eee5ae10256 ("swiotlb: fix slot alignment checks")
Cc: Christoph Hellwig <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Dexuan Cui <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Reviewed-by: Petr Tesarik <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
kernel/dma/swiotlb.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b079a9a8e087..2ec2cc81f1a2 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -982,7 +982,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
unsigned long max_slots = get_max_slots(boundary_mask);
unsigned int iotlb_align_mask =
- dma_get_min_align_mask(dev) | alloc_align_mask;
+ dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
unsigned int nslots = nr_slots(alloc_size), stride;
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int index, slots_checked, count = 0, i;
@@ -993,19 +993,18 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
BUG_ON(!nslots);
BUG_ON(area_index >= pool->nareas);

+ /*
+ * For mappings with an alignment requirement don't bother looping to
+ * unaligned slots once we found an aligned one.
+ */
+ stride = get_max_slots(max(alloc_align_mask, iotlb_align_mask));
+
/*
* For allocations of PAGE_SIZE or larger only look for page aligned
* allocations.
*/
if (alloc_size >= PAGE_SIZE)
- iotlb_align_mask |= ~PAGE_MASK;
- iotlb_align_mask &= ~(IO_TLB_SIZE - 1);
-
- /*
- * For mappings with an alignment requirement don't bother looping to
- * unaligned slots once we found an aligned one.
- */
- stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
+ stride = umax(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);

spin_lock_irqsave(&area->lock, flags);
if (unlikely(nslots > pool->area_nslabs - area->used))
@@ -1015,11 +1014,14 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
index = area->index;

for (slots_checked = 0; slots_checked < pool->area_nslabs; ) {
- slot_index = slot_base + index;
+ phys_addr_t tlb_addr;

- if (orig_addr &&
- (slot_addr(tbl_dma_addr, slot_index) &
- iotlb_align_mask) != (orig_addr & iotlb_align_mask)) {
+ slot_index = slot_base + index;
+ tlb_addr = slot_addr(tbl_dma_addr, slot_index);
+
+ if ((tlb_addr & alloc_align_mask) ||
+ (orig_addr && (tlb_addr & iotlb_align_mask) !=
+ (orig_addr & iotlb_align_mask))) {
index = wrap_area_index(pool, index + 1);
slots_checked++;
continue;
--
2.44.0.278.ge034bb2e1d-goog


2024-03-08 15:29:10

by Will Deacon

[permalink] [raw]
Subject: [PATCH v6 2/6] swiotlb: Enforce page alignment in swiotlb_alloc()

When allocating pages from a restricted DMA pool in swiotlb_alloc(),
the buffer address is blindly converted to a 'struct page *' that is
returned to the caller. In the unlikely event of an allocation bug,
page-unaligned addresses are not detected and slots can silently be
double-allocated.

Add a simple check of the buffer alignment in swiotlb_alloc() to make
debugging a little easier if something has gone wonky.

Cc: Christoph Hellwig <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Dexuan Cui <[email protected]>
Reviewed-by: Petr Tesarik <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
kernel/dma/swiotlb.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 2ec2cc81f1a2..ab7fbb40bc55 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -1643,6 +1643,12 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
return NULL;

tlb_addr = slot_addr(pool->start, index);
+ if (unlikely(!PAGE_ALIGNED(tlb_addr))) {
+ dev_WARN_ONCE(dev, 1, "Cannot allocate pages from non page-aligned swiotlb addr 0x%pa.\n",
+ &tlb_addr);
+ swiotlb_release_slots(dev, tlb_addr);
+ return NULL;
+ }

return pfn_to_page(PFN_DOWN(tlb_addr));
}
--
2.44.0.278.ge034bb2e1d-goog


2024-03-08 15:29:23

by Will Deacon

[permalink] [raw]
Subject: [PATCH v6 3/6] swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()

core-api/dma-api-howto.rst states the following properties of
dma_alloc_coherent():

| The CPU virtual address and the DMA address are both guaranteed to
| be aligned to the smallest PAGE_SIZE order which is greater than or
| equal to the requested size.

However, swiotlb_alloc() passes zero for the 'alloc_align_mask'
parameter of swiotlb_find_slots() and so this property is not upheld.
Instead, allocations larger than a page are aligned to PAGE_SIZE,

Calculate the mask corresponding to the page order suitable for holding
the allocation and pass that to swiotlb_find_slots().

Cc: Christoph Hellwig <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Dexuan Cui <[email protected]>
Fixes: e81e99bacc9f ("swiotlb: Support aligned swiotlb buffers")
Reviewed-by: Petr Tesarik <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
kernel/dma/swiotlb.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ab7fbb40bc55..c20324fba814 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -1633,12 +1633,14 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
struct io_tlb_pool *pool;
phys_addr_t tlb_addr;
+ unsigned int align;
int index;

if (!mem)
return NULL;

- index = swiotlb_find_slots(dev, 0, size, 0, &pool);
+ align = (1 << (get_order(size) + PAGE_SHIFT)) - 1;
+ index = swiotlb_find_slots(dev, 0, size, align, &pool);
if (index == -1)
return NULL;

--
2.44.0.278.ge034bb2e1d-goog


2024-03-08 15:29:32

by Will Deacon

[permalink] [raw]
Subject: [PATCH v6 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present

Nicolin reports that swiotlb buffer allocations fail for an NVME device
behind an IOMMU using 64KiB pages. This is because we end up with a
minimum allocation alignment of 64KiB (for the IOMMU to map the buffer
safely) but a minimum DMA alignment mask corresponding to a 4KiB NVME
page (i.e. preserving the 4KiB page offset from the original allocation).
If the original address is not 4KiB-aligned, the allocation will fail
because swiotlb_search_pool_area() erroneously compares these unmasked
bits with the 64KiB-aligned candidate allocation.

Tweak swiotlb_search_pool_area() so that the DMA alignment mask is
reduced based on the required alignment of the allocation.

Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Reported-by: Nicolin Chen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Nicolin Chen <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
kernel/dma/swiotlb.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index c20324fba814..c381a7ed718f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -981,8 +981,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
unsigned long max_slots = get_max_slots(boundary_mask);
- unsigned int iotlb_align_mask =
- dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
+ unsigned int iotlb_align_mask = dma_get_min_align_mask(dev);
unsigned int nslots = nr_slots(alloc_size), stride;
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int index, slots_checked, count = 0, i;
@@ -993,6 +992,14 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
BUG_ON(!nslots);
BUG_ON(area_index >= pool->nareas);

+ /*
+ * Ensure that the allocation is at least slot-aligned and update
+ * 'iotlb_align_mask' to ignore bits that will be preserved when
+ * offsetting into the allocation.
+ */
+ alloc_align_mask |= (IO_TLB_SIZE - 1);
+ iotlb_align_mask &= ~alloc_align_mask;
+
/*
* For mappings with an alignment requirement don't bother looping to
* unaligned slots once we found an aligned one.
--
2.44.0.278.ge034bb2e1d-goog


2024-03-08 15:29:45

by Will Deacon

[permalink] [raw]
Subject: [PATCH v6 5/6] iommu/dma: Force swiotlb_max_mapping_size on an untrusted device

From: Nicolin Chen <[email protected]>

The swiotlb does not support a mapping size > swiotlb_max_mapping_size().
On the other hand, with a 64KB PAGE_SIZE configuration, it's observed that
an NVME device can map a size between 300KB~512KB, which certainly failed
the swiotlb mappings, though the default pool of swiotlb has many slots:
systemd[1]: Started Journal Service.
=> nvme 0000:00:01.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 32 (slots)
note: journal-offline[392] exited with irqs disabled
note: journal-offline[392] exited with preempt_count 1

Call trace:
[ 3.099918] swiotlb_tbl_map_single+0x214/0x240
[ 3.099921] iommu_dma_map_page+0x218/0x328
[ 3.099928] dma_map_page_attrs+0x2e8/0x3a0
[ 3.101985] nvme_prep_rq.part.0+0x408/0x878 [nvme]
[ 3.102308] nvme_queue_rqs+0xc0/0x300 [nvme]
[ 3.102313] blk_mq_flush_plug_list.part.0+0x57c/0x600
[ 3.102321] blk_add_rq_to_plug+0x180/0x2a0
[ 3.102323] blk_mq_submit_bio+0x4c8/0x6b8
[ 3.103463] __submit_bio+0x44/0x220
[ 3.103468] submit_bio_noacct_nocheck+0x2b8/0x360
[ 3.103470] submit_bio_noacct+0x180/0x6c8
[ 3.103471] submit_bio+0x34/0x130
[ 3.103473] ext4_bio_write_folio+0x5a4/0x8c8
[ 3.104766] mpage_submit_folio+0xa0/0x100
[ 3.104769] mpage_map_and_submit_buffers+0x1a4/0x400
[ 3.104771] ext4_do_writepages+0x6a0/0xd78
[ 3.105615] ext4_writepages+0x80/0x118
[ 3.105616] do_writepages+0x90/0x1e8
[ 3.105619] filemap_fdatawrite_wbc+0x94/0xe0
[ 3.105622] __filemap_fdatawrite_range+0x68/0xb8
[ 3.106656] file_write_and_wait_range+0x84/0x120
[ 3.106658] ext4_sync_file+0x7c/0x4c0
[ 3.106660] vfs_fsync_range+0x3c/0xa8
[ 3.106663] do_fsync+0x44/0xc0

Since untrusted devices might go down the swiotlb pathway with dma-iommu,
these devices should not map a size larger than swiotlb_max_mapping_size.

To fix this bug, add iommu_dma_max_mapping_size() for untrusted devices to
take into account swiotlb_max_mapping_size() v.s. iova_rcache_range() from
the iommu_dma_opt_mapping_size().

Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Signed-off-by: Nicolin Chen <[email protected]>
Link: https://lore.kernel.org/r/ee51a3a5c32cf885b18f6416171802669f4a718a.1707851466.git.nicolinc@nvidia.com
Acked-by: Robin Murphy <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
[will: Drop redundant is_swiotlb_active(dev) check]
Signed-off-by: Will Deacon <[email protected]>
---
drivers/iommu/dma-iommu.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 50ccc4f1ef81..639efa0c4072 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1706,6 +1706,14 @@ static size_t iommu_dma_opt_mapping_size(void)
return iova_rcache_range();
}

+static size_t iommu_dma_max_mapping_size(struct device *dev)
+{
+ if (dev_is_untrusted(dev))
+ return swiotlb_max_mapping_size(dev);
+
+ return SIZE_MAX;
+}
+
static const struct dma_map_ops iommu_dma_ops = {
.flags = DMA_F_PCI_P2PDMA_SUPPORTED,
.alloc = iommu_dma_alloc,
@@ -1728,6 +1736,7 @@ static const struct dma_map_ops iommu_dma_ops = {
.unmap_resource = iommu_dma_unmap_resource,
.get_merge_boundary = iommu_dma_get_merge_boundary,
.opt_mapping_size = iommu_dma_opt_mapping_size,
+ .max_mapping_size = iommu_dma_max_mapping_size,
};

/*
--
2.44.0.278.ge034bb2e1d-goog


2024-03-08 15:29:56

by Will Deacon

[permalink] [raw]
Subject: [PATCH v6 6/6] swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE

For swiotlb allocations >= PAGE_SIZE, the slab search historically
adjusted the stride to avoid checking unaligned slots. This had the
side-effect of aligning large mapping requests to PAGE_SIZE, but that
was broken by 0eee5ae10256 ("swiotlb: fix slot alignment checks").

Since this alignment could be relied upon drivers, reinstate PAGE_SIZE
alignment for swiotlb mappings >= PAGE_SIZE.

Reported-by: Michael Kelley <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
kernel/dma/swiotlb.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index c381a7ed718f..c5851034523f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -992,6 +992,17 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
BUG_ON(!nslots);
BUG_ON(area_index >= pool->nareas);

+ /*
+ * Historically, swiotlb allocations >= PAGE_SIZE were guaranteed to be
+ * page-aligned in the absence of any other alignment requirements.
+ * 'alloc_align_mask' was later introduced to specify the alignment
+ * explicitly, however this is passed as zero for streaming mappings
+ * and so we preserve the old behaviour there in case any drivers are
+ * relying on it.
+ */
+ if (!alloc_align_mask && !iotlb_align_mask && alloc_size >= PAGE_SIZE)
+ alloc_align_mask = PAGE_SIZE - 1;
+
/*
* Ensure that the allocation is at least slot-aligned and update
* 'iotlb_align_mask' to ignore bits that will be preserved when
@@ -1006,13 +1017,6 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
*/
stride = get_max_slots(max(alloc_align_mask, iotlb_align_mask));

- /*
- * For allocations of PAGE_SIZE or larger only look for page aligned
- * allocations.
- */
- if (alloc_size >= PAGE_SIZE)
- stride = umax(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
-
spin_lock_irqsave(&area->lock, flags);
if (unlikely(nslots > pool->area_nslabs - area->used))
goto not_found;
--
2.44.0.278.ge034bb2e1d-goog


2024-03-08 16:08:15

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v6 6/6] swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE

On 2024-03-08 3:28 pm, Will Deacon wrote:
> For swiotlb allocations >= PAGE_SIZE, the slab search historically
> adjusted the stride to avoid checking unaligned slots. This had the
> side-effect of aligning large mapping requests to PAGE_SIZE, but that
> was broken by 0eee5ae10256 ("swiotlb: fix slot alignment checks").
>
> Since this alignment could be relied upon drivers, reinstate PAGE_SIZE
> alignment for swiotlb mappings >= PAGE_SIZE.

This seems clear enough to keep me happy now, thanks! And apologies that
I managed to confuse even myself in the previous thread...

Reviewed-by: Robin Murphy <[email protected]>

> Reported-by: Michael Kelley <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> kernel/dma/swiotlb.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index c381a7ed718f..c5851034523f 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -992,6 +992,17 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> BUG_ON(!nslots);
> BUG_ON(area_index >= pool->nareas);
>
> + /*
> + * Historically, swiotlb allocations >= PAGE_SIZE were guaranteed to be
> + * page-aligned in the absence of any other alignment requirements.
> + * 'alloc_align_mask' was later introduced to specify the alignment
> + * explicitly, however this is passed as zero for streaming mappings
> + * and so we preserve the old behaviour there in case any drivers are
> + * relying on it.
> + */
> + if (!alloc_align_mask && !iotlb_align_mask && alloc_size >= PAGE_SIZE)
> + alloc_align_mask = PAGE_SIZE - 1;
> +
> /*
> * Ensure that the allocation is at least slot-aligned and update
> * 'iotlb_align_mask' to ignore bits that will be preserved when
> @@ -1006,13 +1017,6 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> */
> stride = get_max_slots(max(alloc_align_mask, iotlb_align_mask));
>
> - /*
> - * For allocations of PAGE_SIZE or larger only look for page aligned
> - * allocations.
> - */
> - if (alloc_size >= PAGE_SIZE)
> - stride = umax(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
> -
> spin_lock_irqsave(&area->lock, flags);
> if (unlikely(nslots > pool->area_nslabs - area->used))
> goto not_found;

2024-03-08 16:47:48

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v6 6/6] swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE

On Fri, 8 Mar 2024 16:08:01 +0000
Robin Murphy <[email protected]> wrote:

> On 2024-03-08 3:28 pm, Will Deacon wrote:
> > For swiotlb allocations >= PAGE_SIZE, the slab search historically
> > adjusted the stride to avoid checking unaligned slots. This had the
> > side-effect of aligning large mapping requests to PAGE_SIZE, but that
> > was broken by 0eee5ae10256 ("swiotlb: fix slot alignment checks").
> >
> > Since this alignment could be relied upon drivers, reinstate PAGE_SIZE
> > alignment for swiotlb mappings >= PAGE_SIZE.
>
> This seems clear enough to keep me happy now, thanks! And apologies that
> I managed to confuse even myself in the previous thread...
>
> Reviewed-by: Robin Murphy <[email protected]>

I thought we agreed that this stricter alignment is unnecessary:

https://lore.kernel.org/linux-iommu/[email protected]/

But if everybody else wants to have it...

Petr T

> > Reported-by: Michael Kelley <[email protected]>
> > Signed-off-by: Will Deacon <[email protected]>
> > ---
> > kernel/dma/swiotlb.c | 18 +++++++++++-------
> > 1 file changed, 11 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index c381a7ed718f..c5851034523f 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -992,6 +992,17 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > BUG_ON(!nslots);
> > BUG_ON(area_index >= pool->nareas);
> >
> > + /*
> > + * Historically, swiotlb allocations >= PAGE_SIZE were guaranteed to be
> > + * page-aligned in the absence of any other alignment requirements.
> > + * 'alloc_align_mask' was later introduced to specify the alignment
> > + * explicitly, however this is passed as zero for streaming mappings
> > + * and so we preserve the old behaviour there in case any drivers are
> > + * relying on it.
> > + */
> > + if (!alloc_align_mask && !iotlb_align_mask && alloc_size >= PAGE_SIZE)
> > + alloc_align_mask = PAGE_SIZE - 1;
> > +
> > /*
> > * Ensure that the allocation is at least slot-aligned and update
> > * 'iotlb_align_mask' to ignore bits that will be preserved when
> > @@ -1006,13 +1017,6 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > */
> > stride = get_max_slots(max(alloc_align_mask, iotlb_align_mask));
> >
> > - /*
> > - * For allocations of PAGE_SIZE or larger only look for page aligned
> > - * allocations.
> > - */
> > - if (alloc_size >= PAGE_SIZE)
> > - stride = umax(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
> > -
> > spin_lock_irqsave(&area->lock, flags);
> > if (unlikely(nslots > pool->area_nslabs - area->used))
> > goto not_found;
>


2024-03-08 16:51:00

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v6 6/6] swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE

On Fri, Mar 08, 2024 at 05:38:16PM +0100, Petr Tesařík wrote:
> I thought we agreed that this stricter alignment is unnecessary:
>
> https://lore.kernel.org/linux-iommu/[email protected]/
>
> But if everybody else wants to have it...

Let's get the fixes in with the existing documented alignment, and
then lift it in the next merge window with enough soaking time in
linux-next.

2024-03-08 17:18:05

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v6 6/6] swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE

On 2024-03-08 4:38 pm, Petr Tesařík wrote:
> On Fri, 8 Mar 2024 16:08:01 +0000
> Robin Murphy <[email protected]> wrote:
>
>> On 2024-03-08 3:28 pm, Will Deacon wrote:
>>> For swiotlb allocations >= PAGE_SIZE, the slab search historically
>>> adjusted the stride to avoid checking unaligned slots. This had the
>>> side-effect of aligning large mapping requests to PAGE_SIZE, but that
>>> was broken by 0eee5ae10256 ("swiotlb: fix slot alignment checks").
>>>
>>> Since this alignment could be relied upon drivers, reinstate PAGE_SIZE
>>> alignment for swiotlb mappings >= PAGE_SIZE.
>>
>> This seems clear enough to keep me happy now, thanks! And apologies that
>> I managed to confuse even myself in the previous thread...
>>
>> Reviewed-by: Robin Murphy <[email protected]>
>
> I thought we agreed that this stricter alignment is unnecessary:
>
> https://lore.kernel.org/linux-iommu/[email protected]/

No, that was about dma_alloc_coherent() again (and TBH I'm not sure we
should actually relax it anyway, since there definitely are callers who
rely on size-alignment beyond PAGE_SIZE, however they're typically going
to be using the common implementations which end up in alloc_pages() or
CMA and so do offer that, rather than the oddball ones which don't -
e.g. we're never going to be allocating SMMUv3 Stream Tables out of some
restricted pool via the emergency swiotlb_alloc() path). If anywhere,
the place to argue that point would be patch #3 (which as mentioned I'd
managed to forget about before...)

This one's just about preserving a SWIOTLB-specific behaviour which has
the practical effect of making SWIOTLB a bit less visible to dma_map_*()
callers. The impact of keeping this is fairly low, so seems preferable
to the risk of facing issues 2 or 3 years down the line when someone
finally upgrades their distro and their data gets eaten because it turns
out some obscure driver should really have been updated to use
min_align_mask.

Thanks,
Robin.

> But if everybody else wants to have it...
>
> Petr T
>
>>> Reported-by: Michael Kelley <[email protected]>
>>> Signed-off-by: Will Deacon <[email protected]>
>>> ---
>>> kernel/dma/swiotlb.c | 18 +++++++++++-------
>>> 1 file changed, 11 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>>> index c381a7ed718f..c5851034523f 100644
>>> --- a/kernel/dma/swiotlb.c
>>> +++ b/kernel/dma/swiotlb.c
>>> @@ -992,6 +992,17 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
>>> BUG_ON(!nslots);
>>> BUG_ON(area_index >= pool->nareas);
>>>
>>> + /*
>>> + * Historically, swiotlb allocations >= PAGE_SIZE were guaranteed to be
>>> + * page-aligned in the absence of any other alignment requirements.
>>> + * 'alloc_align_mask' was later introduced to specify the alignment
>>> + * explicitly, however this is passed as zero for streaming mappings
>>> + * and so we preserve the old behaviour there in case any drivers are
>>> + * relying on it.
>>> + */
>>> + if (!alloc_align_mask && !iotlb_align_mask && alloc_size >= PAGE_SIZE)
>>> + alloc_align_mask = PAGE_SIZE - 1;
>>> +
>>> /*
>>> * Ensure that the allocation is at least slot-aligned and update
>>> * 'iotlb_align_mask' to ignore bits that will be preserved when
>>> @@ -1006,13 +1017,6 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
>>> */
>>> stride = get_max_slots(max(alloc_align_mask, iotlb_align_mask));
>>>
>>> - /*
>>> - * For allocations of PAGE_SIZE or larger only look for page aligned
>>> - * allocations.
>>> - */
>>> - if (alloc_size >= PAGE_SIZE)
>>> - stride = umax(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
>>> -
>>> spin_lock_irqsave(&area->lock, flags);
>>> if (unlikely(nslots > pool->area_nslabs - area->used))
>>> goto not_found;
>>
>

2024-03-08 17:27:18

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v6 6/6] swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE

On Fri, 8 Mar 2024 17:50:25 +0100
Christoph Hellwig <[email protected]> wrote:

> On Fri, Mar 08, 2024 at 05:38:16PM +0100, Petr Tesařík wrote:
> > I thought we agreed that this stricter alignment is unnecessary:
> >
> > https://lore.kernel.org/linux-iommu/[email protected]/
> >
> > But if everybody else wants to have it...
>
> Let's get the fixes in with the existing documented alignment, and
> then lift it in the next merge window with enough soaking time in
> linux-next.

I agree. It make sense.

Petr T

2024-03-08 17:30:13

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v6 6/6] swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE

On Fri, 8 Mar 2024 15:28:29 +0000
Will Deacon <[email protected]> wrote:

> For swiotlb allocations >= PAGE_SIZE, the slab search historically
> adjusted the stride to avoid checking unaligned slots. This had the
> side-effect of aligning large mapping requests to PAGE_SIZE, but that
> was broken by 0eee5ae10256 ("swiotlb: fix slot alignment checks").
>
> Since this alignment could be relied upon drivers, reinstate PAGE_SIZE
> alignment for swiotlb mappings >= PAGE_SIZE.
>
> Reported-by: Michael Kelley <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
>
> ---
> kernel/dma/swiotlb.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index c381a7ed718f..c5851034523f 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -992,6 +992,17 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> BUG_ON(!nslots);
> BUG_ON(area_index >= pool->nareas);
>
> + /*
> + * Historically, swiotlb allocations >= PAGE_SIZE were guaranteed to be
> + * page-aligned in the absence of any other alignment requirements.
> + * 'alloc_align_mask' was later introduced to specify the alignment
> + * explicitly, however this is passed as zero for streaming mappings
> + * and so we preserve the old behaviour there in case any drivers are
> + * relying on it.
> + */
> + if (!alloc_align_mask && !iotlb_align_mask && alloc_size >= PAGE_SIZE)
> + alloc_align_mask = PAGE_SIZE - 1;
> +

This could be moved up the call chain to swiotlb_tbl_map_single(), but
since there is already a plan to modify the call signatures for the
next release, let's keep it here.

Reviewed-by: Petr Tesarik <[email protected]>

Petr T

> /*
> * Ensure that the allocation is at least slot-aligned and update
> * 'iotlb_align_mask' to ignore bits that will be preserved when
> @@ -1006,13 +1017,6 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> */
> stride = get_max_slots(max(alloc_align_mask, iotlb_align_mask));
>
> - /*
> - * For allocations of PAGE_SIZE or larger only look for page aligned
> - * allocations.
> - */
> - if (alloc_size >= PAGE_SIZE)
> - stride = umax(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
> -
> spin_lock_irqsave(&area->lock, flags);
> if (unlikely(nslots > pool->area_nslabs - area->used))
> goto not_found;


2024-03-08 18:08:18

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v6 6/6] swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE

On Fri, 8 Mar 2024 17:17:51 +0000
Robin Murphy <[email protected]> wrote:

> On 2024-03-08 4:38 pm, Petr Tesařík wrote:
> > On Fri, 8 Mar 2024 16:08:01 +0000
> > Robin Murphy <[email protected]> wrote:
> >
> >> On 2024-03-08 3:28 pm, Will Deacon wrote:
> >>> For swiotlb allocations >= PAGE_SIZE, the slab search historically
> >>> adjusted the stride to avoid checking unaligned slots. This had the
> >>> side-effect of aligning large mapping requests to PAGE_SIZE, but that
> >>> was broken by 0eee5ae10256 ("swiotlb: fix slot alignment checks").
> >>>
> >>> Since this alignment could be relied upon drivers, reinstate PAGE_SIZE
> >>> alignment for swiotlb mappings >= PAGE_SIZE.
> >>
> >> This seems clear enough to keep me happy now, thanks! And apologies that
> >> I managed to confuse even myself in the previous thread...
> >>
> >> Reviewed-by: Robin Murphy <[email protected]>
> >
> > I thought we agreed that this stricter alignment is unnecessary:
> >
> > https://lore.kernel.org/linux-iommu/[email protected]/
>
> No, that was about dma_alloc_coherent() again (and TBH I'm not sure we
> should actually relax it anyway, since there definitely are callers who
> rely on size-alignment beyond PAGE_SIZE, however they're typically going
> to be using the common implementations which end up in alloc_pages() or
> CMA and so do offer that, rather than the oddball ones which don't -
> e.g. we're never going to be allocating SMMUv3 Stream Tables out of some
> restricted pool via the emergency swiotlb_alloc() path). If anywhere,
> the place to argue that point would be patch #3 (which as mentioned I'd
> managed to forget about before...)

Sure, swiotlb_alloc() ensures that alloc_align_mask is non-zero, so the
condition in this patch cannot be met. In fact, that's why it could be
moved up to swiotlb_tbl_map_single().

> This one's just about preserving a SWIOTLB-specific behaviour which has
> the practical effect of making SWIOTLB a bit less visible to dma_map_*()
> callers. The impact of keeping this is fairly low, so seems preferable
> to the risk of facing issues 2 or 3 years down the line when someone
> finally upgrades their distro and their data gets eaten because it turns
> out some obscure driver should really have been updated to use
> min_align_mask.

The impact is indeed negligible with 4K pages. It may put a bit of
stress on the SWIOTLB with 64K pages, but if the mapping size somehow
correlates with page size, such drivers would need a larger SWIOTLB
anyway.

I had some doubts if there are any guarantees at all for dma_map_*().
Now I see that the email I linked dealt with dma_alloc_coherent().

OK, let's not open the other discussion now.

Petr T

> Thanks,
> Robin.
>
> > But if everybody else wants to have it...
> >
> > Petr T
> >
> >>> Reported-by: Michael Kelley <[email protected]>
> >>> Signed-off-by: Will Deacon <[email protected]>
> >>> ---
> >>> kernel/dma/swiotlb.c | 18 +++++++++++-------
> >>> 1 file changed, 11 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> >>> index c381a7ed718f..c5851034523f 100644
> >>> --- a/kernel/dma/swiotlb.c
> >>> +++ b/kernel/dma/swiotlb.c
> >>> @@ -992,6 +992,17 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> >>> BUG_ON(!nslots);
> >>> BUG_ON(area_index >= pool->nareas);
> >>>
> >>> + /*
> >>> + * Historically, swiotlb allocations >= PAGE_SIZE were guaranteed to be
> >>> + * page-aligned in the absence of any other alignment requirements.
> >>> + * 'alloc_align_mask' was later introduced to specify the alignment
> >>> + * explicitly, however this is passed as zero for streaming mappings
> >>> + * and so we preserve the old behaviour there in case any drivers are
> >>> + * relying on it.
> >>> + */
> >>> + if (!alloc_align_mask && !iotlb_align_mask && alloc_size >= PAGE_SIZE)
> >>> + alloc_align_mask = PAGE_SIZE - 1;
> >>> +
> >>> /*
> >>> * Ensure that the allocation is at least slot-aligned and update
> >>> * 'iotlb_align_mask' to ignore bits that will be preserved when
> >>> @@ -1006,13 +1017,6 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> >>> */
> >>> stride = get_max_slots(max(alloc_align_mask, iotlb_align_mask));
> >>>
> >>> - /*
> >>> - * For allocations of PAGE_SIZE or larger only look for page aligned
> >>> - * allocations.
> >>> - */
> >>> - if (alloc_size >= PAGE_SIZE)
> >>> - stride = umax(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
> >>> -
> >>> spin_lock_irqsave(&area->lock, flags);
> >>> if (unlikely(nslots > pool->area_nslabs - area->used))
> >>> goto not_found;
> >>
> >
>


2024-03-08 19:02:43

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH v6 6/6] swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE

From: Will Deacon <[email protected]> Sent: Friday, March 8, 2024 7:28 AM
>
> For swiotlb allocations >= PAGE_SIZE, the slab search historically
> adjusted the stride to avoid checking unaligned slots. This had the
> side-effect of aligning large mapping requests to PAGE_SIZE, but that
> was broken by 0eee5ae10256 ("swiotlb: fix slot alignment checks").
>
> Since this alignment could be relied upon drivers, reinstate PAGE_SIZE
> alignment for swiotlb mappings >= PAGE_SIZE.
>
> Reported-by: Michael Kelley <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>

Reviewed-by: Michael Kelley <[email protected]>

> ---
> kernel/dma/swiotlb.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index c381a7ed718f..c5851034523f 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -992,6 +992,17 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> BUG_ON(!nslots);
> BUG_ON(area_index >= pool->nareas);
>
> + /*
> + * Historically, swiotlb allocations >= PAGE_SIZE were guaranteed to be
> + * page-aligned in the absence of any other alignment requirements.
> + * 'alloc_align_mask' was later introduced to specify the alignment
> + * explicitly, however this is passed as zero for streaming mappings
> + * and so we preserve the old behaviour there in case any drivers are
> + * relying on it.
> + */
> + if (!alloc_align_mask && !iotlb_align_mask && alloc_size >= PAGE_SIZE)
> + alloc_align_mask = PAGE_SIZE - 1;
> +
> /*
> * Ensure that the allocation is at least slot-aligned and update
> * 'iotlb_align_mask' to ignore bits that will be preserved when
> @@ -1006,13 +1017,6 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> */
> stride = get_max_slots(max(alloc_align_mask, iotlb_align_mask));
>
> - /*
> - * For allocations of PAGE_SIZE or larger only look for page aligned
> - * allocations.
> - */
> - if (alloc_size >= PAGE_SIZE)
> - stride = umax(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
> -
> spin_lock_irqsave(&area->lock, flags);
> if (unlikely(nslots > pool->area_nslabs - area->used))
> goto not_found;
> --
> 2.44.0.278.ge034bb2e1d-goog


2024-03-08 19:21:36

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH v6 0/6] Fix double allocation in swiotlb_alloc()

From: Will Deacon <[email protected]> Sent: Friday, March 8, 2024 7:28 AM
>

[snip]

>
> Changes since v5 include:
>
> - Rework the final patch to preserve page-alignment for streaming
> requests without a DMA alignment mask
>
> - Added Reviewed-by tags from Michael
>
> Cheers,
>
> Will
>

I've tested the full v6 of this series on an ARM64 VM in the Azure
public cloud with 64K page size, and swiotlb=force on the kernel
boot line. The path exercised is the DMA Direct path on a synthetic
SCSI disk device with min_align_mask = 0xFFF. I have a standalone
user-space test program using writev() to generate direct disk
I/Os with a variety of memory buffer alignments and physically
contiguous extents. It's designed to test the paths through the
iovec level, scatter/gather lists, and the driver DMA descriptors
to make sure all is working correctly.

Prior to this patch series, some tests were failing with 64K page
size when swiotlb=force, but they are all working now.

Tested-by: Michael Kelley <[email protected]>

2024-03-11 13:36:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v6 0/6] Fix double allocation in swiotlb_alloc()


Thanks, I've applied this to the dma-mapping tree and plan to send it
onto Linus after a bit of soaking time.

2024-03-11 20:05:21

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v6 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present

On Fri, 8 Mar 2024 15:28:27 +0000
Will Deacon <[email protected]> wrote:

> Nicolin reports that swiotlb buffer allocations fail for an NVME device
> behind an IOMMU using 64KiB pages. This is because we end up with a
> minimum allocation alignment of 64KiB (for the IOMMU to map the buffer
> safely) but a minimum DMA alignment mask corresponding to a 4KiB NVME
> page (i.e. preserving the 4KiB page offset from the original allocation).
> If the original address is not 4KiB-aligned, the allocation will fail
> because swiotlb_search_pool_area() erroneously compares these unmasked
> bits with the 64KiB-aligned candidate allocation.
>
> Tweak swiotlb_search_pool_area() so that the DMA alignment mask is
> reduced based on the required alignment of the allocation.
>
> Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
> Reported-by: Nicolin Chen <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Tested-by: Nicolin Chen <[email protected]>
> Reviewed-by: Michael Kelley <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> kernel/dma/swiotlb.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index c20324fba814..c381a7ed718f 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -981,8 +981,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> dma_addr_t tbl_dma_addr =
> phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> unsigned long max_slots = get_max_slots(boundary_mask);
> - unsigned int iotlb_align_mask =
> - dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> + unsigned int iotlb_align_mask = dma_get_min_align_mask(dev);
> unsigned int nslots = nr_slots(alloc_size), stride;
> unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> unsigned int index, slots_checked, count = 0, i;
> @@ -993,6 +992,14 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> BUG_ON(!nslots);
> BUG_ON(area_index >= pool->nareas);
>
> + /*
> + * Ensure that the allocation is at least slot-aligned and update
> + * 'iotlb_align_mask' to ignore bits that will be preserved when
> + * offsetting into the allocation.
> + */
> + alloc_align_mask |= (IO_TLB_SIZE - 1);
> + iotlb_align_mask &= ~alloc_align_mask;
> +

I have started writing the KUnit test suite, and the results look
incorrect to me for this case.

I'm calling swiotlb_tbl_map_single() with:

* alloc_align_mask = 0xfff
* a device with min_align_mask = 0xfff
* the 12 lowest bits of orig_addr are 0xfa0

The min_align_mask becomes zero after the masking added by this patch,
and the 12 lowest bits of the returned address are 0x7a0, i.e. not
equal to 0xfa0.

In other words, the min_align_mask constraint is not honored. Of course,
given the above values, it is not possible to honor both min_align_mask
and alloc_align_mask. I find it somewhat surprising that NVMe does not
in fact require that the NVME_CTRL_PAGE_SHIFT low bits are preserved,
as suggested by Nicolin's successful testing.

Why is that?

Does IOMMU do some additional post-processing of the bounce buffer
address to restore the value of bit 11?

Or is this bit always zero in all real-world scenarios?

Petr T

2024-03-11 21:36:22

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH v6 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present

From: Petr Tesa??k <[email protected]>
> On Fri, 8 Mar 2024 15:28:27 +0000
> Will Deacon <[email protected]> wrote:

[snip]

> >
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index c20324fba814..c381a7ed718f 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -981,8 +981,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > dma_addr_t tbl_dma_addr =
> > phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> > unsigned long max_slots = get_max_slots(boundary_mask);
> > - unsigned int iotlb_align_mask =
> > - dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> > + unsigned int iotlb_align_mask = dma_get_min_align_mask(dev);
> > unsigned int nslots = nr_slots(alloc_size), stride;
> > unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> > unsigned int index, slots_checked, count = 0, i;
> > @@ -993,6 +992,14 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > BUG_ON(!nslots);
> > BUG_ON(area_index >= pool->nareas);
> >
> > + /*
> > + * Ensure that the allocation is at least slot-aligned and update
> > + * 'iotlb_align_mask' to ignore bits that will be preserved when
> > + * offsetting into the allocation.
> > + */
> > + alloc_align_mask |= (IO_TLB_SIZE - 1);
> > + iotlb_align_mask &= ~alloc_align_mask;
> > +
>
> I have started writing the KUnit test suite, and the results look
> incorrect to me for this case.
>
> I'm calling swiotlb_tbl_map_single() with:
>
> * alloc_align_mask = 0xfff
> * a device with min_align_mask = 0xfff
> * the 12 lowest bits of orig_addr are 0xfa0
>
> The min_align_mask becomes zero after the masking added by this patch,
> and the 12 lowest bits of the returned address are 0x7a0, i.e. not
> equal to 0xfa0.

The address returned by swiotlb_tbl_map_single() is the slot index
converted to an address, plus the offset modulo the min_align_mask for
the device. The local variable "offset" in swiotlb_tbl_map_single()
should be 0xfa0. The slot index should be an even number to meet
the alloc_align_mask requirement. And the pool->start address should
be at least page aligned, producing a page-aligned address *before* the
offset is added. Can you debug which of these isn't true for the case
you are seeing?

>
> In other words, the min_align_mask constraint is not honored. Of course,
> given the above values, it is not possible to honor both min_align_mask
> and alloc_align_mask.

When orig_addr is specified and min_align_mask is set, alloc_align_mask
governs the address of the _allocated_ space, which is not necessarily the
returned physical address. The min_align_mask may dictate some
pre-padding of size "offset" within the allocated space, and the returned
address is *after* that pre-padding. In this way, both can be honored.

> I find it somewhat surprising that NVMe does not
> in fact require that the NVME_CTRL_PAGE_SHIFT low bits are preserved,
> as suggested by Nicolin's successful testing.
>
> Why is that?

I saw only one stack trace from Nicolin, and it was file system buffer
flushing code that initiated the I/O. In such cases, it's very likely that the
original address is at least 4K aligned. Hence the offset is zero and
the low bits will typically be correct.

>
> Does IOMMU do some additional post-processing of the bounce buffer
> address to restore the value of bit 11?

Not that I can see.

>
> Or is this bit always zero in all real-world scenarios?

For file system initiated I/Os, probably yes. But for raw disk I/Os
initiated from user space, not necessarily.

Michael

2024-03-11 22:49:31

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v6 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present

On Mon, Mar 11, 2024 at 09:36:10PM +0000, Michael Kelley wrote:
> From: Petr Tesařík <[email protected]>
> > On Fri, 8 Mar 2024 15:28:27 +0000
> > Will Deacon <[email protected]> wrote:
> > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > index c20324fba814..c381a7ed718f 100644
> > > --- a/kernel/dma/swiotlb.c
> > > +++ b/kernel/dma/swiotlb.c
> > > @@ -981,8 +981,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > > dma_addr_t tbl_dma_addr =
> > > phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> > > unsigned long max_slots = get_max_slots(boundary_mask);
> > > - unsigned int iotlb_align_mask =
> > > - dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> > > + unsigned int iotlb_align_mask = dma_get_min_align_mask(dev);
> > > unsigned int nslots = nr_slots(alloc_size), stride;
> > > unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> > > unsigned int index, slots_checked, count = 0, i;
> > > @@ -993,6 +992,14 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > > BUG_ON(!nslots);
> > > BUG_ON(area_index >= pool->nareas);
> > >
> > > + /*
> > > + * Ensure that the allocation is at least slot-aligned and update
> > > + * 'iotlb_align_mask' to ignore bits that will be preserved when
> > > + * offsetting into the allocation.
> > > + */
> > > + alloc_align_mask |= (IO_TLB_SIZE - 1);
> > > + iotlb_align_mask &= ~alloc_align_mask;
> > > +
> >
> > I have started writing the KUnit test suite, and the results look
> > incorrect to me for this case.
> >
> > I'm calling swiotlb_tbl_map_single() with:
> >
> > * alloc_align_mask = 0xfff
> > * a device with min_align_mask = 0xfff
> > * the 12 lowest bits of orig_addr are 0xfa0
> >
> > The min_align_mask becomes zero after the masking added by this patch,
> > and the 12 lowest bits of the returned address are 0x7a0, i.e. not
> > equal to 0xfa0.
>
> The address returned by swiotlb_tbl_map_single() is the slot index
> converted to an address, plus the offset modulo the min_align_mask for
> the device. The local variable "offset" in swiotlb_tbl_map_single()
> should be 0xfa0. The slot index should be an even number to meet
> the alloc_align_mask requirement. And the pool->start address should
> be at least page aligned, producing a page-aligned address *before* the
> offset is added. Can you debug which of these isn't true for the case
> you are seeing?

I was just looking into this, and I think the problem starts because
swiotlb_align_offset() doesn't return the offset modulo the min_align_mask,
but instead returns the offset *into the slot*:

return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);

so this presumably lops off bit 11 without adjusting the slot number.

I don't think swiotlb_find_slots() should be handling this though; it's
more about how swiotlb_tbl_map_single() puts the address back together
again.

> > In other words, the min_align_mask constraint is not honored. Of course,
> > given the above values, it is not possible to honor both min_align_mask
> > and alloc_align_mask.
>
> When orig_addr is specified and min_align_mask is set, alloc_align_mask
> governs the address of the _allocated_ space, which is not necessarily the
> returned physical address. The min_align_mask may dictate some
> pre-padding of size "offset" within the allocated space, and the returned
> address is *after* that pre-padding. In this way, both can be honored.

I agree, modulo the issue with the offset calculation.

Will

2024-03-11 22:56:18

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v6 0/6] Fix double allocation in swiotlb_alloc()

On Fri, Mar 08, 2024 at 03:28:23PM +0000, Will Deacon wrote:
> Hi again, folks,
>
> This is version six of the patches which I previously posted at:
>
> v1: https://lore.kernel.org/r/[email protected]
> v2: https://lore.kernel.org/r/[email protected]
> v3: https://lore.kernel.org/r/[email protected]
> v4: https://lore.kernel.org/r/[email protected]
> v5: https://lore.kernel.org/r/[email protected]
>
> Changes since v5 include:
>
> - Rework the final patch to preserve page-alignment for streaming
> requests without a DMA alignment mask
>
> - Added Reviewed-by tags from Michael
>
> Cheers,
>
> Will
>
> Cc: [email protected]
> Cc: Christoph Hellwig <[email protected]>
> Cc: Marek Szyprowski <[email protected]>
> Cc: Robin Murphy <[email protected]>
> Cc: Petr Tesarik <[email protected]>
> Cc: Dexuan Cui <[email protected]>
> Cc: Nicolin Chen <[email protected]>

Was out of office, so couldn't rerun the test. If it still matters:

Tested-by: Nicolin Chen <[email protected]>

2024-03-11 23:19:42

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v6 0/6] Fix double allocation in swiotlb_alloc()

On Mon, Mar 11, 2024 at 02:36:17PM +0100, Christoph Hellwig wrote:
>
> Thanks, I've applied this to the dma-mapping tree and plan to send it
> onto Linus after a bit of soaking time.

Cheers, Christoph. I'm suspecting we'll have a few extra fixes as we
collectively get a better understanding of this code (and also as Petr
develops the selftests), but we can send those on top of this series.

Will

2024-03-12 00:53:07

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v6 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present

On Mon, Mar 11, 2024 at 09:36:10PM +0000, Michael Kelley wrote:
> From: Petr Tesařík <[email protected]>
> > I find it somewhat surprising that NVMe does not
> > in fact require that the NVME_CTRL_PAGE_SHIFT low bits are preserved,
> > as suggested by Nicolin's successful testing.
> >
> > Why is that?
>
> I saw only one stack trace from Nicolin, and it was file system buffer
> flushing code that initiated the I/O. In such cases, it's very likely that the
> original address is at least 4K aligned. Hence the offset is zero and
> the low bits will typically be correct.

Though I didn't dig any deeper here, I do see some unaligned
original addresses passed in at the top level:
fsck.ext4-286 [004] ..... 2.594190: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be750600
fsck.ext4-286 [004] ..... 2.613032: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be780400
fsck.ext4-286 [004] ..... 2.614096: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be7c0600
fsck.ext4-286 [004] ..... 2.614103: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be7e0400
mount-288 [005] ..... 2.615157: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be800400
multipathd-405 [003] ..... 3.062878: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: bda40218
multipathd-502 [002] ..... 3.231721: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: bd3107b8
mount-525 [002] ..... 3.250281: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: bd340200
multipathd-529 [004] ..... 3.259053: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be620478
multipathd-571 [005] ..... 3.292893: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be900328
multipathd-580 [005] ..... 3.318832: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be9207c8

Or is that a different "original address"?

Thanks
Nicolin

2024-03-12 08:53:56

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v6 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present

On Mon, 11 Mar 2024 22:49:11 +0000
Will Deacon <[email protected]> wrote:

> On Mon, Mar 11, 2024 at 09:36:10PM +0000, Michael Kelley wrote:
> > From: Petr Tesařík <[email protected]>
> > > On Fri, 8 Mar 2024 15:28:27 +0000
> > > Will Deacon <[email protected]> wrote:
> > > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > > index c20324fba814..c381a7ed718f 100644
> > > > --- a/kernel/dma/swiotlb.c
> > > > +++ b/kernel/dma/swiotlb.c
> > > > @@ -981,8 +981,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > > > dma_addr_t tbl_dma_addr =
> > > > phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> > > > unsigned long max_slots = get_max_slots(boundary_mask);
> > > > - unsigned int iotlb_align_mask =
> > > > - dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> > > > + unsigned int iotlb_align_mask = dma_get_min_align_mask(dev);
> > > > unsigned int nslots = nr_slots(alloc_size), stride;
> > > > unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> > > > unsigned int index, slots_checked, count = 0, i;
> > > > @@ -993,6 +992,14 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > > > BUG_ON(!nslots);
> > > > BUG_ON(area_index >= pool->nareas);
> > > >
> > > > + /*
> > > > + * Ensure that the allocation is at least slot-aligned and update
> > > > + * 'iotlb_align_mask' to ignore bits that will be preserved when
> > > > + * offsetting into the allocation.
> > > > + */
> > > > + alloc_align_mask |= (IO_TLB_SIZE - 1);
> > > > + iotlb_align_mask &= ~alloc_align_mask;
> > > > +
> > >
> > > I have started writing the KUnit test suite, and the results look
> > > incorrect to me for this case.
> > >
> > > I'm calling swiotlb_tbl_map_single() with:
> > >
> > > * alloc_align_mask = 0xfff
> > > * a device with min_align_mask = 0xfff
> > > * the 12 lowest bits of orig_addr are 0xfa0
> > >
> > > The min_align_mask becomes zero after the masking added by this patch,
> > > and the 12 lowest bits of the returned address are 0x7a0, i.e. not
> > > equal to 0xfa0.
> >
> > The address returned by swiotlb_tbl_map_single() is the slot index
> > converted to an address, plus the offset modulo the min_align_mask for
> > the device. The local variable "offset" in swiotlb_tbl_map_single()
> > should be 0xfa0. The slot index should be an even number to meet
> > the alloc_align_mask requirement. And the pool->start address should
> > be at least page aligned, producing a page-aligned address *before* the
> > offset is added. Can you debug which of these isn't true for the case
> > you are seeing?
>
> I was just looking into this, and I think the problem starts because
> swiotlb_align_offset() doesn't return the offset modulo the min_align_mask,
> but instead returns the offset *into the slot*:
>
> return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);
>
> so this presumably lops off bit 11 without adjusting the slot number.

Yes. You will never see an offset bigger than IO_TLB_SIZE.

> I don't think swiotlb_find_slots() should be handling this though; it's
> more about how swiotlb_tbl_map_single() puts the address back together
> again.
> > > In other words, the min_align_mask constraint is not honored. Of course,
> > > given the above values, it is not possible to honor both min_align_mask
> > > and alloc_align_mask.
> >
> > When orig_addr is specified and min_align_mask is set, alloc_align_mask
> > governs the address of the _allocated_ space, which is not necessarily the
> > returned physical address. The min_align_mask may dictate some
> > pre-padding of size "offset" within the allocated space, and the returned
> > address is *after* that pre-padding. In this way, both can be honored.
>
> I agree, modulo the issue with the offset calculation.

*sigh*

This is exactly what I tried to suggest here:

https://lore.kernel.org/linux-iommu/[email protected]/

To which Robin Murphy replied:

> That doesn't make sense - a caller asks to map some range of kernel
> addresses and they get back a corresponding range of DMA addresses; they
> cannot make any reasonable assumptions about DMA addresses *outside*
> that range.

It sounded like a misunderstanding back then already, but in light of
the present findings, should I send the corresponding patch after all?

Petr T

2024-03-12 09:38:50

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v6 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present

On 2024-03-12 8:52 am, Petr Tesařík wrote:
> On Mon, 11 Mar 2024 22:49:11 +0000
> Will Deacon <[email protected]> wrote:
>
>> On Mon, Mar 11, 2024 at 09:36:10PM +0000, Michael Kelley wrote:
>>> From: Petr Tesařík <[email protected]>
>>>> On Fri, 8 Mar 2024 15:28:27 +0000
>>>> Will Deacon <[email protected]> wrote:
>>>>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>>>>> index c20324fba814..c381a7ed718f 100644
>>>>> --- a/kernel/dma/swiotlb.c
>>>>> +++ b/kernel/dma/swiotlb.c
>>>>> @@ -981,8 +981,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
>>>>> dma_addr_t tbl_dma_addr =
>>>>> phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
>>>>> unsigned long max_slots = get_max_slots(boundary_mask);
>>>>> - unsigned int iotlb_align_mask =
>>>>> - dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
>>>>> + unsigned int iotlb_align_mask = dma_get_min_align_mask(dev);
>>>>> unsigned int nslots = nr_slots(alloc_size), stride;
>>>>> unsigned int offset = swiotlb_align_offset(dev, orig_addr);
>>>>> unsigned int index, slots_checked, count = 0, i;
>>>>> @@ -993,6 +992,14 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
>>>>> BUG_ON(!nslots);
>>>>> BUG_ON(area_index >= pool->nareas);
>>>>>
>>>>> + /*
>>>>> + * Ensure that the allocation is at least slot-aligned and update
>>>>> + * 'iotlb_align_mask' to ignore bits that will be preserved when
>>>>> + * offsetting into the allocation.
>>>>> + */
>>>>> + alloc_align_mask |= (IO_TLB_SIZE - 1);
>>>>> + iotlb_align_mask &= ~alloc_align_mask;
>>>>> +
>>>>
>>>> I have started writing the KUnit test suite, and the results look
>>>> incorrect to me for this case.
>>>>
>>>> I'm calling swiotlb_tbl_map_single() with:
>>>>
>>>> * alloc_align_mask = 0xfff
>>>> * a device with min_align_mask = 0xfff
>>>> * the 12 lowest bits of orig_addr are 0xfa0
>>>>
>>>> The min_align_mask becomes zero after the masking added by this patch,
>>>> and the 12 lowest bits of the returned address are 0x7a0, i.e. not
>>>> equal to 0xfa0.
>>>
>>> The address returned by swiotlb_tbl_map_single() is the slot index
>>> converted to an address, plus the offset modulo the min_align_mask for
>>> the device. The local variable "offset" in swiotlb_tbl_map_single()
>>> should be 0xfa0. The slot index should be an even number to meet
>>> the alloc_align_mask requirement. And the pool->start address should
>>> be at least page aligned, producing a page-aligned address *before* the
>>> offset is added. Can you debug which of these isn't true for the case
>>> you are seeing?
>>
>> I was just looking into this, and I think the problem starts because
>> swiotlb_align_offset() doesn't return the offset modulo the min_align_mask,
>> but instead returns the offset *into the slot*:
>>
>> return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);
>>
>> so this presumably lops off bit 11 without adjusting the slot number.
>
> Yes. You will never see an offset bigger than IO_TLB_SIZE.
>
>> I don't think swiotlb_find_slots() should be handling this though; it's
>> more about how swiotlb_tbl_map_single() puts the address back together
>> again.
>>>> In other words, the min_align_mask constraint is not honored. Of course,
>>>> given the above values, it is not possible to honor both min_align_mask
>>>> and alloc_align_mask.
>>>
>>> When orig_addr is specified and min_align_mask is set, alloc_align_mask
>>> governs the address of the _allocated_ space, which is not necessarily the
>>> returned physical address. The min_align_mask may dictate some
>>> pre-padding of size "offset" within the allocated space, and the returned
>>> address is *after* that pre-padding. In this way, both can be honored.
>>
>> I agree, modulo the issue with the offset calculation.
>
> *sigh*
>
> This is exactly what I tried to suggest here:
>
> https://lore.kernel.org/linux-iommu/[email protected]/
>
> To which Robin Murphy replied:
>
>> That doesn't make sense - a caller asks to map some range of kernel
>> addresses and they get back a corresponding range of DMA addresses; they
>> cannot make any reasonable assumptions about DMA addresses *outside*
>> that range.
>
> It sounded like a misunderstanding back then already, but in light of
> the present findings, should I send the corresponding patch after all?

No, that comment was in reference to the idea of effectively forcing
alloc_align_mask in order to honour min_align_mask - specifically that
the reasoning given for it was spurious, but it's clear now it would
also simply exacerbate this problem.

Simply put, if min_align_mask is specified alone, SWIOTLB can allocate a
roughly-aligned range of slots such that the bounce offset is always
less than IO_TLB_SIZE from the start of the allocation; if both
min_align_mask and alloc_align_mask are specified, then the bounce
offset may be larger than IO_TLB_SIZE, and SWIOTLB needs to be able to
handle that correctly. There is still no benefit in forcing the latter
case to happen more often than it needs to.

Thanks,
Robin.

2024-03-12 10:44:20

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v6 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present

On Mon, 11 Mar 2024 17:52:31 -0700
Nicolin Chen <[email protected]> wrote:

> On Mon, Mar 11, 2024 at 09:36:10PM +0000, Michael Kelley wrote:
> > From: Petr Tesařík <[email protected]>
> > > I find it somewhat surprising that NVMe does not
> > > in fact require that the NVME_CTRL_PAGE_SHIFT low bits are preserved,
> > > as suggested by Nicolin's successful testing.
> > >
> > > Why is that?
> >
> > I saw only one stack trace from Nicolin, and it was file system buffer
> > flushing code that initiated the I/O. In such cases, it's very likely that the
> > original address is at least 4K aligned. Hence the offset is zero and
> > the low bits will typically be correct.
>
> Though I didn't dig any deeper here, I do see some unaligned
> original addresses passed in at the top level:
> fsck.ext4-286 [004] ..... 2.594190: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be750600
> fsck.ext4-286 [004] ..... 2.613032: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be780400
> fsck.ext4-286 [004] ..... 2.614096: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be7c0600
> fsck.ext4-286 [004] ..... 2.614103: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be7e0400
> mount-288 [005] ..... 2.615157: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be800400
> multipathd-405 [003] ..... 3.062878: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: bda40218
> multipathd-502 [002] ..... 3.231721: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: bd3107b8
> mount-525 [002] ..... 3.250281: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: bd340200
> multipathd-529 [004] ..... 3.259053: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be620478
> multipathd-571 [005] ..... 3.292893: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be900328
> multipathd-580 [005] ..... 3.318832: iommu_dma_map_page: calling swiotlb_tbl_map_single with phys: be9207c8

Thank you. Indeed, bit 11 of the physical address was zero in all the
above calls, and that's why it works.

Petr T

2024-03-12 12:57:24

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v6 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present

On 12/03/2024 9:51 am, Petr Tesařík wrote:
> On Tue, 12 Mar 2024 09:38:36 +0000
> Robin Murphy <[email protected]> wrote:
>
>> On 2024-03-12 8:52 am, Petr Tesařík wrote:
>>> On Mon, 11 Mar 2024 22:49:11 +0000
>>> Will Deacon <[email protected]> wrote:
>>>
>>>> On Mon, Mar 11, 2024 at 09:36:10PM +0000, Michael Kelley wrote:
>>>>> From: Petr Tesařík <[email protected]>
>>>>>> On Fri, 8 Mar 2024 15:28:27 +0000
>>>>>> Will Deacon <[email protected]> wrote:
>>>>>>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>>>>>>> index c20324fba814..c381a7ed718f 100644
>>>>>>> --- a/kernel/dma/swiotlb.c
>>>>>>> +++ b/kernel/dma/swiotlb.c
>>>>>>> @@ -981,8 +981,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
>>>>>>> dma_addr_t tbl_dma_addr =
>>>>>>> phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
>>>>>>> unsigned long max_slots = get_max_slots(boundary_mask);
>>>>>>> - unsigned int iotlb_align_mask =
>>>>>>> - dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
>>>>>>> + unsigned int iotlb_align_mask = dma_get_min_align_mask(dev);
>>>>>>> unsigned int nslots = nr_slots(alloc_size), stride;
>>>>>>> unsigned int offset = swiotlb_align_offset(dev, orig_addr);
>>>>>>> unsigned int index, slots_checked, count = 0, i;
>>>>>>> @@ -993,6 +992,14 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
>>>>>>> BUG_ON(!nslots);
>>>>>>> BUG_ON(area_index >= pool->nareas);
>>>>>>>
>>>>>>> + /*
>>>>>>> + * Ensure that the allocation is at least slot-aligned and update
>>>>>>> + * 'iotlb_align_mask' to ignore bits that will be preserved when
>>>>>>> + * offsetting into the allocation.
>>>>>>> + */
>>>>>>> + alloc_align_mask |= (IO_TLB_SIZE - 1);
>>>>>>> + iotlb_align_mask &= ~alloc_align_mask;
>>>>>>> +
>>>>>>
>>>>>> I have started writing the KUnit test suite, and the results look
>>>>>> incorrect to me for this case.
>>>>>>
>>>>>> I'm calling swiotlb_tbl_map_single() with:
>>>>>>
>>>>>> * alloc_align_mask = 0xfff
>>>>>> * a device with min_align_mask = 0xfff
>>>>>> * the 12 lowest bits of orig_addr are 0xfa0
>>>>>>
>>>>>> The min_align_mask becomes zero after the masking added by this patch,
>>>>>> and the 12 lowest bits of the returned address are 0x7a0, i.e. not
>>>>>> equal to 0xfa0.
>>>>>
>>>>> The address returned by swiotlb_tbl_map_single() is the slot index
>>>>> converted to an address, plus the offset modulo the min_align_mask for
>>>>> the device. The local variable "offset" in swiotlb_tbl_map_single()
>>>>> should be 0xfa0. The slot index should be an even number to meet
>>>>> the alloc_align_mask requirement. And the pool->start address should
>>>>> be at least page aligned, producing a page-aligned address *before* the
>>>>> offset is added. Can you debug which of these isn't true for the case
>>>>> you are seeing?
>>>>
>>>> I was just looking into this, and I think the problem starts because
>>>> swiotlb_align_offset() doesn't return the offset modulo the min_align_mask,
>>>> but instead returns the offset *into the slot*:
>>>>
>>>> return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);
>>>>
>>>> so this presumably lops off bit 11 without adjusting the slot number.
>>>
>>> Yes. You will never see an offset bigger than IO_TLB_SIZE.
>>>
>>>> I don't think swiotlb_find_slots() should be handling this though; it's
>>>> more about how swiotlb_tbl_map_single() puts the address back together
>>>> again.
>>>>>> In other words, the min_align_mask constraint is not honored. Of course,
>>>>>> given the above values, it is not possible to honor both min_align_mask
>>>>>> and alloc_align_mask.
>>>>>
>>>>> When orig_addr is specified and min_align_mask is set, alloc_align_mask
>>>>> governs the address of the _allocated_ space, which is not necessarily the
>>>>> returned physical address. The min_align_mask may dictate some
>>>>> pre-padding of size "offset" within the allocated space, and the returned
>>>>> address is *after* that pre-padding. In this way, both can be honored.
>>>>
>>>> I agree, modulo the issue with the offset calculation.
>>>
>>> *sigh*
>>>
>>> This is exactly what I tried to suggest here:
>>>
>>> https://lore.kernel.org/linux-iommu/[email protected]/
>>>
>>> To which Robin Murphy replied:
>>>
>>>> That doesn't make sense - a caller asks to map some range of kernel
>>>> addresses and they get back a corresponding range of DMA addresses; they
>>>> cannot make any reasonable assumptions about DMA addresses *outside*
>>>> that range.
>>>
>>> It sounded like a misunderstanding back then already, but in light of
>>> the present findings, should I send the corresponding patch after all?
>>
>> No, that comment was in reference to the idea of effectively forcing
>> alloc_align_mask in order to honour min_align_mask - specifically that
>> the reasoning given for it was spurious, but it's clear now it would
>> also simply exacerbate this problem.
>>
>> Simply put, if min_align_mask is specified alone, SWIOTLB can allocate a
>> roughly-aligned range of slots such that the bounce offset is always
>> less than IO_TLB_SIZE from the start of the allocation; if both
>> min_align_mask and alloc_align_mask are specified, then the bounce
>> offset may be larger than IO_TLB_SIZE, and SWIOTLB needs to be able to
>> handle that correctly. There is still no benefit in forcing the latter
>> case to happen more often than it needs to.
>
> So yes, it was a misunderstanding. Here's what I wrote:
>
> I thought about it some more, and I believe I know what should happen
> if the first two constraints appear to be mutually exclusive.
>
> I thought it was clear that the two constraints "appear mutually
> exclusive" only if both are specified. Admittedly, I also tried to
> combine the historic page alignment with the explicit alloc_align_mask
> somehow, so that could have caused confusion.

The constraints aren't mutually exclusive though, since they represent
different (but slightly overlapping) things. Any context in which they
appear to be is simply wrong.

> Anyway, let me send the patch and discuss it in a new thread.

I think it gets worse - looks like swiotlb_release_slots() actually
relies on offset < IO_TLB_SIZE and will free the wrong slots if not. And
since it no longer has the original alloc_align_mask, that seems...
tricky :(

Thanks,
Robin.

2024-03-12 09:52:34

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v6 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present

On Tue, 12 Mar 2024 09:38:36 +0000
Robin Murphy <[email protected]> wrote:

> On 2024-03-12 8:52 am, Petr Tesařík wrote:
> > On Mon, 11 Mar 2024 22:49:11 +0000
> > Will Deacon <[email protected]> wrote:
> >
> >> On Mon, Mar 11, 2024 at 09:36:10PM +0000, Michael Kelley wrote:
> >>> From: Petr Tesařík <[email protected]>
> >>>> On Fri, 8 Mar 2024 15:28:27 +0000
> >>>> Will Deacon <[email protected]> wrote:
> >>>>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> >>>>> index c20324fba814..c381a7ed718f 100644
> >>>>> --- a/kernel/dma/swiotlb.c
> >>>>> +++ b/kernel/dma/swiotlb.c
> >>>>> @@ -981,8 +981,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> >>>>> dma_addr_t tbl_dma_addr =
> >>>>> phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> >>>>> unsigned long max_slots = get_max_slots(boundary_mask);
> >>>>> - unsigned int iotlb_align_mask =
> >>>>> - dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> >>>>> + unsigned int iotlb_align_mask = dma_get_min_align_mask(dev);
> >>>>> unsigned int nslots = nr_slots(alloc_size), stride;
> >>>>> unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> >>>>> unsigned int index, slots_checked, count = 0, i;
> >>>>> @@ -993,6 +992,14 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> >>>>> BUG_ON(!nslots);
> >>>>> BUG_ON(area_index >= pool->nareas);
> >>>>>
> >>>>> + /*
> >>>>> + * Ensure that the allocation is at least slot-aligned and update
> >>>>> + * 'iotlb_align_mask' to ignore bits that will be preserved when
> >>>>> + * offsetting into the allocation.
> >>>>> + */
> >>>>> + alloc_align_mask |= (IO_TLB_SIZE - 1);
> >>>>> + iotlb_align_mask &= ~alloc_align_mask;
> >>>>> +
> >>>>
> >>>> I have started writing the KUnit test suite, and the results look
> >>>> incorrect to me for this case.
> >>>>
> >>>> I'm calling swiotlb_tbl_map_single() with:
> >>>>
> >>>> * alloc_align_mask = 0xfff
> >>>> * a device with min_align_mask = 0xfff
> >>>> * the 12 lowest bits of orig_addr are 0xfa0
> >>>>
> >>>> The min_align_mask becomes zero after the masking added by this patch,
> >>>> and the 12 lowest bits of the returned address are 0x7a0, i.e. not
> >>>> equal to 0xfa0.
> >>>
> >>> The address returned by swiotlb_tbl_map_single() is the slot index
> >>> converted to an address, plus the offset modulo the min_align_mask for
> >>> the device. The local variable "offset" in swiotlb_tbl_map_single()
> >>> should be 0xfa0. The slot index should be an even number to meet
> >>> the alloc_align_mask requirement. And the pool->start address should
> >>> be at least page aligned, producing a page-aligned address *before* the
> >>> offset is added. Can you debug which of these isn't true for the case
> >>> you are seeing?
> >>
> >> I was just looking into this, and I think the problem starts because
> >> swiotlb_align_offset() doesn't return the offset modulo the min_align_mask,
> >> but instead returns the offset *into the slot*:
> >>
> >> return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);
> >>
> >> so this presumably lops off bit 11 without adjusting the slot number.
> >
> > Yes. You will never see an offset bigger than IO_TLB_SIZE.
> >
> >> I don't think swiotlb_find_slots() should be handling this though; it's
> >> more about how swiotlb_tbl_map_single() puts the address back together
> >> again.
> >>>> In other words, the min_align_mask constraint is not honored. Of course,
> >>>> given the above values, it is not possible to honor both min_align_mask
> >>>> and alloc_align_mask.
> >>>
> >>> When orig_addr is specified and min_align_mask is set, alloc_align_mask
> >>> governs the address of the _allocated_ space, which is not necessarily the
> >>> returned physical address. The min_align_mask may dictate some
> >>> pre-padding of size "offset" within the allocated space, and the returned
> >>> address is *after* that pre-padding. In this way, both can be honored.
> >>
> >> I agree, modulo the issue with the offset calculation.
> >
> > *sigh*
> >
> > This is exactly what I tried to suggest here:
> >
> > https://lore.kernel.org/linux-iommu/[email protected]/
> >
> > To which Robin Murphy replied:
> >
> >> That doesn't make sense - a caller asks to map some range of kernel
> >> addresses and they get back a corresponding range of DMA addresses; they
> >> cannot make any reasonable assumptions about DMA addresses *outside*
> >> that range.
> >
> > It sounded like a misunderstanding back then already, but in light of
> > the present findings, should I send the corresponding patch after all?
>
> No, that comment was in reference to the idea of effectively forcing
> alloc_align_mask in order to honour min_align_mask - specifically that
> the reasoning given for it was spurious, but it's clear now it would
> also simply exacerbate this problem.
>
> Simply put, if min_align_mask is specified alone, SWIOTLB can allocate a
> roughly-aligned range of slots such that the bounce offset is always
> less than IO_TLB_SIZE from the start of the allocation; if both
> min_align_mask and alloc_align_mask are specified, then the bounce
> offset may be larger than IO_TLB_SIZE, and SWIOTLB needs to be able to
> handle that correctly. There is still no benefit in forcing the latter
> case to happen more often than it needs to.

So yes, it was a misunderstanding. Here's what I wrote:

I thought about it some more, and I believe I know what should happen
if the first two constraints appear to be mutually exclusive.

I thought it was clear that the two constraints "appear mutually
exclusive" only if both are specified. Admittedly, I also tried to
combine the historic page alignment with the explicit alloc_align_mask
somehow, so that could have caused confusion.

Anyway, let me send the patch and discuss it in a new thread.

Petr T

2024-03-18 03:39:28

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH v6 1/6] swiotlb: Fix double-allocation of slots due to broken alignment handling

From: Will Deacon <[email protected]> Sent: Friday, March 8, 2024 7:28 AM
>
>
> Fix the problem by treating the allocation alignment separately to any
> additional alignment requirements from the device, using the maximum
> of the two as the stride to search the buffer slots and taking care
> to ensure a minimum of page-alignment for buffers larger than a page.
>
> This also resolves swiotlb allocation failures occuring due to the
> inclusion of ~PAGE_MASK in 'iotlb_align_mask' for large allocations and
> resulting in alignment requirements exceeding swiotlb_max_mapping_size().
>
> Fixes: bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix")
> Fixes: 0eee5ae10256 ("swiotlb: fix slot alignment checks")
> Cc: Christoph Hellwig <[email protected]>
> Cc: Marek Szyprowski <[email protected]>
> Cc: Robin Murphy <[email protected]>
> Cc: Dexuan Cui <[email protected]>
> Reviewed-by: Michael Kelley <[email protected]>
> Reviewed-by: Petr Tesarik <[email protected]>
> Tested-by: Nicolin Chen <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> kernel/dma/swiotlb.c | 28 +++++++++++++++-------------
> 1 file changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index b079a9a8e087..2ec2cc81f1a2 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -982,7 +982,7 @@ static int swiotlb_search_pool_area(struct device
> *dev, struct io_tlb_pool *pool
> phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> unsigned long max_slots = get_max_slots(boundary_mask);
> unsigned int iotlb_align_mask =
> - dma_get_min_align_mask(dev) | alloc_align_mask;
> + dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> unsigned int nslots = nr_slots(alloc_size), stride;
> unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> unsigned int index, slots_checked, count = 0, i;
> @@ -993,19 +993,18 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> BUG_ON(!nslots);
> BUG_ON(area_index >= pool->nareas);
>
> + /*
> + * For mappings with an alignment requirement don't bother looping to
> + * unaligned slots once we found an aligned one.
> + */
> + stride = get_max_slots(max(alloc_align_mask, iotlb_align_mask));
> +
> /*
> * For allocations of PAGE_SIZE or larger only look for page aligned
> * allocations.
> */
> if (alloc_size >= PAGE_SIZE)
> - iotlb_align_mask |= ~PAGE_MASK;
> - iotlb_align_mask &= ~(IO_TLB_SIZE - 1);
> -
> - /*
> - * For mappings with an alignment requirement don't bother looping to
> - * unaligned slots once we found an aligned one.
> - */
> - stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
> + stride = umax(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
>
> spin_lock_irqsave(&area->lock, flags);
> if (unlikely(nslots > pool->area_nslabs - area->used))
> @@ -1015,11 +1014,14 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> index = area->index;
>
> for (slots_checked = 0; slots_checked < pool->area_nslabs; ) {
> - slot_index = slot_base + index;
> + phys_addr_t tlb_addr;
>
> - if (orig_addr &&
> - (slot_addr(tbl_dma_addr, slot_index) &
> - iotlb_align_mask) != (orig_addr & iotlb_align_mask)) {
> + slot_index = slot_base + index;
> + tlb_addr = slot_addr(tbl_dma_addr, slot_index);
> +
> + if ((tlb_addr & alloc_align_mask) ||
> + (orig_addr && (tlb_addr & iotlb_align_mask) !=
> + (orig_addr & iotlb_align_mask))) {
> index = wrap_area_index(pool, index + 1);
> slots_checked++;
> continue;
> --

Question for IOMMU folks: alloc_align_mask is set only in
iommu_dma_map_page(), using the IOMMU granule size.
Can the granule ever be larger than PAGE_SIZE? If so,
swiotlb_search_pool_area() can fail to find slots even when
the swiotlb is empty.

The failure happens when alloc_align_mask is larger than
PAGE_SIZE and the alloc_size is the swiotlb max of 256 Kbytes
(or even a bit smaller in some cases). The swiotlb memory
pool is allocated in swiotlb_memblock_alloc() with PAGE_SIZE
alignment. On x86/x64, if alloc_align_mask is 8191 and the
pool start address is something like XXXX1000, slot 0 won't
satisfy alloc_align_mask. Slot 1 satisfies alloc_align_mask,
but has a size of 127 slots and can't fulfill a 256 Kbyte request.
The problem repeats through the entire swiotlb and the
allocation fails.

Updating swiotlb_memblock_alloc() to use an alignment of
IO_TLB_SIZE * IO_TLB_SEGSIZE (i.e., 256 Kbytes) solves the
problem for all viable configurations.

Michael

2024-03-18 12:37:03

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v6 1/6] swiotlb: Fix double-allocation of slots due to broken alignment handling

On Mon, 18 Mar 2024 03:39:07 +0000
Michael Kelley <[email protected]> wrote:

> From: Will Deacon <[email protected]> Sent: Friday, March 8, 2024 7:28 AM
> >
> >
> > Fix the problem by treating the allocation alignment separately to any
> > additional alignment requirements from the device, using the maximum
> > of the two as the stride to search the buffer slots and taking care
> > to ensure a minimum of page-alignment for buffers larger than a page.
> >
> > This also resolves swiotlb allocation failures occuring due to the
> > inclusion of ~PAGE_MASK in 'iotlb_align_mask' for large allocations and
> > resulting in alignment requirements exceeding swiotlb_max_mapping_size().
> >
> > Fixes: bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix")
> > Fixes: 0eee5ae10256 ("swiotlb: fix slot alignment checks")
> > Cc: Christoph Hellwig <[email protected]>
> > Cc: Marek Szyprowski <[email protected]>
> > Cc: Robin Murphy <[email protected]>
> > Cc: Dexuan Cui <[email protected]>
> > Reviewed-by: Michael Kelley <[email protected]>
> > Reviewed-by: Petr Tesarik <[email protected]>
> > Tested-by: Nicolin Chen <[email protected]>
> > Signed-off-by: Will Deacon <[email protected]>
> > ---
> > kernel/dma/swiotlb.c | 28 +++++++++++++++-------------
> > 1 file changed, 15 insertions(+), 13 deletions(-)
> >
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index b079a9a8e087..2ec2cc81f1a2 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -982,7 +982,7 @@ static int swiotlb_search_pool_area(struct device
> > *dev, struct io_tlb_pool *pool
> > phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> > unsigned long max_slots = get_max_slots(boundary_mask);
> > unsigned int iotlb_align_mask =
> > - dma_get_min_align_mask(dev) | alloc_align_mask;
> > + dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> > unsigned int nslots = nr_slots(alloc_size), stride;
> > unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> > unsigned int index, slots_checked, count = 0, i;
> > @@ -993,19 +993,18 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > BUG_ON(!nslots);
> > BUG_ON(area_index >= pool->nareas);
> >
> > + /*
> > + * For mappings with an alignment requirement don't bother looping to
> > + * unaligned slots once we found an aligned one.
> > + */
> > + stride = get_max_slots(max(alloc_align_mask, iotlb_align_mask));
> > +
> > /*
> > * For allocations of PAGE_SIZE or larger only look for page aligned
> > * allocations.
> > */
> > if (alloc_size >= PAGE_SIZE)
> > - iotlb_align_mask |= ~PAGE_MASK;
> > - iotlb_align_mask &= ~(IO_TLB_SIZE - 1);
> > -
> > - /*
> > - * For mappings with an alignment requirement don't bother looping to
> > - * unaligned slots once we found an aligned one.
> > - */
> > - stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
> > + stride = umax(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
> >
> > spin_lock_irqsave(&area->lock, flags);
> > if (unlikely(nslots > pool->area_nslabs - area->used))
> > @@ -1015,11 +1014,14 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > index = area->index;
> >
> > for (slots_checked = 0; slots_checked < pool->area_nslabs; ) {
> > - slot_index = slot_base + index;
> > + phys_addr_t tlb_addr;
> >
> > - if (orig_addr &&
> > - (slot_addr(tbl_dma_addr, slot_index) &
> > - iotlb_align_mask) != (orig_addr & iotlb_align_mask)) {
> > + slot_index = slot_base + index;
> > + tlb_addr = slot_addr(tbl_dma_addr, slot_index);
> > +
> > + if ((tlb_addr & alloc_align_mask) ||
> > + (orig_addr && (tlb_addr & iotlb_align_mask) !=
> > + (orig_addr & iotlb_align_mask))) {
> > index = wrap_area_index(pool, index + 1);
> > slots_checked++;
> > continue;
> > --
>
> Question for IOMMU folks: alloc_align_mask is set only in
> iommu_dma_map_page(), using the IOMMU granule size.
> Can the granule ever be larger than PAGE_SIZE?

I don't feel as part of the IOMMU folks, but since I have spent one and
a half aeons looking around DMA-related code, let me answer anyway.

No, this is not possible, see here:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/iova.c?h=v6.8#n61

HTH
Petr T

> If so,
> swiotlb_search_pool_area() can fail to find slots even when
> the swiotlb is empty.
>
> The failure happens when alloc_align_mask is larger than
> PAGE_SIZE and the alloc_size is the swiotlb max of 256 Kbytes
> (or even a bit smaller in some cases). The swiotlb memory
> pool is allocated in swiotlb_memblock_alloc() with PAGE_SIZE
> alignment. On x86/x64, if alloc_align_mask is 8191 and the
> pool start address is something like XXXX1000, slot 0 won't
> satisfy alloc_align_mask. Slot 1 satisfies alloc_align_mask,
> but has a size of 127 slots and can't fulfill a 256 Kbyte request.
> The problem repeats through the entire swiotlb and the
> allocation fails.
>
> Updating swiotlb_memblock_alloc() to use an alignment of
> IO_TLB_SIZE * IO_TLB_SEGSIZE (i.e., 256 Kbytes) solves the
> problem for all viable configurations.
>
> Michael
>