Hi folks,
These two patches fix a nasty double allocation problem in swiotlb_alloc()
and add a diagnostic to help catch any similar issues in future. This was
a royal pain to track down and I've had to make a bit of a leap at the
correct alignment semantics (i.e. iotlb_align_mask vs alloc_align_mask).
Without these changes, we've been observing random vsock hangs when
communicating with virtual machines in Android.
Please have a look!
Cheers,
Will
Cc: [email protected]
Cc: Christoph Hellwig <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Petr Tesarik <[email protected]>
Cc: Dexuan Cui <[email protected]>
--->8
Will Deacon (2):
swiotlb: Fix allocation alignment requirement when searching slots
swiotlb: Enforce page alignment in swiotlb_alloc()
kernel/dma/swiotlb.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)
--
2.43.0.429.g432eaa2c6b-goog
Commit bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix"),
which was a fix for commit 0eee5ae10256 ("swiotlb: fix slot alignment
checks"), causes a functional regression with vsock in a virtual machine
using bouncing via a restricted DMA SWIOTLB pool.
When virtio allocates the virtqueues for the vsock device using
dma_alloc_coherent(), the SWIOTLB search fails to take into account the
8KiB buffer size and returns page-unaligned allocations if 'area->index'
was left unaligned by a previous allocation from the buffer:
# Final address in brackets is the SWIOTLB address returned to the caller
| virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1645-1649/7168 (0x98326800)
| virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1649-1653/7168 (0x98328800)
| virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1653-1657/7168 (0x9832a800)
This ends in tears (typically buffer corruption and/or a hang) because
swiotlb_alloc() blindly returns the 'struct page' corresponding to the
allocation and therefore the first half of the page ends up being
allocated twice.
Fix the problem by treating the allocation alignment separately to any
additional alignment requirements from the device, using the maximum
of the two as the stride to search the buffer slots.
Fixes: bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix")
Fixes: 0eee5ae10256 ("swiotlb: fix slot alignment checks")
Cc: Christoph Hellwig <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Petr Tesarik <[email protected]>
Cc: Dexuan Cui <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
kernel/dma/swiotlb.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b079a9a8e087..25febb9e670c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -982,7 +982,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
unsigned long max_slots = get_max_slots(boundary_mask);
unsigned int iotlb_align_mask =
- dma_get_min_align_mask(dev) | alloc_align_mask;
+ dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
unsigned int nslots = nr_slots(alloc_size), stride;
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int index, slots_checked, count = 0, i;
@@ -998,14 +998,13 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
* allocations.
*/
if (alloc_size >= PAGE_SIZE)
- iotlb_align_mask |= ~PAGE_MASK;
- iotlb_align_mask &= ~(IO_TLB_SIZE - 1);
+ alloc_align_mask |= ~PAGE_MASK;
/*
* For mappings with an alignment requirement don't bother looping to
* unaligned slots once we found an aligned one.
*/
- stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
+ stride = (max(alloc_align_mask, iotlb_align_mask) >> IO_TLB_SHIFT) + 1;
spin_lock_irqsave(&area->lock, flags);
if (unlikely(nslots > pool->area_nslabs - area->used))
@@ -1015,15 +1014,18 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
index = area->index;
for (slots_checked = 0; slots_checked < pool->area_nslabs; ) {
+ phys_addr_t tlb_addr;
+
slot_index = slot_base + index;
+ tlb_addr = slot_addr(tbl_dma_addr, slot_index);
+
+ if (tlb_addr & alloc_align_mask)
+ goto next_slot;
if (orig_addr &&
- (slot_addr(tbl_dma_addr, slot_index) &
- iotlb_align_mask) != (orig_addr & iotlb_align_mask)) {
- index = wrap_area_index(pool, index + 1);
- slots_checked++;
- continue;
- }
+ (tlb_addr & iotlb_align_mask) !=
+ (orig_addr & iotlb_align_mask))
+ goto next_slot;
if (!iommu_is_span_boundary(slot_index, nslots,
nr_slots(tbl_dma_addr),
@@ -1033,6 +1035,10 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
}
index = wrap_area_index(pool, index + stride);
slots_checked += stride;
+ continue;
+next_slot:
+ index = wrap_area_index(pool, index + 1);
+ slots_checked++;
}
not_found:
--
2.43.0.429.g432eaa2c6b-goog
When allocating pages from a restricted DMA pool in swiotlb_alloc(),
the buffer address is blindly converted to a 'struct page *' that is
returned to the caller. In the unlikely event of an allocation bug,
page-unaligned addresses are not detected and slots can silently be
double-allocated.
Add a simple check of the buffer alignment in swiotlb_alloc() to make
debugging a little easier if something has gone wonky.
Cc: Christoph Hellwig <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Petr Tesarik <[email protected]>
Cc: Dexuan Cui <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
kernel/dma/swiotlb.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 25febb9e670c..92433ea9f2d2 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -1647,6 +1647,12 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
return NULL;
tlb_addr = slot_addr(pool->start, index);
+ if (!PAGE_ALIGNED(tlb_addr)) {
+ dev_WARN_ONCE(dev, 1, "Cannot return 'struct page *' for non page-aligned swiotlb addr 0x%pa.\n",
+ &tlb_addr);
+ swiotlb_release_slots(dev, tlb_addr);
+ return NULL;
+ }
return pfn_to_page(PFN_DOWN(tlb_addr));
}
--
2.43.0.429.g432eaa2c6b-goog
Hi Will,
On Fri, 26 Jan 2024 15:19:54 +0000
Will Deacon <[email protected]> wrote:
> Hi folks,
>
> These two patches fix a nasty double allocation problem in swiotlb_alloc()
> and add a diagnostic to help catch any similar issues in future. This was
> a royal pain to track down and I've had to make a bit of a leap at the
> correct alignment semantics (i.e. iotlb_align_mask vs alloc_align_mask).
Welcome to the club. I believe you had to re-discover what I described here:
https://lore.kernel.org/linux-iommu/[email protected]/
The relevant part would be this:
To sum it up, there are two types of alignment:
1. specified by a device's min_align_mask; this says how many low
bits of a buffer's physical address must be preserved,
2. specified by allocation size and/or the alignment parameter;
this says how many low bits in the first IO TLB slot's physical
address must be zero.
Fix for that has been sitting on my TODO list for too long. :-(
Petr T
> Without these changes, we've been observing random vsock hangs when
> communicating with virtual machines in Android.
>
> Please have a look!
>
> Cheers,
>
> Will
>
> Cc: [email protected]
> Cc: Christoph Hellwig <[email protected]>
> Cc: Marek Szyprowski <[email protected]>
> Cc: Robin Murphy <[email protected]>
> Cc: Petr Tesarik <[email protected]>
> Cc: Dexuan Cui <[email protected]>
>
> --->8
>
> Will Deacon (2):
> swiotlb: Fix allocation alignment requirement when searching slots
> swiotlb: Enforce page alignment in swiotlb_alloc()
>
> kernel/dma/swiotlb.c | 32 ++++++++++++++++++++++----------
> 1 file changed, 22 insertions(+), 10 deletions(-)
>
On Fri, 26 Jan 2024 15:19:56 +0000
Will Deacon <[email protected]> wrote:
> When allocating pages from a restricted DMA pool in swiotlb_alloc(),
> the buffer address is blindly converted to a 'struct page *' that is
> returned to the caller. In the unlikely event of an allocation bug,
> page-unaligned addresses are not detected and slots can silently be
> double-allocated.
>
> Add a simple check of the buffer alignment in swiotlb_alloc() to make
> debugging a little easier if something has gone wonky.
>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Marek Szyprowski <[email protected]>
> Cc: Robin Murphy <[email protected]>
> Cc: Petr Tesarik <[email protected]>
> Cc: Dexuan Cui <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> kernel/dma/swiotlb.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 25febb9e670c..92433ea9f2d2 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -1647,6 +1647,12 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
> return NULL;
>
> tlb_addr = slot_addr(pool->start, index);
> + if (!PAGE_ALIGNED(tlb_addr)) {
> + dev_WARN_ONCE(dev, 1, "Cannot return 'struct page *' for non page-aligned swiotlb addr 0x%pa.\n",
> + &tlb_addr);
> + swiotlb_release_slots(dev, tlb_addr);
> + return NULL;
> + }
Is there a reason not to use BUG_ON()? If yes, I would at least go for:
+ if (unlikely(!PAGE_ALIGNED(tlb_addr))) {
Other than that, yes, such cheap sanity checking looks like a good idea.
Petr T
On Fri, Jan 26, 2024 at 03:19:56PM +0000, Will Deacon wrote:
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 25febb9e670c..92433ea9f2d2 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -1647,6 +1647,12 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
> return NULL;
>
> tlb_addr = slot_addr(pool->start, index);
> + if (!PAGE_ALIGNED(tlb_addr)) {
> + dev_WARN_ONCE(dev, 1, "Cannot return 'struct page *' for non page-aligned swiotlb addr 0x%pa.\n",
> + &tlb_addr);
> + swiotlb_release_slots(dev, tlb_addr);
> + return NULL;
> + }
>
> return pfn_to_page(PFN_DOWN(tlb_addr));
So PFN_DOWN aligns the address and thus per se converting the unaligned
address isn't a problem. That being said swiotlb obviously should never
allocate unaligned addresses, but the placement of this check feels
odd to me. Also because it only catches swiotlb_alloc and not the
map side.
Maybe just throw a WARN_ON_ONCE into slot_addr() ?
> }
> --
> 2.43.0.429.g432eaa2c6b-goog
---end quoted text---
On Mon, 29 Jan 2024 07:08:53 +0100
Christoph Hellwig <[email protected]> wrote:
> On Fri, Jan 26, 2024 at 03:19:56PM +0000, Will Deacon wrote:
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index 25febb9e670c..92433ea9f2d2 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -1647,6 +1647,12 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
> > return NULL;
> >
> > tlb_addr = slot_addr(pool->start, index);
> > + if (!PAGE_ALIGNED(tlb_addr)) {
> > + dev_WARN_ONCE(dev, 1, "Cannot return 'struct page *' for non page-aligned swiotlb addr 0x%pa.\n",
> > + &tlb_addr);
> > + swiotlb_release_slots(dev, tlb_addr);
> > + return NULL;
> > + }
> >
> > return pfn_to_page(PFN_DOWN(tlb_addr));
>
> So PFN_DOWN aligns the address and thus per se converting the unaligned
> address isn't a problem. That being said swiotlb obviously should never
> allocate unaligned addresses, but the placement of this check feels
> odd to me. Also because it only catches swiotlb_alloc and not the
> map side.
We may have to rethink how alignment constraints are interpreted. See
also my reply to PATCH 1/2.
> Maybe just throw a WARN_ON_ONCE into slot_addr() ?
Yes.
Or, what if I write a KUnit test suite for swiotlb to combat this
constant stream of various regressions?
Petr T
On Mon, Jan 29, 2024 at 08:43:26AM +0100, Petr Tesařík wrote:
> > So PFN_DOWN aligns the address and thus per se converting the unaligned
> > address isn't a problem. That being said swiotlb obviously should never
> > allocate unaligned addresses, but the placement of this check feels
> > odd to me. Also because it only catches swiotlb_alloc and not the
> > map side.
>
> We may have to rethink how alignment constraints are interpreted. See
> also my reply to PATCH 1/2.
>
> > Maybe just throw a WARN_ON_ONCE into slot_addr() ?
>
> Yes.
>
> Or, what if I write a KUnit test suite for swiotlb to combat this
> constant stream of various regressions?
Both sounds good to me.
On Fri, Jan 26, 2024 at 05:20:59PM +0100, Petr Tesařík wrote:
> On Fri, 26 Jan 2024 15:19:54 +0000
> Will Deacon <[email protected]> wrote:
>
> > Hi folks,
> >
> > These two patches fix a nasty double allocation problem in swiotlb_alloc()
> > and add a diagnostic to help catch any similar issues in future. This was
> > a royal pain to track down and I've had to make a bit of a leap at the
> > correct alignment semantics (i.e. iotlb_align_mask vs alloc_align_mask).
>
> Welcome to the club. I believe you had to re-discover what I described here:
>
> https://lore.kernel.org/linux-iommu/[email protected]/
Lucky me...
> The relevant part would be this:
>
> To sum it up, there are two types of alignment:
>
> 1. specified by a device's min_align_mask; this says how many low
> bits of a buffer's physical address must be preserved,
>
> 2. specified by allocation size and/or the alignment parameter;
> this says how many low bits in the first IO TLB slot's physical
> address must be zero.
>
> Fix for that has been sitting on my TODO list for too long. :-(
FWIW, it did _used_ to work (or appear to work), so it would be good to
at least get it back to the old behaviour if nothing else.
Anyway, cheers for reviewing the patches. I'll go through your comments
now...
Will
On Mon, 29 Jan 2024 18:42:55 +0000
Will Deacon <[email protected]> wrote:
> On Fri, Jan 26, 2024 at 05:20:59PM +0100, Petr Tesařík wrote:
> > On Fri, 26 Jan 2024 15:19:54 +0000
> > Will Deacon <[email protected]> wrote:
> >
> > > Hi folks,
> > >
> > > These two patches fix a nasty double allocation problem in swiotlb_alloc()
> > > and add a diagnostic to help catch any similar issues in future. This was
> > > a royal pain to track down and I've had to make a bit of a leap at the
> > > correct alignment semantics (i.e. iotlb_align_mask vs alloc_align_mask).
> >
> > Welcome to the club. I believe you had to re-discover what I described here:
> >
> > https://lore.kernel.org/linux-iommu/[email protected]/
>
> Lucky me...
>
> > The relevant part would be this:
> >
> > To sum it up, there are two types of alignment:
> >
> > 1. specified by a device's min_align_mask; this says how many low
> > bits of a buffer's physical address must be preserved,
> >
> > 2. specified by allocation size and/or the alignment parameter;
> > this says how many low bits in the first IO TLB slot's physical
> > address must be zero.
> >
> > Fix for that has been sitting on my TODO list for too long. :-(
>
> FWIW, it did _used_ to work (or appear to work), so it would be good to
> at least get it back to the old behaviour if nothing else.
Yes, now that I look at the code, it was probably misunderstanding on
my side as to how the three different alignment requirements are
supposed to work together.
AFAICT your patch addresses everything that has ever worked. The rest
needs some more thought, and before I touch this loop again, I'll write
a proper test case.
Petr T
> Anyway, cheers for reviewing the patches. I'll go through your
> comments now...
>
> Will
On Mon, Jan 29, 2024 at 08:26:19PM +0100, Petr Tesařík wrote:
> On Mon, 29 Jan 2024 18:42:55 +0000
> Will Deacon <[email protected]> wrote:
>
> > On Fri, Jan 26, 2024 at 05:20:59PM +0100, Petr Tesařík wrote:
> > > On Fri, 26 Jan 2024 15:19:54 +0000
> > > Will Deacon <[email protected]> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > These two patches fix a nasty double allocation problem in swiotlb_alloc()
> > > > and add a diagnostic to help catch any similar issues in future. This was
> > > > a royal pain to track down and I've had to make a bit of a leap at the
> > > > correct alignment semantics (i.e. iotlb_align_mask vs alloc_align_mask).
> > >
> > > Welcome to the club. I believe you had to re-discover what I described here:
> > >
> > > https://lore.kernel.org/linux-iommu/[email protected]/
> >
> > Lucky me...
> >
> > > The relevant part would be this:
> > >
> > > To sum it up, there are two types of alignment:
> > >
> > > 1. specified by a device's min_align_mask; this says how many low
> > > bits of a buffer's physical address must be preserved,
> > >
> > > 2. specified by allocation size and/or the alignment parameter;
> > > this says how many low bits in the first IO TLB slot's physical
> > > address must be zero.
> > >
> > > Fix for that has been sitting on my TODO list for too long. :-(
> >
> > FWIW, it did _used_ to work (or appear to work), so it would be good to
> > at least get it back to the old behaviour if nothing else.
>
> Yes, now that I look at the code, it was probably misunderstanding on
> my side as to how the three different alignment requirements are
> supposed to work together.
>
> AFAICT your patch addresses everything that has ever worked. The rest
> needs some more thought, and before I touch this loop again, I'll write
> a proper test case.
Thanks, that would be much appreciated!
Will
On Fri, Jan 26, 2024 at 06:01:27PM +0100, Petr Tesařík wrote:
> On Fri, 26 Jan 2024 15:19:55 +0000
> Will Deacon <[email protected]> wrote:
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index b079a9a8e087..25febb9e670c 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -982,7 +982,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> > unsigned long max_slots = get_max_slots(boundary_mask);
> > unsigned int iotlb_align_mask =
> > - dma_get_min_align_mask(dev) | alloc_align_mask;
> > + dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
>
> Good. So, iotlb_align_mask now specifies how many low bits of orig_addr
> should be preserved in the bounce buffer address, ignoring the offset
> within the TLB slot...
Yup, this is basically restoring the old behaviour.
> > unsigned int nslots = nr_slots(alloc_size), stride;
> > unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> > unsigned int index, slots_checked, count = 0, i;
> > @@ -998,14 +998,13 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > * allocations.
> > */
> > if (alloc_size >= PAGE_SIZE)
> > - iotlb_align_mask |= ~PAGE_MASK;
> > - iotlb_align_mask &= ~(IO_TLB_SIZE - 1);
> > + alloc_align_mask |= ~PAGE_MASK;
>
> ...and alloc_align_mask specifies the desired TLB slot alignment.
Yes, although actually I'm now wondering whether there's another bug here
in that we don't return naturally aligned buffers for allocations bigger
than a page. I think that was broken in 0eee5ae10256 ("swiotlb: fix slot
alignment checks") because that stopped aligning the initial search index
to the stride (which was in turn previously aligned to the allocation size).
> > /*
> > * For mappings with an alignment requirement don't bother looping to
> > * unaligned slots once we found an aligned one.
> > */
> > - stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
> > + stride = (max(alloc_align_mask, iotlb_align_mask) >> IO_TLB_SHIFT) + 1;
>
> I'm not quite sure about this one.
>
> And I'm not even sure all combinations make sense!
>
> For example, take these values:
>
> * TLB_SIZE == 0x800 (2K)
> * alloc_align_mask == 0xffffffffffffc000 (16K alignment, could be page size)
> * iotlb_align_mask == 0xffffffffffff0000 (64K alignment)
> * orig_addr == 0x0000000000001234
>
> Only the lowest 16 bits are relevant for the alignment check.
> Device alignment requires 0x1000.
> Alloc alignment requires one of 0x0000, 0x4000, 0x8000, or 0xc000.
> Obviously, such allocation must always fail...
Having an iotlb_align_mask with all those upper bits set looks wrong to me.
Is that the same "braino" as bbb73a103fbb?
> > spin_lock_irqsave(&area->lock, flags);
> > if (unlikely(nslots > pool->area_nslabs - area->used))
> > @@ -1015,15 +1014,18 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > index = area->index;
> >
> > for (slots_checked = 0; slots_checked < pool->area_nslabs; ) {
> > + phys_addr_t tlb_addr;
> > +
> > slot_index = slot_base + index;
> > + tlb_addr = slot_addr(tbl_dma_addr, slot_index);
> > +
> > + if (tlb_addr & alloc_align_mask)
> > + goto next_slot;
>
> Awww, come on. So your code jumps to a label and then inserts an
> unconditional continue just before that label? I'm sure we'll find a
> cleaner way to convey the loop logic. What about this:
>
> if ((tlb_addr & alloc_align_mask) != 0 ||
> (orig_addr && (tlb_addr & io_tlb_align_mask !=
> orig_addr & iotlb_align_mask))) {
> index = wrap_area_index(pool, index + 1);
> slots_checked++;
> continue;
> }
I'm hoping I can drop the alloc_align_mask check entirely if I restore
the alignment of the index.
> But yes, this patch looks like it should finally do the right thing.
I don't think we're quite there yet. I'll spin a v2.
Thanks for the review,
Will
On Fri, Jan 26, 2024 at 05:23:55PM +0100, Petr Tesařík wrote:
> On Fri, 26 Jan 2024 15:19:56 +0000
> Will Deacon <[email protected]> wrote:
>
> > When allocating pages from a restricted DMA pool in swiotlb_alloc(),
> > the buffer address is blindly converted to a 'struct page *' that is
> > returned to the caller. In the unlikely event of an allocation bug,
> > page-unaligned addresses are not detected and slots can silently be
> > double-allocated.
> >
> > Add a simple check of the buffer alignment in swiotlb_alloc() to make
> > debugging a little easier if something has gone wonky.
> >
> > Cc: Christoph Hellwig <[email protected]>
> > Cc: Marek Szyprowski <[email protected]>
> > Cc: Robin Murphy <[email protected]>
> > Cc: Petr Tesarik <[email protected]>
> > Cc: Dexuan Cui <[email protected]>
> > Signed-off-by: Will Deacon <[email protected]>
> > ---
> > kernel/dma/swiotlb.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index 25febb9e670c..92433ea9f2d2 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -1647,6 +1647,12 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
> > return NULL;
> >
> > tlb_addr = slot_addr(pool->start, index);
> > + if (!PAGE_ALIGNED(tlb_addr)) {
> > + dev_WARN_ONCE(dev, 1, "Cannot return 'struct page *' for non page-aligned swiotlb addr 0x%pa.\n",
> > + &tlb_addr);
> > + swiotlb_release_slots(dev, tlb_addr);
> > + return NULL;
> > + }
>
> Is there a reason not to use BUG_ON()? If yes, I would at least go for:
>
> + if (unlikely(!PAGE_ALIGNED(tlb_addr))) {
>
> Other than that, yes, such cheap sanity checking looks like a good idea.
BUG_ON() is generally frowned upon unless there's really no way to proceed.
Since we can fail the allocation here, I think that's the best bet (and hope
that whoever wanted the buffer isn't all that important).
I'll add the unlikely() in v2, although it sounds like Christoph wants
this moving anyway.
Will
On Mon, Jan 29, 2024 at 07:08:53AM +0100, Christoph Hellwig wrote:
> On Fri, Jan 26, 2024 at 03:19:56PM +0000, Will Deacon wrote:
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index 25febb9e670c..92433ea9f2d2 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -1647,6 +1647,12 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
> > return NULL;
> >
> > tlb_addr = slot_addr(pool->start, index);
> > + if (!PAGE_ALIGNED(tlb_addr)) {
> > + dev_WARN_ONCE(dev, 1, "Cannot return 'struct page *' for non page-aligned swiotlb addr 0x%pa.\n",
> > + &tlb_addr);
> > + swiotlb_release_slots(dev, tlb_addr);
> > + return NULL;
> > + }
> >
> > return pfn_to_page(PFN_DOWN(tlb_addr));
>
> So PFN_DOWN aligns the address and thus per se converting the unaligned
> address isn't a problem.
Hmm, I'm not sure I follow why it isn't a problem. If the first 2KiB slot
of the 4KiB page has already been allocated to somebody else, isn't it a
big problem to align down like that? Maybe I should word the warning
message a bit better -- how about:
"Cannot allocate pages from non page-aligned swiotlb addr 0x%pa.\n"
?
> That being said swiotlb obviously should never
> allocate unaligned addresses, but the placement of this check feels
> odd to me. Also because it only catches swiotlb_alloc and not the
> map side.
>
> Maybe just throw a WARN_ON_ONCE into slot_addr() ?
Everything is slot-aligned, so I don't think slot_addr() can detect
this. I put the check in swiotlb_alloc() because I think that's the only
place where we assume that a slot address is page-aligned. I don't think
the map path particularly cares, but if you prefer to have the warning
there too then I think we'd have to stick it at the end of
swiotlb_search_pool_area() (effectively just checking that the returned
slot is consistent with the 'alloc_align_mask' parameter).
Will
On Mon, 29 Jan 2024 19:32:50 +0000
Will Deacon <[email protected]> wrote:
> On Fri, Jan 26, 2024 at 06:01:27PM +0100, Petr Tesařík wrote:
> > On Fri, 26 Jan 2024 15:19:55 +0000
> > Will Deacon <[email protected]> wrote:
> > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > index b079a9a8e087..25febb9e670c 100644
> > > --- a/kernel/dma/swiotlb.c
> > > +++ b/kernel/dma/swiotlb.c
> > > @@ -982,7 +982,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > > phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> > > unsigned long max_slots = get_max_slots(boundary_mask);
> > > unsigned int iotlb_align_mask =
> > > - dma_get_min_align_mask(dev) | alloc_align_mask;
> > > + dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> >
> > Good. So, iotlb_align_mask now specifies how many low bits of orig_addr
> > should be preserved in the bounce buffer address, ignoring the offset
> > within the TLB slot...
>
> Yup, this is basically restoring the old behaviour.
>
> > > unsigned int nslots = nr_slots(alloc_size), stride;
> > > unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> > > unsigned int index, slots_checked, count = 0, i;
> > > @@ -998,14 +998,13 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > > * allocations.
> > > */
> > > if (alloc_size >= PAGE_SIZE)
> > > - iotlb_align_mask |= ~PAGE_MASK;
> > > - iotlb_align_mask &= ~(IO_TLB_SIZE - 1);
> > > + alloc_align_mask |= ~PAGE_MASK;
> >
> > ...and alloc_align_mask specifies the desired TLB slot alignment.
>
> Yes, although actually I'm now wondering whether there's another bug here
> in that we don't return naturally aligned buffers for allocations bigger
> than a page. I think that was broken in 0eee5ae10256 ("swiotlb: fix slot
> alignment checks") because that stopped aligning the initial search index
> to the stride (which was in turn previously aligned to the allocation size).
The question is whether there is any NEED that allocations bigger than
a page are naturally aligned. For my part, I don't see why there should
be, but I might be missing something.
> > > /*
> > > * For mappings with an alignment requirement don't bother looping to
> > > * unaligned slots once we found an aligned one.
> > > */
> > > - stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
> > > + stride = (max(alloc_align_mask, iotlb_align_mask) >> IO_TLB_SHIFT) + 1;
> >
> > I'm not quite sure about this one.
> >
> > And I'm not even sure all combinations make sense!
> >
> > For example, take these values:
> >
> > * TLB_SIZE == 0x800 (2K)
> > * alloc_align_mask == 0xffffffffffffc000 (16K alignment, could be page size)
> > * iotlb_align_mask == 0xffffffffffff0000 (64K alignment)
> > * orig_addr == 0x0000000000001234
> >
> > Only the lowest 16 bits are relevant for the alignment check.
> > Device alignment requires 0x1000.
> > Alloc alignment requires one of 0x0000, 0x4000, 0x8000, or 0xc000.
> > Obviously, such allocation must always fail...
>
> Having an iotlb_align_mask with all those upper bits set looks wrong to me.
> Is that the same "braino" as bbb73a103fbb?
I must always stop and think at least twice before I can be sure
whether a "mask" has the high bits set, or the low bits set...
On an x86, PAGE_SHIFT is 12, PAGE_SIZE is 1UL << PAGE_SHIFT or 0x1000,
PAGE_MASK is ~(PAGE_SIZE-1)) or 0xfffffffffffff000, and there's one
more bitwise negation, so you're right. Both masks above should be
inverted, and using max() to find the stride is correct.
Petr T
On Mon, Jan 29, 2024 at 09:40:34PM +0100, Petr Tesařík wrote:
> On Mon, 29 Jan 2024 19:32:50 +0000
> Will Deacon <[email protected]> wrote:
>
> > On Fri, Jan 26, 2024 at 06:01:27PM +0100, Petr Tesařík wrote:
> > > On Fri, 26 Jan 2024 15:19:55 +0000
> > > Will Deacon <[email protected]> wrote:
> > > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > > index b079a9a8e087..25febb9e670c 100644
> > > > --- a/kernel/dma/swiotlb.c
> > > > +++ b/kernel/dma/swiotlb.c
> > > > @@ -982,7 +982,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > > > phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> > > > unsigned long max_slots = get_max_slots(boundary_mask);
> > > > unsigned int iotlb_align_mask =
> > > > - dma_get_min_align_mask(dev) | alloc_align_mask;
> > > > + dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> > >
> > > Good. So, iotlb_align_mask now specifies how many low bits of orig_addr
> > > should be preserved in the bounce buffer address, ignoring the offset
> > > within the TLB slot...
> >
> > Yup, this is basically restoring the old behaviour.
> >
> > > > unsigned int nslots = nr_slots(alloc_size), stride;
> > > > unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> > > > unsigned int index, slots_checked, count = 0, i;
> > > > @@ -998,14 +998,13 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > > > * allocations.
> > > > */
> > > > if (alloc_size >= PAGE_SIZE)
> > > > - iotlb_align_mask |= ~PAGE_MASK;
> > > > - iotlb_align_mask &= ~(IO_TLB_SIZE - 1);
> > > > + alloc_align_mask |= ~PAGE_MASK;
> > >
> > > ...and alloc_align_mask specifies the desired TLB slot alignment.
> >
> > Yes, although actually I'm now wondering whether there's another bug here
> > in that we don't return naturally aligned buffers for allocations bigger
> > than a page. I think that was broken in 0eee5ae10256 ("swiotlb: fix slot
> > alignment checks") because that stopped aligning the initial search index
> > to the stride (which was in turn previously aligned to the allocation size).
>
> The question is whether there is any NEED that allocations bigger than
> a page are naturally aligned. For my part, I don't see why there should
> be, but I might be missing something.
I think some drivers rely on that. As per core-api/dma-api-howto.rst:
(Using Consistent DMA mappings::dma_alloc_coherent())
| The CPU virtual address and the DMA address are both guaranteed to
| be aligned to the smallest PAGE_SIZE order which is greater than or
| equal to the requested size.
I've certainly written code that relies on it and the swiotlb logic used
to honour that requirement.
> > > > /*
> > > > * For mappings with an alignment requirement don't bother looping to
> > > > * unaligned slots once we found an aligned one.
> > > > */
> > > > - stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
> > > > + stride = (max(alloc_align_mask, iotlb_align_mask) >> IO_TLB_SHIFT) + 1;
> > >
> > > I'm not quite sure about this one.
> > >
> > > And I'm not even sure all combinations make sense!
> > >
> > > For example, take these values:
> > >
> > > * TLB_SIZE == 0x800 (2K)
> > > * alloc_align_mask == 0xffffffffffffc000 (16K alignment, could be page size)
> > > * iotlb_align_mask == 0xffffffffffff0000 (64K alignment)
> > > * orig_addr == 0x0000000000001234
> > >
> > > Only the lowest 16 bits are relevant for the alignment check.
> > > Device alignment requires 0x1000.
> > > Alloc alignment requires one of 0x0000, 0x4000, 0x8000, or 0xc000.
> > > Obviously, such allocation must always fail...
> >
> > Having an iotlb_align_mask with all those upper bits set looks wrong to me.
> > Is that the same "braino" as bbb73a103fbb?
>
> I must always stop and think at least twice before I can be sure
> whether a "mask" has the high bits set, or the low bits set...
>
> On an x86, PAGE_SHIFT is 12, PAGE_SIZE is 1UL << PAGE_SHIFT or 0x1000,
> PAGE_MASK is ~(PAGE_SIZE-1)) or 0xfffffffffffff000, and there's one
> more bitwise negation, so you're right. Both masks above should be
> inverted, and using max() to find the stride is correct.
Heh. It's not straightforward, is it?
Will
On Mon, Jan 29, 2024 at 07:49:40PM +0000, Will Deacon wrote:
> > > return pfn_to_page(PFN_DOWN(tlb_addr));
> >
> > So PFN_DOWN aligns the address and thus per se converting the unaligned
> > address isn't a problem.
>
> Hmm, I'm not sure I follow why it isn't a problem. If the first 2KiB slot
> of the 4KiB page has already been allocated to somebody else, isn't it a
> big problem to align down like that? Maybe I should word the warning
> message a bit better -- how about:
But the problem is that it's used, not that we can't create a page
from a non-aligned address.
>
> "Cannot allocate pages from non page-aligned swiotlb addr 0x%pa.\n"
That sounds better.