2024-03-27 03:46:16

by Michael Kelley

[permalink] [raw]
Subject: [PATCH 1/1] swiotlb: Fix swiotlb_bounce() to do partial sync's correctly

From: Michael Kelley <[email protected]>

In current code, swiotlb_bounce() may do partial sync's correctly in
some circumstances, but may incorrectly fail in other circumstances.
The failure cases require both of these to be true:

1) swiotlb_align_offset() returns a non-zero "offset" value
2) the tlb_addr of the partial sync area points into the first
"offset" bytes of the _second_ or subsequent swiotlb slot allocated
for the mapping

Code added in commit 868c9ddc182b ("swiotlb: add overflow checks
to swiotlb_bounce") attempts to WARN on the invalid case where
tlb_addr points into the first "offset" bytes of the _first_
allocated slot. But there's no way for swiotlb_bounce() to distinguish
the first slot from the second and subsequent slots, so the WARN
can be triggered incorrectly when #2 above is true.

Related, current code calculates an adjustment to the orig_addr stored
in the swiotlb slot. The adjustment compensates for the difference
in the tlb_addr used for the partial sync vs. the tlb_addr for the full
mapping. The adjustment is stored in the local variable tlb_offset.
But when #1 and #2 above are true, it's valid for this adjustment to
be negative. In such case the arithmetic to adjust orig_addr produces
the wrong result due to tlb_offset being declared as unsigned.

Fix these problems by removing the over-constraining validations added
in 868c9ddc182b. Change the declaration of tlb_offset to be signed
instead of unsigned so the adjustment arithmetic works correctly.

Tested with a test-only hack to how swiotlb_tbl_map_single() calls
swiotlb_bounce(). Instead of calling swiotlb_bounce() just once
for the entire mapped area, do a loop with each iteration doing
only a 128 byte partial sync until the entire mapped area is
sync'ed. Then with swiotlb=force on the kernel boot line, run a
variety of raw disk writes followed by read and verification of
all bytes of the written data. The storage device has DMA
min_align_mask set, and the writes are done with a variety of
original buffer memory address alignments and overall buffer
sizes. For many of the combinations, current code triggers the
WARN statements, or the data verification fails. With the fixes,
no WARNs occur and all verifications pass.

Fixes: 5f89468e2f06 ("swiotlb: manipulate orig_addr when tlb_addr has offset")
Fixes: 868c9ddc182b ("swiotlb: add overflow checks to swiotlb_bounce")
Signed-off-by: Michael Kelley <[email protected]>
---
This patch is built against the 6.9-rc1 tree plus Petr Tesarik's
"swiotlb: extend buffer pre-padding to alloc_align_mask" patch
that's in-flight.

kernel/dma/swiotlb.c | 30 +++++++++++++-----------------
1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d7a8cb93ef2d..d57c8837c813 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -863,27 +863,23 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
size_t alloc_size = mem->slots[index].alloc_size;
unsigned long pfn = PFN_DOWN(orig_addr);
unsigned char *vaddr = mem->vaddr + tlb_addr - mem->start;
- unsigned int tlb_offset, orig_addr_offset;
+ int tlb_offset;

if (orig_addr == INVALID_PHYS_ADDR)
return;

- tlb_offset = tlb_addr & (IO_TLB_SIZE - 1);
- orig_addr_offset = swiotlb_align_offset(dev, 0, orig_addr);
- if (tlb_offset < orig_addr_offset) {
- dev_WARN_ONCE(dev, 1,
- "Access before mapping start detected. orig offset %u, requested offset %u.\n",
- orig_addr_offset, tlb_offset);
- return;
- }
-
- tlb_offset -= orig_addr_offset;
- if (tlb_offset > alloc_size) {
- dev_WARN_ONCE(dev, 1,
- "Buffer overflow detected. Allocation size: %zu. Mapping size: %zu+%u.\n",
- alloc_size, size, tlb_offset);
- return;
- }
+ /*
+ * It's valid for tlb_offset to be negative. This can happen when the
+ * "offset" returned by swiotlb_align_offset() is non-zero, and the
+ * tlb_addr is pointing within the first "offset" bytes of the second
+ * or subsequent slots of the allocated swiotlb area. While it's not
+ * valid for tlb_addr to be pointing within the first "offset" bytes
+ * of the first slot, there's no way to check for such an error since
+ * this function can't distinguish the first slot from the second and
+ * subsequent slots.
+ */
+ tlb_offset = (tlb_addr & (IO_TLB_SIZE - 1)) -
+ swiotlb_align_offset(dev, 0, orig_addr);

orig_addr += tlb_offset;
alloc_size -= tlb_offset;
--
2.25.1



2024-03-27 06:13:38

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH 1/1] swiotlb: Fix swiotlb_bounce() to do partial sync's correctly

H Michael,

[email protected] wrote on Tue, Mar 26, 2024 at 08:45:48PM -0700:
> From: Michael Kelley <[email protected]>
>
> In current code, swiotlb_bounce() may do partial sync's correctly in
> some circumstances, but may incorrectly fail in other circumstances.
> The failure cases require both of these to be true:
>
> 1) swiotlb_align_offset() returns a non-zero "offset" value
> 2) the tlb_addr of the partial sync area points into the first
> "offset" bytes of the _second_ or subsequent swiotlb slot allocated
> for the mapping
>
> Code added in commit 868c9ddc182b ("swiotlb: add overflow checks
> to swiotlb_bounce") attempts to WARN on the invalid case where
> tlb_addr points into the first "offset" bytes of the _first_
> allocated slot. But there's no way for swiotlb_bounce() to distinguish
> the first slot from the second and subsequent slots, so the WARN
> can be triggered incorrectly when #2 above is true.
>
> Related, current code calculates an adjustment to the orig_addr stored
> in the swiotlb slot. The adjustment compensates for the difference
> in the tlb_addr used for the partial sync vs. the tlb_addr for the full
> mapping. The adjustment is stored in the local variable tlb_offset.
> But when #1 and #2 above are true, it's valid for this adjustment to
> be negative. In such case the arithmetic to adjust orig_addr produces
> the wrong result due to tlb_offset being declared as unsigned.
>
> Fix these problems by removing the over-constraining validations added
> in 868c9ddc182b. Change the declaration of tlb_offset to be signed
> instead of unsigned so the adjustment arithmetic works correctly.
>
> Tested with a test-only hack to how swiotlb_tbl_map_single() calls
> swiotlb_bounce(). Instead of calling swiotlb_bounce() just once
> for the entire mapped area, do a loop with each iteration doing
> only a 128 byte partial sync until the entire mapped area is
> sync'ed. Then with swiotlb=force on the kernel boot line, run a
> variety of raw disk writes followed by read and verification of
> all bytes of the written data. The storage device has DMA
> min_align_mask set, and the writes are done with a variety of
> original buffer memory address alignments and overall buffer
> sizes. For many of the combinations, current code triggers the
> WARN statements, or the data verification fails. With the fixes,
> no WARNs occur and all verifications pass.

Thanks for the detailed analysis & test, and sorry I didn't reply to
your mail last week.
For everyone's benefit I'll reply here (question was "what were the two
bad commits for"), it's that the previous rework introduced a regression
with caamjr on unaligned mappings that I had reported here:
https://lore.kernel.org/lkml/[email protected]/#t

(took me a while to find the threads, Link: tags have been sparse...)

Unfortunately that was ages ago so I don't really remember exactly on
which device that was reproduced.. Given the Cc I'm sure Lukas had hit
it on the MNT reform (i.MX8MQ), but I did say I tested it so I probably
could reproduce on my i.MX8MP?
I'll try to give a try at reproducing the old bug, and if I do test your
fix over next week.


>
> Fixes: 5f89468e2f06 ("swiotlb: manipulate orig_addr when tlb_addr has offset")
> Fixes: 868c9ddc182b ("swiotlb: add overflow checks to swiotlb_bounce")
> Signed-off-by: Michael Kelley <[email protected]>
> ---
> This patch is built against the 6.9-rc1 tree plus Petr Tesarik's
> "swiotlb: extend buffer pre-padding to alloc_align_mask" patch
> that's in-flight.
>
> kernel/dma/swiotlb.c | 30 +++++++++++++-----------------
> 1 file changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index d7a8cb93ef2d..d57c8837c813 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -863,27 +863,23 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
> size_t alloc_size = mem->slots[index].alloc_size;
> unsigned long pfn = PFN_DOWN(orig_addr);
> unsigned char *vaddr = mem->vaddr + tlb_addr - mem->start;
> - unsigned int tlb_offset, orig_addr_offset;
> + int tlb_offset;
>
> if (orig_addr == INVALID_PHYS_ADDR)
> return;
>
> - tlb_offset = tlb_addr & (IO_TLB_SIZE - 1);
> - orig_addr_offset = swiotlb_align_offset(dev, 0, orig_addr);
> - if (tlb_offset < orig_addr_offset) {
> - dev_WARN_ONCE(dev, 1,
> - "Access before mapping start detected. orig offset %u, requested offset %u.\n",
> - orig_addr_offset, tlb_offset);
> - return;
> - }
> -
> - tlb_offset -= orig_addr_offset;
> - if (tlb_offset > alloc_size) {
> - dev_WARN_ONCE(dev, 1,
> - "Buffer overflow detected. Allocation size: %zu. Mapping size: %zu+%u.\n",
> - alloc_size, size, tlb_offset);
> - return;
> - }
> + /*
> + * It's valid for tlb_offset to be negative. This can happen when the
> + * "offset" returned by swiotlb_align_offset() is non-zero, and the
> + * tlb_addr is pointing within the first "offset" bytes of the second
> + * or subsequent slots of the allocated swiotlb area. While it's not
> + * valid for tlb_addr to be pointing within the first "offset" bytes
> + * of the first slot, there's no way to check for such an error since
> + * this function can't distinguish the first slot from the second and
> + * subsequent slots.
> + */
> + tlb_offset = (tlb_addr & (IO_TLB_SIZE - 1)) -
> + swiotlb_align_offset(dev, 0, orig_addr);
>
> orig_addr += tlb_offset;
> alloc_size -= tlb_offset;

--
Dominique Martinet | Asmadeus

2024-03-29 05:09:48

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH 1/1] swiotlb: Fix swiotlb_bounce() to do partial sync's correctly

Dominique Martinet wrote on Wed, Mar 27, 2024 at 03:05:18PM +0900:
> Unfortunately that was ages ago so I don't really remember exactly on
> which device that was reproduced.. Given the Cc I'm sure Lukas had hit
> it on the MNT reform (i.MX8MQ), but I did say I tested it so I probably
> could reproduce on my i.MX8MP?
> I'll try to give a try at reproducing the old bug, and if I do test your
> fix over next week.

grmbl, sorry I cannot reproduce the problem on devices I have readily
accessible, and don't have time to dig out my reform to test there in
the forseeable future, so cannot confirm if that also fixes the problem
we reported two years ago.
However I had misunderstood your patch, I thought you were also
reverting commit 5f89468e2f06 ("swiotlb: manipulate orig_addr when
tlb_addr has offset") but you're keeping it and just making it signed --
this should cause no problem for the use case I was concerned about as
it fell within the bounds I had defined, so this is basically a no-op
patch for my usecase and I don't expect this particular failure to pop
back up here.


Code-wise, I agree the checks I added in commit 868c9ddc182b ("swiotlb:
add overflow checks to swiotlb_bounce") are too strict - I failed to
consider the device minimum alignment part of swiotlb_align_offset, and
thinking this through this can get weird.
.. And now I'm looking again even with a big minimum alignment it's
also too strict, so, right, let's work through an example.


From my understanding of how orig_addr is computed, in the multi-block
case we have something like this:

(+ represent device's minimum alignment, bigger blocks with | are
IO_TLB_SIZE blocks)

10 20 30 40 50 60
01234567890123456789012345678901234567890123456789012345678901234...
|---+---+---+-block1+---+---+---|---+---+---+-block2+---+---+---|...
^ ^
mem->slots[n].orig_addr mem->slots[n+1].orig_addr
(=7) (=32+7=39)

(memo of the code with your patch:
index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
orig_addr = mem->slots[index].orig_addr;
swiotlb_align_offset(dev, orig_addr) = orig_addr & dev min align mask & (IO_TLB_SIZE-1)
tlb_offset = (tlb_addr & (IO_TLB_SIZE - 1)) - swiotlb_align_offset(dev, orig_addr);
orig_addr += tlb_offset;
alloc_size -= tlb_offset;
vaddr = mem->vaddr + tlb_addr - mem->start
)

So for this example we have IO_TLB_SIZE=32, dev min alignment=4,
orig_addr's align value would be 7%4=3.
Given say tlb_addr at 33, we'd find slot n, and compute
tlb_offset = 1 - 3 = -2

.. And then I just don't follow anymore?

Computing the rest mechanically, for the example's sake let's say n=0
so mem->start=7, index=0, and also set size to 4 bytes, vaddr to 0x1000.

vaddr = 0x1000 + 33 - 7 = 0x1000 + 26
orig_addr = 7 + -2 = 5
size = 4 - -2 = 6

then we'd proceed to memcpy either (vaddr, p2v(orig_addr), size) or the
other way around, but this cannot be right:
- size is bigger than what was requested, I fail to see how that can be
allowed. I'd understand a smaller size assuming swiotlb_bounce gets
called for each interval, but not a bigger size.
- orig_addr is nowhere near 33.


I thought this didn't make sense because of the minimum device
alignment, but even with device alignment >= io tlb's with the very same
example we would get
tld_offset = 1 - 7 = -6
now that one could make sense if we had used the following slot e.g.
orig_addr being slot[1].orig_addr and we'd get back to 31, but that's
not the case, and the size calculation is still off.


So, long story short it took me half a day to get back into this code
and the only thing I understand about it is that I don't understand it.

I'm sure it works most of the case because everything is nicely aligned
(since nobody complained about my checks before, and if there's no
warning with these the code works), but I'd require some convincing to
give a reviewed-by tag to this patch.

Thanks for working on this though, I'll be happy to be pointed out at
flaws in my logic or to look at another attempt...!!
--
Dominique

2024-03-29 16:26:53

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH 1/1] swiotlb: Fix swiotlb_bounce() to do partial sync's correctly

From: Dominique Martinet <[email protected]> Sent: Thursday, March 28, 2024 10:09 PM
>
> Dominique Martinet wrote on Wed, Mar 27, 2024 at 03:05:18PM +0900:
> > Unfortunately that was ages ago so I don't really remember exactly on
> > which device that was reproduced.. Given the Cc I'm sure Lukas had hit
> > it on the MNT reform (i.MX8MQ), but I did say I tested it so I probably
> > could reproduce on my i.MX8MP?
> > I'll try to give a try at reproducing the old bug, and if I do test your
> > fix over next week.
>
> grmbl, sorry I cannot reproduce the problem on devices I have readily
> accessible, and don't have time to dig out my reform to test there in
> the forseeable future, so cannot confirm if that also fixes the problem
> we reported two years ago.
>
> However I had misunderstood your patch, I thought you were also
> reverting commit 5f89468e2f06 ("swiotlb: manipulate orig_addr when
> tlb_addr has offset") but you're keeping it and just making it signed --
> this should cause no problem for the use case I was concerned about as
> it fell within the bounds I had defined, so this is basically a no-op
> patch for my usecase and I don't expect this particular failure to pop
> back up here.
>
> Code-wise, I agree the checks I added in commit 868c9ddc182b ("swiotlb:
> add overflow checks to swiotlb_bounce") are too strict - I failed to
> consider the device minimum alignment part of swiotlb_align_offset, and
> thinking this through this can get weird.
> ... And now I'm looking again even with a big minimum alignment it's
> also too strict, so, right, let's work through an example.
>
>
> From my understanding of how orig_addr is computed, in the multi-block
> case we have something like this:
>
> (+ represent device's minimum alignment, bigger blocks with | are
> IO_TLB_SIZE blocks)
>
> 10 20 30 40 50 60
> 01234567890123456789012345678901234567890123456789012345678901234...
> |---+---+---+-block1+---+---+---|---+---+---+-block2+---+---+---|...
> ^ ^
> mem->slots[n].orig_addr mem->slots[n+1].orig_addr
> (=7) (=32+7=39)
>
> (memo of the code with your patch:
> index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
> orig_addr = mem->slots[index].orig_addr;
> swiotlb_align_offset(dev, orig_addr) = orig_addr & dev min align mask & (IO_TLB_SIZE-1)
> tlb_offset = (tlb_addr & (IO_TLB_SIZE - 1)) - swiotlb_align_offset(dev, orig_addr);
> orig_addr += tlb_offset;
> alloc_size -= tlb_offset;
> vaddr = mem->vaddr + tlb_addr - mem->start
> )
>
> So for this example we have IO_TLB_SIZE=32, dev min alignment=4,
> orig_addr's align value would be 7%4=3.
> Given say tlb_addr at 33, we'd find slot n, and compute
> tlb_offset = 1 - 3 = -2
>
> ... And then I just don't follow anymore?
>
> Computing the rest mechanically, for the example's sake let's say n=0
> so mem->start=7, index=0, and also set size to 4 bytes, vaddr to 0x1000.

I think your logic goes awry here. mem->vaddr is just the virtual
address equivalent of mem->start. See swiotlb_init_io_tlb_pool().
The computation here of vaddr is a back-handed way of getting
the virtual address equivalent of tlb_addr. For the purposes of
this discussion, we can just use tlb_addr and ignore the virt vs.
phys difference.

Your example is saying that the originally mapped area started at
orig_addr 7. You didn't specify the original size, but let's assume
it is at least 40. Since you've specified that swiotlb_bounce() is
called with a tlb_addr of 33, let's assume the tlb_addr for the
full mapped area as returned by swiotlb_tbl_map_single() is 3.
It must end in 0x3 because with dev min alignment = 4, the
low order two bits of the tlb_addr of the full mapped area
must match the low order two bits of the orig_addr.

Continuing your example, subsequently swiotlb_bounce() is
called with a tlb_addr of 33, and size 4. So we want to copy
4 bytes starting at the 30th byte (33 - 3) of the originally mapped
area, which is address 7 + 30 = 37.

In this case,
* tlb_offset = 1 - 3 = -2, as you describe above
* orig_addr = 39 + -2 = 37. The computation uses 39 from
slot[1], not the 7 from slot[0]. This computed 37 is the
correct orig_addr to use for the memcpy().
* size is still 4. There's no computation in swiotlb_bounce()
that changes "size".
* alloc_size is pulled from slot[1], and is adjusted by tlb_offset.
This adjusted alloc_size isn't used for anything except as a sanity
check against "size".

So I think your example works correctly with the updated
code. Note that if a driver calls swiotlb_bounce() with a
tlb_addr that is out-of-range, swiotlb_bounce() can't detect
that. A bogus "size" parameter _is_ detected because of
the check against the adjusted alloc_size.

Michael

>
> vaddr = 0x1000 + 33 - 7 = 0x1000 + 26
> orig_addr = 7 + -2 = 5
> size = 4 - -2 = 6
>
> then we'd proceed to memcpy either (vaddr, p2v(orig_addr), size) or the
> other way around, but this cannot be right:
> - size is bigger than what was requested, I fail to see how that can be
> allowed. I'd understand a smaller size assuming swiotlb_bounce gets
> called for each interval, but not a bigger size.
> - orig_addr is nowhere near 33.
>
>
> I thought this didn't make sense because of the minimum device
> alignment, but even with device alignment >= io tlb's with the very same
> example we would get
> tld_offset = 1 - 7 = -6
> now that one could make sense if we had used the following slot e.g.
> orig_addr being slot[1].orig_addr and we'd get back to 31, but that's
> not the case, and the size calculation is still off.
>
>
> So, long story short it took me half a day to get back into this code
> and the only thing I understand about it is that I don't understand it.
>
> I'm sure it works most of the case because everything is nicely aligned
> (since nobody complained about my checks before, and if there's no
> warning with these the code works), but I'd require some convincing to
> give a reviewed-by tag to this patch.
>
> Thanks for working on this though, I'll be happy to be pointed out at
> flaws in my logic or to look at another attempt...!!
> --
> Dominique

2024-03-30 02:56:01

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH 1/1] swiotlb: Fix swiotlb_bounce() to do partial sync's correctly

Michael Kelley wrote on Fri, Mar 29, 2024 at 03:18:16PM +0000:
> * tlb_offset = 1 - 3 = -2, as you describe above
> * orig_addr = 39 + -2 = 37. The computation uses 39 from
> slot[1], not the 7 from slot[0]. This computed 37 is the
> correct orig_addr to use for the memcpy().

There are two things I don't understand here:
1/ Why orig_addr would come from slot[1] ?

We have index = (tlb_addr - mem->start) >> IO_TLB_SHIFT,
so index = (33 - 7) >> 5 = 26 >> 5 = 0

As such, orig_addr = mem->slots[0].orig_addr and we'd need the offset to
be 30, not -2 ?
Well, either work - if we fix index to point to the next slot in the
negative case that's also acceptable if we're sure it's valid, but I'm
worried it might not be in cases there was only one slot e.g. mapping
[7; 34] and calling with 33 size 2 would try to access slot 1 with a
negative offset in your example, but slot[0] is the last valid slot.


2/ Why is orig_addr 37 the correct address to use for memcpy, and not
33? I'd think it's off by a "minimum alignment page", for me this
computation only works if the dma_get_min_align size is bigger than io
tlb size.


> * size is still 4. There's no computation in swiotlb_bounce()
> that changes "size".
> * alloc_size is pulled from slot[1], and is adjusted by tlb_offset.
> This adjusted alloc_size isn't used for anything except as a sanity
> check against "size".

Right, sorry - so size is ok (assuming slot[1] is used, I conflated the
two sizes.


I'm probably still missing something here, thanks for bearing with me.
--
Dominique

2024-03-30 04:16:56

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH 1/1] swiotlb: Fix swiotlb_bounce() to do partial sync's correctly

From: Dominique Martinet <[email protected]> Sent: Friday, March 29, 2024 7:56 PM
>
> Michael Kelley wrote on Fri, Mar 29, 2024 at 03:18:16PM +0000:
> > * tlb_offset = 1 - 3 = -2, as you describe above
> > * orig_addr = 39 + -2 = 37. The computation uses 39 from
> > slot[1], not the 7 from slot[0]. This computed 37 is the
> > correct orig_addr to use for the memcpy().
>
> There are two things I don't understand here:
> 1/ Why orig_addr would come from slot[1] ?
>
> We have index = (tlb_addr - mem->start) >> IO_TLB_SHIFT,
> so index = (33 - 7) >> 5 = 26 >> 5 = 0
>
> As such, orig_addr = mem->slots[0].orig_addr and we'd need the offset to
> be 30, not -2 ?

mem->start is the physical address of the global pool of
memory allocated for swiotlb buffers. That global pool defaults
to 64 Mbytes in size, and is allocated starting on a page boundary
(which is important). It is divided into "slots", which in a real
kernel are 2 Kbytes each (IO_TLB_SHIFT is 11). When a mapping
is created, swiotlb_tbl_map_single() returns a tlb_addr between
mem->start and mem->start + 64 Mbytes - 1. The slot index
for any tlb_addr can be obtained by doing

index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;

assuming that mem->start is on a boundary that is at least
as big as IO_TLB_SHIFT. So your example of mem->start being
7 isn't valid. If we're using an example where tlb_addr "3" is
returned from swiotlb_tbl_map_single(), and tlb_addr 33
is passed as the argument to swiotlb_bounce(), then
mem->start would need to be zero for things to work. If
mem->start is zero, then tlb_addr's 0 thru 31 are in slot 0,
and tlb_addr's 32 thru 63 are in slot 1, etc.

> Well, either work - if we fix index to point to the next slot in the
> negative case that's also acceptable if we're sure it's valid, but I'm
> worried it might not be in cases there was only one slot e.g. mapping
> [7; 34] and calling with 33 size 2 would try to access slot 1 with a
> negative offset in your example, but slot[0] is the last valid slot.

Right, but there wouldn't be one slot mapping [7; 34] if the
alignment rules are followed when the global swiotlb memory
pool is originally created. The low order IO_TLB_SHIFT bits
of slot physical addresses must be zero for the arithmetic
using shifts to work, so [7; 34] will cross a slot boundary and
two slots are needed.

>
>
> 2/ Why is orig_addr 37 the correct address to use for memcpy, and not
> 33? I'd think it's off by a "minimum alignment page", for me this
> computation only works if the dma_get_min_align size is bigger than io
> tlb size.

The swiotlb mapping operation establishes a pair-wise mapping between
an orig_addr and tlb_addr, with the mapping extending for a specified
number of bytes. Your example started with orig_addr = 7, and I
posited that the mapping extends for 40 bytes. I further posited
that the tlb_addr returned by swiotlb_tbl_map_single() would
be 3 to meet the min alignment requirement (which again
only works if mem->start is 0). So the pair-wise mapping is (7, 3).
Now when swiotlb_bounce() is called with a tlb_addr of 33 and
a size of 4, we know:

* tlb_addr 33 is in slot 1
* tlb_addr 33 is 30 bytes (33 - 3) from the beginning of the
swiotlb area allocated for the mapping by swiotlb_tbl_map_single()
* So we want to add 30 bytes to the orig_addr to get the address
where we want to do the memcpy. That is 7 + 30 = 37.
* The swiotlb area allocated for the mapping goes from
tlb_addr 3 through tlb_addr 43 since the size of the mapping
is 40 bytes. If we do a partial sync of 4 bytes starting at
tlb_addr 37, then that's valid because all 4 bytes fit within
the originally mapped 40 bytes.

Michael

>
>
> > * size is still 4. There's no computation in swiotlb_bounce()
> > that changes "size".
> > * alloc_size is pulled from slot[1], and is adjusted by tlb_offset.
> > This adjusted alloc_size isn't used for anything except as a sanity
> > check against "size".
>
> Right, sorry - so size is ok (assuming slot[1] is used, I conflated the
> two sizes.
>
>
> I'm probably still missing something here, thanks for bearing with me.
> --
> Dominique

2024-04-01 07:56:27

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH 1/1] swiotlb: Fix swiotlb_bounce() to do partial sync's correctly

Michael Kelley wrote on Sat, Mar 30, 2024 at 04:16:30AM +0000:
> From: Dominique Martinet <[email protected]> Sent: Friday, March 29, 2024 7:56 PM
> > There are two things I don't understand here:
> > 1/ Why orig_addr would come from slot[1] ?
> >
> > We have index = (tlb_addr - mem->start) >> IO_TLB_SHIFT,
> > so index = (33 - 7) >> 5 = 26 >> 5 = 0
> >
> > As such, orig_addr = mem->slots[0].orig_addr and we'd need the offset to
> > be 30, not -2 ?
>
> mem->start is the physical address of the global pool of
> memory allocated for swiotlb buffers.

Argh.
Okay, that clears up a misunderstanding I've had since day one... I
should have done a little more reading there.
I've re-checked now and indeed mem->start comes from the pool init, and
corresponds to the reserved memory base.
(I'm not actually 100% sure reserved memory has to be aligned, but at the
very least I've never seen any that isn't on my hardware so I'll pretend
it must be without checking)

That makes much more sense with your fix, I agree offset must be
negative in that case, and it'll work out.

> > Well, either work - if we fix index to point to the next slot in the
> > negative case that's also acceptable if we're sure it's valid, but I'm
> > worried it might not be in cases there was only one slot e.g. mapping
> > [7; 34] and calling with 33 size 2 would try to access slot 1 with a
> > negative offset in your example, but slot[0] is the last valid slot.
>
> Right, but there wouldn't be one slot mapping [7; 34] if the
> alignment rules are followed when the global swiotlb memory
> pool is originally created. The low order IO_TLB_SHIFT bits
> of slot physical addresses must be zero for the arithmetic
> using shifts to work, so [7; 34] will cross a slot boundary and
> two slots are needed.

Yes, since the mem->start/slots belongs to the pool and not the mapping
this didn't make sense either; there's no problem here.

> > 2/ Why is orig_addr 37 the correct address to use for memcpy, and not
> > 33? I'd think it's off by a "minimum alignment page", for me this
> > computation only works if the dma_get_min_align size is bigger than io
> > tlb size.
>
> The swiotlb mapping operation establishes a pair-wise mapping between
> an orig_addr and tlb_addr, with the mapping extending for a specified
> number of bytes. Your example started with orig_addr = 7, and I
> posited that the mapping extends for 40 bytes.

Sure.

> I further posited that the tlb_addr returned by
> swiotlb_tbl_map_single() would be 3 to meet the min alignment
> requirement (which again only works if mem->start is 0).

Okay that's where I'm lost.
1/ I agree that swiotlb_bounce() called from swiotlb_tbl_map_single()
cannot be called with a tlb_addr past a single segment (which I'm not
sure is acceptable in itself, taking the real value of 2KB for io tlb
"pages", if the device requires 512 bytes alignment you won't be able to
access [512-2048[ ?)
2/ swiotlb_bounce() can be called from swiotlb_sync_single_for_device()
or swiotlb_sync_single_for_cpu() with no alignment check on tlb_addr,
we're just trusting that they only ever pass address within the same
constraint ?

If you assume tlb addr can only go from the start of a slot to as far as
the min alignment allows then I agree there's no more problem, but I
don't understand where that comes from.


Either way, that's not a new problem, and the old checks aren't making
this any better so as far as I'm concerned this patch is Progress:
Reviewed-by: Dominique Martinet <[email protected]>


Thanks for taking me by the hand here; if you want to keep discussing
this I'll be happy to give it a little bit more time but I might be a
little bit slower.
--
Dominique

2024-04-01 15:05:28

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH 1/1] swiotlb: Fix swiotlb_bounce() to do partial sync's correctly

From: Dominique Martinet <[email protected]> Sent: Monday, April 1, 2024 12:56 AM
>
> Michael Kelley wrote on Sat, Mar 30, 2024 at 04:16:30AM +0000:
> > From: Dominique Martinet <[email protected]>
> Sent: Friday, March 29, 2024 7:56 PM
> > > There are two things I don't understand here:
> > > 1/ Why orig_addr would come from slot[1] ?
> > >
> > > We have index = (tlb_addr - mem->start) >> IO_TLB_SHIFT,
> > > so index = (33 - 7) >> 5 = 26 >> 5 = 0
> > >
> > > As such, orig_addr = mem->slots[0].orig_addr and we'd need the offset to
> > > be 30, not -2 ?
> >
> > mem->start is the physical address of the global pool of
> > memory allocated for swiotlb buffers.
>
> Argh.
> Okay, that clears up a misunderstanding I've had since day one... I
> should have done a little more reading there.
> I've re-checked now and indeed mem->start comes from the pool init, and
> corresponds to the reserved memory base.
> (I'm not actually 100% sure reserved memory has to be aligned, but at the
> very least I've never seen any that isn't on my hardware so I'll pretend
> it must be without checking)

That memory is allocated in swiotlb_memblock_alloc() with a
call to memblock_alloc(). The second parameter passed to
memblock_alloc() is PAGE_SIZE, which is the required alignment.
There's a complicated dependency that this value must be >=
the largest value that's ever used for the min alignment. Only
a few drivers specify a min alignment, and the largest value used
is 4K, so that dependency is met by passing PAGE_SIZE.

>
> That makes much more sense with your fix, I agree offset must be
> negative in that case, and it'll work out.
>
> > > Well, either work - if we fix index to point to the next slot in the
> > > negative case that's also acceptable if we're sure it's valid, but I'm
> > > worried it might not be in cases there was only one slot e.g. mapping
> > > [7; 34] and calling with 33 size 2 would try to access slot 1 with a
> > > negative offset in your example, but slot[0] is the last valid slot.
> >
> > Right, but there wouldn't be one slot mapping [7; 34] if the
> > alignment rules are followed when the global swiotlb memory
> > pool is originally created. The low order IO_TLB_SHIFT bits
> > of slot physical addresses must be zero for the arithmetic
> > using shifts to work, so [7; 34] will cross a slot boundary and
> > two slots are needed.
>
> Yes, since the mem->start/slots belongs to the pool and not the mapping
> this didn't make sense either; there's no problem here.
>
> > > 2/ Why is orig_addr 37 the correct address to use for memcpy, and not
> > > 33? I'd think it's off by a "minimum alignment page", for me this
> > > computation only works if the dma_get_min_align size is bigger than io
> > > tlb size.
> >
> > The swiotlb mapping operation establishes a pair-wise mapping between
> > an orig_addr and tlb_addr, with the mapping extending for a specified
> > number of bytes. Your example started with orig_addr = 7, and I
> > posited that the mapping extends for 40 bytes.
>
> Sure.
>
> > I further posited that the tlb_addr returned by
> > swiotlb_tbl_map_single() would be 3 to meet the min alignment
> > requirement (which again only works if mem->start is 0).
>
> Okay that's where I'm lost.
> 1/ I agree that swiotlb_bounce() called from swiotlb_tbl_map_single()
> cannot be called with a tlb_addr past a single segment (which I'm not
> sure is acceptable in itself, taking the real value of 2KB for io tlb
> "pages", if the device requires 512 bytes alignment you won't be able to
> access [512-2048[ ?)

The swiotlb_bounce() call within swiotlb_tbl_map_single() is always
made with the tlb_addr that swiotlb_tbl_map_single() will return to
the calling driver, and for a size that is the mapping size passed as
a parameter to swiotlb_tbl_map_single(). In other words, that
swiotlb_bounce() call is never a partial sync -- it always sync's the
entire mapped area. And there's no problem with accessing the
entire mapped area as swiotlb_bounce() can't validate the
tlb_addr that's passed in. It _can_ validate the size, but the size
passed in is always valid if it is bouncing exactly the entire mapped
area.

> 2/ swiotlb_bounce() can be called from swiotlb_sync_single_for_device()
> or swiotlb_sync_single_for_cpu() with no alignment check on tlb_addr,
> we're just trusting that they only ever pass address within the same
> constraint ?

Yes, this is the case that is at issue in this entire discussion. When
the calling device driver is doing a sync of only part of a mapped
area (i.e., a "partial sync"), the tlb_addr passed to swiotlb_bounce()
might not be the tlb_addr returned by swiotlb_tbl_map_single(). It
might be that tlb_addr plus some increment. And indeed,
there's no way for swiotlb_bounce() to validate that the tlb_addr
plus the increment is really within the swiotlb slots allocated for
the mapping. So the swiotlb code is trusting that the calling
device driver isn't doing something bogus.

But in the bigger picture, swiotlb_tbl_map_single() returns a
tlb_addr to the device driver. We already assume that the device
driver will pass that value to the actual device for it to do the
DMA. If the device driver is badly behaved and passes a bad
tlb_addr to the device for the DMA, it can already wreak havoc.
So not being able to validate the tlb_addr passed to
swiotlb_bounce() doesn't really make things any worse. Drivers
must be well-behaved.

>
> If you assume tlb addr can only go from the start of a slot to as far as
> the min alignment allows then I agree there's no more problem, but I
> don't understand where that comes from.

No, the code doesn't assume that. The tlb_addr and size passed to
swiotlb_bounce() via one of the "swiotlb_sync_*()" calls can
legitimately identify any subset of the originally mapped area. If
the originally mapped area occupies 10 slots in the swiotlb, then
swiotlb_bounce() can sync any subset range across those 10 slots
(starting after the minimum alignment "padding" in the 1st slot).
For example, the tlb_addr might point into the 9th of the 10 slots,
as long as that tlb_addr plus the size doesn't extend past the end
of the originally mapped area.

>
>
> Either way, that's not a new problem, and the old checks aren't making
> this any better so as far as I'm concerned this patch is Progress:
> Reviewed-by: Dominique Martinet <[email protected]>
>

Thanks.

>
> Thanks for taking me by the hand here; if you want to keep discussing
> this I'll be happy to give it a little bit more time but I might be a
> little bit slower.

No problem.

> --
> Dominique

2024-04-02 15:10:32

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/1] swiotlb: Fix swiotlb_bounce() to do partial sync's correctly

Thanks,

applied to the dma-mapping tree for Linux 6.9-rc.