2021-06-08 02:45:00

by 'Dominique MARTINET'

[permalink] [raw]
Subject: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

I'm not able to find any individual mails for Christoph's patches so
replying to the PR.

In particular, this commit:
Konrad Rzeszutek Wilk wrote on Fri, Feb 26, 2021 at 11:00:08AM -0500:
> Christoph Hellwig (8):
> swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single

merged as 16fc3cef33a0, breaks caam as used today (thanks Lukas for
bisecting it!)


More precisely, drivers/crypto/caam/caamalg.c does a lot of mappings
that aren't aligned:

dma_sync_single_for_device(jrdev, ctx->sh_desc_enc_dma,
desc_bytes(desc), ctx->dir);
dma_sync_single_for_device(jrdev, ctx->sh_desc_dec_dma,
desc_bytes(desc), ctx->dir);


which can be caught by crypto tests with this caam enabled, for example
adding a warning when an unaligned mapping happens I get this trace:
--------
[ 1628.670226] swiotlb_tbl_sync_single+0x74/0xa0
[ 1628.674677] dma_sync_single_for_device+0xe4/0x110
[ 1628.679472] skcipher_setkey+0xd0/0xf0
[ 1628.683224] des3_skcipher_setkey+0x74/0xac
[ 1628.687416] crypto_skcipher_setkey+0x54/0x110
[ 1628.691866] crypto_authenc_setkey+0x94/0xd0
[ 1628.696138] crypto_aead_setkey+0x34/0x10c
[ 1628.700236] test_aead_vec_cfg+0x3a0/0x770
[ 1628.704338] test_aead+0xac/0x130
[ 1628.707656] alg_test_aead+0xa8/0x190
[ 1628.711324] alg_test.part.0+0xf4/0x41c
[ 1628.715161] alg_test+0x1c/0x60
[ 1628.718307] do_test+0x37ec/0x4c50
[ 1628.721709] do_test+0x4bec/0x4c50
[ 1628.725114] tcrypt_mod_init+0x54/0xac
[ 1628.728864] do_one_initcall+0x4c/0x1b0
[ 1628.732701] kernel_init_freeable+0x1d0/0x234
[ 1628.737060] kernel_init+0x10/0x114
[ 1628.740550] ret_from_fork+0x10/0x24
-----

and the tests themselves also fail (all or at least most of them) with
e.g.
------
[ 8.454233] caam_jr 30901000.jr: 40001713: DECO: desc idx 23: Header Error. Invalid length or parity, or certain other problems.
[ 8.465820] alg: ahash: hmac-sha256-caam final() failed with err -22 on test vector 0, cfg="init+update+final aligned buffer"
[ 8.477149] ------------[ cut here ]------------
[ 8.481781] alg: self-tests for hmac-sha256-caam (hmac(sha256)) failed (rc=-22)
[ 8.481818] WARNING: CPU: 2 PID: 295 at crypto/testmgr.c:5645 alg_test.part.0+0x128/0x41c
[ 8.497307] Modules linked in:
[ 8.500365] CPU: 2 PID: 295 Comm: cryptomgr_test Tainted: G W 5.13.0-rc5-00002-gc98cdee6172e #23
[ 8.510455] Hardware name: NXP i.MX8MPlus EVK board (DT)
[ 8.515767] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[ 8.521778] pc : alg_test.part.0+0x128/0x41c
[ 8.526050] lr : alg_test.part.0+0x128/0x41c
[ 8.530324] sp : ffff80001371bd10
[ 8.533637] x29: ffff80001371bd10 x28: 000000000000008f x27: 000000000000008f
[ 8.540785] x26: 000000000000008f x25: 0000000000000400 x24: ffff8000111658c8
[ 8.547930] x23: ffff0000c02aaa80 x22: 000000000001008f x21: ffff0000c02aaa00
[ 8.555075] x20: 0000000000000085 x19: 00000000ffffffea x18: 00000000fffffffc
[ 8.562221] x17: 0000000000000001 x16: 0000000000000003 x15: 0000000000000020
[ 8.569365] x14: ffffffffffffffff x13: 00000000000003e7 x12: ffff80001371b9e0
[ 8.576511] x11: ffff80001188c940 x10: ffff800011844300 x9 : ffff800011886b98
[ 8.583658] x8 : ffff80001182eb98 x7 : ffff800011886b98 x6 : ffffffffffff0888
[ 8.590801] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
[ 8.597945] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c1684e00
[ 8.605093] Call trace:
[ 8.607540] alg_test.part.0+0x128/0x41c
[ 8.611467] alg_test+0x1c/0x60
[ 8.614608] cryptomgr_test+0x28/0x50
[ 8.618275] kthread+0x154/0x160
[ 8.621511] ret_from_fork+0x10/0x24
[ 8.625088] ---[ end trace 2d195377ee3c219e ]---
------



Looking at it a bit further it seems to me that swiotlb_bounce() should
either keep the offset (re-adding the line that was removed except it
would go back in swiotlb_bounce, diff at end of mail), or the size
should be adjusted to cover from the start of the page up until the
original offset + size which would also probably work (not tested)

That, or make unaligned mappings forbidden and warn when we see one, but
I have no idea what other component could be doing some -- I'm not sure
if what the caam code does it legitimate (e.g. would it be possible to
do the mappings once at init and use them?), but the swiotlb code
doesn't look quite right.


For now I'll just revert this locally but there must have been a reason
the adjustment got removed in the first place, what's the best way
forward?





--- (for 5.13-rc1+)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d505d61c..3acdd77edfed 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -349,6 +349,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size

if (orig_addr == INVALID_PHYS_ADDR)
return;
+ orig_addr += (unsigned long)tlb_addr & (IO_TLB_SIZE - 1);

if (size > alloc_size) {
dev_WARN_ONCE(dev, 1,
---

Thanks,
--
Dominique


2021-06-10 14:52:48

by Horia Geantă

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

On 6/8/2021 5:35 AM, Dominique MARTINET wrote:
> I'm not able to find any individual mails for Christoph's patches so
> replying to the PR.
>
The patch set is here:
https://lore.kernel.org/linux-iommu/[email protected]

> In particular, this commit:
> Konrad Rzeszutek Wilk wrote on Fri, Feb 26, 2021 at 11:00:08AM -0500:
>> Christoph Hellwig (8):
>> swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single
>
> merged as 16fc3cef33a0, breaks caam as used today (thanks Lukas for
> bisecting it!)
>
Thanks.

I've noticed the failure also in v5.10 and v5.11 stable kernels,
since the patch set has been backported.

>
> More precisely, drivers/crypto/caam/caamalg.c does a lot of mappings
> that aren't aligned:
>
> dma_sync_single_for_device(jrdev, ctx->sh_desc_enc_dma,
> desc_bytes(desc), ctx->dir);
> dma_sync_single_for_device(jrdev, ctx->sh_desc_dec_dma,
> desc_bytes(desc), ctx->dir);
>
Right. These dma sync ops are in caaamalg.c and should be fixed.

OTOH there are other dma sync ops in caam driver - e.g. caamhash.c:
dma_sync_single_for_device(jrdev, ctx->sh_desc_update_dma,
desc_bytes(desc), ctx->dir);
where the mappings are aligned (see struct caam_hash_ctx),
but even in this case the crypto algorithms are failing.

>
> which can be caught by crypto tests with this caam enabled, for example
> adding a warning when an unaligned mapping happens I get this trace:
> --------
> [ 1628.670226] swiotlb_tbl_sync_single+0x74/0xa0
> [ 1628.674677] dma_sync_single_for_device+0xe4/0x110
> [ 1628.679472] skcipher_setkey+0xd0/0xf0
> [ 1628.683224] des3_skcipher_setkey+0x74/0xac
> [ 1628.687416] crypto_skcipher_setkey+0x54/0x110
> [ 1628.691866] crypto_authenc_setkey+0x94/0xd0
> [ 1628.696138] crypto_aead_setkey+0x34/0x10c
> [ 1628.700236] test_aead_vec_cfg+0x3a0/0x770
> [ 1628.704338] test_aead+0xac/0x130
> [ 1628.707656] alg_test_aead+0xa8/0x190
> [ 1628.711324] alg_test.part.0+0xf4/0x41c
> [ 1628.715161] alg_test+0x1c/0x60
> [ 1628.718307] do_test+0x37ec/0x4c50
> [ 1628.721709] do_test+0x4bec/0x4c50
> [ 1628.725114] tcrypt_mod_init+0x54/0xac
> [ 1628.728864] do_one_initcall+0x4c/0x1b0
> [ 1628.732701] kernel_init_freeable+0x1d0/0x234
> [ 1628.737060] kernel_init+0x10/0x114
> [ 1628.740550] ret_from_fork+0x10/0x24
> -----
>
> and the tests themselves also fail (all or at least most of them) with
> e.g.
> ------
> [ 8.454233] caam_jr 30901000.jr: 40001713: DECO: desc idx 23: Header Error. Invalid length or parity, or certain other problems.
> [ 8.465820] alg: ahash: hmac-sha256-caam final() failed with err -22 on test vector 0, cfg="init+update+final aligned buffer"
> [ 8.477149] ------------[ cut here ]------------
> [ 8.481781] alg: self-tests for hmac-sha256-caam (hmac(sha256)) failed (rc=-22)
> [ 8.481818] WARNING: CPU: 2 PID: 295 at crypto/testmgr.c:5645 alg_test.part.0+0x128/0x41c
> [ 8.497307] Modules linked in:
> [ 8.500365] CPU: 2 PID: 295 Comm: cryptomgr_test Tainted: G W 5.13.0-rc5-00002-gc98cdee6172e #23
> [ 8.510455] Hardware name: NXP i.MX8MPlus EVK board (DT)
> [ 8.515767] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> [ 8.521778] pc : alg_test.part.0+0x128/0x41c
> [ 8.526050] lr : alg_test.part.0+0x128/0x41c
> [ 8.530324] sp : ffff80001371bd10
> [ 8.533637] x29: ffff80001371bd10 x28: 000000000000008f x27: 000000000000008f
> [ 8.540785] x26: 000000000000008f x25: 0000000000000400 x24: ffff8000111658c8
> [ 8.547930] x23: ffff0000c02aaa80 x22: 000000000001008f x21: ffff0000c02aaa00
> [ 8.555075] x20: 0000000000000085 x19: 00000000ffffffea x18: 00000000fffffffc
> [ 8.562221] x17: 0000000000000001 x16: 0000000000000003 x15: 0000000000000020
> [ 8.569365] x14: ffffffffffffffff x13: 00000000000003e7 x12: ffff80001371b9e0
> [ 8.576511] x11: ffff80001188c940 x10: ffff800011844300 x9 : ffff800011886b98
> [ 8.583658] x8 : ffff80001182eb98 x7 : ffff800011886b98 x6 : ffffffffffff0888
> [ 8.590801] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
> [ 8.597945] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c1684e00
> [ 8.605093] Call trace:
> [ 8.607540] alg_test.part.0+0x128/0x41c
> [ 8.611467] alg_test+0x1c/0x60
> [ 8.614608] cryptomgr_test+0x28/0x50
> [ 8.618275] kthread+0x154/0x160
> [ 8.621511] ret_from_fork+0x10/0x24
> [ 8.625088] ---[ end trace 2d195377ee3c219e ]---
> ------
>
>
>
> Looking at it a bit further it seems to me that swiotlb_bounce() should
> either keep the offset (re-adding the line that was removed except it
> would go back in swiotlb_bounce, diff at end of mail), or the size
> should be adjusted to cover from the start of the page up until the
> original offset + size which would also probably work (not tested)
>
> That, or make unaligned mappings forbidden and warn when we see one, but
> I have no idea what other component could be doing some -- I'm not sure
> if what the caam code does it legitimate (e.g. would it be possible to
> do the mappings once at init and use them?), but the swiotlb code
> doesn't look quite right.
>
Well, it's not only about unaligned accesses.

It's also about partial syncs, e.g.
dma_handle = dma_map_single(dev, cpu_addr, size, DMA_BIDIRECTIONAL);
[...]
dma_sync_single_for_device(dev, dma_handle + offset, size - offset,
DMA_BIDIRECTIONAL);
(where dma_handle + offset should be cacheline-aligned).

Documentation/core-api/dma-api.rst explicitly allows for partial syncs:
Synchronise a single contiguous or scatter/gather mapping for the CPU
and device. With the sync_sg API, all the parameters must be the same
as those passed into the single mapping API. With the sync_single API,
you can use dma_handle and size parameters that aren't identical to
those passed into the single mapping API to do a partial sync.

AFAICS commit 16fc3cef33a0 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")
is breaking this functionality.

Thanks,
Horia

2021-06-10 19:43:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

On Thu, Jun 10, 2021 at 7:52 AM Horia Geantă <[email protected]> wrote:
>
> Documentation/core-api/dma-api.rst explicitly allows for partial syncs:
> Synchronise a single contiguous or scatter/gather mapping for the CPU
> and device. With the sync_sg API, all the parameters must be the same
> as those passed into the single mapping API. With the sync_single API,
> you can use dma_handle and size parameters that aren't identical to
> those passed into the single mapping API to do a partial sync.
>
> AFAICS commit 16fc3cef33a0 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")
> is breaking this functionality.

How about a patch like the attached? Does that fix things for you.

Christoph? Comments - that commit removed the offset calculation
entirely, because the old

(unsigned long)tlb_addr & (IO_TLB_SIZE - 1)

was wrong, but instead of removing it, I think it should have just
fixed it to be

(tlb_addr - mem->start) & (IO_TLB_SIZE - 1);

instead. That way the slot offset always matches the slot index calculation.

I also made it then take the offset into account for the alloc_size checks.

Does this UNTESTED patch perhaps do the right thing?

Linus


Attachments:
patch.diff (1.10 kB)

2021-06-10 23:21:22

by Horia Geantă

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

On 6/10/2021 10:41 PM, Linus Torvalds wrote:
>
> How about a patch like the attached? Does that fix things for you.
>
Yes, it fixes the caam driver regression.

Tested-by: Horia Geantă <[email protected]>

on top of next-20210610.

Thank you,
Horia

2021-06-11 06:22:31

by Christoph Hellwig

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

On Thu, Jun 10, 2021 at 05:52:07PM +0300, Horia Geantă wrote:
> I've noticed the failure also in v5.10 and v5.11 stable kernels,
> since the patch set has been backported.

FYI, there has been a patch on the list that should have fixed this
for about a month:

https://lore.kernel.org/linux-iommu/[email protected]/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f

but it seems like it never got picked up.

2021-06-11 10:38:19

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

On Fri, Jun 11, 2021 at 08:21:53AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 10, 2021 at 05:52:07PM +0300, Horia Geantă wrote:
> > I've noticed the failure also in v5.10 and v5.11 stable kernels,
> > since the patch set has been backported.
>
> FYI, there has been a patch on the list that should have fixed this
> for about a month:
>
> https://lore.kernel.org/linux-iommu/[email protected]/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f
>
> but it seems like it never got picked up.

Yikes!

Dominique,

Would you be up to testing the attached (and inline) patch please?

Linus,

Would you be terribly offended if I took your code (s/unsigned
long/unsigned int), and used Chanho's description of the problem (see below)?

Christoph,
I took the liberty of putting your Reviewed-by on the patch, you OK with
that?

Jianxiong,
Would you be up for testing this patch on your NVMe rig please? I don't
forsee a problem.. but just in case

From f06da55596675383fbf2563fe2919b2a8f68901b Mon Sep 17 00:00:00 2001
From: Linus Torvalds <[email protected]>
Date: Thu, 10 Jun 2021 12:41:26 -0700
Subject: [PATCH] swiotlb: manipulate orig_addr when tlb_addr has offset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

in case of driver wants to sync part of ranges with offset,
swiotlb_tbl_sync_single() copies from orig_addr base to tlb_addr with
offset and ends up with data mismatch.

It was removed from
"swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single",
but said logic has to be added back in.

[From Linus's email:
That commit which the removed the offset calculation entirely, because the old

(unsigned long)tlb_addr & (IO_TLB_SIZE - 1)

was wrong, but instead of removing it, I think it should have just
fixed it to be

(tlb_addr - mem->start) & (IO_TLB_SIZE - 1);

instead. That way the slot offset always matches the slot index calculation.]

The use-case that drivers are hitting is as follow:

1. Get dma_addr_t from dma_map_single()

dma_addr_t tlb_addr = dma_map_single(dev, vaddr, vsize, DMA_TO_DEVICE);

|<---------------vsize------------->|
+-----------------------------------+
| | original buffer
+-----------------------------------+
vaddr

swiotlb_align_offset
|<----->|<---------------vsize------------->|
+-------+-----------------------------------+
| | | swiotlb buffer
+-------+-----------------------------------+
tlb_addr

2. Do something
3. Sync dma_addr_t through dma_sync_single_for_device(..)

dma_sync_single_for_device(dev, tlb_addr + offset, size, DMA_TO_DEVICE);

Error case.
Copy data to original buffer but it is from base addr (instead of
base addr + offset) in original buffer:

swiotlb_align_offset
|<----->|<- offset ->|<- size ->|
+-------+-----------------------------------+
| | |##########| | swiotlb buffer
+-------+-----------------------------------+
tlb_addr

|<- size ->|
+-----------------------------------+
|##########| | original buffer
+-----------------------------------+
vaddr

The fix is to copy the data to the original buffer and take into
account the offset, like so:

swiotlb_align_offset
|<----->|<- offset ->|<- size ->|
+-------+-----------------------------------+
| | |##########| | swiotlb buffer
+-------+-----------------------------------+
tlb_addr

|<- offset ->|<- size ->|
+-----------------------------------+
| |##########| | original buffer
+-----------------------------------+
vaddr

[This patch text is from Bumyong's email; and his solution was very close
to Linus's, so incorporating his text]

Fixes: 16fc3cef33a0 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")
Signed-off-by: Bumyong Lee <[email protected]>
Signed-off-by: Chanho Park <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reported-by: Dominique MARTINET <[email protected]>
Reported-by: Horia Geantă <[email protected]>
Tested-by: Horia Geantă <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
CC: [email protected]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
kernel/dma/swiotlb.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d50..dc438b5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -342,6 +342,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
{
struct io_tlb_mem *mem = io_tlb_default_mem;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
+ unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
unsigned long pfn = PFN_DOWN(orig_addr);
@@ -350,6 +351,14 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
if (orig_addr == INVALID_PHYS_ADDR)
return;

+ if (offset > alloc_size) {
+ dev_WARN_ONCE(dev, 1,
+ "Buffer overflow detected. Offset: %lu. Mapping size: %zu.\n",
+ offset, size);
+ return;
+ }
+ alloc_size -= offset;
+ orig_addr += offset;
if (size > alloc_size) {
dev_WARN_ONCE(dev, 1,
"Buffer overflow detected. Allocation size: %zu. Mapping size: %zu.\n",
--
1.8.3.1


Attachments:
(No filename) (5.61 kB)
0001-swiotlb-manipulate-orig_addr-when-tlb_addr-has-offse.patch (4.65 kB)
Download all attachments

2021-06-11 11:00:38

by Horia Geantă

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

On 6/11/2021 1:35 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jun 11, 2021 at 08:21:53AM +0200, Christoph Hellwig wrote:
>> On Thu, Jun 10, 2021 at 05:52:07PM +0300, Horia Geantă wrote:
>>> I've noticed the failure also in v5.10 and v5.11 stable kernels,
>>> since the patch set has been backported.
>>
>> FYI, there has been a patch on the list that should have fixed this
>> for about a month:
>>
>> https://lore.kernel.org/linux-iommu/[email protected]/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f
>>
>> but it seems like it never got picked up.
>
> Yikes!
>
> Dominique,
>
> Would you be up to testing the attached (and inline) patch please?
>
> Linus,
>
> Would you be terribly offended if I took your code (s/unsigned
> long/unsigned int), and used Chanho's description of the problem (see below)?
>
Both patches work for my case.

However, there's yet another, possibly significant, difference b/w the two:
offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
vs.
offset = (tlb_addr & (IO_TLB_SIZE - 1)) -
swiotlb_align_offset(dev, orig_addr);

I think accounting for the alignment offset (swiotlb_align_offset())
has to be kept.

Horia

2021-06-11 16:24:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

On Fri, Jun 11, 2021 at 3:35 AM Konrad Rzeszutek Wilk
<[email protected]> wrote:
>
> Linus,
>
> Would you be terribly offended if I took your code (s/unsigned
> long/unsigned int), and used Chanho's description of the problem (see below)?

No offense to that at all - that looks like the right solution. See my
answer to Christoph: I do think my patch does the right one, but I
can't test it and my knowledge of the swiotlb code is not complete
enough to really do anything else than "this looks right".

And adding my sign-off to the patch is fine, but I don't necessarily
need the authorship credit - mine was a throw-away patch just looking
at what the bisection report said. All the real effort was by the
reporters (and for the commit message, Bumyong Lee & co).

Finally - looking at the two places that do have that
swiotlb_align_offset(), my reaction is "I don't understand what that
code is doing".

The index does that

index = find_slots(dev, orig_addr, alloc_size + offset);

so that offset does seem to depend on how the find_slots code works.
Which in turn does use the same dma_get_min_align_mask() thing that
swiotlb_align_offset() uses. So the offsets do seem to match, but
find_slots(dev() does a lot of stuff that I don't know. so...

IOW, it does reinforce my "I don't know this code AT ALL". Which just
makes me more convinced that I shouldn't get authorship of the patch
because if something goes wrong with it, I can't help.

So at most maybe a "Suggested-by:".

My patch really was based on very little context and "this is the
calculation that makes sense given the other calculations in the
function".

Linus

2021-06-17 02:41:22

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

On Wed, Jun 16, 2021 at 01:49:54PM -0700, Jianxiong Gao wrote:
> On Fri, Jun 11, 2021 at 3:35 AM Konrad Rzeszutek Wilk
> <[email protected]> wrote:
> >
> > On Fri, Jun 11, 2021 at 08:21:53AM +0200, Christoph Hellwig wrote:
> > > On Thu, Jun 10, 2021 at 05:52:07PM +0300, Horia Geantă wrote:
> > > > I've noticed the failure also in v5.10 and v5.11 stable kernels,
> > > > since the patch set has been backported.
> > >
> > > FYI, there has been a patch on the list that should have fixed this
> > > for about a month:
> > >
> > > https://lore.kernel.org/linux-iommu/[email protected]/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f
> > >
> > > but it seems like it never got picked up.
> >
> > Jianxiong,
> > Would you be up for testing this patch on your NVMe rig please? I don't
> > forsee a problem.. but just in case
> >
> I have tested the attached patch and it generates an error when
> formatting a disk to xfs format in Rhel 8 environment:

Thank you for testing that - and this is a bummer indeed.

Jianxiong,
How unique is this NVMe? Should I be able to reproduce this with any
type or is it specific to Google Cloud?

Dominique, Horia,

Are those crypto devices somehow easily available to test out the
patches?

P.S.
Most unfortunate timing - I am out in rural areas in US with not great
Internet, so won't be able to get fully down to this until Monday.

2021-06-17 02:42:20

by 'Dominique MARTINET'

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

Konrad Rzeszutek Wilk wrote on Wed, Jun 16, 2021 at 08:27:39PM -0400:
> Thank you for testing that - and this is a bummer indeed.

Hm, actually not that surprising if it was working without the offset
adjustments and doing non-aligned mappings -- perhaps the nvme code just
needs to round the offsets down instead of expecting swiotlb to do it?

Note I didn't look at that part of the code at all, so I might be
stating the obvious in a way that's difficult to adjust...


> Dominique, Horia,
>
> Are those crypto devices somehow easily available to test out the
> patches?

The one I have is included in the iMX8MP and iMX8MQ socs, the later is
included in the mnt reform and librem 5 and both have evaluation
toolkits but I wouldn't quite say they are easy to get...

I'm happy to test different patch variants if Horia doesn't beat me to
it though, it's not as practical as having the device but don't hesitate
to ask if I can run with extra debugs or something.

--
Dominique

2021-06-17 05:13:00

by Christoph Hellwig

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

On Thu, Jun 17, 2021 at 09:39:15AM +0900, Dominique MARTINET wrote:
> Konrad Rzeszutek Wilk wrote on Wed, Jun 16, 2021 at 08:27:39PM -0400:
> > Thank you for testing that - and this is a bummer indeed.
>
> Hm, actually not that surprising if it was working without the offset
> adjustments and doing non-aligned mappings -- perhaps the nvme code just
> needs to round the offsets down instead of expecting swiotlb to do it?

It can't. The whole point of the series was to keep the original offsets.

2021-06-17 05:38:32

by 'Dominique MARTINET'

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

Christoph Hellwig wrote on Thu, Jun 17, 2021 at 07:12:32AM +0200:
> On Thu, Jun 17, 2021 at 09:39:15AM +0900, Dominique MARTINET wrote:
> > Konrad Rzeszutek Wilk wrote on Wed, Jun 16, 2021 at 08:27:39PM -0400:
> > > Thank you for testing that - and this is a bummer indeed.
> >
> > Hm, actually not that surprising if it was working without the offset
> > adjustments and doing non-aligned mappings -- perhaps the nvme code just
> > needs to round the offsets down instead of expecting swiotlb to do it?
>
> It can't. The whole point of the series was to keep the original offsets.

Right, now I'm reading this again there are two kind of offsets (quoting
code from today's master)
---
static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size,
enum dma_data_direction dir)
{
struct io_tlb_mem *mem = io_tlb_default_mem;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = mem->slots[index].orig_addr;
---

There is:
- (tlb_addr - mem->start) alignment that Linus added up
- mem->slots[index].orig_addr alignment (within IO_TLB_SIZE blocks)


I would assume that series made it possible to preserve offsets within a
block for orig_addr, but in the process broke the offsets of a bounce
within an memory slot (the first one) ; I assume we want to restore here
the offset within the IO_TLB_SIZE block in orig_addr so it needs another
offseting of that orig_addr offset e.g. taking a block and offsets
within blocks, we have at the start of function:

|-----------------|-------------------|--------------------------|
^ ^ ^
block start slot orig addr tlb_addr

and want the orig_addr variable to align with tlb_addr.


So I was a bit hasty in saying nvme needs to remove offsets, it's more
that current code only has the second one working while the quick fix
breaks the second one in the process of fixing the first...



Jianxiong Gao, before spending more time on this, could you also try
Chanho Park's patch?
https://lore.kernel.org/linux-iommu/[email protected]/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f

I frankly don't understand many details of that code at this point,
in particular I have no idea why or if the patch needs another offset
with mem->start or where the dma_get_min_align_mask(dev) comes from,
but it'll be interesting to test.


Thanks,
--
Dominique

2021-06-17 11:42:06

by Christoph Hellwig

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

On Wed, Jun 16, 2021 at 08:27:39PM -0400, Konrad Rzeszutek Wilk wrote:
> How unique is this NVMe? Should I be able to reproduce this with any
> type or is it specific to Google Cloud?

With swiotlb=force this should be reproducable everywhere.

2021-06-18 19:31:26

by Jianxiong Gao

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

> Jianxiong Gao, before spending more time on this, could you also try
> Chanho Park's patch?
> https://lore.kernel.org/linux-iommu/2[email protected]/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f
>
I have tested Chanho Parks's patch and it works for us.
The NVMe driver performs correctly with the patch.

I have teste the patch on 06af8679449d

--
Jianxiong Gao

2021-06-21 02:14:01

by 'Dominique MARTINET'

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

Jianxiong Gao wrote on Fri, Jun 18, 2021 at 11:01:59AM -0700:
> > Jianxiong Gao, before spending more time on this, could you also try
> > Chanho Park's patch?
> > https://lore.kernel.org/linux-iommu/[email protected]/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f
> >
> I have tested Chanho Parks's patch and it works for us.
> The NVMe driver performs correctly with the patch.
>
> I have teste the patch on 06af8679449d

Thanks!
(a bit late, but added Chanho Park in Cc...)

I can confirm it also works for our caam problem, as Horia said.

I've also come to term with the use of swiotlb_align_offset() through
testing, or rather many devices seem to have a 0 mask so it will almost
always be cancelled out, so if it works for Jianxiong then it's probably
good enough and I'll just assume that's how the orig_addr has been
designed...

I think it's missing a couple of checks like the one Linus had in his
patch, and would be comfortable with something like the attached patch
(in practice for me exactly the same as the original patch, except I've
added two checks: offsets smaller than orig addr offset are refused as
well as offsets bigger than the mapping size)

I'm sorry Jianxiong but would you be willing to take the time to test
again just to make sure there were no such offsets in your case?


If we're good with that I'll send it as an official v2 keeping Chanho's
from, unless he wants to.


Thanks everyone,
--
Dominique



Attachments:
(No filename) (1.46 kB)
swiotlb.patch (2.12 kB)
swiotlb.patch
Download all attachments

2021-06-21 03:49:10

by Chanho Park

[permalink] [raw]
Subject: RE: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

+ Bumyong who is the original author of the patch.

Hi Dominique,

> Thanks!
> (a bit late, but added Chanho Park in Cc...)
>
> I can confirm it also works for our caam problem, as Horia said.
>
> I've also come to term with the use of swiotlb_align_offset() through
> testing, or rather many devices seem to have a 0 mask so it will almost
> always be cancelled out, so if it works for Jianxiong then it's probably
> good enough and I'll just assume that's how the orig_addr has been
> designed...
>
> I think it's missing a couple of checks like the one Linus had in his
> patch, and would be comfortable with something like the attached patch (in
> practice for me exactly the same as the original patch, except I've added
> two checks: offsets smaller than orig addr offset are refused as well as
> offsets bigger than the mapping size)
>
> I'm sorry Jianxiong but would you be willing to take the time to test
> again just to make sure there were no such offsets in your case?
>
>
> If we're good with that I'll send it as an official v2 keeping Chanho's
> from, unless he wants to.
>

Sure. No problem. But, the patch was already stacked on Konrad's tree
and linux-next as well.

https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=devel/for-linus-5.14&id=33d1641f38f0c327bc3e5c21de585c77a6512bc6

Best Regards,
Chanho Park


2021-06-21 04:16:35

by 'Dominique MARTINET'

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

Chanho Park wrote on Mon, Jun 21, 2021 at 11:55:22AM +0900:
> Sure. No problem. But, the patch was already stacked on Konrad's tree
> and linux-next as well.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=devel/for-linus-5.14&id=33d1641f38f0c327bc3e5c21de585c77a6512bc6

That patch is slightly different, it's a rewrite Konrad did that mixes
in Linus' suggestion[1], which breaks things for the NVMe usecase
Jianxiong Gao has.

[1] offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1)


Konrad is aware so I think it shouldn't be submitted :)

--
Dominique

2021-06-22 07:49:24

by 'Dominique MARTINET'

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

Konrad Rzeszutek Wilk wrote on Mon, Jun 21, 2021 at 09:16:43AM -0400:
> The beaty of 'devel' and 'linux-next' is that they can be reshuffled and
> mangled. I pushed them original patch from Bumyong there and will let
> it sit for a day and then create a stable branch and give it to Linus.

Thanks, that should be good.

Do you want me to send a follow-up patch with the two extra checks
(tlb_addr & (IO_TLB_SIZE -1)) > swiotlb_align_offset(dev, orig_addr)
tlb_offset < alloc_size

or are we certain this can't ever happen?
(I didn't see any hit in dmesg when I ran with these, but my opinion is
better safe than sorry...)


> Then I need to expand the test-regression bucket so that this does not
> happen again. Dominique, how easy would it be to purchase one of those
> devices?

My company is making such a device, but it's not on the market yet
(was planned for august, with some delay in approvisionning it'll
probably be a bit late), and would mean buying from Japan so I'm not
sure how convenient that would be...

These are originally NXP devices so I assume Horia would have better
suggestions, if you would?


> I was originally thinking to create a crypto device in QEMU to simulate
> this but that may take longer to write than just getting the real thing.
>
> Or I could create some fake devices with weird offsets and write a driver
> for it to exercise this.. like this one I had done some time ago that
> needs some brushing off.

Just a fake device with fake offsets as a test is probably good enough,
ideally would need to exerce both failures we've seen (offset in
dma_sync_single_for_device like caam does and in the original mapping (I
assume?) like the NVMe driver does), but that sounds possible :)


Thanks again!
--
Dominique

2021-06-22 21:58:57

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

On Tue, Jun 22, 2021 at 04:48:24PM +0900, 'Dominique MARTINET' wrote:
> Konrad Rzeszutek Wilk wrote on Mon, Jun 21, 2021 at 09:16:43AM -0400:
> > The beaty of 'devel' and 'linux-next' is that they can be reshuffled and
> > mangled. I pushed them original patch from Bumyong there and will let
> > it sit for a day and then create a stable branch and give it to Linus.
>
> Thanks, that should be good.
>
> Do you want me to send a follow-up patch with the two extra checks
> (tlb_addr & (IO_TLB_SIZE -1)) > swiotlb_align_offset(dev, orig_addr)
> tlb_offset < alloc_size
>
> or are we certain this can't ever happen?

I would love more patches and I saw the previous one you posted.

But we only got two (or one) weeks before the next merge window opens
so I am sending to Linus the one that was tested with NVMe and crypto
(see above).

That is the
https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=stable/for-linus-5.14

And then after Linus releases the 5.14 - I would love to take your
cleanup on top of that and test it?

> (I didn't see any hit in dmesg when I ran with these, but my opinion is
> better safe than sorry...)
>
>
> > Then I need to expand the test-regression bucket so that this does not
> > happen again. Dominique, how easy would it be to purchase one of those
> > devices?
>
> My company is making such a device, but it's not on the market yet
> (was planned for august, with some delay in approvisionning it'll
> probably be a bit late), and would mean buying from Japan so I'm not
> sure how convenient that would be...
>
> These are originally NXP devices so I assume Horia would have better
> suggestions, if you would?
>
>
> > I was originally thinking to create a crypto device in QEMU to simulate
> > this but that may take longer to write than just getting the real thing.
> >
> > Or I could create some fake devices with weird offsets and write a driver
> > for it to exercise this.. like this one I had done some time ago that
> > needs some brushing off.
>
> Just a fake device with fake offsets as a test is probably good enough,
> ideally would need to exerce both failures we've seen (offset in
> dma_sync_single_for_device like caam does and in the original mapping (I
> assume?) like the NVMe driver does), but that sounds possible :)

Yup. Working on that now.
>
>
> Thanks again!
> --
> Dominique