2019-05-22 07:21:59

by Horia Geanta

[permalink] [raw]
Subject: [PATCH] swiotlb: sync buffer when mapping FROM_DEVICE

From the very beginning, the swiotlb implementation (and even before that,
pci implementation if we look in full git history) did not sync
the bounced buffer in case of DMA mapping using DMA_FROM_DEVICE direction.

However, this is incorrect since the device might not write to that area
at all (or might partially write to it), which leads to data corruption
in the sense that data in original buffer is lost (overwritten with
uninitialized data in the bounced buffer at DMA unmap time).

In general, DMA mapping using DMA_FROM_DEVICE does not mean existing data
should be thrown away.

Fix this by sync-ing the bounced buffer at DMA mapping time
irrespective of DMA direction.

Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Horia Geantă <[email protected]>
---

I haven't provided a Fixes tag since this approach goes way back in time.
If you agree with the fix, we'll have to decide if it should go
into -stable and what's the earliest LTS branch to get the backport.

Patch is based on konrad/swiotlb.git, devel/for-linus-5.2 branch.

kernel/dma/swiotlb.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 38d57218809c..f330222f0eb5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -545,13 +545,14 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,

/*
* Save away the mapping from the original address to the DMA address.
- * This is needed when we sync the memory. Then we sync the buffer if
- * needed.
+ * This is needed when we sync the memory. Then we sync the buffer
+ * irrespective of mapping direction - since for FROM_DEVICE we want to
+ * make sure original data is not lost in the case of device not fully
+ * overwriting the area mapped.
*/
for (i = 0; i < nslots; i++)
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);

return tlb_addr;
--
2.17.1


2019-05-22 12:34:50

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] swiotlb: sync buffer when mapping FROM_DEVICE

I'm a little worried about this. While it looks functionally correct
we have surived without it, and doing another copy for every swiotlb
dma mapping from the device looks extremely painful for the typical use
cases where we expect the device to transfer the whole mapping.

I'd be tempted to instead properl document the current behavior and
introduce a new DMA_ATTR_PARTIAL flag to allow for partial mappings.

2019-05-22 12:52:28

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH] swiotlb: sync buffer when mapping FROM_DEVICE

On 22/05/2019 13:32, Christoph Hellwig wrote:
> I'm a little worried about this. While it looks functionally correct
> we have surived without it, and doing another copy for every swiotlb
> dma mapping from the device looks extremely painful for the typical use
> cases where we expect the device to transfer the whole mapping.
>
> I'd be tempted to instead properl document the current behavior and
> introduce a new DMA_ATTR_PARTIAL flag to allow for partial mappings.

Would that work out any different from the existing
DMA_ATTR_SKIP_CPU_SYNC? If drivers are prepared to handle this issue
from their end, they can already do so for single mappings by using that
attr along with explicit partial syncs via dma_sync_single(). For
page/sg mappings we'd still have the problem of identifying what part of
"partial" actually matters, and probably having to add some additional
new sync operations to cope.

Robin.

2019-05-22 13:11:34

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] swiotlb: sync buffer when mapping FROM_DEVICE

On Wed, May 22, 2019 at 01:50:47PM +0100, Robin Murphy wrote:
> Would that work out any different from the existing DMA_ATTR_SKIP_CPU_SYNC?
>
> If drivers are prepared to handle this issue from their end, they can
> already do so for single mappings by using that attr along with explicit
> partial syncs via dma_sync_single(). For page/sg mappings we'd still have
> the problem of identifying what part of "partial" actually matters, and
> probably having to add some additional new sync operations to cope.

Except that the same optimization we are tripping over here is also
present in dma_sync_* - dma_sync_*_to_device with DMA_FROM_DEVICE is a
no-op in swiotlb.

2019-05-22 13:28:57

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH] swiotlb: sync buffer when mapping FROM_DEVICE

On 2019-05-22 2:09 pm, Christoph Hellwig wrote:
> On Wed, May 22, 2019 at 01:50:47PM +0100, Robin Murphy wrote:
>> Would that work out any different from the existing DMA_ATTR_SKIP_CPU_SYNC?
>>
>> If drivers are prepared to handle this issue from their end, they can
>> already do so for single mappings by using that attr along with explicit
>> partial syncs via dma_sync_single(). For page/sg mappings we'd still have
>> the problem of identifying what part of "partial" actually matters, and
>> probably having to add some additional new sync operations to cope.
>
> Except that the same optimization we are tripping over here is also
> present in dma_sync_* - dma_sync_*_to_device with DMA_FROM_DEVICE is a
> no-op in swiotlb.

Sure, but that should be irrelevant since the effective problem here is
in the sync_*_for_cpu direction, and it's the unmap which nobbles the
buffer. If the driver does this:

dma_map_single(whole buffer);
<device writes to part of buffer>
dma_unmap_single(whole buffer);
<contents of rest of buffer now undefined>

then it could instead do this and be happy:

dma_map_single(whole buffer, SKIP_CPU_SYNC);
<device writes to part of buffer>
dma_sync_single_for_cpu(updated part of buffer);
dma_unmap_single(whole buffer, SKIP_CPU_SYNC);
<contents of rest of buffer still valid>

Robin.

2019-05-22 13:36:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] swiotlb: sync buffer when mapping FROM_DEVICE

On Wed, May 22, 2019 at 02:25:38PM +0100, Robin Murphy wrote:
> Sure, but that should be irrelevant since the effective problem here is in
> the sync_*_for_cpu direction, and it's the unmap which nobbles the buffer.
> If the driver does this:
>
> dma_map_single(whole buffer);
> <device writes to part of buffer>
> dma_unmap_single(whole buffer);
> <contents of rest of buffer now undefined>
>
> then it could instead do this and be happy:
>
> dma_map_single(whole buffer, SKIP_CPU_SYNC);
> <device writes to part of buffer>
> dma_sync_single_for_cpu(updated part of buffer);
> dma_unmap_single(whole buffer, SKIP_CPU_SYNC);
> <contents of rest of buffer still valid>

Assuming the driver knows how much was actually DMAed this would
solve the issue. Horia, does this work for you?

2019-05-22 13:57:23

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH] swiotlb: sync buffer when mapping FROM_DEVICE

On 22/05/2019 14:34, Christoph Hellwig wrote:
> On Wed, May 22, 2019 at 02:25:38PM +0100, Robin Murphy wrote:
>> Sure, but that should be irrelevant since the effective problem here is in
>> the sync_*_for_cpu direction, and it's the unmap which nobbles the buffer.
>> If the driver does this:
>>
>> dma_map_single(whole buffer);
>> <device writes to part of buffer>
>> dma_unmap_single(whole buffer);
>> <contents of rest of buffer now undefined>
>>
>> then it could instead do this and be happy:
>>
>> dma_map_single(whole buffer, SKIP_CPU_SYNC);
>> <device writes to part of buffer>
>> dma_sync_single_for_cpu(updated part of buffer);
>> dma_unmap_single(whole buffer, SKIP_CPU_SYNC);
>> <contents of rest of buffer still valid>
>
> Assuming the driver knows how much was actually DMAed this would
> solve the issue. Horia, does this work for you?

Ohhh, and now I've just twigged what you were suggesting - your
DMA_ATTR_PARTIAL flag would mean "treat this as a read-modify-write of
the buffer because we *don't* know exactly which parts the device may
write to". So indeed if we did go down that route we wouldn't need any
of the sync stuff I was worrying about (but I might suggest naming it
DMA_ATTR_UPDATE instead). Apologies for being slow :)

Robin.

2019-05-23 05:37:34

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH] swiotlb: sync buffer when mapping FROM_DEVICE

Hi Robin,

On 2019-05-22 15:55, Robin Murphy wrote:
> On 22/05/2019 14:34, Christoph Hellwig wrote:
>> On Wed, May 22, 2019 at 02:25:38PM +0100, Robin Murphy wrote:
>>> Sure, but that should be irrelevant since the effective problem here
>>> is in
>>> the sync_*_for_cpu direction, and it's the unmap which nobbles the
>>> buffer.
>>> If the driver does this:
>>>
>>>     dma_map_single(whole buffer);
>>>     <device writes to part of buffer>
>>>     dma_unmap_single(whole buffer);
>>>     <contents of rest of buffer now undefined>
>>>
>>> then it could instead do this and be happy:
>>>
>>>     dma_map_single(whole buffer, SKIP_CPU_SYNC);
>>>     <device writes to part of buffer>
>>>     dma_sync_single_for_cpu(updated part of buffer);
>>>     dma_unmap_single(whole buffer, SKIP_CPU_SYNC);
>>>     <contents of rest of buffer still valid>
>>
>> Assuming the driver knows how much was actually DMAed this would
>> solve the issue.  Horia, does this work for you?
>
> Ohhh, and now I've just twigged what you were suggesting - your
> DMA_ATTR_PARTIAL flag would mean "treat this as a read-modify-write of
> the buffer because we *don't* know exactly which parts the device may
> write to". So indeed if we did go down that route we wouldn't need any
> of the sync stuff I was worrying about (but I might suggest naming it
> DMA_ATTR_UPDATE instead). Apologies for being slow :)

Don't we have DMA_BIDIRECTIONAL for such case? Maybe we should update
documentation a bit to point that DMA_FROM_DEVICE expects the whole
buffer to be filled by the device?

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2019-05-23 16:27:33

by Horia Geanta

[permalink] [raw]
Subject: Re: [PATCH] swiotlb: sync buffer when mapping FROM_DEVICE

On 5/23/2019 8:35 AM, Marek Szyprowski wrote:
> Hi Robin,
>
> On 2019-05-22 15:55, Robin Murphy wrote:
>> On 22/05/2019 14:34, Christoph Hellwig wrote:
>>> On Wed, May 22, 2019 at 02:25:38PM +0100, Robin Murphy wrote:
>>>> Sure, but that should be irrelevant since the effective problem here
>>>> is in
>>>> the sync_*_for_cpu direction, and it's the unmap which nobbles the
>>>> buffer.
>>>> If the driver does this:
>>>>
>>>> ????dma_map_single(whole buffer);
>>>> ????<device writes to part of buffer>
>>>> ????dma_unmap_single(whole buffer);
>>>> ????<contents of rest of buffer now undefined>
>>>>
>>>> then it could instead do this and be happy:
>>>>
>>>> ????dma_map_single(whole buffer, SKIP_CPU_SYNC);
>>>> ????<device writes to part of buffer>
>>>> ????dma_sync_single_for_cpu(updated part of buffer);
>>>> ????dma_unmap_single(whole buffer, SKIP_CPU_SYNC);
>>>> ????<contents of rest of buffer still valid>
>>>
>>> Assuming the driver knows how much was actually DMAed this would
>>> solve the issue.? Horia, does this work for you?
In my particular case, input is provided as a scatterlist, out of which first N
bytes are problematic (not written to by device and corrupted when swiotlb
bouncing is needed), while remaining bytes (Total - N) are updated by the device.

>>
>> Ohhh, and now I've just twigged what you were suggesting - your
>> DMA_ATTR_PARTIAL flag would mean "treat this as a read-modify-write of
>> the buffer because we *don't* know exactly which parts the device may
>> write to". So indeed if we did go down that route we wouldn't need any
>> of the sync stuff I was worrying about (but I might suggest naming it
>> DMA_ATTR_UPDATE instead). Apologies for being slow :)
>
> Don't we have DMA_BIDIRECTIONAL for such case? Maybe we should update
> documentation a bit to point that DMA_FROM_DEVICE expects the whole
> buffer to be filled by the device?
>
Or, put more bluntly, driver must not rely on previous data in the area mapped
DMA_FROM_DEVICE. This limitation stems from the buffer bouncing mechanism of the
swiotlb DMA API backend, which other backends might not suffer from (e.g. IOMMU).

Btw, the device I am working on (caam crypto engine) is deployed in several SoCs
configured differently - with or without an IOMMU (and coherent or non-coherent
etc.). IOW it's a "power user" of the DMA API and I appreciate all the help in
solving / clarifying this kind of implicit assumptions.

Thanks,
Horia

2019-05-23 16:45:59

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] swiotlb: sync buffer when mapping FROM_DEVICE

On Thu, May 23, 2019 at 07:35:07AM +0200, Marek Szyprowski wrote:
> Don't we have DMA_BIDIRECTIONAL for such case?

Not sure if it was intended for that case, but it definitively should
do the right thing for swiotlb, and it should also do the right thing
in terms of cache maintainance.

> Maybe we should update
> documentation a bit to point that DMA_FROM_DEVICE expects the whole
> buffer to be filled by the device?

Probably. Horia, can you try to use DMA_BIDIRECTIONAL?

2019-05-23 17:56:47

by Horia Geanta

[permalink] [raw]
Subject: Re: [PATCH] swiotlb: sync buffer when mapping FROM_DEVICE

On 5/23/2019 7:43 PM, Christoph Hellwig wrote:
> On Thu, May 23, 2019 at 07:35:07AM +0200, Marek Szyprowski wrote:
>> Don't we have DMA_BIDIRECTIONAL for such case?
>
> Not sure if it was intended for that case, but it definitively should
> do the right thing for swiotlb, and it should also do the right thing
> in terms of cache maintainance.
>
>> Maybe we should update
>> documentation a bit to point that DMA_FROM_DEVICE expects the whole
>> buffer to be filled by the device?
>
> Probably. Horia, can you try to use DMA_BIDIRECTIONAL?
>
This works, but at the cost of performance - all the cache lines being written
back to memory, just to be overwritten by the device.

Thanks,
Horia

2019-05-23 18:07:34

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH] swiotlb: sync buffer when mapping FROM_DEVICE

On 23/05/2019 17:43, Christoph Hellwig wrote:
> On Thu, May 23, 2019 at 07:35:07AM +0200, Marek Szyprowski wrote:
>> Don't we have DMA_BIDIRECTIONAL for such case?
>
> Not sure if it was intended for that case, but it definitively should
> do the right thing for swiotlb, and it should also do the right thing
> in terms of cache maintainance.
>
>> Maybe we should update
>> documentation a bit to point that DMA_FROM_DEVICE expects the whole
>> buffer to be filled by the device?
>
> Probably. Horia, can you try to use DMA_BIDIRECTIONAL?
>

Yes, in general that should be a viable option. I got rather focused on
the distinction that a "partial" FROM_DEVICE mapping would still be
allowed to physically prevent the device from making any reads, whereas
BIDIRECTIONAL would not, but I suspect any benefit being lost there is
mostly one of debugging visibility rather than appreciable security, and
probably not enough to justify additional complexity on its own - I
couldn't say off-hand how many IOMMUs actually support write-only
permissions anyway.

Whichever way, I'd certainly have no objection to formalising what seems
to be the existing behaviour (both SWIOTLB and ARM dmabounce look
consistent, at least) as something like "for the DMA_FROM_DEVICE
direction, any parts of the buffer not written to by the device will
become undefined". The IOMMU bounce page stuff is going to be another
one in this boat, too.

Robin.