2021-01-11 15:47:48

by Marc Orr

[permalink] [raw]
Subject: [PATCH] dma: mark unmapped DMA scatter/gather invalid

This patch updates dma_direct_unmap_sg() to mark each scatter/gather
entry invalid, after it's unmapped. This fixes two issues:

1. It makes the unmapping code able to tolerate a double unmap.
2. It prevents the NVMe driver from erroneously treating an unmapped DMA
address as mapped.

The bug that motivated this patch was the following sequence, which
occurred within the NVMe driver, with the kernel flag `swiotlb=force`.

* NVMe driver calls dma_direct_map_sg()
* dma_direct_map_sg() fails part way through the scatter gather/list
* dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
succeeded.
* NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
double unmap, which is a bug.

With this patch, a hadoop workload running on a cluster of three AMD
SEV VMs, is able to succeed. Without the patch, the hadoop workload
suffers application-level and even VM-level failures.

Tested-by: Jianxiong Gao <[email protected]>
Tested-by: Marc Orr <[email protected]>
Reviewed-by: Jianxiong Gao <[email protected]>
Signed-off-by: Marc Orr <[email protected]>
---
kernel/dma/direct.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 0a4881e59aa7..3d9b17fe5771 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
struct scatterlist *sg;
int i;

- for_each_sg(sgl, sg, nents, i)
+ for_each_sg(sgl, sg, nents, i) {
dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
attrs);
+ sg->dma_address = DMA_MAPPING_ERROR;
+ }
}
EXPORT_SYMBOL(dma_direct_unmap_sg);
#endif
--
2.30.0.284.gd98b1dd5eaa7-goog


2021-01-11 16:14:22

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid

On 2021-01-11 15:43, Marc Orr wrote:
> This patch updates dma_direct_unmap_sg() to mark each scatter/gather
> entry invalid, after it's unmapped. This fixes two issues:

s/fixes/bodges around (badly)/

> 1. It makes the unmapping code able to tolerate a double unmap.
> 2. It prevents the NVMe driver from erroneously treating an unmapped DMA
> address as mapped.
>
> The bug that motivated this patch was the following sequence, which
> occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
>
> * NVMe driver calls dma_direct_map_sg()
> * dma_direct_map_sg() fails part way through the scatter gather/list
> * dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
> succeeded.
> * NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
> double unmap, which is a bug.

So why not just fix that actual bug?

> With this patch, a hadoop workload running on a cluster of three AMD
> SEV VMs, is able to succeed. Without the patch, the hadoop workload
> suffers application-level and even VM-level failures.
>
> Tested-by: Jianxiong Gao <[email protected]>
> Tested-by: Marc Orr <[email protected]>
> Reviewed-by: Jianxiong Gao <[email protected]>
> Signed-off-by: Marc Orr <[email protected]>
> ---
> kernel/dma/direct.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 0a4881e59aa7..3d9b17fe5771 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> struct scatterlist *sg;
> int i;
>
> - for_each_sg(sgl, sg, nents, i)
> + for_each_sg(sgl, sg, nents, i) {
> dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
> attrs);
> + sg->dma_address = DMA_MAPPING_ERROR;

There are more DMA API backends than just dma-direct, so while this
might help paper over bugs when SWIOTLB is in use, it's not going to
have any effect when those same bugs are hit under other circumstances.
Once again, the moral of the story is that effort is better spent just
fixing the bugs ;)

Robin.

> + }
> }
> EXPORT_SYMBOL(dma_direct_unmap_sg);
> #endif
>

2021-01-11 16:19:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid

On Mon, Jan 11, 2021 at 07:43:35AM -0800, Marc Orr wrote:
> This patch updates dma_direct_unmap_sg() to mark each scatter/gather
> entry invalid, after it's unmapped. This fixes two issues:
>
> 1. It makes the unmapping code able to tolerate a double unmap.
> 2. It prevents the NVMe driver from erroneously treating an unmapped DMA
> address as mapped.
>
> The bug that motivated this patch was the following sequence, which
> occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
>
> * NVMe driver calls dma_direct_map_sg()
> * dma_direct_map_sg() fails part way through the scatter gather/list
> * dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
> succeeded.
> * NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
> double unmap, which is a bug.
>
> With this patch, a hadoop workload running on a cluster of three AMD
> SEV VMs, is able to succeed. Without the patch, the hadoop workload
> suffers application-level and even VM-level failures.
>
> Tested-by: Jianxiong Gao <[email protected]>
> Tested-by: Marc Orr <[email protected]>
> Reviewed-by: Jianxiong Gao <[email protected]>
> Signed-off-by: Marc Orr <[email protected]>
> ---
> kernel/dma/direct.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

2021-01-11 18:07:18

by Marc Orr

[permalink] [raw]
Subject: Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid

> On 2021-01-11 15:43, Marc Orr wrote:

minus [email protected], per gregkh@'s email.

> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 0a4881e59aa7..3d9b17fe5771 100644
> > --- a/kernel/dma/direct.c
> > +++ b/kernel/dma/direct.c
> > @@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> > struct scatterlist *sg;
> > int i;
> >
> > - for_each_sg(sgl, sg, nents, i)
> > + for_each_sg(sgl, sg, nents, i) {
> > dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
> > attrs);
> > + sg->dma_address = DMA_MAPPING_ERROR;
>
> There are more DMA API backends than just dma-direct, so while this
> might help paper over bugs when SWIOTLB is in use, it's not going to
> have any effect when those same bugs are hit under other circumstances.
> Once again, the moral of the story is that effort is better spent just
> fixing the bugs ;)

Thanks for the quick feedback. What is the correct fix? I understand
the first half. The NVMe driver should be updated to not call unmap on
an address that has already been unmapped within the DMA direct code.
Where I'm less certain is how to communicate to the NVMe driver that
the mapping failed. In particular, the NVMe code explicitly checks if
the first DMA address in the scatter/gather list is set to
DMA_MAPPING_ERROR. Thus, don't we need to update the DMA direct code
to propagate DMA_MAPPING_ERROR back up to the driver, via the
scatter/gather struct?

I skimmed arch/arm/mm/dma-mapping.c, just now. I can see that this
code sets the address within the scatter/gather struct to
DMA_MAPPING_ERROR before trying to map an IO address and write it into
the struct. Is this a good example to follow?

Thanks,
Marc

2021-01-11 20:45:47

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid

On 2021-01-11 18:03, Marc Orr wrote:
>> On 2021-01-11 15:43, Marc Orr wrote:
>
> minus [email protected], per gregkh@'s email.
>
>>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>>> index 0a4881e59aa7..3d9b17fe5771 100644
>>> --- a/kernel/dma/direct.c
>>> +++ b/kernel/dma/direct.c
>>> @@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>>> struct scatterlist *sg;
>>> int i;
>>>
>>> - for_each_sg(sgl, sg, nents, i)
>>> + for_each_sg(sgl, sg, nents, i) {
>>> dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
>>> attrs);
>>> + sg->dma_address = DMA_MAPPING_ERROR;
>>
>> There are more DMA API backends than just dma-direct, so while this
>> might help paper over bugs when SWIOTLB is in use, it's not going to
>> have any effect when those same bugs are hit under other circumstances.
>> Once again, the moral of the story is that effort is better spent just
>> fixing the bugs ;)
>
> Thanks for the quick feedback. What is the correct fix? I understand
> the first half. The NVMe driver should be updated to not call unmap on
> an address that has already been unmapped within the DMA direct code.
> Where I'm less certain is how to communicate to the NVMe driver that
> the mapping failed. In particular, the NVMe code explicitly checks if
> the first DMA address in the scatter/gather list is set to
> DMA_MAPPING_ERROR. Thus, don't we need to update the DMA direct code
> to propagate DMA_MAPPING_ERROR back up to the driver, via the
> scatter/gather struct?

Erm, you check the return value of dma_map_sg(). If it's zero, the
request failed; if it's nonzero, that's how many DMA segments you now
have to process. See Documentation/core-api/dma-api.rst.

The only guarantee offered about the state of the scatterlist itself is
that if it is successfully mapped, then the dma_address and dma_length
fields are valid for that many segments, and if that is fewer than the
total number of physical segments then the next one after the final DMA
segment will have a dma_length of 0. In particular there are no
guarantees at all about the state if the mapping was unsuccessful.

If a driver is failing to keep track of the success/failure status and
later down the line trying to guess what to do with a list that may or
may not have been mapped, then frankly that driver should be redesigned
because that is a terrible anti-pattern. At the very very least it
should explicitly encode its own "known bad" state upon failure that it
can then reliably recognise later.

Robin.

> I skimmed arch/arm/mm/dma-mapping.c, just now. I can see that this
> code sets the address within the scatter/gather struct to
> DMA_MAPPING_ERROR before trying to map an IO address and write it into
> the struct. Is this a good example to follow?
>
> Thanks,
> Marc
>

2021-01-11 21:41:14

by Marc Orr

[permalink] [raw]
Subject: Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid

> >>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> >>> index 0a4881e59aa7..3d9b17fe5771 100644
> >>> --- a/kernel/dma/direct.c
> >>> +++ b/kernel/dma/direct.c
> >>> @@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> >>> struct scatterlist *sg;
> >>> int i;
> >>>
> >>> - for_each_sg(sgl, sg, nents, i)
> >>> + for_each_sg(sgl, sg, nents, i) {
> >>> dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
> >>> attrs);
> >>> + sg->dma_address = DMA_MAPPING_ERROR;
> >>
> >> There are more DMA API backends than just dma-direct, so while this
> >> might help paper over bugs when SWIOTLB is in use, it's not going to
> >> have any effect when those same bugs are hit under other circumstances.
> >> Once again, the moral of the story is that effort is better spent just
> >> fixing the bugs ;)
> >
> > Thanks for the quick feedback. What is the correct fix? I understand
> > the first half. The NVMe driver should be updated to not call unmap on
> > an address that has already been unmapped within the DMA direct code.
> > Where I'm less certain is how to communicate to the NVMe driver that
> > the mapping failed. In particular, the NVMe code explicitly checks if
> > the first DMA address in the scatter/gather list is set to
> > DMA_MAPPING_ERROR. Thus, don't we need to update the DMA direct code
> > to propagate DMA_MAPPING_ERROR back up to the driver, via the
> > scatter/gather struct?
>
> Erm, you check the return value of dma_map_sg(). If it's zero, the
> request failed; if it's nonzero, that's how many DMA segments you now
> have to process. See Documentation/core-api/dma-api.rst.
>
> The only guarantee offered about the state of the scatterlist itself is
> that if it is successfully mapped, then the dma_address and dma_length
> fields are valid for that many segments, and if that is fewer than the
> total number of physical segments then the next one after the final DMA
> segment will have a dma_length of 0. In particular there are no
> guarantees at all about the state if the mapping was unsuccessful.
>
> If a driver is failing to keep track of the success/failure status and
> later down the line trying to guess what to do with a list that may or
> may not have been mapped, then frankly that driver should be redesigned
> because that is a terrible anti-pattern. At the very very least it
> should explicitly encode its own "known bad" state upon failure that it
> can then reliably recognise later.

Got it now. I'll get to work on a patch for the NVMe driver to fix the
bug. Thanks for all of these pointers. They are immensely helpful.

Thanks,
Marc