2021-01-19 19:01:14

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] nvme: fix handling mapping failure

On Tue, Jan 19, 2021 at 09:53:36AM -0800, Marc Orr wrote:
> This patch ensures that when `nvme_map_data()` fails to map the
> addresses in a scatter/gather list:
>
> * The addresses are not incorrectly unmapped. The underlying
> scatter/gather code unmaps the addresses after detecting a failure.
> Thus, unmapping them again in the driver is a bug.
> * The DMA pool allocations are not deallocated when they were never
> allocated.
>
> The bug that motivated this patch was the following sequence, which
> occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
>
> * NVMe driver calls dma_direct_map_sg()
> * dma_direct_map_sg() fails part way through the scatter gather/list
> * dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
> succeeded.
> * NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
> double unmap, which is a bug.
>
> Before this patch, I observed intermittent application- and VM-level
> failures when running a benchmark, fio, in an AMD SEV guest. This patch
> resolves the failures.

I think the right way to fix this is to just do a proper unwind insted
of calling a catchall function. Can you try this patch?

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 25456d02eddb8c..47d7075053b6b2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -842,7 +842,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
if (!iod->nents)
- goto out;
+ goto out_free_sg;

if (is_pci_p2pdma_page(sg_page(iod->sg)))
nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
@@ -851,16 +851,25 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req), DMA_ATTR_NO_WARN);
if (!nr_mapped)
- goto out;
+ goto out_free_sg;

iod->use_sgl = nvme_pci_use_sgls(dev, req);
if (iod->use_sgl)
ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
else
ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
-out:
if (ret != BLK_STS_OK)
- nvme_unmap_data(dev, req);
+ goto out_dma_unmap;
+ return BLK_STS_OK;
+
+out_dma_unmap:
+ if (is_pci_p2pdma_page(sg_page(iod->sg)))
+ pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
+ rq_dma_dir(req));
+ else
+ dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+out_free_sg:
+ mempool_free(iod->sg, dev->iod_mempool);
return ret;
}


2021-01-19 23:14:48

by Marc Orr

[permalink] [raw]
Subject: Re: [PATCH] nvme: fix handling mapping failure

On Tue, Jan 19, 2021 at 10:00 AM Christoph Hellwig <[email protected]> wrote:
>
> On Tue, Jan 19, 2021 at 09:53:36AM -0800, Marc Orr wrote:
> > This patch ensures that when `nvme_map_data()` fails to map the
> > addresses in a scatter/gather list:
> >
> > * The addresses are not incorrectly unmapped. The underlying
> > scatter/gather code unmaps the addresses after detecting a failure.
> > Thus, unmapping them again in the driver is a bug.
> > * The DMA pool allocations are not deallocated when they were never
> > allocated.
> >
> > The bug that motivated this patch was the following sequence, which
> > occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
> >
> > * NVMe driver calls dma_direct_map_sg()
> > * dma_direct_map_sg() fails part way through the scatter gather/list
> > * dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
> > succeeded.
> > * NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
> > double unmap, which is a bug.
> >
> > Before this patch, I observed intermittent application- and VM-level
> > failures when running a benchmark, fio, in an AMD SEV guest. This patch
> > resolves the failures.
>
> I think the right way to fix this is to just do a proper unwind insted
> of calling a catchall function. Can you try this patch?

Done. It works great, thanks! Shall I send out a v2 with what you've proposed?

> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 25456d02eddb8c..47d7075053b6b2 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -842,7 +842,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
> sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
> iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
> if (!iod->nents)
> - goto out;
> + goto out_free_sg;
>
> if (is_pci_p2pdma_page(sg_page(iod->sg)))
> nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
> @@ -851,16 +851,25 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
> nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> rq_dma_dir(req), DMA_ATTR_NO_WARN);
> if (!nr_mapped)
> - goto out;
> + goto out_free_sg;
>
> iod->use_sgl = nvme_pci_use_sgls(dev, req);
> if (iod->use_sgl)
> ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
> else
> ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
> -out:
> if (ret != BLK_STS_OK)
> - nvme_unmap_data(dev, req);
> + goto out_dma_unmap;
> + return BLK_STS_OK;
> +
> +out_dma_unmap:
> + if (is_pci_p2pdma_page(sg_page(iod->sg)))
> + pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
> + rq_dma_dir(req));
> + else
> + dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));

Do you think it's worth hoisting this sg unmap snippet into a helper
that can be called from both here, as well as nvme_unmap_data()?

> +out_free_sg:
> + mempool_free(iod->sg, dev->iod_mempool);
> return ret;
> }
>