2020-11-23 22:23:29

by Jianxiong Gao

[permalink] [raw]
Subject: [PATCH] [PATCH] Adding offset keeping option when mapping data via SWIOTLB.

NVMe driver and other applications may depend on the data offset
to operate correctly. Currently when unaligned data is mapped via
SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. When
booting with --swiotlb=force option and using NVMe as interface,
running mkfs.xfs on Rhel fails because of the unalignment issue.
This patch adds an option to make sure the mapped data preserves
its offset of the orginal addrss. Tested on latest kernel that
this patch fixes the issue.

Signed-off-by: Jianxiong Gao <[email protected]>
Acked-by: David Rientjes <[email protected]>
---
drivers/nvme/host/pci.c | 3 ++-
include/linux/dma-mapping.h | 8 ++++++++
kernel/dma/swiotlb.c | 13 +++++++++++++
3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 0578ff253c47..a366fb8a1ff0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -833,7 +833,8 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
else
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
- rq_dma_dir(req), DMA_ATTR_NO_WARN);
+ rq_dma_dir(req),
+ DMA_ATTR_NO_WARN|DMA_ATTR_SWIOTLB_KEEP_OFFSET);
if (!nr_mapped)
goto out;

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 956151052d45..e46d23d9fa20 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,6 +61,14 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)

+/*
+ * DMA_ATTR_SWIOTLB_KEEP_OFFSET: used to indicate that the buffer has to keep
+ * its offset when mapped via SWIOTLB. Some application functionality depends
+ * on the address offset, thus when buffers are mapped via SWIOTLB, the offset
+ * needs to be preserved.
+ */
+#define DMA_ATTR_SWIOTLB_KEEP_OFFSET (1UL << 10)
+
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 781b9dca197c..f43d7be1342d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -483,6 +483,13 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
max_slots = mask + 1
? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
: 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+
+ /*
+ * If we need to keep the offset when mapping, we need to add the offset
+ * to the total set we need to allocate in SWIOTLB
+ */
+ if (attrs & DMA_ATTR_SWIOTLB_KEEP_OFFSET)
+ alloc_size += offset_in_page(orig_addr);

/*
* For mappings greater than or equal to a page, we limit the stride
@@ -567,6 +574,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
*/
for (i = 0; i < nslots; i++)
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+ /*
+ * When keeping the offset of the original data, we need to advance
+ * the tlb_addr by the offset of orig_addr.
+ */
+ if (attrs & DMA_ATTR_SWIOTLB_KEEP_OFFSET)
+ tlb_addr += orig_addr & (PAGE_SIZE - 1);
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
--
2.27.0



2020-11-25 00:04:25

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Adding offset keeping option when mapping data via SWIOTLB.

On Mon, Nov 23, 2020 at 02:18:07PM -0800, Jianxiong Gao wrote:
> NVMe driver and other applications may depend on the data offset
> to operate correctly. Currently when unaligned data is mapped via
> SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. When
> booting with --swiotlb=force option and using NVMe as interface,
> running mkfs.xfs on Rhel fails because of the unalignment issue.

RHEL? So a specific RHEL kernel. Is there a Red Hat bug created
for this that can be linked to this patch to make it easier
for folks to figure this?

Why would you be using swiotlb=force?
Ah, you are using AMD SEV!

> This patch adds an option to make sure the mapped data preserves
> its offset of the orginal addrss. Tested on latest kernel that

s/addrss/address/
> this patch fixes the issue.
>
> Signed-off-by: Jianxiong Gao <[email protected]>
> Acked-by: David Rientjes <[email protected]>
> ---
> drivers/nvme/host/pci.c | 3 ++-
> include/linux/dma-mapping.h | 8 ++++++++
> kernel/dma/swiotlb.c | 13 +++++++++++++
> 3 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 0578ff253c47..a366fb8a1ff0 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -833,7 +833,8 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
> iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
> else
> nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> - rq_dma_dir(req), DMA_ATTR_NO_WARN);
> + rq_dma_dir(req),
> + DMA_ATTR_NO_WARN|DMA_ATTR_SWIOTLB_KEEP_OFFSET);
> if (!nr_mapped)
> goto out;
>
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 956151052d45..e46d23d9fa20 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -61,6 +61,14 @@
> */
> #define DMA_ATTR_PRIVILEGED (1UL << 9)
>
> +/*
> + * DMA_ATTR_SWIOTLB_KEEP_OFFSET: used to indicate that the buffer has to keep
> + * its offset when mapped via SWIOTLB. Some application functionality depends
> + * on the address offset, thus when buffers are mapped via SWIOTLB, the offset
> + * needs to be preserved.
> + */
> +#define DMA_ATTR_SWIOTLB_KEEP_OFFSET (1UL << 10)
> +
> /*
> * A dma_addr_t can hold any valid DMA or bus address for the platform. It can
> * be given to a device to use as a DMA source or target. It is specific to a
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 781b9dca197c..f43d7be1342d 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -483,6 +483,13 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
> max_slots = mask + 1
> ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
> : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
> +
> + /*
> + * If we need to keep the offset when mapping, we need to add the offset
> + * to the total set we need to allocate in SWIOTLB
> + */
> + if (attrs & DMA_ATTR_SWIOTLB_KEEP_OFFSET)
> + alloc_size += offset_in_page(orig_addr);
>
> /*
> * For mappings greater than or equal to a page, we limit the stride
> @@ -567,6 +574,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
> */
> for (i = 0; i < nslots; i++)
> io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> + /*
> + * When keeping the offset of the original data, we need to advance
> + * the tlb_addr by the offset of orig_addr.
> + */
> + if (attrs & DMA_ATTR_SWIOTLB_KEEP_OFFSET)
> + tlb_addr += orig_addr & (PAGE_SIZE - 1);
> if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
> swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
> --
> 2.27.0
>
>

2020-11-25 02:04:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Adding offset keeping option when mapping data via SWIOTLB.

On Mon, Nov 23, 2020 at 02:18:07PM -0800, Jianxiong Gao wrote:
> NVMe driver and other applications may depend on the data offset
> to operate correctly. Currently when unaligned data is mapped via
> SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. When
> booting with --swiotlb=force option and using NVMe as interface,
> running mkfs.xfs on Rhel fails because of the unalignment issue.
> This patch adds an option to make sure the mapped data preserves
> its offset of the orginal addrss. Tested on latest kernel that
> this patch fixes the issue.
>
> Signed-off-by: Jianxiong Gao <[email protected]>
> Acked-by: David Rientjes <[email protected]>

I think we actually need to do this by default. There are plenty
of other hardware designs that rely on dma mapping not adding
offsets that did not exist, e.g. ahci and various RDMA NICs.