2021-05-14 01:25:19

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 00/22] Add new DMA mapping operation for P2PDMA

Hi,

This patchset continues my work to to add P2PDMA support to the common
dma map operations. This allows for creating SGLs that have both P2PDMA
and regular pages which is a necessary step to allowing P2PDMA pages in
userspace.

The earlier RFC[1] and v1[2] postings generated a lot of great feedback.
This version adds a bunch more cleanup at the start of the series. I'll
probably look to split the earlier patches off and get them merged
indpendantly after a round of review with this series as this series
has gotten quite long.

I'm happy to do a few more passes if anyone has any further feedback
or better ideas.

This series is based on v5.13-rc1 and a git branch can be found here:

https://github.com/sbates130272/linux-p2pmem/ p2pdma_map_ops_v2

Thanks,

Logan

[1] https://lore.kernel.org/linux-block/[email protected]/
[2] https://lore.kernel.org/linux-block/[email protected]/

Changes sine v1:
* Rebased onto v5.13-rc1
* Add some cleanup to the existing P2PDMA code to fix up some naming
conventions and documentation as the code has evolved a bit since the
names were chosen. (As suggested by John)
* Add a patch that adds a warning if a host bridge is not in the whitelist
(as suggested by Don)
* Change to using dma_map_sgtable() instead of creating a new
interface. For this, a couple of .map_sg implementations were changed
to return full error codes. (as per Christoph)
* Renamed the scatterlist functions to include the term "dma" to
indicate that they apply to the DMA side of the sg. (per Jason)
* Introduce ib_dma_pci_p2p_dma_supported() helper instead of open
coding the check (per Jason)
* Numerous minor adjustments and documentation fixes

Changes since the RFC:
* Added comment and fixed up the pci_get_slot patch. (per Bjorn)
* Fixed glaring sg_phys() double offset bug. (per Robin)
* Created a new map operation (dma_map_sg_p2pdma()) with a new calling
convention instead of modifying the calling convention of
dma_map_sg(). (per Robin)
* Integrated the two similar pci_p2pdma_dma_map_type() and
pci_p2pdma_map_type() functions into one (per Ira)
* Reworked some of the logic in the map_sg() implementations into
helpers in the p2pdma code. (per Christoph)
* Dropped a bunch of unnecessary symbol exports (per Christoph)
* Expanded the code in dma_pci_p2pdma_supported() for clarity. (per
Ira and Christoph)
* Finished off using the new dma_map_sg_p2pdma() call in rdma_rw
and removed the old pci_p2pdma_[un]map_sg(). (per Jason)

--

Logan Gunthorpe (22):
PCI/P2PDMA: Rename upstream_bridge_distance() and rework documentation
PCI/P2PDMA: Use a buffer on the stack for collecting the acs list
PCI/P2PDMA: Cleanup type for return value of calc_map_type_and_dist()
PCI/P2PDMA: Avoid pci_get_slot() which sleeps
PCI/P2PDMA: Print a warning if the host bridge is not in the whitelist
PCI/P2PDMA: Attempt to set map_type if it has not been set
PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagemap and device
dma-mapping: Allow map_sg() ops to return negative error codes
dma-direct: Return appropriate error code from dma_direct_map_sg()
iommu: Return full error code from iommu_map_sg[_atomic]()
dma-iommu: Return error code from iommu_dma_map_sg()
lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL
PCI/P2PDMA: Make pci_p2pdma_map_type() non-static
PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support
iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
nvme-pci: Check DMA ops when indicating support for PCI P2PDMA
nvme-pci: Convert to using dma_map_sgtable()
RDMA/core: Introduce ib_dma_pci_p2p_dma_supported()
RDMA/rw: use dma_map_sgtable()
PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()

drivers/infiniband/core/rw.c | 75 ++++------
drivers/iommu/dma-iommu.c | 86 +++++++++--
drivers/iommu/iommu.c | 15 +-
drivers/nvme/host/core.c | 3 +-
drivers/nvme/host/nvme.h | 2 +-
drivers/nvme/host/pci.c | 80 +++++-----
drivers/nvme/target/rdma.c | 2 +-
drivers/pci/Kconfig | 2 +-
drivers/pci/p2pdma.c | 273 +++++++++++++++++++----------------
include/linux/dma-map-ops.h | 18 ++-
include/linux/dma-mapping.h | 46 +++++-
include/linux/iommu.h | 22 +--
include/linux/pci-p2pdma.h | 81 ++++++++---
include/linux/scatterlist.h | 50 ++++++-
include/rdma/ib_verbs.h | 30 ++++
kernel/dma/direct.c | 44 +++++-
kernel/dma/mapping.c | 31 +++-
17 files changed, 570 insertions(+), 290 deletions(-)


base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
--
2.20.1


2021-05-14 01:25:59

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 20/22] RDMA/core: Introduce ib_dma_pci_p2p_dma_supported()

Introduce the helper function ib_dma_pci_p2p_dma_supported() to check
if a given ib_device can be used in P2PDMA transfers. This ensures
the ib_device is not using virt_dma and also that the underlying
dma_device supports P2PDMA.

Use the new helper in nvme-rdma to replace the existing check for
ib_uses_virt_dma(). Adding the dma_pci_p2pdma_supported() check allows
switching away from pci_p2pdma_[un]map_sg().

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/nvme/target/rdma.c | 2 +-
include/rdma/ib_verbs.h | 11 +++++++++++
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 6c1f3ab7649c..86f0bf4dc1ca 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -414,7 +414,7 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device *ndev,
if (ib_dma_mapping_error(ndev->device, r->send_sge.addr))
goto out_free_rsp;

- if (!ib_uses_virt_dma(ndev->device))
+ if (ib_dma_pci_p2p_dma_supported(ndev->device))
r->req.p2p_client = &ndev->device->dev;
r->send_sge.length = sizeof(*r->req.cqe);
r->send_sge.lkey = ndev->pd->local_dma_lkey;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7e2f3699b898..724e80a656f7 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3943,6 +3943,17 @@ static inline bool ib_uses_virt_dma(struct ib_device *dev)
return IS_ENABLED(CONFIG_INFINIBAND_VIRT_DMA) && !dev->dma_device;
}

+/*
+ * Check if a IB device's underlying DMA mapping supports P2PDMA transfers.
+ */
+static inline bool ib_dma_pci_p2p_dma_supported(struct ib_device *dev)
+{
+ if (ib_uses_virt_dma(dev))
+ return false;
+
+ return dma_pci_p2pdma_supported(dev->dma_device);
+}
+
/**
* ib_dma_mapping_error - check a DMA addr for error
* @dev: The device for which the dma_addr was created
--
2.20.1


2021-05-14 01:27:11

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 18/22] nvme-pci: Check DMA ops when indicating support for PCI P2PDMA

Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to
replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops
flags can be checked for PCI P2PDMA support.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/nvme/host/core.c | 3 ++-
drivers/nvme/host/nvme.h | 2 +-
drivers/nvme/host/pci.c | 11 +++++++++--
3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 522c9b229f80..9544a8948bc4 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3682,7 +3682,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid,
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue);

blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
- if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
+ if (ctrl->ops->supports_pci_p2pdma &&
+ ctrl->ops->supports_pci_p2pdma(ctrl))
blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);

ns->queue->queuedata = ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 05f31a2c64bb..84649ce6c206 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -487,7 +487,6 @@ struct nvme_ctrl_ops {
unsigned int flags;
#define NVME_F_FABRICS (1 << 0)
#define NVME_F_METADATA_SUPPORTED (1 << 1)
-#define NVME_F_PCI_P2PDMA (1 << 2)
int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
@@ -495,6 +494,7 @@ struct nvme_ctrl_ops {
void (*submit_async_event)(struct nvme_ctrl *ctrl);
void (*delete_ctrl)(struct nvme_ctrl *ctrl);
int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
+ bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl);
};

#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a29b170701fc..9912291f43af 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2756,17 +2756,24 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
return snprintf(buf, size, "%s\n", dev_name(&pdev->dev));
}

+static bool nvme_pci_supports_pci_p2pdma(struct nvme_ctrl *ctrl)
+{
+ struct nvme_dev *dev = to_nvme_dev(ctrl);
+
+ return dma_pci_p2pdma_supported(dev->dev);
+}
+
static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
.name = "pcie",
.module = THIS_MODULE,
- .flags = NVME_F_METADATA_SUPPORTED |
- NVME_F_PCI_P2PDMA,
+ .flags = NVME_F_METADATA_SUPPORTED,
.reg_read32 = nvme_pci_reg_read32,
.reg_write32 = nvme_pci_reg_write32,
.reg_read64 = nvme_pci_reg_read64,
.free_ctrl = nvme_pci_free_ctrl,
.submit_async_event = nvme_pci_submit_async_event,
.get_address = nvme_pci_get_address,
+ .supports_pci_p2pdma = nvme_pci_supports_pci_p2pdma,
};

static int nvme_dev_map(struct nvme_dev *dev)
--
2.20.1


2021-05-14 01:27:47

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 19/22] nvme-pci: Convert to using dma_map_sgtable()

The dma_map operations now support P2PDMA pages directly. So remove
the calls to pci_p2pdma_[un]map_sg_attrs() and replace them with calls
to dma_map_sgtable().

dma_map_sgtable() returns more complete error codes than dma_map_sg()
and allows differentiating EREMOTEIO errors in case an unsupported
P2PDMA transfer is requested. When this happens, return BLK_STS_TARGET
so the request isn't retried.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/nvme/host/pci.c | 69 +++++++++++++++++------------------------
1 file changed, 29 insertions(+), 40 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9912291f43af..8844ef1005c3 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -229,11 +229,10 @@ struct nvme_iod {
bool use_sgl;
int aborted;
int npages; /* In the PRP list. 0 means small pool in use */
- int nents; /* Used in scatterlist */
dma_addr_t first_dma;
unsigned int dma_len; /* length of single DMA segment mapping */
dma_addr_t meta_dma;
- struct scatterlist *sg;
+ struct sg_table sgt;
};

static inline unsigned int nvme_dbbuf_size(struct nvme_dev *dev)
@@ -525,7 +524,7 @@ static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx)
static void **nvme_pci_iod_list(struct request *req)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
- return (void **)(iod->sg + blk_rq_nr_phys_segments(req));
+ return (void **)(iod->sgt.sgl + blk_rq_nr_phys_segments(req));
}

static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
@@ -579,17 +578,6 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct request *req)

}

-static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
-{
- struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
- if (is_pci_p2pdma_page(sg_page(iod->sg)))
- pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
- rq_dma_dir(req));
- else
- dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
-}
-
static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -600,9 +588,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
return;
}

- WARN_ON_ONCE(!iod->nents);
+ WARN_ON_ONCE(!iod->sgt.nents);
+
+ dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0);

- nvme_unmap_sg(dev, req);
if (iod->npages == 0)
dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
iod->first_dma);
@@ -610,7 +599,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
nvme_free_sgls(dev, req);
else
nvme_free_prps(dev, req);
- mempool_free(iod->sg, dev->iod_mempool);
+ mempool_free(iod->sgt.sgl, dev->iod_mempool);
}

static void nvme_print_sgl(struct scatterlist *sgl, int nents)
@@ -633,7 +622,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev,
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
struct dma_pool *pool;
int length = blk_rq_payload_bytes(req);
- struct scatterlist *sg = iod->sg;
+ struct scatterlist *sg = iod->sgt.sgl;
int dma_len = sg_dma_len(sg);
u64 dma_addr = sg_dma_address(sg);
int offset = dma_addr & (NVME_CTRL_PAGE_SIZE - 1);
@@ -706,16 +695,16 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev,
dma_len = sg_dma_len(sg);
}
done:
- cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sg));
+ cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sgt.sgl));
cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma);
return BLK_STS_OK;
free_prps:
nvme_free_prps(dev, req);
return BLK_STS_RESOURCE;
bad_sgl:
- WARN(DO_ONCE(nvme_print_sgl, iod->sg, iod->nents),
+ WARN(DO_ONCE(nvme_print_sgl, iod->sgt.sgl, iod->sgt.nents),
"Invalid SGL for payload:%d nents:%d\n",
- blk_rq_payload_bytes(req), iod->nents);
+ blk_rq_payload_bytes(req), iod->sgt.nents);
return BLK_STS_IOERR;
}

@@ -741,12 +730,13 @@ static void nvme_pci_sgl_set_seg(struct nvme_sgl_desc *sge,
}

static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev,
- struct request *req, struct nvme_rw_command *cmd, int entries)
+ struct request *req, struct nvme_rw_command *cmd)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
struct dma_pool *pool;
struct nvme_sgl_desc *sg_list;
- struct scatterlist *sg = iod->sg;
+ struct scatterlist *sg = iod->sgt.sgl;
+ int entries = iod->sgt.nents;
dma_addr_t sgl_dma;
int i = 0;

@@ -844,7 +834,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
blk_status_t ret = BLK_STS_RESOURCE;
- int nr_mapped;
+ int rc;

if (blk_rq_nr_phys_segments(req) == 1) {
struct bio_vec bv = req_bvec(req);
@@ -862,26 +852,25 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
}

iod->dma_len = 0;
- iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
- if (!iod->sg)
+ iod->sgt.sgl = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
+ if (!iod->sgt.sgl)
return BLK_STS_RESOURCE;
- sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
- iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
- if (!iod->nents)
+ sg_init_table(iod->sgt.sgl, blk_rq_nr_phys_segments(req));
+ iod->sgt.orig_nents = blk_rq_map_sg(req->q, req, iod->sgt.sgl);
+ if (!iod->sgt.orig_nents)
goto out_free_sg;

- if (is_pci_p2pdma_page(sg_page(iod->sg)))
- nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
- iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
- else
- nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
- rq_dma_dir(req), DMA_ATTR_NO_WARN);
- if (!nr_mapped)
+ rc = dma_map_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req),
+ DMA_ATTR_NO_WARN);
+ if (rc) {
+ if (rc == -EREMOTEIO)
+ ret = BLK_STS_TARGET;
goto out_free_sg;
+ }

iod->use_sgl = nvme_pci_use_sgls(dev, req);
if (iod->use_sgl)
- ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
+ ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw);
else
ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
if (ret != BLK_STS_OK)
@@ -889,9 +878,9 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
return BLK_STS_OK;

out_unmap_sg:
- nvme_unmap_sg(dev, req);
+ dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0);
out_free_sg:
- mempool_free(iod->sg, dev->iod_mempool);
+ mempool_free(iod->sgt.sgl, dev->iod_mempool);
return ret;
}

@@ -924,7 +913,7 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,

iod->aborted = 0;
iod->npages = -1;
- iod->nents = 0;
+ iod->sgt.nents = 0;

/*
* We should not need to do this, but we're still using this to
--
2.20.1


2021-05-14 01:29:31

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 17/22] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg

When a PCI P2PDMA page is seen, set the IOVA length of the segment
to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
apply the appropriate bus address to the segment. The IOVA is not
created if the scatterlist only consists of P2PDMA pages.

A P2PDMA page may have three possible outcomes when being mapped:
1) If the data path between the two devices doesn't go through
the root port, then it should be mapped with a PCI bus address
2) If the data path goes through the host bridge, it should be mapped
normally with an IOMMU IOVA.
3) It is not possible for the two devices to communicate and thus
the mapping operation should fail (and it will return -EREMOTEIO).

Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.

With this change, the flags variable in the dma_map_ops is set to
DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/iommu/dma-iommu.c | 66 +++++++++++++++++++++++++++++++++++----
1 file changed, 60 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 0d69f509a95d..7cc6650e1fa5 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -20,6 +20,7 @@
#include <linux/mm.h>
#include <linux/mutex.h>
#include <linux/pci.h>
+#include <linux/pci-p2pdma.h>
#include <linux/swiotlb.h>
#include <linux/scatterlist.h>
#include <linux/vmalloc.h>
@@ -891,6 +892,16 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
sg_dma_address(s) = DMA_MAPPING_ERROR;
sg_dma_len(s) = 0;

+ if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
+ if (i > 0)
+ cur = sg_next(cur);
+
+ pci_p2pdma_map_bus_segment(s, cur);
+ count++;
+ cur_len = 0;
+ continue;
+ }
+
/*
* Now fill in the real DMA data. If...
* - there is a valid output segment to append to
@@ -988,6 +999,8 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
struct iova_domain *iovad = &cookie->iovad;
struct scatterlist *s, *prev = NULL;
int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
+ struct dev_pagemap *pgmap = NULL;
+ enum pci_p2pdma_map_type map_type;
dma_addr_t iova;
size_t iova_len = 0;
unsigned long mask = dma_get_seg_boundary(dev);
@@ -1023,6 +1036,35 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
s_length = iova_align(iovad, s_length + s_iova_off);
s->length = s_length;

+ if (is_pci_p2pdma_page(sg_page(s))) {
+ if (sg_page(s)->pgmap != pgmap) {
+ pgmap = sg_page(s)->pgmap;
+ map_type = pci_p2pdma_map_type(pgmap, dev);
+ }
+
+ switch (map_type) {
+ case PCI_P2PDMA_MAP_BUS_ADDR:
+ /*
+ * A zero length will be ignored by
+ * iommu_map_sg() and then can be detected
+ * in __finalise_sg() to actually map the
+ * bus address.
+ */
+ s->length = 0;
+ continue;
+ case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+ /*
+ * Mapping through host bridge should be
+ * mapped with regular IOVAs, thus we
+ * do nothing here and continue below.
+ */
+ break;
+ default:
+ ret = -EREMOTEIO;
+ goto out_restore_sg;
+ }
+ }
+
/*
* Due to the alignment of our single IOVA allocation, we can
* depend on these assumptions about the segment boundary mask:
@@ -1045,6 +1087,9 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
prev = s;
}

+ if (!iova_len)
+ return __finalise_sg(dev, sg, nents, 0);
+
iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
if (!iova) {
ret = -ENOMEM;
@@ -1071,7 +1116,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs)
{
- dma_addr_t start, end;
+ dma_addr_t end, start = DMA_MAPPING_ERROR;
struct scatterlist *tmp;
int i;

@@ -1087,14 +1132,22 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
* The scatterlist segments are mapped into a single
* contiguous IOVA allocation, so this is incredibly easy.
*/
- start = sg_dma_address(sg);
- for_each_sg(sg_next(sg), tmp, nents - 1, i) {
+ for_each_sg(sg, tmp, nents, i) {
+ if (sg_is_dma_pci_p2pdma(tmp)) {
+ sg_dma_unmark_pci_p2pdma(tmp);
+ continue;
+ }
if (sg_dma_len(tmp) == 0)
break;
- sg = tmp;
+
+ if (start == DMA_MAPPING_ERROR)
+ start = sg_dma_address(tmp);
+
+ end = sg_dma_address(tmp) + sg_dma_len(tmp);
}
- end = sg_dma_address(sg) + sg_dma_len(sg);
- __iommu_dma_unmap(dev, start, end - start);
+
+ if (start != DMA_MAPPING_ERROR)
+ __iommu_dma_unmap(dev, start, end - start);
}

static dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
@@ -1287,6 +1340,7 @@ static unsigned long iommu_dma_get_merge_boundary(struct device *dev)
}

static const struct dma_map_ops iommu_dma_ops = {
+ .flags = DMA_F_PCI_P2PDMA_SUPPORTED,
.alloc = iommu_dma_alloc,
.free = iommu_dma_free,
.alloc_pages = dma_common_alloc_pages,
--
2.20.1


2021-05-14 01:31:11

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 16/22] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support

Add a flags member to the dma_map_ops structure with one flag to
indicate support for PCI P2PDMA.

Also, add a helper to check if a device supports PCI P2PDMA.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
include/linux/dma-map-ops.h | 10 ++++++++++
include/linux/dma-mapping.h | 5 +++++
kernel/dma/mapping.c | 18 ++++++++++++++++++
3 files changed, 33 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index eaa969be8284..b7b51b89b054 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -11,7 +11,17 @@

struct cma;

+/*
+ * Values for struct dma_map_ops.flags:
+ *
+ * DMA_F_PCI_P2PDMA_SUPPORTED: Indicates the dma_map_ops implementation can
+ * handle PCI P2PDMA pages in the map_sg/unmap_sg operation.
+ */
+#define DMA_F_PCI_P2PDMA_SUPPORTED (1 << 0)
+
struct dma_map_ops {
+ unsigned int flags;
+
void *(*alloc)(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp,
unsigned long attrs);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2b0ecf0aa4df..1149540e969f 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -138,6 +138,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
unsigned long attrs);
bool dma_can_mmap(struct device *dev);
int dma_supported(struct device *dev, u64 mask);
+bool dma_pci_p2pdma_supported(struct device *dev);
int dma_set_mask(struct device *dev, u64 mask);
int dma_set_coherent_mask(struct device *dev, u64 mask);
u64 dma_get_required_mask(struct device *dev);
@@ -242,6 +243,10 @@ static inline int dma_supported(struct device *dev, u64 mask)
{
return 0;
}
+static inline bool dma_pci_p2pdma_supported(struct device *dev)
+{
+ return false;
+}
static inline int dma_set_mask(struct device *dev, u64 mask)
{
return -EIO;
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 700a2bb7cc9e..2940977b1030 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -658,6 +658,24 @@ int dma_supported(struct device *dev, u64 mask)
}
EXPORT_SYMBOL(dma_supported);

+bool dma_pci_p2pdma_supported(struct device *dev)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+
+ /* if ops is not set, dma direct will be used which supports P2PDMA */
+ if (!ops)
+ return true;
+
+ /*
+ * Note: dma_ops_bypass is not checked here because P2PDMA should
+ * not be used with dma mapping ops that do not have support even
+ * if the specific device is bypassing them.
+ */
+
+ return ops->flags & DMA_F_PCI_P2PDMA_SUPPORTED;
+}
+EXPORT_SYMBOL_GPL(dma_pci_p2pdma_supported);
+
#ifdef CONFIG_ARCH_HAS_DMA_SET_MASK
void arch_dma_set_mask(struct device *dev, u64 mask);
#else
--
2.20.1


2021-05-14 01:32:09

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 14/22] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations

Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg()
implementations. It takes an scatterlist segment that must point to a
pci_p2pdma struct page and will map it if the mapping requires a bus
address.

The return value indicates whether the mapping required a bus address
or whether the caller still needs to map the segment normally. If the
segment should not be mapped, -EREMOTEIO is returned.

This helper uses a state structure to track the changes to the
pgmap across calls and avoid needing to lookup into the xarray for
every page.

Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU
dma_map_sg() implementations where the sg segment containing the page
differs from the sg segment containing the DMA address.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/pci/p2pdma.c | 59 ++++++++++++++++++++++++++++++++++++++
include/linux/pci-p2pdma.h | 21 ++++++++++++++
2 files changed, 80 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 0568604fd8b4..b0d779aeade9 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -938,6 +938,65 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
}
EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);

+/**
+ * pci_p2pdma_map_segment - map an sg segment determining the mapping type
+ * @state: State structure that should be declared outside of the for_each_sg()
+ * loop and initialized to zero.
+ * @dev: DMA device that's doing the mapping operation
+ * @sg: scatterlist segment to map
+ *
+ * This is a helper to be used by non-iommu dma_map_sg() implementations where
+ * the sg segment is the same for the page_link and the dma_address.
+ *
+ * Attempt to map a single segment in an SGL with the PCI bus address.
+ * The segment must point to a PCI P2PDMA page and thus must be
+ * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check.
+ *
+ * Returns the type of mapping used and maps the page if the type is
+ * PCI_P2PDMA_MAP_BUS_ADDR.
+ */
+enum pci_p2pdma_map_type
+pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev,
+ struct scatterlist *sg)
+{
+ if (state->pgmap != sg_page(sg)->pgmap) {
+ state->pgmap = sg_page(sg)->pgmap;
+ state->map = pci_p2pdma_map_type(state->pgmap, dev);
+ state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
+ }
+
+ if (state->map == PCI_P2PDMA_MAP_BUS_ADDR) {
+ sg->dma_address = sg_phys(sg) + state->bus_off;
+ sg_dma_len(sg) = sg->length;
+ sg_dma_mark_pci_p2pdma(sg);
+ }
+
+ return state->map;
+}
+
+/**
+ * pci_p2pdma_map_bus_segment - map an sg segment pre determined to
+ * be mapped with PCI_P2PDMA_MAP_BUS_ADDR
+ * @pg_sg: scatterlist segment with the page to map
+ * @dma_sg: scatterlist segment to assign a dma address to
+ *
+ * This is a helper for iommu dma_map_sg() implementations when the
+ * segment for the dma address differs from the segment containing the
+ * source page.
+ *
+ * pci_p2pdma_map_type() must have already been called on the pg_sg and
+ * returned PCI_P2PDMA_MAP_BUS_ADDR.
+ */
+void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+ struct scatterlist *dma_sg)
+{
+ struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(sg_page(pg_sg)->pgmap);
+
+ dma_sg->dma_address = sg_phys(pg_sg) + pgmap->bus_offset;
+ sg_dma_len(dma_sg) = pg_sg->length;
+ sg_dma_mark_pci_p2pdma(dma_sg);
+}
+
/**
* pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
* to enable p2pdma
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index caac2d023f8f..e5a8d5bc0f51 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -13,6 +13,12 @@

#include <linux/pci.h>

+struct pci_p2pdma_map_state {
+ struct dev_pagemap *pgmap;
+ int map;
+ u64 bus_off;
+};
+
struct block_device;
struct scatterlist;

@@ -70,6 +76,11 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs);
void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs);
+enum pci_p2pdma_map_type
+pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev,
+ struct scatterlist *sg);
+void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+ struct scatterlist *dma_sg);
int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
bool *use_p2pdma);
ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
@@ -135,6 +146,16 @@ static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev,
unsigned long attrs)
{
}
+static inline enum pci_p2pdma_map_type
+pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev,
+ struct scatterlist *sg)
+{
+ return PCI_P2PDMA_MAP_NOT_SUPPORTED;
+}
+static inline void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+ struct scatterlist *dma_sg)
+{
+}
static inline int pci_p2pdma_enable_store(const char *page,
struct pci_dev **p2p_dev, bool *use_p2pdma)
{
--
2.20.1


2021-05-14 01:32:19

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 15/22] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg

Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
PCI P2PDMA pages directly without a hack in the callers. This allows
for heterogeneous SGLs that contain both P2PDMA and regular pages.

A P2PDMA page may have three possible outcomes when being mapped:
1) If the data path between the two devices doesn't go through the
root port, then it should be mapped with a PCI bus address
2) If the data path goes through the host bridge, it should be mapped
normally, as though it were a CPU physical address
3) It is not possible for the two devices to communicate and thus
the mapping operation should fail (and it will return -EREMOTEIO).

SGL segments that contain PCI bus addresses are marked with
sg_dma_mark_pci_p2pdma() and are ignored when unmapped.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
kernel/dma/direct.c | 44 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 803ee9321170..567dac942e89 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -13,6 +13,7 @@
#include <linux/vmalloc.h>
#include <linux/set_memory.h>
#include <linux/slab.h>
+#include <linux/pci-p2pdma.h>
#include "direct.h"

/*
@@ -381,29 +382,60 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
arch_sync_dma_for_cpu_all();
}

+/*
+ * Unmaps segments, except for ones marked as pci_p2pdma which do not
+ * require any further action as they contain a bus address.
+ */
void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
int nents, enum dma_data_direction dir, unsigned long attrs)
{
struct scatterlist *sg;
int i;

- for_each_sg(sgl, sg, nents, i)
- dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
- attrs);
+ for_each_sg(sgl, sg, nents, i) {
+ if (sg_is_dma_pci_p2pdma(sg)) {
+ sg_dma_unmark_pci_p2pdma(sg);
+ } else {
+ dma_direct_unmap_page(dev, sg->dma_address,
+ sg_dma_len(sg), dir, attrs);
+ }
+ }
}
#endif

int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
enum dma_data_direction dir, unsigned long attrs)
{
- int i;
+ struct pci_p2pdma_map_state p2pdma_state = {};
+ enum pci_p2pdma_map_type map;
struct scatterlist *sg;
+ int i, ret;

for_each_sg(sgl, sg, nents, i) {
+ if (is_pci_p2pdma_page(sg_page(sg))) {
+ map = pci_p2pdma_map_segment(&p2pdma_state, dev, sg);
+ switch (map) {
+ case PCI_P2PDMA_MAP_BUS_ADDR:
+ continue;
+ case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+ /*
+ * Mapping through host bridge should be
+ * mapped normally, thus we do nothing
+ * and continue below.
+ */
+ break;
+ default:
+ ret = -EREMOTEIO;
+ goto out_unmap;
+ }
+ }
+
sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
sg->offset, sg->length, dir, attrs);
- if (sg->dma_address == DMA_MAPPING_ERROR)
+ if (sg->dma_address == DMA_MAPPING_ERROR) {
+ ret = -EINVAL;
goto out_unmap;
+ }
sg_dma_len(sg) = sg->length;
}

@@ -411,7 +443,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,

out_unmap:
dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
- return -EINVAL;
+ return ret;
}

dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
--
2.20.1


2021-05-14 01:35:08

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 08/22] dma-mapping: Allow map_sg() ops to return negative error codes

Allow dma_map_sgtable() to pass errors from the map_sg() ops. This
will be required for returning appropriate error codes when mapping
P2PDMA memory.

Introduce __dma_map_sg_attrs() which will return the raw error code
from the map_sg operation (whether it be negative or zero). Then add a
dma_map_sg_attrs() wrapper to convert any negative errors to zero to
satisfy the existing calling convention.

dma_map_sgtable() will convert a zero error return for old map_sg() ops
into a -EINVAL return and return any negative errors as reported.

This allows map_sg implementations to start returning multiple
negative error codes. Legacy map_sg implementations can continue
to return zero until they are all converted.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
include/linux/dma-map-ops.h | 8 ++++++--
include/linux/dma-mapping.h | 41 ++++++++++++++++++++++++++++++++-----
kernel/dma/mapping.c | 13 +++++-------
3 files changed, 47 insertions(+), 15 deletions(-)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d53a96a3d64..eaa969be8284 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -41,8 +41,12 @@ struct dma_map_ops {
size_t size, enum dma_data_direction dir,
unsigned long attrs);
/*
- * map_sg returns 0 on error and a value > 0 on success.
- * It should never return a value < 0.
+ * map_sg should return a negative error code on error.
+ * dma_map_sgtable() will return the error code returned and convert
+ * a zero return (for legacy implementations) into -EINVAL.
+ *
+ * dma_map_sg() will always return zero on any negative or zero
+ * return to satisfy its own calling convention.
*/
int (*map_sg)(struct device *dev, struct scatterlist *sg, int nents,
enum dma_data_direction dir, unsigned long attrs);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 183e7103a66d..2b0ecf0aa4df 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -105,7 +105,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
unsigned long attrs);
void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs);
-int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
+int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
enum dma_data_direction dir, unsigned long attrs);
void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir,
@@ -164,7 +164,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
}
-static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
+static inline int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs)
{
return 0;
@@ -343,6 +343,34 @@ static inline void dma_sync_single_range_for_device(struct device *dev,
return dma_sync_single_for_device(dev, addr + offset, size, dir);
}

+/**
+ * dma_map_sg_attrs - Map the given buffer for DMA
+ * @dev: The device for which to perform the DMA operation
+ * @sg: The sg_table object describing the buffer
+ * @dir: DMA direction
+ * @attrs: Optional DMA attributes for the map operation
+ *
+ * Maps a buffer described by a scatterlist passed in the sg argument with
+ * nents segments for the @dir DMA operation by the @dev device.
+ *
+ * Returns the number of mapped entries (which can be less than nents)
+ * on success. Zero is returned for any error.
+ *
+ * dma_unmap_sg_attrs() should be used to unmap the buffer with the
+ * original sg and original nents (not the value returned by this funciton).
+ */
+static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
+ int nents, enum dma_data_direction dir, unsigned long attrs)
+{
+ int ret;
+
+ ret = __dma_map_sg_attrs(dev, sg, nents, dir, attrs);
+ if (ret < 0)
+ ret = 0;
+
+ return ret;
+}
+
/**
* dma_map_sgtable - Map the given buffer for DMA
* @dev: The device for which to perform the DMA operation
@@ -357,16 +385,19 @@ static inline void dma_sync_single_range_for_device(struct device *dev,
* ownership of the buffer back to the CPU domain before touching the
* buffer by the CPU.
*
- * Returns 0 on success or -EINVAL on error during mapping the buffer.
+ * Returns 0 on success or a negative error code on error
*/
static inline int dma_map_sgtable(struct device *dev, struct sg_table *sgt,
enum dma_data_direction dir, unsigned long attrs)
{
int nents;

- nents = dma_map_sg_attrs(dev, sgt->sgl, sgt->orig_nents, dir, attrs);
- if (nents <= 0)
+ nents = __dma_map_sg_attrs(dev, sgt->sgl, sgt->orig_nents, dir, attrs);
+ if (nents == 0)
return -EINVAL;
+ else if (nents < 0)
+ return nents;
+
sgt->nents = nents;
return 0;
}
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 2b06a809d0b9..700a2bb7cc9e 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -177,11 +177,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
}
EXPORT_SYMBOL(dma_unmap_page_attrs);

-/*
- * dma_maps_sg_attrs returns 0 on error and > 0 on success.
- * It should never return a value < 0.
- */
-int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
+int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
enum dma_data_direction dir, unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
@@ -197,12 +193,13 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);
- BUG_ON(ents < 0);
- debug_dma_map_sg(dev, sg, nents, ents, dir);
+
+ if (ents > 0)
+ debug_dma_map_sg(dev, sg, nents, ents, dir);

return ents;
}
-EXPORT_SYMBOL(dma_map_sg_attrs);
+EXPORT_SYMBOL(__dma_map_sg_attrs);

void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir,
--
2.20.1


2021-05-14 01:35:52

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 07/22] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagemap and device

All callers of pci_p2pdma_map_type() have a struct dev_pgmap and a
struct device (of the client doing the DMA transfer). Thus move the
conversion to struct pci_devs for the provider and client into this
function.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/pci/p2pdma.c | 29 +++++++++++------------------
1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 4034ffa0eb06..6c2057cd3f37 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -844,14 +844,21 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
}
EXPORT_SYMBOL_GPL(pci_p2pmem_publish);

-static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct pci_dev *provider,
- struct pci_dev *client)
+static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+ struct device *dev)
{
+ struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
enum pci_p2pdma_map_type ret;
+ struct pci_dev *client;

if (!provider->p2pdma)
return PCI_P2PDMA_MAP_NOT_SUPPORTED;

+ if (!dev_is_pci(dev))
+ return PCI_P2PDMA_MAP_NOT_SUPPORTED;
+
+ client = to_pci_dev(dev);
+
ret = xa_to_value(xa_load(&provider->p2pdma->map_types,
map_types_idx(client)));
if (ret == PCI_P2PDMA_MAP_UNKNOWN)
@@ -892,14 +899,8 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
{
struct pci_p2pdma_pagemap *p2p_pgmap =
to_p2p_pgmap(sg_page(sg)->pgmap);
- struct pci_dev *client;
-
- if (WARN_ON_ONCE(!dev_is_pci(dev)))
- return 0;

- client = to_pci_dev(dev);
-
- switch (pci_p2pdma_map_type(p2p_pgmap->provider, client)) {
+ switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
case PCI_P2PDMA_MAP_BUS_ADDR:
@@ -922,17 +923,9 @@ EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs)
{
- struct pci_p2pdma_pagemap *p2p_pgmap =
- to_p2p_pgmap(sg_page(sg)->pgmap);
enum pci_p2pdma_map_type map_type;
- struct pci_dev *client;
-
- if (WARN_ON_ONCE(!dev_is_pci(dev)))
- return;
-
- client = to_pci_dev(dev);

- map_type = pci_p2pdma_map_type(p2p_pgmap->provider, client);
+ map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);

if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
--
2.20.1


2021-05-14 01:41:33

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 10/22] iommu: Return full error code from iommu_map_sg[_atomic]()

Convert to ssize_t return code so the return code from __iommu_map()
can be returned all the way down through dma_iommu_map_sg().

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/iommu/iommu.c | 15 +++++++--------
include/linux/iommu.h | 22 +++++++++++-----------
2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d5df5..df428206ea6e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2567,9 +2567,9 @@ size_t iommu_unmap_fast(struct iommu_domain *domain,
}
EXPORT_SYMBOL_GPL(iommu_unmap_fast);

-static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
- struct scatterlist *sg, unsigned int nents, int prot,
- gfp_t gfp)
+static ssize_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
+ struct scatterlist *sg, unsigned int nents, int prot,
+ gfp_t gfp)
{
const struct iommu_ops *ops = domain->ops;
size_t len = 0, mapped = 0;
@@ -2610,19 +2610,18 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
/* undo mappings already done */
iommu_unmap(domain, iova, mapped);

- return 0;
-
+ return ret;
}

-size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
- struct scatterlist *sg, unsigned int nents, int prot)
+ssize_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
+ struct scatterlist *sg, unsigned int nents, int prot)
{
might_sleep();
return __iommu_map_sg(domain, iova, sg, nents, prot, GFP_KERNEL);
}
EXPORT_SYMBOL_GPL(iommu_map_sg);

-size_t iommu_map_sg_atomic(struct iommu_domain *domain, unsigned long iova,
+ssize_t iommu_map_sg_atomic(struct iommu_domain *domain, unsigned long iova,
struct scatterlist *sg, unsigned int nents, int prot)
{
return __iommu_map_sg(domain, iova, sg, nents, prot, GFP_ATOMIC);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..9369458ba1bd 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -414,11 +414,11 @@ extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
extern size_t iommu_unmap_fast(struct iommu_domain *domain,
unsigned long iova, size_t size,
struct iommu_iotlb_gather *iotlb_gather);
-extern size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
- struct scatterlist *sg,unsigned int nents, int prot);
-extern size_t iommu_map_sg_atomic(struct iommu_domain *domain,
- unsigned long iova, struct scatterlist *sg,
- unsigned int nents, int prot);
+extern ssize_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
+ struct scatterlist *sg, unsigned int nents, int prot);
+extern ssize_t iommu_map_sg_atomic(struct iommu_domain *domain,
+ unsigned long iova, struct scatterlist *sg,
+ unsigned int nents, int prot);
extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova);
extern void iommu_set_fault_handler(struct iommu_domain *domain,
iommu_fault_handler_t handler, void *token);
@@ -679,18 +679,18 @@ static inline size_t iommu_unmap_fast(struct iommu_domain *domain,
return 0;
}

-static inline size_t iommu_map_sg(struct iommu_domain *domain,
- unsigned long iova, struct scatterlist *sg,
- unsigned int nents, int prot)
+static inline ssize_t iommu_map_sg(struct iommu_domain *domain,
+ unsigned long iova, struct scatterlist *sg,
+ unsigned int nents, int prot)
{
- return 0;
+ return -ENODEV;
}

-static inline size_t iommu_map_sg_atomic(struct iommu_domain *domain,
+static inline ssize_t iommu_map_sg_atomic(struct iommu_domain *domain,
unsigned long iova, struct scatterlist *sg,
unsigned int nents, int prot)
{
- return 0;
+ return -ENODEV;
}

static inline void iommu_flush_iotlb_all(struct iommu_domain *domain)
--
2.20.1


2021-05-14 01:46:23

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 12/22] lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL

Make use of the third free LSB in scatterlist's page_link on 64bit systems.

The extra bit will be used by dma_[un]map_sg_p2pdma() to determine when a
given SGL segments dma_address points to a PCI bus address.
dma_unmap_sg_p2pdma() will need to perform different cleanup when a
segment is marked as P2PDMA.

Using this bit requires adding an additional dependency on CONFIG_64BIT to
CONFIG_PCI_P2PDMA. This should be acceptable as the majority of P2PDMA
use cases are restricted to newer root complexes and roughly require the
extra address space for memory BARs used in the transactions.

Signed-off-by: Logan Gunthorpe <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
---
drivers/pci/Kconfig | 2 +-
include/linux/scatterlist.h | 50 ++++++++++++++++++++++++++++++++++---
2 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 0c473d75e625..90b4bddb3300 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -163,7 +163,7 @@ config PCI_PASID

config PCI_P2PDMA
bool "PCI peer-to-peer transfer support"
- depends on ZONE_DEVICE
+ depends on ZONE_DEVICE && 64BIT
select GENERIC_ALLOCATOR
help
Enableѕ drivers to do PCI peer-to-peer transactions to and from
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 6f70572b2938..562c0aa264f5 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -58,6 +58,21 @@ struct sg_table {
#define SG_CHAIN 0x01UL
#define SG_END 0x02UL

+/*
+ * bit 2 is the third free bit in the page_link on 64bit systems which
+ * is used by dma_unmap_sg() to determine if the dma_address is a PCI
+ * bus address when doing P2PDMA.
+ * Note: CONFIG_PCI_P2PDMA depends on CONFIG_64BIT because of this.
+ */
+
+#ifdef CONFIG_PCI_P2PDMA
+#define SG_DMA_PCI_P2PDMA 0x04UL
+#else
+#define SG_DMA_PCI_P2PDMA 0x00UL
+#endif
+
+#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END | SG_DMA_PCI_P2PDMA)
+
/*
* We overload the LSB of the page pointer to indicate whether it's
* a valid sg entry, or whether it points to the start of a new scatterlist.
@@ -66,7 +81,9 @@ struct sg_table {
#define sg_is_chain(sg) ((sg)->page_link & SG_CHAIN)
#define sg_is_last(sg) ((sg)->page_link & SG_END)
#define sg_chain_ptr(sg) \
- ((struct scatterlist *) ((sg)->page_link & ~(SG_CHAIN | SG_END)))
+ ((struct scatterlist *) ((sg)->page_link & ~SG_PAGE_LINK_MASK))
+
+#define sg_is_dma_pci_p2pdma(sg) ((sg)->page_link & SG_DMA_PCI_P2PDMA)

/**
* sg_assign_page - Assign a given page to an SG entry
@@ -80,13 +97,13 @@ struct sg_table {
**/
static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
{
- unsigned long page_link = sg->page_link & (SG_CHAIN | SG_END);
+ unsigned long page_link = sg->page_link & SG_PAGE_LINK_MASK;

/*
* In order for the low bit stealing approach to work, pages
* must be aligned at a 32-bit boundary as a minimum.
*/
- BUG_ON((unsigned long) page & (SG_CHAIN | SG_END));
+ BUG_ON((unsigned long) page & SG_PAGE_LINK_MASK);
#ifdef CONFIG_DEBUG_SG
BUG_ON(sg_is_chain(sg));
#endif
@@ -120,7 +137,7 @@ static inline struct page *sg_page(struct scatterlist *sg)
#ifdef CONFIG_DEBUG_SG
BUG_ON(sg_is_chain(sg));
#endif
- return (struct page *)((sg)->page_link & ~(SG_CHAIN | SG_END));
+ return (struct page *)((sg)->page_link & ~SG_PAGE_LINK_MASK);
}

/**
@@ -222,6 +239,31 @@ static inline void sg_unmark_end(struct scatterlist *sg)
sg->page_link &= ~SG_END;
}

+/**
+ * sg_dma_mark_pci_p2pdma - Mark the scatterlist entry for PCI p2pdma
+ * @sg: SG entryScatterlist
+ *
+ * Description:
+ * Marks the passed in sg entry to indicate that the dma_address is
+ * a PCI bus address.
+ **/
+static inline void sg_dma_mark_pci_p2pdma(struct scatterlist *sg)
+{
+ sg->page_link |= SG_DMA_PCI_P2PDMA;
+}
+
+/**
+ * sg_unmark_pci_p2pdma - Unmark the scatterlist entry for PCI p2pdma
+ * @sg: SG entryScatterlist
+ *
+ * Description:
+ * Clears the PCI P2PDMA mark
+ **/
+static inline void sg_dma_unmark_pci_p2pdma(struct scatterlist *sg)
+{
+ sg->page_link &= ~SG_DMA_PCI_P2PDMA;
+}
+
/**
* sg_phys - Return physical address of an sg entry
* @sg: SG entry
--
2.20.1


2021-05-14 01:47:01

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 13/22] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static

pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
implementation because it will need to determine the mapping type
ahead of actually doing the mapping to create the actual iommu mapping.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/pci/p2pdma.c | 24 +++++++++++++---------
include/linux/pci-p2pdma.h | 41 ++++++++++++++++++++++++++++++++++++++
2 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 6c2057cd3f37..0568604fd8b4 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -20,13 +20,6 @@
#include <linux/seq_buf.h>
#include <linux/xarray.h>

-enum pci_p2pdma_map_type {
- PCI_P2PDMA_MAP_UNKNOWN = 0,
- PCI_P2PDMA_MAP_NOT_SUPPORTED,
- PCI_P2PDMA_MAP_BUS_ADDR,
- PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
-};
-
struct pci_p2pdma {
struct gen_pool *pool;
bool p2pmem_published;
@@ -844,8 +837,21 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
}
EXPORT_SYMBOL_GPL(pci_p2pmem_publish);

-static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
- struct device *dev)
+/**
+ * pci_p2pdma_map_type - return the type of mapping that should be used for
+ * a given device and pgmap
+ * @pgmap: the pagemap of a page to determine the mapping type for
+ * @dev: device that is mapping the page
+ *
+ * Returns one of:
+ * PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done
+ * PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address
+ * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done normally
+ * using the CPU physical address (in dma-direct) or an IOVA
+ * mapping for the IOMMU.
+ */
+enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+ struct device *dev)
{
struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
enum pci_p2pdma_map_type ret;
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 8318a97c9c61..caac2d023f8f 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -16,6 +16,40 @@
struct block_device;
struct scatterlist;

+enum pci_p2pdma_map_type {
+ /*
+ * PCI_P2PDMA_MAP_UNKNOWN: Used internally for indicating the mapping
+ * type hasn't been calculated yet. Functions that return this enum
+ * never return this value.
+ */
+ PCI_P2PDMA_MAP_UNKNOWN = 0,
+
+ /*
+ * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will
+ * traverse the host bridge and the host bridge is not in the
+ * whitelist. DMA Mapping routines should return an error when
+ * this is returned.
+ */
+ PCI_P2PDMA_MAP_NOT_SUPPORTED,
+
+ /*
+ * PCI_P2PDMA_BUS_ADDR: Indicates that two devices can talk to
+ * eachother directly through a PCI switch and the transaction will
+ * not traverse the host bridge. Such a mapping should program
+ * the DMA engine with PCI bus addresses.
+ */
+ PCI_P2PDMA_MAP_BUS_ADDR,
+
+ /*
+ * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk
+ * to eachother, but the transaction traverses a host bridge on the
+ * whitelist. In this case, a normal mapping either with CPU physical
+ * addresses (in the case of dma-direct) or IOVA addresses (in the
+ * case of IOMMUs) should be used to program the DMA engine.
+ */
+ PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
+};
+
#ifdef CONFIG_PCI_P2PDMA
int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
u64 offset);
@@ -30,6 +64,8 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
unsigned int *nents, u32 length);
void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
+enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+ struct device *dev);
int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs);
void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -83,6 +119,11 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
{
}
+static inline enum pci_p2pdma_map_type
+pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct device *dev)
+{
+ return PCI_P2PDMA_MAP_NOT_SUPPORTED;
+}
static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
struct scatterlist *sg, int nents, enum dma_data_direction dir,
unsigned long attrs)
--
2.20.1


2021-05-14 01:51:47

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 06/22] PCI/P2PDMA: Attempt to set map_type if it has not been set

Attempt to find the mapping type for P2PDMA pages on the first
DMA map attempt if it has not been done ahead of time.

Previously, the mapping type was expected to be calculated ahead of
time, but if pages are to come from userspace then there's no
way to ensure the path was checked ahead of time.

With this change it's no longer invalid to call pci_p2pdma_map_sg()
before the mapping type is calculated so drop the WARN_ON when that
is the case.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/pci/p2pdma.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 0e0b2218eacd..4034ffa0eb06 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -847,11 +847,17 @@ EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct pci_dev *provider,
struct pci_dev *client)
{
+ enum pci_p2pdma_map_type ret;
+
if (!provider->p2pdma)
return PCI_P2PDMA_MAP_NOT_SUPPORTED;

- return xa_to_value(xa_load(&provider->p2pdma->map_types,
- map_types_idx(client)));
+ ret = xa_to_value(xa_load(&provider->p2pdma->map_types,
+ map_types_idx(client)));
+ if (ret == PCI_P2PDMA_MAP_UNKNOWN)
+ return calc_map_type_and_dist_warn(provider, client, NULL);
+
+ return ret;
}

static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
@@ -899,7 +905,6 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
case PCI_P2PDMA_MAP_BUS_ADDR:
return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
default:
- WARN_ON_ONCE(1);
return 0;
}
}
--
2.20.1


2021-05-14 01:56:15

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 05/22] PCI/P2PDMA: Print a warning if the host bridge is not in the whitelist

If the host bridge is not in the whitelist print a warning in the
calc_map_type_and_dist_warn() path detailing the vendor and device IDs
that would need to be added to the whitelist.

Suggested-by: Don Dutile <[email protected]>
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/pci/p2pdma.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 175da3a9c8fb..0e0b2218eacd 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -340,7 +340,7 @@ static struct pci_dev *pci_host_bridge_dev(struct pci_host_bridge *host)
}

static bool __host_bridge_whitelist(struct pci_host_bridge *host,
- bool same_host_bridge)
+ bool same_host_bridge, bool warn)
{
struct pci_dev *root = pci_host_bridge_dev(host);
const struct pci_p2pdma_whitelist_entry *entry;
@@ -361,6 +361,10 @@ static bool __host_bridge_whitelist(struct pci_host_bridge *host,
return true;
}

+ if (warn)
+ pci_warn(root, "Host bridge not in P2PDMA whitelist: %04x:%04x\n",
+ vendor, device);
+
return false;
}

@@ -368,16 +372,17 @@ static bool __host_bridge_whitelist(struct pci_host_bridge *host,
* If we can't find a common upstream bridge take a look at the root
* complex and compare it to a whitelist of known good hardware.
*/
-static bool host_bridge_whitelist(struct pci_dev *a, struct pci_dev *b)
+static bool host_bridge_whitelist(struct pci_dev *a, struct pci_dev *b,
+ bool warn)
{
struct pci_host_bridge *host_a = pci_find_host_bridge(a->bus);
struct pci_host_bridge *host_b = pci_find_host_bridge(b->bus);

if (host_a == host_b)
- return __host_bridge_whitelist(host_a, true);
+ return __host_bridge_whitelist(host_a, true, warn);

- if (__host_bridge_whitelist(host_a, false) &&
- __host_bridge_whitelist(host_b, false))
+ if (__host_bridge_whitelist(host_a, false, warn) &&
+ __host_bridge_whitelist(host_b, false, warn))
return true;

return false;
@@ -513,7 +518,7 @@ calc_map_type_and_dist(struct pci_dev *provider, struct pci_dev *client,

if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE) {
if (!cpu_supports_p2pdma() &&
- !host_bridge_whitelist(provider, client))
+ !host_bridge_whitelist(provider, client, acs_redirects))
map_type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
}

--
2.20.1


2021-05-14 07:54:45

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 21/22] RDMA/rw: use dma_map_sgtable()

dma_map_sg() now supports the use of P2PDMA pages so pci_p2pdma_map_sg()
is no longer necessary and may be dropped.

Switch to the dma_map_sgtable() interface which will allow for better
error reporting if the P2PDMA pages are unsupported.

The change to sgtable also appears to fix a couple subtle error path
bugs:

- In rdma_rw_ctx_init(), dma_unmap would be called with an sg
that could have been incremented from the original call, as
well as an nents that was not the original number of nents
called when mapped.
- Similarly in rdma_rw_ctx_signature_init, both sg and prot_sg
were unmapped with the incorrect number of nents.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/infiniband/core/rw.c | 75 +++++++++++++++---------------------
include/rdma/ib_verbs.h | 19 +++++++++
2 files changed, 51 insertions(+), 43 deletions(-)

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index a588c2038479..68f2dda56138 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -273,26 +273,6 @@ static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
return 1;
}

-static void rdma_rw_unmap_sg(struct ib_device *dev, struct scatterlist *sg,
- u32 sg_cnt, enum dma_data_direction dir)
-{
- if (is_pci_p2pdma_page(sg_page(sg)))
- pci_p2pdma_unmap_sg(dev->dma_device, sg, sg_cnt, dir);
- else
- ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
-}
-
-static int rdma_rw_map_sg(struct ib_device *dev, struct scatterlist *sg,
- u32 sg_cnt, enum dma_data_direction dir)
-{
- if (is_pci_p2pdma_page(sg_page(sg))) {
- if (WARN_ON_ONCE(ib_uses_virt_dma(dev)))
- return 0;
- return pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
- }
- return ib_dma_map_sg(dev, sg, sg_cnt, dir);
-}
-
/**
* rdma_rw_ctx_init - initialize a RDMA READ/WRITE context
* @ctx: context to initialize
@@ -313,12 +293,16 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u32 port_num,
u64 remote_addr, u32 rkey, enum dma_data_direction dir)
{
struct ib_device *dev = qp->pd->device;
+ struct sg_table sgt = {
+ .sgl = sg,
+ .orig_nents = sg_cnt,
+ };
int ret;

- ret = rdma_rw_map_sg(dev, sg, sg_cnt, dir);
- if (!ret)
- return -ENOMEM;
- sg_cnt = ret;
+ ret = ib_dma_map_sgtable(dev, &sgt, dir, 0);
+ if (ret)
+ return ret;
+ sg_cnt = sgt.nents;

/*
* Skip to the S/G entry that sg_offset falls into:
@@ -354,7 +338,7 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u32 port_num,
return ret;

out_unmap_sg:
- rdma_rw_unmap_sg(dev, sg, sg_cnt, dir);
+ ib_dma_unmap_sgtable(dev, &sgt, dir, 0);
return ret;
}
EXPORT_SYMBOL(rdma_rw_ctx_init);
@@ -387,6 +371,14 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
qp->integrity_en);
struct ib_rdma_wr *rdma_wr;
int count = 0, ret;
+ struct sg_table sgt = {
+ .sgl = sg,
+ .orig_nents = sg_cnt,
+ };
+ struct sg_table prot_sgt = {
+ .sgl = prot_sg,
+ .orig_nents = prot_sg_cnt,
+ };

if (sg_cnt > pages_per_mr || prot_sg_cnt > pages_per_mr) {
pr_err("SG count too large: sg_cnt=%d, prot_sg_cnt=%d, pages_per_mr=%d\n",
@@ -394,18 +386,14 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
return -EINVAL;
}

- ret = rdma_rw_map_sg(dev, sg, sg_cnt, dir);
- if (!ret)
- return -ENOMEM;
- sg_cnt = ret;
+ ret = ib_dma_map_sgtable(dev, &sgt, dir, 0);
+ if (ret)
+ return ret;

if (prot_sg_cnt) {
- ret = rdma_rw_map_sg(dev, prot_sg, prot_sg_cnt, dir);
- if (!ret) {
- ret = -ENOMEM;
+ ret = ib_dma_map_sgtable(dev, &prot_sgt, dir, 0);
+ if (ret)
goto out_unmap_sg;
- }
- prot_sg_cnt = ret;
}

ctx->type = RDMA_RW_SIG_MR;
@@ -426,10 +414,11 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp,

memcpy(ctx->reg->mr->sig_attrs, sig_attrs, sizeof(struct ib_sig_attrs));

- ret = ib_map_mr_sg_pi(ctx->reg->mr, sg, sg_cnt, NULL, prot_sg,
- prot_sg_cnt, NULL, SZ_4K);
+ ret = ib_map_mr_sg_pi(ctx->reg->mr, sg, sgt.nents, NULL, prot_sg,
+ prot_sgt.nents, NULL, SZ_4K);
if (unlikely(ret)) {
- pr_err("failed to map PI sg (%d)\n", sg_cnt + prot_sg_cnt);
+ pr_err("failed to map PI sg (%d)\n",
+ sgt.nents + prot_sgt.nents);
goto out_destroy_sig_mr;
}

@@ -468,10 +457,10 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
out_free_ctx:
kfree(ctx->reg);
out_unmap_prot_sg:
- if (prot_sg_cnt)
- rdma_rw_unmap_sg(dev, prot_sg, prot_sg_cnt, dir);
+ if (prot_sgt.nents)
+ ib_dma_unmap_sgtable(dev, &prot_sgt, dir, 0);
out_unmap_sg:
- rdma_rw_unmap_sg(dev, sg, sg_cnt, dir);
+ ib_dma_unmap_sgtable(dev, &sgt, dir, 0);
return ret;
}
EXPORT_SYMBOL(rdma_rw_ctx_signature_init);
@@ -604,7 +593,7 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
break;
}

- rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+ ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
}
EXPORT_SYMBOL(rdma_rw_ctx_destroy);

@@ -632,8 +621,8 @@ void rdma_rw_ctx_destroy_signature(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
kfree(ctx->reg);

if (prot_sg_cnt)
- rdma_rw_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
- rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+ ib_dma_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
+ ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
}
EXPORT_SYMBOL(rdma_rw_ctx_destroy_signature);

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 724e80a656f7..3db6de41aa2d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4053,6 +4053,25 @@ static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
dma_attrs);
}

+static inline int ib_dma_map_sgtable(struct ib_device *dev,
+ struct sg_table *sgt,
+ enum dma_data_direction direction,
+ unsigned long dma_attrs)
+{
+ if (ib_uses_virt_dma(dev))
+ return ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents);
+ return dma_map_sgtable(dev->dma_device, sgt, direction, dma_attrs);
+}
+
+static inline void ib_dma_unmap_sgtable(struct ib_device *dev,
+ struct sg_table *sgt,
+ enum dma_data_direction direction,
+ unsigned long dma_attrs)
+{
+ if (!ib_uses_virt_dma(dev))
+ dma_unmap_sgtable(dev->dma_device, sgt, direction, dma_attrs);
+}
+
/**
* ib_dma_map_sg - Map a scatter/gather list to DMA addresses
* @dev: The device for which the DMA addresses are to be created
--
2.20.1


2021-05-14 08:08:40

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 22/22] PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()

This interface is superseded by support in dma_map_sg() which now supports
heterogeneous scatterlists. There are no longer any users, so remove it.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/pci/p2pdma.c | 65 --------------------------------------
include/linux/pci-p2pdma.h | 27 ----------------
2 files changed, 92 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index b0d779aeade9..767122e0a43f 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -873,71 +873,6 @@ enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
return ret;
}

-static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
- struct device *dev, struct scatterlist *sg, int nents)
-{
- struct scatterlist *s;
- int i;
-
- for_each_sg(sg, s, nents, i) {
- s->dma_address = sg_phys(s) - p2p_pgmap->bus_offset;
- sg_dma_len(s) = s->length;
- }
-
- return nents;
-}
-
-/**
- * pci_p2pdma_map_sg_attrs - map a PCI peer-to-peer scatterlist for DMA
- * @dev: device doing the DMA request
- * @sg: scatter list to map
- * @nents: elements in the scatterlist
- * @dir: DMA direction
- * @attrs: DMA attributes passed to dma_map_sg() (if called)
- *
- * Scatterlists mapped with this function should be unmapped using
- * pci_p2pdma_unmap_sg_attrs().
- *
- * Returns the number of SG entries mapped or 0 on error.
- */
-int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir, unsigned long attrs)
-{
- struct pci_p2pdma_pagemap *p2p_pgmap =
- to_p2p_pgmap(sg_page(sg)->pgmap);
-
- switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
- case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
- return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
- case PCI_P2PDMA_MAP_BUS_ADDR:
- return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
- default:
- return 0;
- }
-}
-EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
-
-/**
- * pci_p2pdma_unmap_sg_attrs - unmap a PCI peer-to-peer scatterlist that was
- * mapped with pci_p2pdma_map_sg()
- * @dev: device doing the DMA request
- * @sg: scatter list to map
- * @nents: number of elements returned by pci_p2pdma_map_sg()
- * @dir: DMA direction
- * @attrs: DMA attributes passed to dma_unmap_sg() (if called)
- */
-void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir, unsigned long attrs)
-{
- enum pci_p2pdma_map_type map_type;
-
- map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
-
- if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
- dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
-}
-EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
-
/**
* pci_p2pdma_map_segment - map an sg segment determining the mapping type
* @state: State structure that should be declared outside of the for_each_sg()
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index e5a8d5bc0f51..0c33a40a86e7 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -72,10 +72,6 @@ void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
struct device *dev);
-int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir, unsigned long attrs);
-void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir, unsigned long attrs);
enum pci_p2pdma_map_type
pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev,
struct scatterlist *sg);
@@ -135,17 +131,6 @@ pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct device *dev)
{
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
}
-static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
- struct scatterlist *sg, int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- return 0;
-}
-static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev,
- struct scatterlist *sg, int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
-}
static inline enum pci_p2pdma_map_type
pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev,
struct scatterlist *sg)
@@ -181,16 +166,4 @@ static inline struct pci_dev *pci_p2pmem_find(struct device *client)
return pci_p2pmem_find_many(&client, 1);
}

-static inline int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir)
-{
- return pci_p2pdma_map_sg_attrs(dev, sg, nents, dir, 0);
-}
-
-static inline void pci_p2pdma_unmap_sg(struct device *dev,
- struct scatterlist *sg, int nents, enum dma_data_direction dir)
-{
- pci_p2pdma_unmap_sg_attrs(dev, sg, nents, dir, 0);
-}
-
#endif /* _LINUX_PCI_P2P_H */
--
2.20.1


2021-05-14 08:11:35

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 11/22] dma-iommu: Return error code from iommu_dma_map_sg()

Pass through appropriate error codes from iommu_dma_map_sg() now that
the error code will be passed through dma_map_sgtable().

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/iommu/dma-iommu.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..0d69f509a95d 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -970,7 +970,7 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,

out_unmap:
iommu_dma_unmap_sg_swiotlb(dev, sg, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
- return 0;
+ return -EINVAL;
}

/*
@@ -991,11 +991,14 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
dma_addr_t iova;
size_t iova_len = 0;
unsigned long mask = dma_get_seg_boundary(dev);
+ ssize_t ret;
int i;

- if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
- iommu_deferred_attach(dev, domain))
- return 0;
+ if (static_branch_unlikely(&iommu_deferred_attach_enabled)) {
+ ret = iommu_deferred_attach(dev, domain);
+ if (ret)
+ return ret;
+ }

if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
@@ -1043,14 +1046,17 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
}

iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
- if (!iova)
+ if (!iova) {
+ ret = -ENOMEM;
goto out_restore_sg;
+ }

/*
* We'll leave any physical concatenation to the IOMMU driver's
* implementation - it knows better than we do.
*/
- if (iommu_map_sg_atomic(domain, iova, sg, nents, prot) < iova_len)
+ ret = iommu_map_sg_atomic(domain, iova, sg, nents, prot);
+ if (ret < iova_len)
goto out_free_iova;

return __finalise_sg(dev, sg, nents, iova);
@@ -1059,7 +1065,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
iommu_dma_free_iova(cookie, iova, iova_len, NULL);
out_restore_sg:
__invalidate_sg(sg, nents);
- return 0;
+ return ret;
}

static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
--
2.20.1


2021-05-14 08:12:00

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v2 09/22] dma-direct: Return appropriate error code from dma_direct_map_sg()

Now that the map_sg() op expects error codes instead of return zero on
error, convert dma_direct_map_sg() to return an error code. The
only error to return presently is EINVAL if a page could not
be mapped.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
kernel/dma/direct.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..803ee9321170 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -411,7 +411,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,

out_unmap:
dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
- return 0;
+ return -EINVAL;
}

dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
--
2.20.1


2021-05-14 18:49:00

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 15/22] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg

> + for_each_sg(sgl, sg, nents, i) {
> + if (sg_is_dma_pci_p2pdma(sg)) {
> + sg_dma_unmark_pci_p2pdma(sg);
> + } else {

Double space here. We also don't really need the curly braces to start
with.

> + struct pci_p2pdma_map_state p2pdma_state = {};
> + enum pci_p2pdma_map_type map;
> struct scatterlist *sg;
> + int i, ret;
>
> for_each_sg(sgl, sg, nents, i) {
> + if (is_pci_p2pdma_page(sg_page(sg))) {
> + map = pci_p2pdma_map_segment(&p2pdma_state, dev, sg);
> + switch (map) {

Why not just:

switch (pci_p2pdma_map_segment(&p2pdma_state, dev,
sg)) {

(even better with a shorter name for p2pdma_state so that it all fits on
a single line)?

> + case PCI_P2PDMA_MAP_BUS_ADDR:
> + continue;
> + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
> + /*
> + * Mapping through host bridge should be
> + * mapped normally, thus we do nothing
> + * and continue below.
> + */

I have a bit of a hard time parsing this comment.

> + if (sg->dma_address == DMA_MAPPING_ERROR) {
> + ret = -EINVAL;
> goto out_unmap;
> + }
> sg_dma_len(sg) = sg->length;
> }
>
> @@ -411,7 +443,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>
> out_unmap:
> dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
> - return -EINVAL;
> + return ret;

Maybe just initialize ret to -EINVAL at declaration time to simplify this
a bit?

2021-05-14 18:53:49

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 00/22] Add new DMA mapping operation for P2PDMA

On Thu, May 13, 2021 at 04:31:41PM -0600, Logan Gunthorpe wrote:
> 17 files changed, 570 insertions(+), 290 deletions(-)

I'm a little worried about all this extra code for no new functionality
at all.

2021-05-14 19:41:08

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 14/22] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations



On 2021-05-14 7:53 a.m., Christoph Hellwig wrote:
> I think helpers for the dma mapping implementation should probably
> go into dma-map-ops.h so that the header for the public API doesn't get
> polluted with them.

Ok. Will do for v3.

Logan

2021-05-14 19:41:08

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 08/22] dma-mapping: Allow map_sg() ops to return negative error codes



On 2021-05-14 7:51 a.m., Christoph Hellwig wrote:
>> +int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>> enum dma_data_direction dir, unsigned long attrs);
>
> I don't think it makes sense to expose this __dma_map_sg_attrs helper.
> Just keep it static and move the sgtable helper to kernel/dma/mapping.c
> as well.

Makes sense. I'll fix this for v3.

Logan

2021-05-14 19:44:12

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 15/22] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg



On 2021-05-14 7:57 a.m., Christoph Hellwig wrote:
>> + for_each_sg(sgl, sg, nents, i) {
>> + if (sg_is_dma_pci_p2pdma(sg)) {
>> + sg_dma_unmark_pci_p2pdma(sg);
>> + } else {
>
> Double space here. We also don't really need the curly braces to start
> with.
>
>> + struct pci_p2pdma_map_state p2pdma_state = {};
>> + enum pci_p2pdma_map_type map;
>> struct scatterlist *sg;
>> + int i, ret;
>>
>> for_each_sg(sgl, sg, nents, i) {
>> + if (is_pci_p2pdma_page(sg_page(sg))) {
>> + map = pci_p2pdma_map_segment(&p2pdma_state, dev, sg);
>> + switch (map) {
>
> Why not just:
>
> switch (pci_p2pdma_map_segment(&p2pdma_state, dev,
> sg)) {
>
> (even better with a shorter name for p2pdma_state so that it all fits on
> a single line)?
>
>> + case PCI_P2PDMA_MAP_BUS_ADDR:
>> + continue;
>> + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
>> + /*
>> + * Mapping through host bridge should be
>> + * mapped normally, thus we do nothing
>> + * and continue below.
>> + */
>
> I have a bit of a hard time parsing this comment.
>
>> + if (sg->dma_address == DMA_MAPPING_ERROR) {
>> + ret = -EINVAL;
>> goto out_unmap;
>> + }
>> sg_dma_len(sg) = sg->length;
>> }
>>
>> @@ -411,7 +443,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>>
>> out_unmap:
>> dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
>> - return -EINVAL;
>> + return ret;
>
> Maybe just initialize ret to -EINVAL at declaration time to simplify this
> a bit?
>

All fair points, will fix in v3.

Logan

2021-05-14 22:12:06

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 06/22] PCI/P2PDMA: Attempt to set map_type if it has not been set

On Thu, May 13, 2021 at 04:31:47PM -0600, Logan Gunthorpe wrote:
> Attempt to find the mapping type for P2PDMA pages on the first
> DMA map attempt if it has not been done ahead of time.
>
> Previously, the mapping type was expected to be calculated ahead of
> time, but if pages are to come from userspace then there's no
> way to ensure the path was checked ahead of time.
>
> With this change it's no longer invalid to call pci_p2pdma_map_sg()
> before the mapping type is calculated so drop the WARN_ON when that
> is the case.

Why?

2021-05-14 22:12:15

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 08/22] dma-mapping: Allow map_sg() ops to return negative error codes

> +int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
> enum dma_data_direction dir, unsigned long attrs);

I don't think it makes sense to expose this __dma_map_sg_attrs helper.
Just keep it static and move the sgtable helper to kernel/dma/mapping.c
as well.

2021-05-14 22:12:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 14/22] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations

I think helpers for the dma mapping implementation should probably
go into dma-map-ops.h so that the header for the public API doesn't get
polluted with them.

2021-05-15 01:18:19

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 06/22] PCI/P2PDMA: Attempt to set map_type if it has not been set



On 2021-05-14 7:49 a.m., Christoph Hellwig wrote:
> On Thu, May 13, 2021 at 04:31:47PM -0600, Logan Gunthorpe wrote:
>> Attempt to find the mapping type for P2PDMA pages on the first
>> DMA map attempt if it has not been done ahead of time.
>>
>> Previously, the mapping type was expected to be calculated ahead of
>> time, but if pages are to come from userspace then there's no
>> way to ensure the path was checked ahead of time.
>>
>> With this change it's no longer invalid to call pci_p2pdma_map_sg()
>> before the mapping type is calculated so drop the WARN_ON when that
>> is the case.
>
> Why?

Before this change, the if the mapping type wasn't already calculated
pci_p2pdma_map_sg() would just fail. This was fine for NVMe-of as it
always called pci_p2pdma_distance() ahead of time which calculated the
mapping type, stored it in the xarray and did not proceed if the two
devices could not talk to each other.

This patch makes it so if the mapping type is not already calculated at
dma map time, it does the calculation. This means the dma map operation
can fail if the two devices aren't able to talk to each other.

Logan

2021-05-15 01:48:22

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 00/22] Add new DMA mapping operation for P2PDMA



On 2021-05-14 8:00 a.m., Christoph Hellwig wrote:
> On Thu, May 13, 2021 at 04:31:41PM -0600, Logan Gunthorpe wrote:
>> 17 files changed, 570 insertions(+), 290 deletions(-)
>
> I'm a little worried about all this extra code for no new functionality
> at all.

Yes, this series is really just prep work for allowing new
functionality. And a bunch of cleanup has been tacked onto it. It should
make adding P2PDMA in userspace a lot easier and I expect other P2PDMA
use cases will be better enabled by it.

A lot of people don't like the pci_p2pdma_map_sg() special case and
can't use it in their use case because it only accepts homogenous SGLs.
This series gets rid of the special case and allows it to be used more
generally.

Logan