2015-10-08 07:51:37

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 00/26] New fast registration API

Hi all,

As discussed on the linux-rdma list, there is plenty of room for
improvement in our memory registration APIs. We keep finding
ULPs that are duplicating code, sometimes use wrong strategies
and mis-use our current API.

As a first step, this patch set replaces the fast registration API
to accept a kernel common struct scatterlist and takes care of
the page vector construction in the core layer with hooks for the
drivers HW specific assignments. This allows to remove a common
code duplication as it was done in each and every ULP driver.

Changes from v2:
- Fixed alignment for page lists allocations in mlx4, mlx5 (Bart)

- Rebased against Doug's for-4.4 tree (4.3.0-rc1) + 4.3-rc fixes

- Added Acked/Tested tags

Changes from v1:
- Add ib_map_mr_sg_zbva() for RDS which uses it (preferred it over
polluting the API).

- Replaced coherent allocations in mlx4, mlx5 with DMA streaming
APIs (Bart)

- Changed ib_map_mr_sg description (Bart)

- Split SRP driver patches (Bart)

- Added missing wr->next = NULL from various ULPs (Steve, Santosh)

- Fixed 0-day testing errors in nes driver, xprtrdma and svcrdma

- Fixed checkpatch issues

Changes from v0:
- Rebased on top of 4.3-rc1 + Christoph's ib_send_wr conversion patches

- Allow the ULP to pass page_size argument to ib_map_mr_sg in order
to have it work better in some specific workloads. This suggestion
came from Bart Van Assche which pointed out that some applications
might use page sizes significantly smaller than the system PAGE_SIZE
of specific architectures

- Fixed some logical bugs in ib_sg_to_pages

- Added a set_page function pointer for drivers to pass to ib_sg_to_pages
so some drivers (e.g mlx4, mlx5, nes) can avoid keeping a second page
vector and/or re-iterate on the page vector in order to perform HW specific
assignments (big/little endian conversion, extra flags)

- Converted SRP initiator and RDS iwarp ULPs to the new API

- Removed fast registration code from hfi1 driver (as it isn't supported
anyway). I assume that the correct place to get the support back would
be in a shared SW library (hfi1, qib, rxe).

- Updated the change logs

The code is available at: https://github.com/sagigrimberg/linux/tree/reg_api.6

Sagi Grimberg (26):
IB/core: Introduce new fast registration API
IB/mlx5: Remove dead fmr code
IB/mlx5: Support the new memory registration API
IB/mlx4: Support the new memory registration API
RDMA/ocrdma: Support the new memory registration API
RDMA/cxgb3: Support the new memory registration API
iw_cxgb4: Support the new memory registration API
IB/qib: Support the new memory registration API
RDMA/nes: Support the new memory registration API
IB/iser: Port to new fast registration API
iser-target: Port to new memory registration API
xprtrdma: Port to new memory registration API
svcrdma: Port to new memory registration API
RDS/IW: Convert to new memory registration API
IB/srp: Split srp_map_sg
IB/srp: Convert to new registration API
IB/srp: Remove srp_finish_mapping
IB/srp: Dont allocate a page vector when using fast_reg
IB/mlx5: Remove old FRWR API support
IB/mlx4: Remove old FRWR API support
RDMA/ocrdma: Remove old FRWR API
RDMA/cxgb3: Remove old FRWR API
iw_cxgb4: Remove old FRWR API
IB/qib: Remove old FRWR API
RDMA/nes: Remove old FRWR API
IB/core: Remove old fast registration API

drivers/infiniband/core/verbs.c | 132 +++++++++++---
drivers/infiniband/hw/cxgb3/iwch_cq.c | 2 +-
drivers/infiniband/hw/cxgb3/iwch_provider.c | 39 +++--
drivers/infiniband/hw/cxgb3/iwch_provider.h | 2 +
drivers/infiniband/hw/cxgb3/iwch_qp.c | 37 ++--
drivers/infiniband/hw/cxgb4/cq.c | 2 +-
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 25 +--
drivers/infiniband/hw/cxgb4/mem.c | 61 +++----
drivers/infiniband/hw/cxgb4/provider.c | 3 +-
drivers/infiniband/hw/cxgb4/qp.c | 47 +++--
drivers/infiniband/hw/mlx4/cq.c | 2 +-
drivers/infiniband/hw/mlx4/main.c | 3 +-
drivers/infiniband/hw/mlx4/mlx4_ib.h | 25 ++-
drivers/infiniband/hw/mlx4/mr.c | 149 ++++++++++------
drivers/infiniband/hw/mlx4/qp.c | 34 ++--
drivers/infiniband/hw/mlx5/cq.c | 4 +-
drivers/infiniband/hw/mlx5/main.c | 3 +-
drivers/infiniband/hw/mlx5/mlx5_ib.h | 48 +-----
drivers/infiniband/hw/mlx5/mr.c | 135 ++++++++++-----
drivers/infiniband/hw/mlx5/qp.c | 140 +++++++--------
drivers/infiniband/hw/nes/nes_hw.h | 6 -
drivers/infiniband/hw/nes/nes_verbs.c | 163 +++++++-----------
drivers/infiniband/hw/nes/nes_verbs.h | 4 +
drivers/infiniband/hw/ocrdma/ocrdma.h | 2 +
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 3 +-
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 154 ++++++++---------
drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 7 +-
drivers/infiniband/hw/qib/qib_keys.c | 42 ++---
drivers/infiniband/hw/qib/qib_mr.c | 46 ++---
drivers/infiniband/hw/qib/qib_verbs.c | 13 +-
drivers/infiniband/hw/qib/qib_verbs.h | 13 +-
drivers/infiniband/ulp/iser/iscsi_iser.h | 8 +-
drivers/infiniband/ulp/iser/iser_memory.c | 54 +++---
drivers/infiniband/ulp/iser/iser_verbs.c | 16 +-
drivers/infiniband/ulp/isert/ib_isert.c | 130 +++-----------
drivers/infiniband/ulp/isert/ib_isert.h | 2 -
drivers/infiniband/ulp/srp/ib_srp.c | 256 +++++++++++++++++-----------
drivers/infiniband/ulp/srp/ib_srp.h | 11 +-
include/linux/sunrpc/svc_rdma.h | 6 +-
include/rdma/ib_verbs.h | 86 +++++-----
net/rds/iw.h | 5 +-
net/rds/iw_rdma.c | 128 +++++---------
net/rds/iw_send.c | 57 +++----
net/sunrpc/xprtrdma/frwr_ops.c | 112 +++++++-----
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 76 +++++----
net/sunrpc/xprtrdma/svc_rdma_transport.c | 34 ++--
net/sunrpc/xprtrdma/xprt_rdma.h | 3 +-
47 files changed, 1157 insertions(+), 1173 deletions(-)

--
1.8.4.3



2015-10-08 07:51:37

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 06/26] RDMA/cxgb3: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in iwch_mr and populate it when
iwch_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating build_fastreg just take the needed information
from different places:
- page_size, iova, length (ib_mr)
- page array (iwch_mr)
- key, access flags (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/cxgb3/iwch_provider.c | 33 ++++++++++++++++++++
drivers/infiniband/hw/cxgb3/iwch_provider.h | 2 ++
drivers/infiniband/hw/cxgb3/iwch_qp.c | 48 +++++++++++++++++++++++++++++
3 files changed, 83 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 93308c45f298..ee3d5ca7de6c 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -463,6 +463,7 @@ static int iwch_dereg_mr(struct ib_mr *ib_mr)
return -EINVAL;

mhp = to_iwch_mr(ib_mr);
+ kfree(mhp->pages);
rhp = mhp->rhp;
mmid = mhp->attr.stag >> 8;
cxio_dereg_mem(&rhp->rdev, mhp->attr.stag, mhp->attr.pbl_size,
@@ -821,6 +822,12 @@ static struct ib_mr *iwch_alloc_mr(struct ib_pd *pd,
if (!mhp)
goto err;

+ mhp->pages = kcalloc(max_num_sg, sizeof(u64), GFP_KERNEL);
+ if (!mhp->pages) {
+ ret = -ENOMEM;
+ goto pl_err;
+ }
+
mhp->rhp = rhp;
ret = iwch_alloc_pbl(mhp, max_num_sg);
if (ret)
@@ -847,11 +854,36 @@ err3:
err2:
iwch_free_pbl(mhp);
err1:
+ kfree(mhp->pages);
+pl_err:
kfree(mhp);
err:
return ERR_PTR(ret);
}

+static int iwch_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct iwch_mr *mhp = to_iwch_mr(ibmr);
+
+ if (unlikely(mhp->npages == mhp->attr.pbl_size))
+ return -ENOMEM;
+
+ mhp->pages[mhp->npages++] = addr;
+
+ return 0;
+}
+
+static int iwch_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct iwch_mr *mhp = to_iwch_mr(ibmr);
+
+ mhp->npages = 0;
+
+ return ib_sg_to_pages(ibmr, sg, sg_nents, iwch_set_page);
+}
+
static struct ib_fast_reg_page_list *iwch_alloc_fastreg_pbl(
struct ib_device *device,
int page_list_len)
@@ -1450,6 +1482,7 @@ int iwch_register_device(struct iwch_dev *dev)
dev->ibdev.bind_mw = iwch_bind_mw;
dev->ibdev.dealloc_mw = iwch_dealloc_mw;
dev->ibdev.alloc_mr = iwch_alloc_mr;
+ dev->ibdev.map_mr_sg = iwch_map_mr_sg;
dev->ibdev.alloc_fast_reg_page_list = iwch_alloc_fastreg_pbl;
dev->ibdev.free_fast_reg_page_list = iwch_free_fastreg_pbl;
dev->ibdev.attach_mcast = iwch_multicast_attach;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
index 87c14b0c5ac0..2ac85b86a680 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
@@ -77,6 +77,8 @@ struct iwch_mr {
struct iwch_dev *rhp;
u64 kva;
struct tpt_attributes attr;
+ u64 *pages;
+ u32 npages;
};

typedef struct iwch_mw iwch_mw_handle;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index bac0508fedd9..a09ea538e990 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -146,6 +146,49 @@ static int build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
return 0;
}

+static int build_memreg(union t3_wr *wqe, struct ib_reg_wr *wr,
+ u8 *flit_cnt, int *wr_cnt, struct t3_wq *wq)
+{
+ struct iwch_mr *mhp = to_iwch_mr(wr->mr);
+ int i;
+ __be64 *p;
+
+ if (mhp->npages > T3_MAX_FASTREG_DEPTH)
+ return -EINVAL;
+ *wr_cnt = 1;
+ wqe->fastreg.stag = cpu_to_be32(wr->key);
+ wqe->fastreg.len = cpu_to_be32(mhp->ibmr.length);
+ wqe->fastreg.va_base_hi = cpu_to_be32(mhp->ibmr.iova >> 32);
+ wqe->fastreg.va_base_lo_fbo =
+ cpu_to_be32(mhp->ibmr.iova & 0xffffffff);
+ wqe->fastreg.page_type_perms = cpu_to_be32(
+ V_FR_PAGE_COUNT(mhp->npages) |
+ V_FR_PAGE_SIZE(ilog2(wr->mr->page_size) - 12) |
+ V_FR_TYPE(TPT_VATO) |
+ V_FR_PERMS(iwch_ib_to_tpt_access(wr->access)));
+ p = &wqe->fastreg.pbl_addrs[0];
+ for (i = 0; i < mhp->npages; i++, p++) {
+
+ /* If we need a 2nd WR, then set it up */
+ if (i == T3_MAX_FASTREG_FRAG) {
+ *wr_cnt = 2;
+ wqe = (union t3_wr *)(wq->queue +
+ Q_PTR2IDX((wq->wptr+1), wq->size_log2));
+ build_fw_riwrh((void *)wqe, T3_WR_FASTREG, 0,
+ Q_GENBIT(wq->wptr + 1, wq->size_log2),
+ 0, 1 + mhp->npages - T3_MAX_FASTREG_FRAG,
+ T3_EOP);
+
+ p = &wqe->pbl_frag.pbl_addrs[0];
+ }
+ *p = cpu_to_be64((u64)mhp->pages[i]);
+ }
+ *flit_cnt = 5 + mhp->npages;
+ if (*flit_cnt > 15)
+ *flit_cnt = 15;
+ return 0;
+}
+
static int build_fastreg(union t3_wr *wqe, struct ib_send_wr *send_wr,
u8 *flit_cnt, int *wr_cnt, struct t3_wq *wq)
{
@@ -419,6 +462,11 @@ int iwch_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
err = build_fastreg(wqe, wr, &t3_wr_flit_cnt,
&wr_cnt, &qhp->wq);
break;
+ case IB_WR_REG_MR:
+ t3_wr_opcode = T3_WR_FASTREG;
+ err = build_memreg(wqe, reg_wr(wr), &t3_wr_flit_cnt,
+ &wr_cnt, &qhp->wq);
+ break;
case IB_WR_LOCAL_INV:
if (wr->send_flags & IB_SEND_FENCE)
t3_wr_flags |= T3_LOCAL_FENCE_FLAG;
--
1.8.4.3


2015-10-08 07:51:37

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 10/26] IB/iser: Port to new fast registration API

Remove fastreg page list allocation as the page vector
is now private to the provider. Instead of constructing
the page list and fast_req work request, call ib_map_mr_sg
and construct ib_reg_wr.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/ulp/iser/iscsi_iser.h | 8 ++---
drivers/infiniband/ulp/iser/iser_memory.c | 54 ++++++++++++++-----------------
drivers/infiniband/ulp/iser/iser_verbs.c | 16 +--------
3 files changed, 27 insertions(+), 51 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 2484bee993ec..271aa71e827c 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -297,7 +297,7 @@ struct iser_tx_desc {
u8 wr_idx;
union iser_wr {
struct ib_send_wr send;
- struct ib_fast_reg_wr fast_reg;
+ struct ib_reg_wr fast_reg;
struct ib_sig_handover_wr sig;
} wrs[ISER_MAX_WRS];
struct iser_mem_reg data_reg;
@@ -412,7 +412,6 @@ struct iser_device {
*
* @mr: memory region
* @fmr_pool: pool of fmrs
- * @frpl: fast reg page list used by frwrs
* @page_vec: fast reg page list used by fmr pool
* @mr_valid: is mr valid indicator
*/
@@ -421,10 +420,7 @@ struct iser_reg_resources {
struct ib_mr *mr;
struct ib_fmr_pool *fmr_pool;
};
- union {
- struct ib_fast_reg_page_list *frpl;
- struct iser_page_vec *page_vec;
- };
+ struct iser_page_vec *page_vec;
u8 mr_valid:1;
};

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index b29fda3e8e74..ea765fb9664d 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -472,7 +472,7 @@ iser_reg_sig_mr(struct iscsi_iser_task *iser_task,
sig_reg->sge.addr = 0;
sig_reg->sge.length = scsi_transfer_length(iser_task->sc);

- iser_dbg("sig reg: lkey: 0x%x, rkey: 0x%x, addr: 0x%llx, length: %u\n",
+ iser_dbg("lkey=0x%x rkey=0x%x addr=0x%llx length=%u\n",
sig_reg->sge.lkey, sig_reg->rkey, sig_reg->sge.addr,
sig_reg->sge.length);
err:
@@ -484,47 +484,41 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task,
struct iser_reg_resources *rsc,
struct iser_mem_reg *reg)
{
- struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn;
- struct iser_device *device = ib_conn->device;
- struct ib_mr *mr = rsc->mr;
- struct ib_fast_reg_page_list *frpl = rsc->frpl;
struct iser_tx_desc *tx_desc = &iser_task->desc;
- struct ib_fast_reg_wr *wr;
- int offset, size, plen;
-
- plen = iser_sg_to_page_vec(mem, device->ib_device, frpl->page_list,
- &offset, &size);
- if (plen * SIZE_4K < size) {
- iser_err("fast reg page_list too short to hold this SG\n");
- return -EINVAL;
- }
+ struct ib_mr *mr = rsc->mr;
+ struct ib_reg_wr *wr;
+ int n;

if (!rsc->mr_valid)
iser_inv_rkey(iser_tx_next_wr(tx_desc), mr);

- wr = fast_reg_wr(iser_tx_next_wr(tx_desc));
- wr->wr.opcode = IB_WR_FAST_REG_MR;
+ n = ib_map_mr_sg(mr, mem->sg, mem->size, SIZE_4K);
+ if (unlikely(n != mem->size)) {
+ iser_err("failed to map sg (%d/%d)\n",
+ n, mem->size);
+ return n < 0 ? n : -EINVAL;
+ }
+
+ wr = reg_wr(iser_tx_next_wr(tx_desc));
+ wr->wr.opcode = IB_WR_REG_MR;
wr->wr.wr_id = ISER_FASTREG_LI_WRID;
wr->wr.send_flags = 0;
- wr->iova_start = frpl->page_list[0] + offset;
- wr->page_list = frpl;
- wr->page_list_len = plen;
- wr->page_shift = SHIFT_4K;
- wr->length = size;
- wr->rkey = mr->rkey;
- wr->access_flags = (IB_ACCESS_LOCAL_WRITE |
- IB_ACCESS_REMOTE_WRITE |
- IB_ACCESS_REMOTE_READ);
+ wr->wr.num_sge = 0;
+ wr->mr = mr;
+ wr->key = mr->rkey;
+ wr->access = IB_ACCESS_LOCAL_WRITE |
+ IB_ACCESS_REMOTE_WRITE |
+ IB_ACCESS_REMOTE_READ;
+
rsc->mr_valid = 0;

reg->sge.lkey = mr->lkey;
reg->rkey = mr->rkey;
- reg->sge.addr = frpl->page_list[0] + offset;
- reg->sge.length = size;
+ reg->sge.addr = mr->iova;
+ reg->sge.length = mr->length;

- iser_dbg("fast reg: lkey=0x%x, rkey=0x%x, addr=0x%llx,"
- " length=0x%x\n", reg->sge.lkey, reg->rkey,
- reg->sge.addr, reg->sge.length);
+ iser_dbg("lkey=0x%x rkey=0x%x addr=0x%llx length=0x%x\n",
+ reg->sge.lkey, reg->rkey, reg->sge.addr, reg->sge.length);

return 0;
}
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index b26022e30af1..a66b9dea96d8 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -293,35 +293,21 @@ iser_alloc_reg_res(struct ib_device *ib_device,
{
int ret;

- res->frpl = ib_alloc_fast_reg_page_list(ib_device, size);
- if (IS_ERR(res->frpl)) {
- ret = PTR_ERR(res->frpl);
- iser_err("Failed to allocate ib_fast_reg_page_list err=%d\n",
- ret);
- return PTR_ERR(res->frpl);
- }
-
res->mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, size);
if (IS_ERR(res->mr)) {
ret = PTR_ERR(res->mr);
iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", ret);
- goto fast_reg_mr_failure;
+ return ret;
}
res->mr_valid = 1;

return 0;
-
-fast_reg_mr_failure:
- ib_free_fast_reg_page_list(res->frpl);
-
- return ret;
}

static void
iser_free_reg_res(struct iser_reg_resources *rsc)
{
ib_dereg_mr(rsc->mr);
- ib_free_fast_reg_page_list(rsc->frpl);
}

static int
--
1.8.4.3


2015-10-08 07:51:37

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 03/26] IB/mlx5: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in mlx5_ib_mr and populate it when
mlx5_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by setting the exact WQE as IB_WR_FAST_REG_MR, just take the
needed information from different places:
- page_size, iova, length, access flags (ib_mr)
- page array (mlx5_ib_mr)
- key (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/mlx5/cq.c | 3 ++
drivers/infiniband/hw/mlx5/main.c | 1 +
drivers/infiniband/hw/mlx5/mlx5_ib.h | 9 ++++
drivers/infiniband/hw/mlx5/mr.c | 93 ++++++++++++++++++++++++++++++++++++
drivers/infiniband/hw/mlx5/qp.c | 83 ++++++++++++++++++++++++++++++++
5 files changed, 189 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 2d0dbbf38ceb..206930096d56 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -109,6 +109,9 @@ static enum ib_wc_opcode get_umr_comp(struct mlx5_ib_wq *wq, int idx)
case IB_WR_LOCAL_INV:
return IB_WC_LOCAL_INV;

+ case IB_WR_REG_MR:
+ return IB_WC_REG_MR;
+
case IB_WR_FAST_REG_MR:
return IB_WC_FAST_REG_MR;

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index f1ccd40beae9..7e93044ea6ce 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1425,6 +1425,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
dev->ib_dev.detach_mcast = mlx5_ib_mcg_detach;
dev->ib_dev.process_mad = mlx5_ib_process_mad;
dev->ib_dev.alloc_mr = mlx5_ib_alloc_mr;
+ dev->ib_dev.map_mr_sg = mlx5_ib_map_mr_sg;
dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
dev->ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list;
dev->ib_dev.check_mr_status = mlx5_ib_check_mr_status;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index f789a3e6c215..72672ae48296 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -319,6 +319,11 @@ enum mlx5_ib_mtt_access_flags {

struct mlx5_ib_mr {
struct ib_mr ibmr;
+ void *descs;
+ dma_addr_t desc_map;
+ int ndescs;
+ int max_descs;
+ int desc_size;
struct mlx5_core_mr mmr;
struct ib_umem *umem;
struct mlx5_shared_mr_info *smr_info;
@@ -330,6 +335,7 @@ struct mlx5_ib_mr {
struct mlx5_create_mkey_mbox_out out;
struct mlx5_core_sig_ctx *sig;
int live;
+ void *descs_alloc;
};

struct mlx5_ib_fast_reg_page_list {
@@ -560,6 +566,9 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr);
struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
+int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents);
struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
int page_list_len);
void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 6a2fed3015ab..cabfe4190657 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1153,6 +1153,52 @@ error:
return err;
}

+static int
+mlx5_alloc_priv_descs(struct ib_device *device,
+ struct mlx5_ib_mr *mr,
+ int ndescs,
+ int desc_size)
+{
+ int size = ndescs * desc_size;
+ int add_size;
+ int ret;
+
+ add_size = max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0);
+
+ mr->descs_alloc = kzalloc(size + add_size, GFP_KERNEL);
+ if (!mr->descs_alloc)
+ return -ENOMEM;
+
+ mr->descs = PTR_ALIGN(mr->descs_alloc, MLX5_UMR_ALIGN);
+
+ mr->desc_map = dma_map_single(device->dma_device, mr->descs,
+ size, DMA_TO_DEVICE);
+ if (dma_mapping_error(device->dma_device, mr->desc_map)) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ return 0;
+err:
+ kfree(mr->descs_alloc);
+
+ return ret;
+}
+
+static void
+mlx5_free_priv_descs(struct mlx5_ib_mr *mr)
+{
+ if (mr->descs) {
+ struct ib_device *device = mr->ibmr.device;
+ int size = mr->max_descs * mr->desc_size;
+
+ dma_unmap_single(device->dma_device, mr->desc_map,
+ size, DMA_TO_DEVICE);
+ kfree(mr->descs_alloc);
+ mr->descs = NULL;
+ }
+}
+
static int clean_mr(struct mlx5_ib_mr *mr)
{
struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device);
@@ -1172,6 +1218,8 @@ static int clean_mr(struct mlx5_ib_mr *mr)
mr->sig = NULL;
}

+ mlx5_free_priv_descs(mr);
+
if (!umred) {
err = destroy_mkey(dev, mr);
if (err) {
@@ -1261,6 +1309,14 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
if (mr_type == IB_MR_TYPE_MEM_REG) {
access_mode = MLX5_ACCESS_MODE_MTT;
in->seg.log2_page_size = PAGE_SHIFT;
+
+ err = mlx5_alloc_priv_descs(pd->device, mr,
+ ndescs, sizeof(u64));
+ if (err)
+ goto err_free_in;
+
+ mr->desc_size = sizeof(u64);
+ mr->max_descs = ndescs;
} else if (mr_type == IB_MR_TYPE_SIGNATURE) {
u32 psv_index[2];

@@ -1317,6 +1373,7 @@ err_destroy_psv:
mlx5_ib_warn(dev, "failed to destroy wire psv %d\n",
mr->sig->psv_wire.psv_idx);
}
+ mlx5_free_priv_descs(mr);
err_free_sig:
kfree(mr->sig);
err_free_in:
@@ -1408,3 +1465,39 @@ int mlx5_ib_check_mr_status(struct ib_mr *ibmr, u32 check_mask,
done:
return ret;
}
+
+static int mlx5_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct mlx5_ib_mr *mr = to_mmr(ibmr);
+ __be64 *descs;
+
+ if (unlikely(mr->ndescs == mr->max_descs))
+ return -ENOMEM;
+
+ descs = mr->descs;
+ descs[mr->ndescs++] = cpu_to_be64(addr | MLX5_EN_RD | MLX5_EN_WR);
+
+ return 0;
+}
+
+int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct mlx5_ib_mr *mr = to_mmr(ibmr);
+ int n;
+
+ mr->ndescs = 0;
+
+ ib_dma_sync_single_for_cpu(ibmr->device, mr->desc_map,
+ mr->desc_size * mr->max_descs,
+ DMA_TO_DEVICE);
+
+ n = ib_sg_to_pages(ibmr, sg, sg_nents, mlx5_set_page);
+
+ ib_dma_sync_single_for_device(ibmr->device, mr->desc_map,
+ mr->desc_size * mr->max_descs,
+ DMA_TO_DEVICE);
+
+ return n;
+}
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index d4c36af4270f..a71e43a4f028 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -65,6 +65,7 @@ static const u32 mlx5_ib_opcode[] = {
[IB_WR_SEND_WITH_INV] = MLX5_OPCODE_SEND_INVAL,
[IB_WR_LOCAL_INV] = MLX5_OPCODE_UMR,
[IB_WR_FAST_REG_MR] = MLX5_OPCODE_UMR,
+ [IB_WR_REG_MR] = MLX5_OPCODE_UMR,
[IB_WR_MASKED_ATOMIC_CMP_AND_SWP] = MLX5_OPCODE_ATOMIC_MASKED_CS,
[IB_WR_MASKED_ATOMIC_FETCH_AND_ADD] = MLX5_OPCODE_ATOMIC_MASKED_FA,
[MLX5_IB_WR_UMR] = MLX5_OPCODE_UMR,
@@ -1896,6 +1897,17 @@ static __be64 sig_mkey_mask(void)
return cpu_to_be64(result);
}

+static void set_reg_umr_seg(struct mlx5_wqe_umr_ctrl_seg *umr,
+ struct mlx5_ib_mr *mr)
+{
+ int ndescs = mr->ndescs;
+
+ memset(umr, 0, sizeof(*umr));
+ umr->flags = MLX5_UMR_CHECK_NOT_FREE;
+ umr->klm_octowords = get_klm_octo(ndescs);
+ umr->mkey_mask = frwr_mkey_mask();
+}
+
static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr,
struct ib_send_wr *wr, int li)
{
@@ -1987,6 +1999,22 @@ static u8 get_umr_flags(int acc)
MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN;
}

+static void set_reg_mkey_seg(struct mlx5_mkey_seg *seg,
+ struct mlx5_ib_mr *mr,
+ u32 key, int access)
+{
+ int ndescs = ALIGN(mr->ndescs, 8) >> 1;
+
+ memset(seg, 0, sizeof(*seg));
+ seg->flags = get_umr_flags(access) | MLX5_ACCESS_MODE_MTT;
+ seg->qpn_mkey7_0 = cpu_to_be32((key & 0xff) | 0xffffff00);
+ seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
+ seg->start_addr = cpu_to_be64(mr->ibmr.iova);
+ seg->len = cpu_to_be64(mr->ibmr.length);
+ seg->xlt_oct_size = cpu_to_be32(ndescs);
+ seg->log2_page_size = ilog2(mr->ibmr.page_size);
+}
+
static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr,
int li, int *writ)
{
@@ -2028,6 +2056,17 @@ static void set_reg_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *w
mlx5_mkey_variant(umrwr->mkey));
}

+static void set_reg_data_seg(struct mlx5_wqe_data_seg *dseg,
+ struct mlx5_ib_mr *mr,
+ struct mlx5_ib_pd *pd)
+{
+ int bcount = mr->desc_size * mr->ndescs;
+
+ dseg->addr = cpu_to_be64(mr->desc_map);
+ dseg->byte_count = cpu_to_be32(ALIGN(bcount, 64));
+ dseg->lkey = cpu_to_be32(pd->ibpd.local_dma_lkey);
+}
+
static void set_frwr_pages(struct mlx5_wqe_data_seg *dseg,
struct ib_send_wr *wr,
struct mlx5_core_dev *mdev,
@@ -2433,6 +2472,38 @@ static int set_psv_wr(struct ib_sig_domain *domain,
return 0;
}

+static int set_reg_wr(struct mlx5_ib_qp *qp,
+ struct ib_reg_wr *wr,
+ void **seg, int *size)
+{
+ struct mlx5_ib_mr *mr = to_mmr(wr->mr);
+ struct mlx5_ib_pd *pd = to_mpd(qp->ibqp.pd);
+
+ if (unlikely(wr->wr.send_flags & IB_SEND_INLINE)) {
+ mlx5_ib_warn(to_mdev(qp->ibqp.device),
+ "Invalid IB_SEND_INLINE send flag\n");
+ return -EINVAL;
+ }
+
+ set_reg_umr_seg(*seg, mr);
+ *seg += sizeof(struct mlx5_wqe_umr_ctrl_seg);
+ *size += sizeof(struct mlx5_wqe_umr_ctrl_seg) / 16;
+ if (unlikely((*seg == qp->sq.qend)))
+ *seg = mlx5_get_send_wqe(qp, 0);
+
+ set_reg_mkey_seg(*seg, mr, wr->key, wr->access);
+ *seg += sizeof(struct mlx5_mkey_seg);
+ *size += sizeof(struct mlx5_mkey_seg) / 16;
+ if (unlikely((*seg == qp->sq.qend)))
+ *seg = mlx5_get_send_wqe(qp, 0);
+
+ set_reg_data_seg(*seg, mr, pd);
+ *seg += sizeof(struct mlx5_wqe_data_seg);
+ *size += (sizeof(struct mlx5_wqe_data_seg) / 16);
+
+ return 0;
+}
+
static int set_frwr_li_wr(void **seg, struct ib_send_wr *wr, int *size,
struct mlx5_core_dev *mdev, struct mlx5_ib_pd *pd, struct mlx5_ib_qp *qp)
{
@@ -2676,6 +2747,18 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
num_sge = 0;
break;

+ case IB_WR_REG_MR:
+ next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL;
+ qp->sq.wr_data[idx] = IB_WR_REG_MR;
+ ctrl->imm = cpu_to_be32(reg_wr(wr)->key);
+ err = set_reg_wr(qp, reg_wr(wr), &seg, &size);
+ if (err) {
+ *bad_wr = wr;
+ goto out;
+ }
+ num_sge = 0;
+ break;
+
case IB_WR_REG_SIG_MR:
qp->sq.wr_data[idx] = IB_WR_REG_SIG_MR;
mr = to_mmr(sig_handover_wr(wr)->sig_mr);
--
1.8.4.3


2015-10-08 07:51:37

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 09/26] RDMA/nes: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in nes_mr and populate it when
nes_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating IB_WR_FAST_REG_MR handling and take the
needed information from different places:
- page_size, iova, length (ib_mr)
- page array (nes_mr)
- key, access flags (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/nes/nes_verbs.c | 117 +++++++++++++++++++++++++++++++++-
drivers/infiniband/hw/nes/nes_verbs.h | 4 ++
2 files changed, 120 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index f71b37b75f82..f5b0cc972403 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -51,6 +51,7 @@ atomic_t qps_created;
atomic_t sw_qps_destroyed;

static void nes_unregister_ofa_device(struct nes_ib_device *nesibdev);
+static int nes_dereg_mr(struct ib_mr *ib_mr);

/**
* nes_alloc_mw
@@ -443,9 +444,46 @@ static struct ib_mr *nes_alloc_mr(struct ib_pd *ibpd,
} else {
kfree(nesmr);
nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index);
- ibmr = ERR_PTR(-ENOMEM);
+ return ERR_PTR(-ENOMEM);
}
+
+ nesmr->pages = pci_alloc_consistent(nesdev->pcidev,
+ max_num_sg * sizeof(u64),
+ &nesmr->paddr);
+ if (!nesmr->paddr)
+ goto err;
+
+ nesmr->max_pages = max_num_sg;
+
return ibmr;
+
+err:
+ nes_dereg_mr(ibmr);
+
+ return ERR_PTR(-ENOMEM);
+}
+
+static int nes_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct nes_mr *nesmr = to_nesmr(ibmr);
+
+ if (unlikely(nesmr->npages == nesmr->max_pages))
+ return -ENOMEM;
+
+ nesmr->pages[nesmr->npages++] = cpu_to_le64(addr);
+
+ return 0;
+}
+
+static int nes_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct nes_mr *nesmr = to_nesmr(ibmr);
+
+ nesmr->npages = 0;
+
+ return ib_sg_to_pages(ibmr, sg, sg_nents, nes_set_page);
}

/*
@@ -2683,6 +2721,13 @@ static int nes_dereg_mr(struct ib_mr *ib_mr)
u16 major_code;
u16 minor_code;

+
+ if (nesmr->pages)
+ pci_free_consistent(nesdev->pcidev,
+ nesmr->max_pages * sizeof(u64),
+ nesmr->pages,
+ nesmr->paddr);
+
if (nesmr->region) {
ib_umem_release(nesmr->region);
}
@@ -3513,6 +3558,75 @@ static int nes_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr,
wqe_misc);
break;
}
+ case IB_WR_REG_MR:
+ {
+ struct nes_mr *mr = to_nesmr(reg_wr(ib_wr)->mr);
+ int page_shift = ilog2(reg_wr(ib_wr)->mr->page_size);
+ int flags = reg_wr(ib_wr)->access;
+
+ if (mr->npages > (NES_4K_PBL_CHUNK_SIZE / sizeof(u64))) {
+ nes_debug(NES_DBG_IW_TX, "SQ_FMR: bad page_list_len\n");
+ err = -EINVAL;
+ break;
+ }
+ wqe_misc = NES_IWARP_SQ_OP_FAST_REG;
+ set_wqe_64bit_value(wqe->wqe_words,
+ NES_IWARP_SQ_FMR_WQE_VA_FBO_LOW_IDX,
+ mr->ibmr.iova);
+ set_wqe_32bit_value(wqe->wqe_words,
+ NES_IWARP_SQ_FMR_WQE_LENGTH_LOW_IDX,
+ mr->ibmr.length);
+ set_wqe_32bit_value(wqe->wqe_words,
+ NES_IWARP_SQ_FMR_WQE_LENGTH_HIGH_IDX, 0);
+ set_wqe_32bit_value(wqe->wqe_words,
+ NES_IWARP_SQ_FMR_WQE_MR_STAG_IDX,
+ reg_wr(ib_wr)->key);
+
+ if (page_shift == 12) {
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_PAGE_SIZE_4K;
+ } else if (page_shift == 21) {
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_PAGE_SIZE_2M;
+ } else {
+ nes_debug(NES_DBG_IW_TX, "Invalid page shift,"
+ " ib_wr=%u, max=1\n", ib_wr->num_sge);
+ err = -EINVAL;
+ break;
+ }
+
+ /* Set access_flags */
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_LOCAL_READ;
+ if (flags & IB_ACCESS_LOCAL_WRITE)
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_LOCAL_WRITE;
+
+ if (flags & IB_ACCESS_REMOTE_WRITE)
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_REMOTE_WRITE;
+
+ if (flags & IB_ACCESS_REMOTE_READ)
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_REMOTE_READ;
+
+ if (flags & IB_ACCESS_MW_BIND)
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_WINDOW_BIND;
+
+ /* Fill in PBL info: */
+ set_wqe_64bit_value(wqe->wqe_words,
+ NES_IWARP_SQ_FMR_WQE_PBL_ADDR_LOW_IDX,
+ mr->paddr);
+
+ set_wqe_32bit_value(wqe->wqe_words,
+ NES_IWARP_SQ_FMR_WQE_PBL_LENGTH_IDX,
+ mr->npages * 8);
+
+ nes_debug(NES_DBG_IW_TX, "SQ_REG_MR: iova_start: %llx, "
+ "length: %d, rkey: %0x, pgl_paddr: %llx, "
+ "page_list_len: %u, wqe_misc: %x\n",
+ (unsigned long long) mr->ibmr.iova,
+ mr->ibmr.length,
+ reg_wr(ib_wr)->key,
+ (unsigned long long) mr->paddr,
+ mr->npages,
+ wqe_misc);
+ break;
+ }
default:
/* error */
err = -EINVAL;
@@ -3940,6 +4054,7 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev)
nesibdev->ibdev.bind_mw = nes_bind_mw;

nesibdev->ibdev.alloc_mr = nes_alloc_mr;
+ nesibdev->ibdev.map_mr_sg = nes_map_mr_sg;
nesibdev->ibdev.alloc_fast_reg_page_list = nes_alloc_fast_reg_page_list;
nesibdev->ibdev.free_fast_reg_page_list = nes_free_fast_reg_page_list;

diff --git a/drivers/infiniband/hw/nes/nes_verbs.h b/drivers/infiniband/hw/nes/nes_verbs.h
index 309b31c31ae1..a204b677af22 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.h
+++ b/drivers/infiniband/hw/nes/nes_verbs.h
@@ -79,6 +79,10 @@ struct nes_mr {
u16 pbls_used;
u8 mode;
u8 pbl_4k;
+ __le64 *pages;
+ dma_addr_t paddr;
+ u32 max_pages;
+ u32 npages;
};

struct nes_hw_pb {
--
1.8.4.3


2015-10-08 07:51:37

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 04/26] IB/mlx4: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in mlx4_ib_mr and populate it when
mlx4_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by setting the exact WQE as IB_WR_FAST_REG_MR, just take the
needed information from different places:
- page_size, iova, length, access flags (ib_mr)
- page array (mlx4_ib_mr)
- key (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
Tested-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/mlx4/cq.c | 1 +
drivers/infiniband/hw/mlx4/main.c | 1 +
drivers/infiniband/hw/mlx4/mlx4_ib.h | 10 ++++
drivers/infiniband/hw/mlx4/mr.c | 101 ++++++++++++++++++++++++++++++++---
drivers/infiniband/hw/mlx4/qp.c | 25 +++++++++
5 files changed, 132 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 5fd49f9435f9..2ea4125b7903 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -819,6 +819,7 @@ repoll:
break;
case MLX4_OPCODE_FMR:
wc->opcode = IB_WC_FAST_REG_MR;
+ /* TODO: wc->opcode = IB_WC_REG_MR; */
break;
case MLX4_OPCODE_LOCAL_INVAL:
wc->opcode = IB_WC_LOCAL_INV;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 38be8dc2932e..19191ac0783c 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2249,6 +2249,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
ibdev->ib_dev.rereg_user_mr = mlx4_ib_rereg_user_mr;
ibdev->ib_dev.dereg_mr = mlx4_ib_dereg_mr;
ibdev->ib_dev.alloc_mr = mlx4_ib_alloc_mr;
+ ibdev->ib_dev.map_mr_sg = mlx4_ib_map_mr_sg;
ibdev->ib_dev.alloc_fast_reg_page_list = mlx4_ib_alloc_fast_reg_page_list;
ibdev->ib_dev.free_fast_reg_page_list = mlx4_ib_free_fast_reg_page_list;
ibdev->ib_dev.attach_mcast = mlx4_ib_mcg_attach;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 1e7b23bb2eb0..d6214577ecf8 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -129,10 +129,17 @@ struct mlx4_ib_cq {
struct list_head recv_qp_list;
};

+#define MLX4_MR_PAGES_ALIGN 0x40
+
struct mlx4_ib_mr {
struct ib_mr ibmr;
+ __be64 *pages;
+ dma_addr_t page_map;
+ u32 npages;
+ u32 max_pages;
struct mlx4_mr mmr;
struct ib_umem *umem;
+ void *pages_alloc;
};

struct mlx4_ib_mw {
@@ -706,6 +713,9 @@ int mlx4_ib_dealloc_mw(struct ib_mw *mw);
struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
+int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents);
struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
int page_list_len);
void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 5bba176e9dfa..96fc7ed99fb8 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -59,7 +59,7 @@ struct ib_mr *mlx4_ib_get_dma_mr(struct ib_pd *pd, int acc)
struct mlx4_ib_mr *mr;
int err;

- mr = kmalloc(sizeof *mr, GFP_KERNEL);
+ mr = kzalloc(sizeof(*mr), GFP_KERNEL);
if (!mr)
return ERR_PTR(-ENOMEM);

@@ -140,7 +140,7 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
int err;
int n;

- mr = kmalloc(sizeof *mr, GFP_KERNEL);
+ mr = kzalloc(sizeof(*mr), GFP_KERNEL);
if (!mr)
return ERR_PTR(-ENOMEM);

@@ -271,11 +271,59 @@ release_mpt_entry:
return err;
}

+static int
+mlx4_alloc_priv_pages(struct ib_device *device,
+ struct mlx4_ib_mr *mr,
+ int max_pages)
+{
+ int size = max_pages * sizeof(u64);
+ int add_size;
+ int ret;
+
+ add_size = max_t(int, MLX4_MR_PAGES_ALIGN - ARCH_KMALLOC_MINALIGN, 0);
+
+ mr->pages_alloc = kzalloc(size + add_size, GFP_KERNEL);
+ if (!mr->pages_alloc)
+ return -ENOMEM;
+
+ mr->pages = PTR_ALIGN(mr->pages_alloc, MLX4_MR_PAGES_ALIGN);
+
+ mr->page_map = dma_map_single(device->dma_device, mr->pages,
+ size, DMA_TO_DEVICE);
+
+ if (dma_mapping_error(device->dma_device, mr->page_map)) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ return 0;
+err:
+ kfree(mr->pages_alloc);
+
+ return ret;
+}
+
+static void
+mlx4_free_priv_pages(struct mlx4_ib_mr *mr)
+{
+ if (mr->pages) {
+ struct ib_device *device = mr->ibmr.device;
+ int size = mr->max_pages * sizeof(u64);
+
+ dma_unmap_single(device->dma_device, mr->page_map,
+ size, DMA_TO_DEVICE);
+ kfree(mr->pages_alloc);
+ mr->pages = NULL;
+ }
+}
+
int mlx4_ib_dereg_mr(struct ib_mr *ibmr)
{
struct mlx4_ib_mr *mr = to_mmr(ibmr);
int ret;

+ mlx4_free_priv_pages(mr);
+
ret = mlx4_mr_free(to_mdev(ibmr->device)->dev, &mr->mmr);
if (ret)
return ret;
@@ -362,7 +410,7 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
max_num_sg > MLX4_MAX_FAST_REG_PAGES)
return ERR_PTR(-EINVAL);

- mr = kmalloc(sizeof *mr, GFP_KERNEL);
+ mr = kzalloc(sizeof(*mr), GFP_KERNEL);
if (!mr)
return ERR_PTR(-ENOMEM);

@@ -371,18 +419,25 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
if (err)
goto err_free;

+ err = mlx4_alloc_priv_pages(pd->device, mr, max_num_sg);
+ if (err)
+ goto err_free_mr;
+
+ mr->max_pages = max_num_sg;
+
err = mlx4_mr_enable(dev->dev, &mr->mmr);
if (err)
- goto err_mr;
+ goto err_free_pl;

mr->ibmr.rkey = mr->ibmr.lkey = mr->mmr.key;
mr->umem = NULL;

return &mr->ibmr;

-err_mr:
+err_free_pl:
+ mlx4_free_priv_pages(mr);
+err_free_mr:
(void) mlx4_mr_free(dev->dev, &mr->mmr);
-
err_free:
kfree(mr);
return ERR_PTR(err);
@@ -528,3 +583,37 @@ int mlx4_ib_fmr_dealloc(struct ib_fmr *ibfmr)

return err;
}
+
+static int mlx4_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct mlx4_ib_mr *mr = to_mmr(ibmr);
+
+ if (unlikely(mr->npages == mr->max_pages))
+ return -ENOMEM;
+
+ mr->pages[mr->npages++] = cpu_to_be64(addr | MLX4_MTT_FLAG_PRESENT);
+
+ return 0;
+}
+
+int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct mlx4_ib_mr *mr = to_mmr(ibmr);
+ int rc;
+
+ mr->npages = 0;
+
+ ib_dma_sync_single_for_cpu(ibmr->device, mr->page_map,
+ sizeof(u64) * mr->max_pages,
+ DMA_TO_DEVICE);
+
+ rc = ib_sg_to_pages(ibmr, sg, sg_nents, mlx4_set_page);
+
+ ib_dma_sync_single_for_device(ibmr->device, mr->page_map,
+ sizeof(u64) * mr->max_pages,
+ DMA_TO_DEVICE);
+
+ return rc;
+}
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 3831cddb551f..850d4e9c13c5 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -112,6 +112,7 @@ static const __be32 mlx4_ib_opcode[] = {
[IB_WR_SEND_WITH_INV] = cpu_to_be32(MLX4_OPCODE_SEND_INVAL),
[IB_WR_LOCAL_INV] = cpu_to_be32(MLX4_OPCODE_LOCAL_INVAL),
[IB_WR_FAST_REG_MR] = cpu_to_be32(MLX4_OPCODE_FMR),
+ [IB_WR_REG_MR] = cpu_to_be32(MLX4_OPCODE_FMR),
[IB_WR_MASKED_ATOMIC_CMP_AND_SWP] = cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_CS),
[IB_WR_MASKED_ATOMIC_FETCH_AND_ADD] = cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_FA),
[IB_WR_BIND_MW] = cpu_to_be32(MLX4_OPCODE_BIND_MW),
@@ -2405,6 +2406,22 @@ static __be32 convert_access(int acc)
cpu_to_be32(MLX4_WQE_FMR_PERM_LOCAL_READ);
}

+static void set_reg_seg(struct mlx4_wqe_fmr_seg *fseg,
+ struct ib_reg_wr *wr)
+{
+ struct mlx4_ib_mr *mr = to_mmr(wr->mr);
+
+ fseg->flags = convert_access(wr->access);
+ fseg->mem_key = cpu_to_be32(wr->key);
+ fseg->buf_list = cpu_to_be64(mr->page_map);
+ fseg->start_addr = cpu_to_be64(mr->ibmr.iova);
+ fseg->reg_len = cpu_to_be64(mr->ibmr.length);
+ fseg->offset = 0; /* XXX -- is this just for ZBVA? */
+ fseg->page_size = cpu_to_be32(ilog2(mr->ibmr.page_size));
+ fseg->reserved[0] = 0;
+ fseg->reserved[1] = 0;
+}
+
static void set_fmr_seg(struct mlx4_wqe_fmr_seg *fseg,
struct ib_fast_reg_wr *wr)
{
@@ -2766,6 +2783,14 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
size += sizeof (struct mlx4_wqe_fmr_seg) / 16;
break;

+ case IB_WR_REG_MR:
+ ctrl->srcrb_flags |=
+ cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER);
+ set_reg_seg(wqe, reg_wr(wr));
+ wqe += sizeof(struct mlx4_wqe_fmr_seg);
+ size += sizeof(struct mlx4_wqe_fmr_seg) / 16;
+ break;
+
case IB_WR_BIND_MW:
ctrl->srcrb_flags |=
cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER);
--
1.8.4.3


2015-10-08 07:51:37

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 01/26] IB/core: Introduce new fast registration API

The new fast registration verb ib_map_mr_sg receives a scatterlist
and converts it to a page list under the verbs API thus hiding
the specific HW mapping details away from the consumer.

The provider drivers are provided with a generic helper ib_sg_to_pages
that converts a scatterlist into a vector of page addresses. The
drivers can still perform any HW specific page address setting
by passing a set_page function pointer which will be invoked for
each page address. This allows drivers to avoid keeping a shadow
page vectors and convert them to HW specific translations by doing
extra copies.

This API will allow ULPs to remove the duplicated code of constructing
a page vector from a given sg list.

The send work request ib_reg_wr also shrinks as it will contain only
mr, key and access flags in addition.

Signed-off-by: Sagi Grimberg <[email protected]>
Tested-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/core/verbs.c | 107 ++++++++++++++++++++++++++++++++++++++++
include/rdma/ib_verbs.h | 44 +++++++++++++++++
2 files changed, 151 insertions(+)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index e1f2c9887f3f..e18a8bce8130 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1469,3 +1469,110 @@ int ib_check_mr_status(struct ib_mr *mr, u32 check_mask,
mr->device->check_mr_status(mr, check_mask, mr_status) : -ENOSYS;
}
EXPORT_SYMBOL(ib_check_mr_status);
+
+/**
+ * ib_map_mr_sg() - Map the largest prefix of a dma mapped SG list
+ * and set it the memory region.
+ * @mr: memory region
+ * @sg: dma mapped scatterlist
+ * @sg_nents: number of entries in sg
+ * @page_size: page vector desired page size
+ *
+ * Constraints:
+ * - The first sg element is allowed to have an offset.
+ * - Each sg element must be aligned to page_size (or physically
+ * contiguous to the previous element). In case an sg element has a
+ * non contiguous offset, the mapping prefix will not include it.
+ * - The last sg element is allowed to have length less than page_size.
+ * - If sg_nents total byte length exceeds the mr max_num_sge * page_size
+ * then only max_num_sg entries will be mapped.
+ *
+ * Returns the number of sg elements that were mapped to the memory region.
+ *
+ * After this completes successfully, the memory region
+ * is ready for registration.
+ */
+int ib_map_mr_sg(struct ib_mr *mr,
+ struct scatterlist *sg,
+ unsigned int sg_nents,
+ unsigned int page_size)
+{
+ if (unlikely(!mr->device->map_mr_sg))
+ return -ENOSYS;
+
+ mr->page_size = page_size;
+
+ return mr->device->map_mr_sg(mr, sg, sg_nents);
+}
+EXPORT_SYMBOL(ib_map_mr_sg);
+
+/**
+ * ib_sg_to_pages() - Convert the largest prefix of a sg list
+ * to a page vector
+ * @mr: memory region
+ * @sgl: dma mapped scatterlist
+ * @sg_nents: number of entries in sg
+ * @set_page: driver page assignment function pointer
+ *
+ * Core service helper for drivers to covert the largest
+ * prefix of given sg list to a page vector. The sg list
+ * prefix converted is the prefix that meet the requirements
+ * of ib_map_mr_sg.
+ *
+ * Returns the number of sg elements that were assigned to
+ * a page vector.
+ */
+int ib_sg_to_pages(struct ib_mr *mr,
+ struct scatterlist *sgl,
+ unsigned int sg_nents,
+ int (*set_page)(struct ib_mr *, u64))
+{
+ struct scatterlist *sg;
+ u64 last_end_dma_addr = 0, last_page_addr = 0;
+ unsigned int last_page_off = 0;
+ u64 page_mask = ~((u64)mr->page_size - 1);
+ int i;
+
+ mr->iova = sg_dma_address(&sgl[0]);
+ mr->length = 0;
+
+ for_each_sg(sgl, sg, sg_nents, i) {
+ u64 dma_addr = sg_dma_address(sg);
+ unsigned int dma_len = sg_dma_len(sg);
+ u64 end_dma_addr = dma_addr + dma_len;
+ u64 page_addr = dma_addr & page_mask;
+
+ if (i && page_addr != dma_addr) {
+ if (last_end_dma_addr != dma_addr) {
+ /* gap */
+ goto done;
+
+ } else if (last_page_off + dma_len <= mr->page_size) {
+ /* chunk this fragment with the last */
+ mr->length += dma_len;
+ last_end_dma_addr += dma_len;
+ last_page_off += dma_len;
+ continue;
+ } else {
+ /* map starting from the next page */
+ page_addr = last_page_addr + mr->page_size;
+ dma_len -= mr->page_size - last_page_off;
+ }
+ }
+
+ do {
+ if (unlikely(set_page(mr, page_addr)))
+ goto done;
+ page_addr += mr->page_size;
+ } while (page_addr < end_dma_addr);
+
+ mr->length += dma_len;
+ last_end_dma_addr = end_dma_addr;
+ last_page_addr = end_dma_addr & page_mask;
+ last_page_off = end_dma_addr & ~page_mask;
+ }
+
+done:
+ return i;
+}
+EXPORT_SYMBOL(ib_sg_to_pages);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index c9abc8fec620..24bb87f16afb 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -739,6 +739,7 @@ enum ib_wc_opcode {
IB_WC_LSO,
IB_WC_LOCAL_INV,
IB_WC_FAST_REG_MR,
+ IB_WC_REG_MR,
IB_WC_MASKED_COMP_SWAP,
IB_WC_MASKED_FETCH_ADD,
/*
@@ -1030,6 +1031,7 @@ enum ib_wr_opcode {
IB_WR_RDMA_READ_WITH_INV,
IB_WR_LOCAL_INV,
IB_WR_FAST_REG_MR,
+ IB_WR_REG_MR,
IB_WR_MASKED_ATOMIC_CMP_AND_SWP,
IB_WR_MASKED_ATOMIC_FETCH_AND_ADD,
IB_WR_BIND_MW,
@@ -1163,6 +1165,18 @@ static inline struct ib_fast_reg_wr *fast_reg_wr(struct ib_send_wr *wr)
return container_of(wr, struct ib_fast_reg_wr, wr);
}

+struct ib_reg_wr {
+ struct ib_send_wr wr;
+ struct ib_mr *mr;
+ u32 key;
+ int access;
+};
+
+static inline struct ib_reg_wr *reg_wr(struct ib_send_wr *wr)
+{
+ return container_of(wr, struct ib_reg_wr, wr);
+}
+
struct ib_bind_mw_wr {
struct ib_send_wr wr;
struct ib_mw *mw;
@@ -1375,6 +1389,9 @@ struct ib_mr {
struct ib_uobject *uobject;
u32 lkey;
u32 rkey;
+ u64 iova;
+ u32 length;
+ unsigned int page_size;
atomic_t usecnt; /* count number of MWs */
};

@@ -1759,6 +1776,9 @@ struct ib_device {
struct ib_mr * (*alloc_mr)(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
+ int (*map_mr_sg)(struct ib_mr *mr,
+ struct scatterlist *sg,
+ unsigned int sg_nents);
struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device,
int page_list_len);
void (*free_fast_reg_page_list)(struct ib_fast_reg_page_list *page_list);
@@ -3064,4 +3084,28 @@ struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port,
u16 pkey, const union ib_gid *gid,
const struct sockaddr *addr);

+int ib_map_mr_sg(struct ib_mr *mr,
+ struct scatterlist *sg,
+ unsigned int sg_nents,
+ unsigned int page_size);
+
+static inline int
+ib_map_mr_sg_zbva(struct ib_mr *mr,
+ struct scatterlist *sg,
+ unsigned int sg_nents,
+ unsigned int page_size)
+{
+ int n;
+
+ n = ib_map_mr_sg(mr, sg, sg_nents, page_size);
+ mr->iova = 0;
+
+ return n;
+}
+
+int ib_sg_to_pages(struct ib_mr *mr,
+ struct scatterlist *sgl,
+ unsigned int sg_nents,
+ int (*set_page)(struct ib_mr *, u64));
+
#endif /* IB_VERBS_H */
--
1.8.4.3


2015-10-08 07:51:37

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 02/26] IB/mlx5: Remove dead fmr code

Just function declarations - no need for those
laying arround. If for some reason someone will want
FMR support in mlx5, it should be easy enough to restore
a few structs.

Signed-off-by: Sagi Grimberg <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/mlx5/mlx5_ib.h | 25 -------------------------
1 file changed, 25 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 29f3ecdbe790..f789a3e6c215 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -364,20 +364,6 @@ enum {
MLX5_FMR_BUSY,
};

-struct mlx5_ib_fmr {
- struct ib_fmr ibfmr;
- struct mlx5_core_mr mr;
- int access_flags;
- int state;
- /* protect fmr state
- */
- spinlock_t lock;
- u64 wrid;
- struct ib_send_wr wr[2];
- u8 page_shift;
- struct ib_fast_reg_page_list page_list;
-};
-
struct mlx5_cache_ent {
struct list_head head;
/* sync access to the cahce entry
@@ -462,11 +448,6 @@ static inline struct mlx5_ib_dev *to_mdev(struct ib_device *ibdev)
return container_of(ibdev, struct mlx5_ib_dev, ib_dev);
}

-static inline struct mlx5_ib_fmr *to_mfmr(struct ib_fmr *ibfmr)
-{
- return container_of(ibfmr, struct mlx5_ib_fmr, ibfmr);
-}
-
static inline struct mlx5_ib_cq *to_mcq(struct ib_cq *ibcq)
{
return container_of(ibcq, struct mlx5_ib_cq, ibcq);
@@ -582,12 +563,6 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
int page_list_len);
void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
-struct ib_fmr *mlx5_ib_fmr_alloc(struct ib_pd *pd, int acc,
- struct ib_fmr_attr *fmr_attr);
-int mlx5_ib_map_phys_fmr(struct ib_fmr *ibfmr, u64 *page_list,
- int npages, u64 iova);
-int mlx5_ib_unmap_fmr(struct list_head *fmr_list);
-int mlx5_ib_fmr_dealloc(struct ib_fmr *ibfmr);
int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
const struct ib_wc *in_wc, const struct ib_grh *in_grh,
const struct ib_mad_hdr *in, size_t in_mad_size,
--
1.8.4.3


2015-10-08 07:51:37

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 07/26] iw_cxgb4: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in c4iw_mr and populate it when
c4iw_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating build_fastreg just take the needed information
from different places:
- page_size, iova, length (ib_mr)
- page array (c4iw_mr)
- key, access flags (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Tested-by: Steve Wise <[email protected]>
---
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 7 ++++
drivers/infiniband/hw/cxgb4/mem.c | 38 +++++++++++++++++
drivers/infiniband/hw/cxgb4/provider.c | 1 +
drivers/infiniband/hw/cxgb4/qp.c | 74 ++++++++++++++++++++++++++++++++++
4 files changed, 120 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index c7bb38c931a5..032f90aa8ac9 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -386,6 +386,10 @@ struct c4iw_mr {
struct c4iw_dev *rhp;
u64 kva;
struct tpt_attributes attr;
+ u64 *mpl;
+ dma_addr_t mpl_addr;
+ u32 max_mpl_len;
+ u32 mpl_len;
};

static inline struct c4iw_mr *to_c4iw_mr(struct ib_mr *ibmr)
@@ -973,6 +977,9 @@ struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(
struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
+int c4iw_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents);
int c4iw_dealloc_mw(struct ib_mw *mw);
struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start,
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 026b91ebd5e2..86ec65721797 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -863,6 +863,7 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
u32 mmid;
u32 stag = 0;
int ret = 0;
+ int length = roundup(max_num_sg * sizeof(u64), 32);

if (mr_type != IB_MR_TYPE_MEM_REG ||
max_num_sg > t4_max_fr_depth(use_dsgl))
@@ -876,6 +877,14 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
goto err;
}

+ mhp->mpl = dma_alloc_coherent(&rhp->rdev.lldi.pdev->dev,
+ length, &mhp->mpl_addr, GFP_KERNEL);
+ if (!mhp->mpl) {
+ ret = -ENOMEM;
+ goto err_mpl;
+ }
+ mhp->max_mpl_len = length;
+
mhp->rhp = rhp;
ret = alloc_pbl(mhp, max_num_sg);
if (ret)
@@ -905,11 +914,37 @@ err2:
c4iw_pblpool_free(&mhp->rhp->rdev, mhp->attr.pbl_addr,
mhp->attr.pbl_size << 3);
err1:
+ dma_free_coherent(&mhp->rhp->rdev.lldi.pdev->dev,
+ mhp->max_mpl_len, mhp->mpl, mhp->mpl_addr);
+err_mpl:
kfree(mhp);
err:
return ERR_PTR(ret);
}

+static int c4iw_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct c4iw_mr *mhp = to_c4iw_mr(ibmr);
+
+ if (unlikely(mhp->mpl_len == mhp->max_mpl_len))
+ return -ENOMEM;
+
+ mhp->mpl[mhp->mpl_len++] = addr;
+
+ return 0;
+}
+
+int c4iw_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct c4iw_mr *mhp = to_c4iw_mr(ibmr);
+
+ mhp->mpl_len = 0;
+
+ return ib_sg_to_pages(ibmr, sg, sg_nents, c4iw_set_page);
+}
+
struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(struct ib_device *device,
int page_list_len)
{
@@ -970,6 +1005,9 @@ int c4iw_dereg_mr(struct ib_mr *ib_mr)
rhp = mhp->rhp;
mmid = mhp->attr.stag >> 8;
remove_handle(rhp, &rhp->mmidr, mmid);
+ if (mhp->mpl)
+ dma_free_coherent(&mhp->rhp->rdev.lldi.pdev->dev,
+ mhp->max_mpl_len, mhp->mpl, mhp->mpl_addr);
dereg_mem(&rhp->rdev, mhp->attr.stag, mhp->attr.pbl_size,
mhp->attr.pbl_addr);
if (mhp->attr.pbl_size)
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 7746113552e7..55dedadcffaa 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -557,6 +557,7 @@ int c4iw_register_device(struct c4iw_dev *dev)
dev->ibdev.bind_mw = c4iw_bind_mw;
dev->ibdev.dealloc_mw = c4iw_dealloc_mw;
dev->ibdev.alloc_mr = c4iw_alloc_mr;
+ dev->ibdev.map_mr_sg = c4iw_map_mr_sg;
dev->ibdev.alloc_fast_reg_page_list = c4iw_alloc_fastreg_pbl;
dev->ibdev.free_fast_reg_page_list = c4iw_free_fastreg_pbl;
dev->ibdev.attach_mcast = c4iw_multicast_attach;
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index b60498fff99a..12a23bc935f3 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -605,10 +605,76 @@ static int build_rdma_recv(struct c4iw_qp *qhp, union t4_recv_wr *wqe,
return 0;
}

+static int build_memreg(struct t4_sq *sq, union t4_wr *wqe,
+ struct ib_reg_wr *wr, u8 *len16, u8 t5dev)
+{
+ struct c4iw_mr *mhp = to_c4iw_mr(wr->mr);
+ struct fw_ri_immd *imdp;
+ __be64 *p;
+ int i;
+ int pbllen = roundup(mhp->mpl_len * sizeof(u64), 32);
+ int rem;
+
+ if (mhp->mpl_len > t4_max_fr_depth(use_dsgl))
+ return -EINVAL;
+
+ wqe->fr.qpbinde_to_dcacpu = 0;
+ wqe->fr.pgsz_shift = ilog2(wr->mr->page_size) - 12;
+ wqe->fr.addr_type = FW_RI_VA_BASED_TO;
+ wqe->fr.mem_perms = c4iw_ib_to_tpt_access(wr->access);
+ wqe->fr.len_hi = 0;
+ wqe->fr.len_lo = cpu_to_be32(mhp->ibmr.length);
+ wqe->fr.stag = cpu_to_be32(wr->key);
+ wqe->fr.va_hi = cpu_to_be32(mhp->ibmr.iova >> 32);
+ wqe->fr.va_lo_fbo = cpu_to_be32(mhp->ibmr.iova &
+ 0xffffffff);
+
+ if (t5dev && use_dsgl && (pbllen > max_fr_immd)) {
+ struct fw_ri_dsgl *sglp;
+
+ for (i = 0; i < mhp->mpl_len; i++)
+ mhp->mpl[i] = (__force u64)cpu_to_be64((u64)mhp->mpl[i]);
+
+ sglp = (struct fw_ri_dsgl *)(&wqe->fr + 1);
+ sglp->op = FW_RI_DATA_DSGL;
+ sglp->r1 = 0;
+ sglp->nsge = cpu_to_be16(1);
+ sglp->addr0 = cpu_to_be64(mhp->mpl_addr);
+ sglp->len0 = cpu_to_be32(pbllen);
+
+ *len16 = DIV_ROUND_UP(sizeof(wqe->fr) + sizeof(*sglp), 16);
+ } else {
+ imdp = (struct fw_ri_immd *)(&wqe->fr + 1);
+ imdp->op = FW_RI_DATA_IMMD;
+ imdp->r1 = 0;
+ imdp->r2 = 0;
+ imdp->immdlen = cpu_to_be32(pbllen);
+ p = (__be64 *)(imdp + 1);
+ rem = pbllen;
+ for (i = 0; i < mhp->mpl_len; i++) {
+ *p = cpu_to_be64((u64)mhp->mpl[i]);
+ rem -= sizeof(*p);
+ if (++p == (__be64 *)&sq->queue[sq->size])
+ p = (__be64 *)sq->queue;
+ }
+ BUG_ON(rem < 0);
+ while (rem) {
+ *p = 0;
+ rem -= sizeof(*p);
+ if (++p == (__be64 *)&sq->queue[sq->size])
+ p = (__be64 *)sq->queue;
+ }
+ *len16 = DIV_ROUND_UP(sizeof(wqe->fr) + sizeof(*imdp)
+ + pbllen, 16);
+ }
+ return 0;
+}
+
static int build_fastreg(struct t4_sq *sq, union t4_wr *wqe,
struct ib_send_wr *send_wr, u8 *len16, u8 t5dev)
{
struct ib_fast_reg_wr *wr = fast_reg_wr(send_wr);
+
struct fw_ri_immd *imdp;
__be64 *p;
int i;
@@ -817,6 +883,14 @@ int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
qhp->rhp->rdev.lldi.adapter_type) ?
1 : 0);
break;
+ case IB_WR_REG_MR:
+ fw_opcode = FW_RI_FR_NSMR_WR;
+ swsqe->opcode = FW_RI_FAST_REGISTER;
+ err = build_memreg(&qhp->wq.sq, wqe, reg_wr(wr), &len16,
+ is_t5(
+ qhp->rhp->rdev.lldi.adapter_type) ?
+ 1 : 0);
+ break;
case IB_WR_LOCAL_INV:
if (wr->send_flags & IB_SEND_FENCE)
fw_flags |= FW_RI_LOCAL_FENCE_FLAG;
--
1.8.4.3


2015-10-08 07:51:37

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 05/26] RDMA/ocrdma: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in ocrdma_mr and populate it when
ocrdma_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating IB_WR_FAST_REG_MR, but take the needed
information from different places:
- page_size, iova, length, access flags (ib_mr)
- page array (ocrdma_mr)
- key (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/ocrdma/ocrdma.h | 2 +
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 +
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 90 +++++++++++++++++++++++++++++
drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 +
4 files changed, 96 insertions(+)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h
index b4091ab48db0..c2f3af5d5194 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -193,6 +193,8 @@ struct ocrdma_mr {
struct ib_mr ibmr;
struct ib_umem *umem;
struct ocrdma_hw_mr hwmr;
+ u64 *pages;
+ u32 npages;
};

struct ocrdma_stats {
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 87aa55df7c82..874beb4b07a1 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -182,6 +182,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
dev->ibdev.reg_user_mr = ocrdma_reg_user_mr;

dev->ibdev.alloc_mr = ocrdma_alloc_mr;
+ dev->ibdev.map_mr_sg = ocrdma_map_mr_sg;
dev->ibdev.alloc_fast_reg_page_list = ocrdma_alloc_frmr_page_list;
dev->ibdev.free_fast_reg_page_list = ocrdma_free_frmr_page_list;

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index eb09e224acb9..66cc15a78f63 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -1013,6 +1013,7 @@ int ocrdma_dereg_mr(struct ib_mr *ib_mr)

(void) ocrdma_mbx_dealloc_lkey(dev, mr->hwmr.fr_mr, mr->hwmr.lkey);

+ kfree(mr->pages);
ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr);

/* it could be user registered memory. */
@@ -2177,6 +2178,61 @@ static int get_encoded_page_size(int pg_sz)
return i;
}

+static int ocrdma_build_reg(struct ocrdma_qp *qp,
+ struct ocrdma_hdr_wqe *hdr,
+ struct ib_reg_wr *wr)
+{
+ u64 fbo;
+ struct ocrdma_ewqe_fr *fast_reg = (struct ocrdma_ewqe_fr *)(hdr + 1);
+ struct ocrdma_mr *mr = get_ocrdma_mr(wr->mr);
+ struct ocrdma_pbl *pbl_tbl = mr->hwmr.pbl_table;
+ struct ocrdma_pbe *pbe;
+ u32 wqe_size = sizeof(*fast_reg) + sizeof(*hdr);
+ int num_pbes = 0, i;
+
+ wqe_size = roundup(wqe_size, OCRDMA_WQE_ALIGN_BYTES);
+
+ hdr->cw |= (OCRDMA_FR_MR << OCRDMA_WQE_OPCODE_SHIFT);
+ hdr->cw |= ((wqe_size / OCRDMA_WQE_STRIDE) << OCRDMA_WQE_SIZE_SHIFT);
+
+ if (wr->access & IB_ACCESS_LOCAL_WRITE)
+ hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_LOCAL_WR;
+ if (wr->access & IB_ACCESS_REMOTE_WRITE)
+ hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_REMOTE_WR;
+ if (wr->access & IB_ACCESS_REMOTE_READ)
+ hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_REMOTE_RD;
+ hdr->lkey = wr->key;
+ hdr->total_len = mr->ibmr.length;
+
+ fbo = mr->ibmr.iova - mr->pages[0];
+
+ fast_reg->va_hi = upper_32_bits(mr->ibmr.iova);
+ fast_reg->va_lo = (u32) (mr->ibmr.iova & 0xffffffff);
+ fast_reg->fbo_hi = upper_32_bits(fbo);
+ fast_reg->fbo_lo = (u32) fbo & 0xffffffff;
+ fast_reg->num_sges = mr->npages;
+ fast_reg->size_sge = get_encoded_page_size(mr->ibmr.page_size);
+
+ pbe = pbl_tbl->va;
+ for (i = 0; i < mr->npages; i++) {
+ u64 buf_addr = mr->pages[i];
+
+ pbe->pa_lo = cpu_to_le32((u32) (buf_addr & PAGE_MASK));
+ pbe->pa_hi = cpu_to_le32((u32) upper_32_bits(buf_addr));
+ num_pbes += 1;
+ pbe++;
+
+ /* if the pbl is full storing the pbes,
+ * move to next pbl.
+ */
+ if (num_pbes == (mr->hwmr.pbl_size/sizeof(u64))) {
+ pbl_tbl++;
+ pbe = (struct ocrdma_pbe *)pbl_tbl->va;
+ }
+ }
+
+ return 0;
+}

static int ocrdma_build_fr(struct ocrdma_qp *qp, struct ocrdma_hdr_wqe *hdr,
struct ib_send_wr *send_wr)
@@ -2304,6 +2360,9 @@ int ocrdma_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
case IB_WR_FAST_REG_MR:
status = ocrdma_build_fr(qp, hdr, wr);
break;
+ case IB_WR_REG_MR:
+ status = ocrdma_build_reg(qp, hdr, reg_wr(wr));
+ break;
default:
status = -EINVAL;
break;
@@ -3059,6 +3118,12 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *ibpd,
if (!mr)
return ERR_PTR(-ENOMEM);

+ mr->pages = kcalloc(max_num_sg, sizeof(u64), GFP_KERNEL);
+ if (!mr->pages) {
+ status = -ENOMEM;
+ goto pl_err;
+ }
+
status = ocrdma_get_pbl_info(dev, mr, max_num_sg);
if (status)
goto pbl_err;
@@ -3082,6 +3147,8 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *ibpd,
mbx_err:
ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr);
pbl_err:
+ kfree(mr->pages);
+pl_err:
kfree(mr);
return ERR_PTR(-ENOMEM);
}
@@ -3268,3 +3335,26 @@ pbl_err:
kfree(mr);
return ERR_PTR(status);
}
+
+static int ocrdma_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct ocrdma_mr *mr = get_ocrdma_mr(ibmr);
+
+ if (unlikely(mr->npages == mr->hwmr.num_pbes))
+ return -ENOMEM;
+
+ mr->pages[mr->npages++] = addr;
+
+ return 0;
+}
+
+int ocrdma_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct ocrdma_mr *mr = get_ocrdma_mr(ibmr);
+
+ mr->npages = 0;
+
+ return ib_sg_to_pages(ibmr, sg, sg_nents, ocrdma_set_page);
+}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index 308c16857a5d..4edf63f9c6c7 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -125,6 +125,9 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *, u64 start, u64 length,
struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
+int ocrdma_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents);
struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device
*ibdev,
int page_list_len);
--
1.8.4.3


2015-10-08 07:51:55

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 11/26] iser-target: Port to new memory registration API

Remove fastreg page list allocation as the page vector
is now private to the provider. Instead of constructing
the page list and fast_req work request, call ib_map_mr_sg
and construct ib_reg_wr.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/ulp/isert/ib_isert.c | 130 +++++++-------------------------
drivers/infiniband/ulp/isert/ib_isert.h | 2 -
2 files changed, 28 insertions(+), 104 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 70d55d381515..7dbae56e576a 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -473,10 +473,8 @@ isert_conn_free_fastreg_pool(struct isert_conn *isert_conn)
list_for_each_entry_safe(fr_desc, tmp,
&isert_conn->fr_pool, list) {
list_del(&fr_desc->list);
- ib_free_fast_reg_page_list(fr_desc->data_frpl);
ib_dereg_mr(fr_desc->data_mr);
if (fr_desc->pi_ctx) {
- ib_free_fast_reg_page_list(fr_desc->pi_ctx->prot_frpl);
ib_dereg_mr(fr_desc->pi_ctx->prot_mr);
ib_dereg_mr(fr_desc->pi_ctx->sig_mr);
kfree(fr_desc->pi_ctx);
@@ -504,22 +502,13 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc,
return -ENOMEM;
}

- pi_ctx->prot_frpl = ib_alloc_fast_reg_page_list(device,
- ISCSI_ISER_SG_TABLESIZE);
- if (IS_ERR(pi_ctx->prot_frpl)) {
- isert_err("Failed to allocate prot frpl err=%ld\n",
- PTR_ERR(pi_ctx->prot_frpl));
- ret = PTR_ERR(pi_ctx->prot_frpl);
- goto err_pi_ctx;
- }
-
pi_ctx->prot_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
ISCSI_ISER_SG_TABLESIZE);
if (IS_ERR(pi_ctx->prot_mr)) {
isert_err("Failed to allocate prot frmr err=%ld\n",
PTR_ERR(pi_ctx->prot_mr));
ret = PTR_ERR(pi_ctx->prot_mr);
- goto err_prot_frpl;
+ goto err_pi_ctx;
}
desc->ind |= ISERT_PROT_KEY_VALID;

@@ -539,8 +528,6 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc,

err_prot_mr:
ib_dereg_mr(pi_ctx->prot_mr);
-err_prot_frpl:
- ib_free_fast_reg_page_list(pi_ctx->prot_frpl);
err_pi_ctx:
kfree(pi_ctx);

@@ -551,34 +538,18 @@ static int
isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd,
struct fast_reg_descriptor *fr_desc)
{
- int ret;
-
- fr_desc->data_frpl = ib_alloc_fast_reg_page_list(ib_device,
- ISCSI_ISER_SG_TABLESIZE);
- if (IS_ERR(fr_desc->data_frpl)) {
- isert_err("Failed to allocate data frpl err=%ld\n",
- PTR_ERR(fr_desc->data_frpl));
- return PTR_ERR(fr_desc->data_frpl);
- }
-
fr_desc->data_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
ISCSI_ISER_SG_TABLESIZE);
if (IS_ERR(fr_desc->data_mr)) {
isert_err("Failed to allocate data frmr err=%ld\n",
PTR_ERR(fr_desc->data_mr));
- ret = PTR_ERR(fr_desc->data_mr);
- goto err_data_frpl;
+ return PTR_ERR(fr_desc->data_mr);
}
fr_desc->ind |= ISERT_DATA_KEY_VALID;

isert_dbg("Created fr_desc %p\n", fr_desc);

return 0;
-
-err_data_frpl:
- ib_free_fast_reg_page_list(fr_desc->data_frpl);
-
- return ret;
}

static int
@@ -2535,45 +2506,6 @@ unmap_cmd:
return ret;
}

-static int
-isert_map_fr_pagelist(struct ib_device *ib_dev,
- struct scatterlist *sg_start, int sg_nents, u64 *fr_pl)
-{
- u64 start_addr, end_addr, page, chunk_start = 0;
- struct scatterlist *tmp_sg;
- int i = 0, new_chunk, last_ent, n_pages;
-
- n_pages = 0;
- new_chunk = 1;
- last_ent = sg_nents - 1;
- for_each_sg(sg_start, tmp_sg, sg_nents, i) {
- start_addr = ib_sg_dma_address(ib_dev, tmp_sg);
- if (new_chunk)
- chunk_start = start_addr;
- end_addr = start_addr + ib_sg_dma_len(ib_dev, tmp_sg);
-
- isert_dbg("SGL[%d] dma_addr: 0x%llx len: %u\n",
- i, (unsigned long long)tmp_sg->dma_address,
- tmp_sg->length);
-
- if ((end_addr & ~PAGE_MASK) && i < last_ent) {
- new_chunk = 0;
- continue;
- }
- new_chunk = 1;
-
- page = chunk_start & PAGE_MASK;
- do {
- fr_pl[n_pages++] = page;
- isert_dbg("Mapped page_list[%d] page_addr: 0x%llx\n",
- n_pages - 1, page);
- page += PAGE_SIZE;
- } while (page < end_addr);
- }
-
- return n_pages;
-}
-
static inline void
isert_inv_rkey(struct ib_send_wr *inv_wr, struct ib_mr *mr)
{
@@ -2599,11 +2531,9 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
struct isert_device *device = isert_conn->device;
struct ib_device *ib_dev = device->ib_device;
struct ib_mr *mr;
- struct ib_fast_reg_page_list *frpl;
- struct ib_fast_reg_wr fr_wr;
+ struct ib_reg_wr reg_wr;
struct ib_send_wr inv_wr, *bad_wr, *wr = NULL;
- int ret, pagelist_len;
- u32 page_off;
+ int ret, n;

if (mem->dma_nents == 1) {
sge->lkey = device->pd->local_dma_lkey;
@@ -2614,45 +2544,41 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
return 0;
}

- if (ind == ISERT_DATA_KEY_VALID) {
+ if (ind == ISERT_DATA_KEY_VALID)
/* Registering data buffer */
mr = fr_desc->data_mr;
- frpl = fr_desc->data_frpl;
- } else {
+ else
/* Registering protection buffer */
mr = fr_desc->pi_ctx->prot_mr;
- frpl = fr_desc->pi_ctx->prot_frpl;
- }
-
- page_off = mem->offset % PAGE_SIZE;
-
- isert_dbg("Use fr_desc %p sg_nents %d offset %u\n",
- fr_desc, mem->nents, mem->offset);
-
- pagelist_len = isert_map_fr_pagelist(ib_dev, mem->sg, mem->nents,
- &frpl->page_list[0]);

if (!(fr_desc->ind & ind)) {
isert_inv_rkey(&inv_wr, mr);
wr = &inv_wr;
}

- /* Prepare FASTREG WR */
- memset(&fr_wr, 0, sizeof(fr_wr));
- fr_wr.wr.wr_id = ISER_FASTREG_LI_WRID;
- fr_wr.wr.opcode = IB_WR_FAST_REG_MR;
- fr_wr.iova_start = frpl->page_list[0] + page_off;
- fr_wr.page_list = frpl;
- fr_wr.page_list_len = pagelist_len;
- fr_wr.page_shift = PAGE_SHIFT;
- fr_wr.length = mem->len;
- fr_wr.rkey = mr->rkey;
- fr_wr.access_flags = IB_ACCESS_LOCAL_WRITE;
+ n = ib_map_mr_sg(mr, mem->sg, mem->nents, PAGE_SIZE);
+ if (unlikely(n != mem->nents)) {
+ isert_err("failed to map mr sg (%d/%d)\n",
+ n, mem->nents);
+ return n < 0 ? n : -EINVAL;
+ }
+
+ isert_dbg("Use fr_desc %p sg_nents %d offset %u\n",
+ fr_desc, mem->nents, mem->offset);
+
+ reg_wr.wr.next = NULL;
+ reg_wr.wr.opcode = IB_WR_REG_MR;
+ reg_wr.wr.wr_id = ISER_FASTREG_LI_WRID;
+ reg_wr.wr.send_flags = 0;
+ reg_wr.wr.num_sge = 0;
+ reg_wr.mr = mr;
+ reg_wr.key = mr->lkey;
+ reg_wr.access = IB_ACCESS_LOCAL_WRITE;

if (!wr)
- wr = &fr_wr.wr;
+ wr = &reg_wr.wr;
else
- wr->next = &fr_wr.wr;
+ wr->next = &reg_wr.wr;

ret = ib_post_send(isert_conn->qp, wr, &bad_wr);
if (ret) {
@@ -2662,8 +2588,8 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
fr_desc->ind &= ~ind;

sge->lkey = mr->lkey;
- sge->addr = frpl->page_list[0] + page_off;
- sge->length = mem->len;
+ sge->addr = mr->iova;
+ sge->length = mr->length;

isert_dbg("sge: addr: 0x%llx length: %u lkey: %x\n",
sge->addr, sge->length, sge->lkey);
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 360461819452..3d7fbc47c343 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -84,14 +84,12 @@ enum isert_indicator {

struct pi_context {
struct ib_mr *prot_mr;
- struct ib_fast_reg_page_list *prot_frpl;
struct ib_mr *sig_mr;
};

struct fast_reg_descriptor {
struct list_head list;
struct ib_mr *data_mr;
- struct ib_fast_reg_page_list *data_frpl;
u8 ind;
struct pi_context *pi_ctx;
};
--
1.8.4.3


2015-10-08 07:51:56

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 13/26] svcrdma: Port to new memory registration API

Instead of maintaining a fastreg page list, keep an sg table
and convert an array of pages to a sg list. Then call ib_map_mr_sg
and construct ib_reg_wr.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Tested-by: Steve Wise <[email protected]>
Tested-by: Selvin Xavier <[email protected]>
---
include/linux/sunrpc/svc_rdma.h | 6 +--
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 76 ++++++++++++++++++--------------
net/sunrpc/xprtrdma/svc_rdma_transport.c | 34 +++++---------
3 files changed, 55 insertions(+), 61 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 7ccc961f33e9..e8147d535588 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -105,11 +105,9 @@ struct svc_rdma_chunk_sge {
};
struct svc_rdma_fastreg_mr {
struct ib_mr *mr;
- void *kva;
- struct ib_fast_reg_page_list *page_list;
- int page_list_len;
+ struct scatterlist *sg;
+ unsigned int sg_nents;
unsigned long access_flags;
- unsigned long map_len;
enum dma_data_direction direction;
struct list_head frmr_list;
};
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 7be42d0da19e..303f194970f9 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -220,12 +220,12 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
{
struct ib_rdma_wr read_wr;
struct ib_send_wr inv_wr;
- struct ib_fast_reg_wr fastreg_wr;
+ struct ib_reg_wr reg_wr;
u8 key;
- int pages_needed = PAGE_ALIGN(*page_offset + rs_length) >> PAGE_SHIFT;
+ unsigned int nents = PAGE_ALIGN(*page_offset + rs_length) >> PAGE_SHIFT;
struct svc_rdma_op_ctxt *ctxt = svc_rdma_get_context(xprt);
struct svc_rdma_fastreg_mr *frmr = svc_rdma_get_frmr(xprt);
- int ret, read, pno;
+ int ret, read, pno, dma_nents, n;
u32 pg_off = *page_offset;
u32 pg_no = *page_no;

@@ -234,16 +234,14 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,

ctxt->direction = DMA_FROM_DEVICE;
ctxt->frmr = frmr;
- pages_needed = min_t(int, pages_needed, xprt->sc_frmr_pg_list_len);
- read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
+ nents = min_t(unsigned int, nents, xprt->sc_frmr_pg_list_len);
+ read = min_t(int, nents << PAGE_SHIFT, rs_length);

- frmr->kva = page_address(rqstp->rq_arg.pages[pg_no]);
frmr->direction = DMA_FROM_DEVICE;
frmr->access_flags = (IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE);
- frmr->map_len = pages_needed << PAGE_SHIFT;
- frmr->page_list_len = pages_needed;
+ frmr->sg_nents = nents;

- for (pno = 0; pno < pages_needed; pno++) {
+ for (pno = 0; pno < nents; pno++) {
int len = min_t(int, rs_length, PAGE_SIZE - pg_off);

head->arg.pages[pg_no] = rqstp->rq_arg.pages[pg_no];
@@ -251,17 +249,12 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
head->arg.len += len;
if (!pg_off)
head->count++;
+
+ sg_set_page(&frmr->sg[pno], rqstp->rq_arg.pages[pg_no],
+ len, pg_off);
+
rqstp->rq_respages = &rqstp->rq_arg.pages[pg_no+1];
rqstp->rq_next_page = rqstp->rq_respages + 1;
- frmr->page_list->page_list[pno] =
- ib_dma_map_page(xprt->sc_cm_id->device,
- head->arg.pages[pg_no], 0,
- PAGE_SIZE, DMA_FROM_DEVICE);
- ret = ib_dma_mapping_error(xprt->sc_cm_id->device,
- frmr->page_list->page_list[pno]);
- if (ret)
- goto err;
- atomic_inc(&xprt->sc_dma_used);

/* adjust offset and wrap to next page if needed */
pg_off += len;
@@ -277,28 +270,42 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
else
clear_bit(RDMACTXT_F_LAST_CTXT, &ctxt->flags);

+ dma_nents = ib_dma_map_sg(xprt->sc_cm_id->device,
+ frmr->sg, frmr->sg_nents,
+ frmr->direction);
+ if (!dma_nents) {
+ pr_err("svcrdma: failed to dma map sg %p\n",
+ frmr->sg);
+ return -ENOMEM;
+ }
+ atomic_inc(&xprt->sc_dma_used);
+
+ n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
+ if (unlikely(n != frmr->sg_nents)) {
+ pr_err("svcrdma: failed to map mr %p (%d/%d elements)\n",
+ frmr->mr, n, frmr->sg_nents);
+ return n < 0 ? n : -EINVAL;
+ }
+
/* Bump the key */
key = (u8)(frmr->mr->lkey & 0x000000FF);
ib_update_fast_reg_key(frmr->mr, ++key);

- ctxt->sge[0].addr = (unsigned long)frmr->kva + *page_offset;
+ ctxt->sge[0].addr = frmr->mr->iova;
ctxt->sge[0].lkey = frmr->mr->lkey;
- ctxt->sge[0].length = read;
+ ctxt->sge[0].length = frmr->mr->length;
ctxt->count = 1;
ctxt->read_hdr = head;

- /* Prepare FASTREG WR */
- memset(&fastreg_wr, 0, sizeof(fastreg_wr));
- fastreg_wr.wr.opcode = IB_WR_FAST_REG_MR;
- fastreg_wr.wr.send_flags = IB_SEND_SIGNALED;
- fastreg_wr.iova_start = (unsigned long)frmr->kva;
- fastreg_wr.page_list = frmr->page_list;
- fastreg_wr.page_list_len = frmr->page_list_len;
- fastreg_wr.page_shift = PAGE_SHIFT;
- fastreg_wr.length = frmr->map_len;
- fastreg_wr.access_flags = frmr->access_flags;
- fastreg_wr.rkey = frmr->mr->lkey;
- fastreg_wr.wr.next = &read_wr.wr;
+ /* Prepare REG WR */
+ reg_wr.wr.opcode = IB_WR_REG_MR;
+ reg_wr.wr.wr_id = 0;
+ reg_wr.wr.send_flags = IB_SEND_SIGNALED;
+ reg_wr.wr.num_sge = 0;
+ reg_wr.mr = frmr->mr;
+ reg_wr.key = frmr->mr->lkey;
+ reg_wr.access = frmr->access_flags;
+ reg_wr.wr.next = &read_wr.wr;

/* Prepare RDMA_READ */
memset(&read_wr, 0, sizeof(read_wr));
@@ -324,7 +331,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
ctxt->wr_op = read_wr.wr.opcode;

/* Post the chain */
- ret = svc_rdma_send(xprt, &fastreg_wr.wr);
+ ret = svc_rdma_send(xprt, &reg_wr.wr);
if (ret) {
pr_err("svcrdma: Error %d posting RDMA_READ\n", ret);
set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
@@ -338,7 +345,8 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
atomic_inc(&rdma_stat_read);
return ret;
err:
- svc_rdma_unmap_dma(ctxt);
+ ib_dma_unmap_sg(xprt->sc_cm_id->device,
+ frmr->sg, frmr->sg_nents, frmr->direction);
svc_rdma_put_context(ctxt, 0);
svc_rdma_put_frmr(xprt, frmr);
return ret;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index fcc3eb80c265..6ee928fc06cb 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -732,7 +732,7 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
static struct svc_rdma_fastreg_mr *rdma_alloc_frmr(struct svcxprt_rdma *xprt)
{
struct ib_mr *mr;
- struct ib_fast_reg_page_list *pl;
+ struct scatterlist *sg;
struct svc_rdma_fastreg_mr *frmr;
u32 num_sg;

@@ -745,13 +745,14 @@ static struct svc_rdma_fastreg_mr *rdma_alloc_frmr(struct svcxprt_rdma *xprt)
if (IS_ERR(mr))
goto err_free_frmr;

- pl = ib_alloc_fast_reg_page_list(xprt->sc_cm_id->device,
- num_sg);
- if (IS_ERR(pl))
+ sg = kcalloc(RPCSVC_MAXPAGES, sizeof(*sg), GFP_KERNEL);
+ if (!sg)
goto err_free_mr;

+ sg_init_table(sg, RPCSVC_MAXPAGES);
+
frmr->mr = mr;
- frmr->page_list = pl;
+ frmr->sg = sg;
INIT_LIST_HEAD(&frmr->frmr_list);
return frmr;

@@ -771,8 +772,8 @@ static void rdma_dealloc_frmr_q(struct svcxprt_rdma *xprt)
frmr = list_entry(xprt->sc_frmr_q.next,
struct svc_rdma_fastreg_mr, frmr_list);
list_del_init(&frmr->frmr_list);
+ kfree(frmr->sg);
ib_dereg_mr(frmr->mr);
- ib_free_fast_reg_page_list(frmr->page_list);
kfree(frmr);
}
}
@@ -786,8 +787,7 @@ struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *rdma)
frmr = list_entry(rdma->sc_frmr_q.next,
struct svc_rdma_fastreg_mr, frmr_list);
list_del_init(&frmr->frmr_list);
- frmr->map_len = 0;
- frmr->page_list_len = 0;
+ frmr->sg_nents = 0;
}
spin_unlock_bh(&rdma->sc_frmr_q_lock);
if (frmr)
@@ -796,25 +796,13 @@ struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *rdma)
return rdma_alloc_frmr(rdma);
}

-static void frmr_unmap_dma(struct svcxprt_rdma *xprt,
- struct svc_rdma_fastreg_mr *frmr)
-{
- int page_no;
- for (page_no = 0; page_no < frmr->page_list_len; page_no++) {
- dma_addr_t addr = frmr->page_list->page_list[page_no];
- if (ib_dma_mapping_error(frmr->mr->device, addr))
- continue;
- atomic_dec(&xprt->sc_dma_used);
- ib_dma_unmap_page(frmr->mr->device, addr, PAGE_SIZE,
- frmr->direction);
- }
-}
-
void svc_rdma_put_frmr(struct svcxprt_rdma *rdma,
struct svc_rdma_fastreg_mr *frmr)
{
if (frmr) {
- frmr_unmap_dma(rdma, frmr);
+ ib_dma_unmap_sg(rdma->sc_cm_id->device,
+ frmr->sg, frmr->sg_nents, frmr->direction);
+ atomic_dec(&rdma->sc_dma_used);
spin_lock_bh(&rdma->sc_frmr_q_lock);
WARN_ON_ONCE(!list_empty(&frmr->frmr_list));
list_add(&frmr->frmr_list, &rdma->sc_frmr_q);
--
1.8.4.3


2015-10-08 07:51:57

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 17/26] IB/srp: Remove srp_finish_mapping

No callers left, remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
Tested-by: Bart Van Assche <[email protected]>
---
drivers/infiniband/ulp/srp/ib_srp.c | 10 ----------
1 file changed, 10 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 6d399378928d..d4a5a9b86390 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1368,16 +1368,6 @@ static int srp_map_finish_fr(struct srp_map_state *state,
return n;
}

-static int srp_finish_mapping(struct srp_map_state *state,
- struct srp_rdma_ch *ch)
-{
- struct srp_target_port *target = ch->target;
- struct srp_device *dev = target->srp_host->srp_dev;
-
- return dev->use_fast_reg ? srp_map_finish_fr(state, ch) :
- srp_map_finish_fmr(state, ch);
-}
-
static int srp_map_sg_entry(struct srp_map_state *state,
struct srp_rdma_ch *ch,
struct scatterlist *sg, int sg_index)
--
1.8.4.3


2015-10-08 07:52:02

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 25/26] RDMA/nes: Remove old FRWR API

No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/nes/nes_hw.h | 6 --
drivers/infiniband/hw/nes/nes_verbs.c | 162 +---------------------------------
2 files changed, 1 insertion(+), 167 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h
index d748e4b31b8d..c9080208aad2 100644
--- a/drivers/infiniband/hw/nes/nes_hw.h
+++ b/drivers/infiniband/hw/nes/nes_hw.h
@@ -1200,12 +1200,6 @@ struct nes_fast_mr_wqe_pbl {
dma_addr_t paddr;
};

-struct nes_ib_fast_reg_page_list {
- struct ib_fast_reg_page_list ibfrpl;
- struct nes_fast_mr_wqe_pbl nes_wqe_pbl;
- u64 pbl;
-};
-
struct nes_listener {
struct work_struct work;
struct workqueue_struct *wq;
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index f5b0cc972403..c58091a6635f 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -486,76 +486,6 @@ static int nes_map_mr_sg(struct ib_mr *ibmr,
return ib_sg_to_pages(ibmr, sg, sg_nents, nes_set_page);
}

-/*
- * nes_alloc_fast_reg_page_list
- */
-static struct ib_fast_reg_page_list *nes_alloc_fast_reg_page_list(
- struct ib_device *ibdev,
- int page_list_len)
-{
- struct nes_vnic *nesvnic = to_nesvnic(ibdev);
- struct nes_device *nesdev = nesvnic->nesdev;
- struct ib_fast_reg_page_list *pifrpl;
- struct nes_ib_fast_reg_page_list *pnesfrpl;
-
- if (page_list_len > (NES_4K_PBL_CHUNK_SIZE / sizeof(u64)))
- return ERR_PTR(-E2BIG);
- /*
- * Allocate the ib_fast_reg_page_list structure, the
- * nes_fast_bpl structure, and the PLB table.
- */
- pnesfrpl = kmalloc(sizeof(struct nes_ib_fast_reg_page_list) +
- page_list_len * sizeof(u64), GFP_KERNEL);
-
- if (!pnesfrpl)
- return ERR_PTR(-ENOMEM);
-
- pifrpl = &pnesfrpl->ibfrpl;
- pifrpl->page_list = &pnesfrpl->pbl;
- pifrpl->max_page_list_len = page_list_len;
- /*
- * Allocate the WQE PBL
- */
- pnesfrpl->nes_wqe_pbl.kva = pci_alloc_consistent(nesdev->pcidev,
- page_list_len * sizeof(u64),
- &pnesfrpl->nes_wqe_pbl.paddr);
-
- if (!pnesfrpl->nes_wqe_pbl.kva) {
- kfree(pnesfrpl);
- return ERR_PTR(-ENOMEM);
- }
- nes_debug(NES_DBG_MR, "nes_alloc_fast_reg_pbl: nes_frpl = %p, "
- "ibfrpl = %p, ibfrpl.page_list = %p, pbl.kva = %p, "
- "pbl.paddr = %llx\n", pnesfrpl, &pnesfrpl->ibfrpl,
- pnesfrpl->ibfrpl.page_list, pnesfrpl->nes_wqe_pbl.kva,
- (unsigned long long) pnesfrpl->nes_wqe_pbl.paddr);
-
- return pifrpl;
-}
-
-/*
- * nes_free_fast_reg_page_list
- */
-static void nes_free_fast_reg_page_list(struct ib_fast_reg_page_list *pifrpl)
-{
- struct nes_vnic *nesvnic = to_nesvnic(pifrpl->device);
- struct nes_device *nesdev = nesvnic->nesdev;
- struct nes_ib_fast_reg_page_list *pnesfrpl;
-
- pnesfrpl = container_of(pifrpl, struct nes_ib_fast_reg_page_list, ibfrpl);
- /*
- * Free the WQE PBL.
- */
- pci_free_consistent(nesdev->pcidev,
- pifrpl->max_page_list_len * sizeof(u64),
- pnesfrpl->nes_wqe_pbl.kva,
- pnesfrpl->nes_wqe_pbl.paddr);
- /*
- * Free the PBL structure
- */
- kfree(pnesfrpl);
-}
-
/**
* nes_query_device
*/
@@ -3470,94 +3400,6 @@ static int nes_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr,
NES_IWARP_SQ_LOCINV_WQE_INV_STAG_IDX,
ib_wr->ex.invalidate_rkey);
break;
- case IB_WR_FAST_REG_MR:
- {
- int i;
- struct ib_fast_reg_wr *fwr = fast_reg_wr(ib_wr);
- int flags = fwr->access_flags;
- struct nes_ib_fast_reg_page_list *pnesfrpl =
- container_of(fwr->page_list,
- struct nes_ib_fast_reg_page_list,
- ibfrpl);
- u64 *src_page_list = pnesfrpl->ibfrpl.page_list;
- u64 *dst_page_list = pnesfrpl->nes_wqe_pbl.kva;
-
- if (fwr->page_list_len >
- (NES_4K_PBL_CHUNK_SIZE / sizeof(u64))) {
- nes_debug(NES_DBG_IW_TX, "SQ_FMR: bad page_list_len\n");
- err = -EINVAL;
- break;
- }
- wqe_misc = NES_IWARP_SQ_OP_FAST_REG;
- set_wqe_64bit_value(wqe->wqe_words,
- NES_IWARP_SQ_FMR_WQE_VA_FBO_LOW_IDX,
- fwr->iova_start);
- set_wqe_32bit_value(wqe->wqe_words,
- NES_IWARP_SQ_FMR_WQE_LENGTH_LOW_IDX,
- fwr->length);
- set_wqe_32bit_value(wqe->wqe_words,
- NES_IWARP_SQ_FMR_WQE_LENGTH_HIGH_IDX, 0);
- set_wqe_32bit_value(wqe->wqe_words,
- NES_IWARP_SQ_FMR_WQE_MR_STAG_IDX,
- fwr->rkey);
- /* Set page size: */
- if (fwr->page_shift == 12) {
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_PAGE_SIZE_4K;
- } else if (fwr->page_shift == 21) {
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_PAGE_SIZE_2M;
- } else {
- nes_debug(NES_DBG_IW_TX, "Invalid page shift,"
- " ib_wr=%u, max=1\n", ib_wr->num_sge);
- err = -EINVAL;
- break;
- }
- /* Set access_flags */
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_LOCAL_READ;
- if (flags & IB_ACCESS_LOCAL_WRITE)
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_LOCAL_WRITE;
-
- if (flags & IB_ACCESS_REMOTE_WRITE)
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_REMOTE_WRITE;
-
- if (flags & IB_ACCESS_REMOTE_READ)
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_REMOTE_READ;
-
- if (flags & IB_ACCESS_MW_BIND)
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_WINDOW_BIND;
-
- /* Fill in PBL info: */
- if (fwr->page_list_len >
- pnesfrpl->ibfrpl.max_page_list_len) {
- nes_debug(NES_DBG_IW_TX, "Invalid page list length,"
- " ib_wr=%p, value=%u, max=%u\n",
- ib_wr, fwr->page_list_len,
- pnesfrpl->ibfrpl.max_page_list_len);
- err = -EINVAL;
- break;
- }
-
- set_wqe_64bit_value(wqe->wqe_words,
- NES_IWARP_SQ_FMR_WQE_PBL_ADDR_LOW_IDX,
- pnesfrpl->nes_wqe_pbl.paddr);
-
- set_wqe_32bit_value(wqe->wqe_words,
- NES_IWARP_SQ_FMR_WQE_PBL_LENGTH_IDX,
- fwr->page_list_len * 8);
-
- for (i = 0; i < fwr->page_list_len; i++)
- dst_page_list[i] = cpu_to_le64(src_page_list[i]);
-
- nes_debug(NES_DBG_IW_TX, "SQ_FMR: iova_start: %llx, "
- "length: %d, rkey: %0x, pgl_paddr: %llx, "
- "page_list_len: %u, wqe_misc: %x\n",
- (unsigned long long) fwr->iova_start,
- fwr->length,
- fwr->rkey,
- (unsigned long long) pnesfrpl->nes_wqe_pbl.paddr,
- fwr->page_list_len,
- wqe_misc);
- break;
- }
case IB_WR_REG_MR:
{
struct nes_mr *mr = to_nesmr(reg_wr(ib_wr)->mr);
@@ -3866,7 +3708,7 @@ static int nes_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry)
entry->opcode = IB_WC_LOCAL_INV;
break;
case NES_IWARP_SQ_OP_FAST_REG:
- entry->opcode = IB_WC_FAST_REG_MR;
+ entry->opcode = IB_WC_REG_MR;
break;
}

@@ -4055,8 +3897,6 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev)

nesibdev->ibdev.alloc_mr = nes_alloc_mr;
nesibdev->ibdev.map_mr_sg = nes_map_mr_sg;
- nesibdev->ibdev.alloc_fast_reg_page_list = nes_alloc_fast_reg_page_list;
- nesibdev->ibdev.free_fast_reg_page_list = nes_free_fast_reg_page_list;

nesibdev->ibdev.attach_mcast = nes_multicast_attach;
nesibdev->ibdev.detach_mcast = nes_multicast_detach;
--
1.8.4.3


2015-10-08 07:51:56

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 12/26] xprtrdma: Port to new memory registration API

Instead of maintaining a fastreg page list, keep an sg table
and convert an array of pages to a sg list. Then call ib_map_mr_sg
and construct ib_reg_wr.

Note that the next step would be to have NFS work with sg lists
as it maps well to sk_frags (see comment from hch
http://marc.info/?l=linux-rdma&m=143677002622296&w=2).

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Tested-by: Steve Wise <[email protected]>
Tested-by: Selvin Xavier <[email protected]>
---
net/sunrpc/xprtrdma/frwr_ops.c | 112 +++++++++++++++++++++++-----------------
net/sunrpc/xprtrdma/xprt_rdma.h | 3 +-
2 files changed, 67 insertions(+), 48 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 0d2f46f600b6..3c89f8052786 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -151,9 +151,13 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device,
f->fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, depth);
if (IS_ERR(f->fr_mr))
goto out_mr_err;
- f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth);
- if (IS_ERR(f->fr_pgl))
+
+ f->sg = kcalloc(depth, sizeof(*f->sg), GFP_KERNEL);
+ if (!f->sg)
goto out_list_err;
+
+ sg_init_table(f->sg, depth);
+
return 0;

out_mr_err:
@@ -163,7 +167,7 @@ out_mr_err:
return rc;

out_list_err:
- rc = PTR_ERR(f->fr_pgl);
+ rc = -ENOMEM;
dprintk("RPC: %s: ib_alloc_fast_reg_page_list status %i\n",
__func__, rc);
ib_dereg_mr(f->fr_mr);
@@ -179,7 +183,7 @@ __frwr_release(struct rpcrdma_mw *r)
if (rc)
dprintk("RPC: %s: ib_dereg_mr status %i\n",
__func__, rc);
- ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
+ kfree(r->r.frmr.sg);
}

static int
@@ -312,14 +316,11 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
struct rpcrdma_mw *mw;
struct rpcrdma_frmr *frmr;
struct ib_mr *mr;
- struct ib_fast_reg_wr fastreg_wr;
+ struct ib_reg_wr reg_wr;
struct ib_send_wr *bad_wr;
+ unsigned int dma_nents;
u8 key;
- int len, pageoff;
- int i, rc;
- int seg_len;
- u64 pa;
- int page_no;
+ int i, rc, len, n;

mw = seg1->rl_mw;
seg1->rl_mw = NULL;
@@ -332,64 +333,81 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
} while (mw->r.frmr.fr_state != FRMR_IS_INVALID);
frmr = &mw->r.frmr;
frmr->fr_state = FRMR_IS_VALID;
+ mr = frmr->fr_mr;

- pageoff = offset_in_page(seg1->mr_offset);
- seg1->mr_offset -= pageoff; /* start of page */
- seg1->mr_len += pageoff;
- len = -pageoff;
if (nsegs > ia->ri_max_frmr_depth)
nsegs = ia->ri_max_frmr_depth;

- for (page_no = i = 0; i < nsegs;) {
- rpcrdma_map_one(device, seg, direction);
- pa = seg->mr_dma;
- for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
- frmr->fr_pgl->page_list[page_no++] = pa;
- pa += PAGE_SIZE;
- }
+ for (len = 0, i = 0; i < nsegs;) {
+ if (seg->mr_page)
+ sg_set_page(&frmr->sg[i],
+ seg->mr_page,
+ seg->mr_len,
+ offset_in_page(seg->mr_offset));
+ else
+ sg_set_buf(&frmr->sg[i], seg->mr_offset,
+ seg->mr_len);
+
len += seg->mr_len;
++seg;
++i;
+
/* Check for holes */
if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
break;
}
+ frmr->sg_nents = i;
+
+ dma_nents = ib_dma_map_sg(device, frmr->sg, frmr->sg_nents, direction);
+ if (!dma_nents) {
+ pr_err("RPC: %s: failed to dma map sg %p sg_nents %d\n",
+ __func__, frmr->sg, frmr->sg_nents);
+ return -ENOMEM;
+ }
+
+ n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
+ if (unlikely(n != frmr->sg_nents)) {
+ pr_err("RPC: %s: failed to map mr %p (%d/%d)\n",
+ __func__, frmr->fr_mr, n, frmr->sg_nents);
+ rc = n < 0 ? n : -EINVAL;
+ goto out_senderr;
+ }
+
dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n",
- __func__, mw, i, len);
-
- memset(&fastreg_wr, 0, sizeof(fastreg_wr));
- fastreg_wr.wr.wr_id = (unsigned long)(void *)mw;
- fastreg_wr.wr.opcode = IB_WR_FAST_REG_MR;
- fastreg_wr.iova_start = seg1->mr_dma + pageoff;
- fastreg_wr.page_list = frmr->fr_pgl;
- fastreg_wr.page_shift = PAGE_SHIFT;
- fastreg_wr.page_list_len = page_no;
- fastreg_wr.length = len;
- fastreg_wr.access_flags = writing ?
- IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
- IB_ACCESS_REMOTE_READ;
- mr = frmr->fr_mr;
+ __func__, mw, frmr->sg_nents, mr->length);
+
key = (u8)(mr->rkey & 0x000000FF);
ib_update_fast_reg_key(mr, ++key);
- fastreg_wr.rkey = mr->rkey;
+
+ reg_wr.wr.next = NULL;
+ reg_wr.wr.opcode = IB_WR_REG_MR;
+ reg_wr.wr.wr_id = (uintptr_t)mw;
+ reg_wr.wr.num_sge = 0;
+ reg_wr.wr.send_flags = 0;
+ reg_wr.mr = mr;
+ reg_wr.key = mr->rkey;
+ reg_wr.access = writing ?
+ IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
+ IB_ACCESS_REMOTE_READ;

DECR_CQCOUNT(&r_xprt->rx_ep);
- rc = ib_post_send(ia->ri_id->qp, &fastreg_wr.wr, &bad_wr);
+ rc = ib_post_send(ia->ri_id->qp, &reg_wr.wr, &bad_wr);
if (rc)
goto out_senderr;

+ seg1->mr_dir = direction;
seg1->rl_mw = mw;
seg1->mr_rkey = mr->rkey;
- seg1->mr_base = seg1->mr_dma + pageoff;
- seg1->mr_nsegs = i;
- seg1->mr_len = len;
- return i;
+ seg1->mr_base = mr->iova;
+ seg1->mr_nsegs = frmr->sg_nents;
+ seg1->mr_len = mr->length;
+
+ return frmr->sg_nents;

out_senderr:
dprintk("RPC: %s: ib_post_send status %i\n", __func__, rc);
- while (i--)
- rpcrdma_unmap_one(device, --seg);
+ ib_dma_unmap_sg(device, frmr->sg, frmr->sg_nents, direction);
__frwr_queue_recovery(mw);
return rc;
}
@@ -403,22 +421,22 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
struct rpcrdma_mr_seg *seg1 = seg;
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_mw *mw = seg1->rl_mw;
+ struct rpcrdma_frmr *frmr = &mw->r.frmr;
struct ib_send_wr invalidate_wr, *bad_wr;
int rc, nsegs = seg->mr_nsegs;

dprintk("RPC: %s: FRMR %p\n", __func__, mw);

seg1->rl_mw = NULL;
- mw->r.frmr.fr_state = FRMR_IS_INVALID;
+ frmr->fr_state = FRMR_IS_INVALID;

memset(&invalidate_wr, 0, sizeof(invalidate_wr));
invalidate_wr.wr_id = (unsigned long)(void *)mw;
invalidate_wr.opcode = IB_WR_LOCAL_INV;
- invalidate_wr.ex.invalidate_rkey = mw->r.frmr.fr_mr->rkey;
+ invalidate_wr.ex.invalidate_rkey = frmr->fr_mr->rkey;
DECR_CQCOUNT(&r_xprt->rx_ep);

- while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(ia->ri_device, seg++);
+ ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir);
read_lock(&ia->ri_qplock);
rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
read_unlock(&ia->ri_qplock);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index c09414e6f91b..426a9107c0ff 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -193,7 +193,8 @@ enum rpcrdma_frmr_state {
};

struct rpcrdma_frmr {
- struct ib_fast_reg_page_list *fr_pgl;
+ struct scatterlist *sg;
+ unsigned int sg_nents;
struct ib_mr *fr_mr;
enum rpcrdma_frmr_state fr_state;
struct work_struct fr_work;
--
1.8.4.3


2015-10-08 07:52:02

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 26/26] IB/core: Remove old fast registration API

No callers and no providers left, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/core/verbs.c | 25 -------------------
include/rdma/ib_verbs.h | 54 -----------------------------------------
2 files changed, 79 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index e18a8bce8130..ddfdd02ac35d 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1253,31 +1253,6 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
}
EXPORT_SYMBOL(ib_alloc_mr);

-struct ib_fast_reg_page_list *ib_alloc_fast_reg_page_list(struct ib_device *device,
- int max_page_list_len)
-{
- struct ib_fast_reg_page_list *page_list;
-
- if (!device->alloc_fast_reg_page_list)
- return ERR_PTR(-ENOSYS);
-
- page_list = device->alloc_fast_reg_page_list(device, max_page_list_len);
-
- if (!IS_ERR(page_list)) {
- page_list->device = device;
- page_list->max_page_list_len = max_page_list_len;
- }
-
- return page_list;
-}
-EXPORT_SYMBOL(ib_alloc_fast_reg_page_list);
-
-void ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list)
-{
- page_list->device->free_fast_reg_page_list(page_list);
-}
-EXPORT_SYMBOL(ib_free_fast_reg_page_list);
-
/* Memory windows */

struct ib_mw *ib_alloc_mw(struct ib_pd *pd, enum ib_mw_type type)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 24bb87f16afb..f5d706ea2e19 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -738,7 +738,6 @@ enum ib_wc_opcode {
IB_WC_BIND_MW,
IB_WC_LSO,
IB_WC_LOCAL_INV,
- IB_WC_FAST_REG_MR,
IB_WC_REG_MR,
IB_WC_MASKED_COMP_SWAP,
IB_WC_MASKED_FETCH_ADD,
@@ -1030,7 +1029,6 @@ enum ib_wr_opcode {
IB_WR_SEND_WITH_INV,
IB_WR_RDMA_READ_WITH_INV,
IB_WR_LOCAL_INV,
- IB_WR_FAST_REG_MR,
IB_WR_REG_MR,
IB_WR_MASKED_ATOMIC_CMP_AND_SWP,
IB_WR_MASKED_ATOMIC_FETCH_AND_ADD,
@@ -1069,12 +1067,6 @@ struct ib_sge {
u32 lkey;
};

-struct ib_fast_reg_page_list {
- struct ib_device *device;
- u64 *page_list;
- unsigned int max_page_list_len;
-};
-
/**
* struct ib_mw_bind_info - Parameters for a memory window bind operation.
* @mr: A memory region to bind the memory window to.
@@ -1149,22 +1141,6 @@ static inline struct ib_ud_wr *ud_wr(struct ib_send_wr *wr)
return container_of(wr, struct ib_ud_wr, wr);
}

-struct ib_fast_reg_wr {
- struct ib_send_wr wr;
- u64 iova_start;
- struct ib_fast_reg_page_list *page_list;
- unsigned int page_shift;
- unsigned int page_list_len;
- u32 length;
- int access_flags;
- u32 rkey;
-};
-
-static inline struct ib_fast_reg_wr *fast_reg_wr(struct ib_send_wr *wr)
-{
- return container_of(wr, struct ib_fast_reg_wr, wr);
-}
-
struct ib_reg_wr {
struct ib_send_wr wr;
struct ib_mr *mr;
@@ -1779,9 +1755,6 @@ struct ib_device {
int (*map_mr_sg)(struct ib_mr *mr,
struct scatterlist *sg,
unsigned int sg_nents);
- struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device,
- int page_list_len);
- void (*free_fast_reg_page_list)(struct ib_fast_reg_page_list *page_list);
int (*rereg_phys_mr)(struct ib_mr *mr,
int mr_rereg_mask,
struct ib_pd *pd,
@@ -2890,33 +2863,6 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
u32 max_num_sg);

/**
- * ib_alloc_fast_reg_page_list - Allocates a page list array
- * @device - ib device pointer.
- * @page_list_len - size of the page list array to be allocated.
- *
- * This allocates and returns a struct ib_fast_reg_page_list * and a
- * page_list array that is at least page_list_len in size. The actual
- * size is returned in max_page_list_len. The caller is responsible
- * for initializing the contents of the page_list array before posting
- * a send work request with the IB_WC_FAST_REG_MR opcode.
- *
- * The page_list array entries must be translated using one of the
- * ib_dma_*() functions just like the addresses passed to
- * ib_map_phys_fmr(). Once the ib_post_send() is issued, the struct
- * ib_fast_reg_page_list must not be modified by the caller until the
- * IB_WC_FAST_REG_MR work request completes.
- */
-struct ib_fast_reg_page_list *ib_alloc_fast_reg_page_list(
- struct ib_device *device, int page_list_len);
-
-/**
- * ib_free_fast_reg_page_list - Deallocates a previously allocated
- * page list array.
- * @page_list - struct ib_fast_reg_page_list pointer to be deallocated.
- */
-void ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
-
-/**
* ib_update_fast_reg_key - updates the key portion of the fast_reg MR
* R_Key and L_Key.
* @mr - struct ib_mr pointer to be updated.
--
1.8.4.3


2015-10-08 07:52:02

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 18/26] IB/srp: Dont allocate a page vector when using fast_reg

The new fast registration API does not reuqire a page vector
so we can't avoid allocating it.

Signed-off-by: Sagi Grimberg <[email protected]>
Tested-by: Bart Van Assche <[email protected]>
---
drivers/infiniband/ulp/srp/ib_srp.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index d4a5a9b86390..d00f819c09b0 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -840,11 +840,12 @@ static void srp_free_req_data(struct srp_target_port *target,

for (i = 0; i < target->req_ring_size; ++i) {
req = &ch->req_ring[i];
- if (dev->use_fast_reg)
+ if (dev->use_fast_reg) {
kfree(req->fr_list);
- else
+ } else {
kfree(req->fmr_list);
- kfree(req->map_page);
+ kfree(req->map_page);
+ }
if (req->indirect_dma_addr) {
ib_dma_unmap_single(ibdev, req->indirect_dma_addr,
target->indirect_size,
@@ -878,14 +879,15 @@ static int srp_alloc_req_data(struct srp_rdma_ch *ch)
GFP_KERNEL);
if (!mr_list)
goto out;
- if (srp_dev->use_fast_reg)
+ if (srp_dev->use_fast_reg) {
req->fr_list = mr_list;
- else
+ } else {
req->fmr_list = mr_list;
- req->map_page = kmalloc(srp_dev->max_pages_per_mr *
- sizeof(void *), GFP_KERNEL);
- if (!req->map_page)
- goto out;
+ req->map_page = kmalloc(srp_dev->max_pages_per_mr *
+ sizeof(void *), GFP_KERNEL);
+ if (!req->map_page)
+ goto out;
+ }
req->indirect_desc = kmalloc(target->indirect_size, GFP_KERNEL);
if (!req->indirect_desc)
goto out;
--
1.8.4.3


2015-10-08 07:52:02

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 08/26] IB/qib: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in qib_mr and populate it when
qib_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating qib_fastreg_mr just take the needed information
from different places:
- page_size, iova, length (ib_mr)
- page array (qib_mr)
- key, access flags (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/qib/qib_keys.c | 56 +++++++++++++++++++++++++++++++++++
drivers/infiniband/hw/qib/qib_mr.c | 32 ++++++++++++++++++++
drivers/infiniband/hw/qib/qib_verbs.c | 9 +++++-
drivers/infiniband/hw/qib/qib_verbs.h | 8 +++++
4 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/qib/qib_keys.c b/drivers/infiniband/hw/qib/qib_keys.c
index eaf139a33b2e..95b8b9110fc6 100644
--- a/drivers/infiniband/hw/qib/qib_keys.c
+++ b/drivers/infiniband/hw/qib/qib_keys.c
@@ -390,3 +390,59 @@ bail:
spin_unlock_irqrestore(&rkt->lock, flags);
return ret;
}
+
+/*
+ * Initialize the memory region specified by the work request.
+ */
+int qib_reg_mr(struct qib_qp *qp, struct ib_reg_wr *wr)
+{
+ struct qib_lkey_table *rkt = &to_idev(qp->ibqp.device)->lk_table;
+ struct qib_pd *pd = to_ipd(qp->ibqp.pd);
+ struct qib_mr *mr = to_imr(wr->mr);
+ struct qib_mregion *mrg;
+ u32 key = wr->key;
+ unsigned i, n, m;
+ int ret = -EINVAL;
+ unsigned long flags;
+ u64 *page_list;
+ size_t ps;
+
+ spin_lock_irqsave(&rkt->lock, flags);
+ if (pd->user || key == 0)
+ goto bail;
+
+ mrg = rcu_dereference_protected(
+ rkt->table[(key >> (32 - ib_qib_lkey_table_size))],
+ lockdep_is_held(&rkt->lock));
+ if (unlikely(mrg == NULL || qp->ibqp.pd != mrg->pd))
+ goto bail;
+
+ if (mr->npages > mrg->max_segs)
+ goto bail;
+
+ ps = mr->ibmr.page_size;
+ if (mr->ibmr.length > ps * mr->npages)
+ goto bail;
+
+ mrg->user_base = mr->ibmr.iova;
+ mrg->iova = mr->ibmr.iova;
+ mrg->lkey = key;
+ mrg->length = mr->ibmr.length;
+ mrg->access_flags = wr->access;
+ page_list = mr->pages;
+ m = 0;
+ n = 0;
+ for (i = 0; i < mr->npages; i++) {
+ mrg->map[m]->segs[n].vaddr = (void *) page_list[i];
+ mrg->map[m]->segs[n].length = ps;
+ if (++n == QIB_SEGSZ) {
+ m++;
+ n = 0;
+ }
+ }
+
+ ret = 0;
+bail:
+ spin_unlock_irqrestore(&rkt->lock, flags);
+ return ret;
+}
diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c
index 19220dcb9a3b..0fa4b0de8074 100644
--- a/drivers/infiniband/hw/qib/qib_mr.c
+++ b/drivers/infiniband/hw/qib/qib_mr.c
@@ -303,6 +303,7 @@ int qib_dereg_mr(struct ib_mr *ibmr)
int ret = 0;
unsigned long timeout;

+ kfree(mr->pages);
qib_free_lkey(&mr->mr);

qib_put_mr(&mr->mr); /* will set completion if last */
@@ -340,7 +341,38 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
if (IS_ERR(mr))
return (struct ib_mr *)mr;

+ mr->pages = kcalloc(max_num_sg, sizeof(u64), GFP_KERNEL);
+ if (!mr->pages)
+ goto err;
+
return &mr->ibmr;
+
+err:
+ qib_dereg_mr(&mr->ibmr);
+ return ERR_PTR(-ENOMEM);
+}
+
+static int qib_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct qib_mr *mr = to_imr(ibmr);
+
+ if (unlikely(mr->npages == mr->mr.max_segs))
+ return -ENOMEM;
+
+ mr->pages[mr->npages++] = addr;
+
+ return 0;
+}
+
+int qib_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct qib_mr *mr = to_imr(ibmr);
+
+ mr->npages = 0;
+
+ return ib_sg_to_pages(ibmr, sg, sg_nents, qib_set_page);
}

struct ib_fast_reg_page_list *
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index a6b0b098ff30..a1e53d7b662b 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -362,7 +362,10 @@ static int qib_post_one_send(struct qib_qp *qp, struct ib_send_wr *wr,
* undefined operations.
* Make sure buffer is large enough to hold the result for atomics.
*/
- if (wr->opcode == IB_WR_FAST_REG_MR) {
+ if (wr->opcode == IB_WR_REG_MR) {
+ if (qib_reg_mr(qp, reg_wr(wr)))
+ goto bail_inval;
+ } else if (wr->opcode == IB_WR_FAST_REG_MR) {
if (qib_fast_reg_mr(qp, wr))
goto bail_inval;
} else if (qp->ibqp.qp_type == IB_QPT_UC) {
@@ -401,6 +404,9 @@ static int qib_post_one_send(struct qib_qp *qp, struct ib_send_wr *wr,
if (qp->ibqp.qp_type != IB_QPT_UC &&
qp->ibqp.qp_type != IB_QPT_RC)
memcpy(&wqe->ud_wr, ud_wr(wr), sizeof(wqe->ud_wr));
+ else if (wr->opcode == IB_WR_REG_MR)
+ memcpy(&wqe->reg_wr, reg_wr(wr),
+ sizeof(wqe->reg_wr));
else if (wr->opcode == IB_WR_FAST_REG_MR)
memcpy(&wqe->fast_reg_wr, fast_reg_wr(wr),
sizeof(wqe->fast_reg_wr));
@@ -2260,6 +2266,7 @@ int qib_register_ib_device(struct qib_devdata *dd)
ibdev->reg_user_mr = qib_reg_user_mr;
ibdev->dereg_mr = qib_dereg_mr;
ibdev->alloc_mr = qib_alloc_mr;
+ ibdev->map_mr_sg = qib_map_mr_sg;
ibdev->alloc_fast_reg_page_list = qib_alloc_fast_reg_page_list;
ibdev->free_fast_reg_page_list = qib_free_fast_reg_page_list;
ibdev->alloc_fmr = qib_alloc_fmr;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index 8aa16851a5e6..dbc81c5761e3 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -330,6 +330,8 @@ struct qib_mr {
struct ib_mr ibmr;
struct ib_umem *umem;
struct qib_mregion mr; /* must be last */
+ u64 *pages;
+ u32 npages;
};

/*
@@ -341,6 +343,7 @@ struct qib_swqe {
union {
struct ib_send_wr wr; /* don't use wr.sg_list */
struct ib_ud_wr ud_wr;
+ struct ib_reg_wr reg_wr;
struct ib_fast_reg_wr fast_reg_wr;
struct ib_rdma_wr rdma_wr;
struct ib_atomic_wr atomic_wr;
@@ -1044,12 +1047,17 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_entries);

+int qib_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents);
+
struct ib_fast_reg_page_list *qib_alloc_fast_reg_page_list(
struct ib_device *ibdev, int page_list_len);

void qib_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl);

int qib_fast_reg_mr(struct qib_qp *qp, struct ib_send_wr *wr);
+int qib_reg_mr(struct qib_qp *qp, struct ib_reg_wr *wr);

struct ib_fmr *qib_alloc_fmr(struct ib_pd *pd, int mr_access_flags,
struct ib_fmr_attr *fmr_attr);
--
1.8.4.3


2015-10-08 07:52:02

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 24/26] IB/qib: Remove old FRWR API

No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/qib/qib_keys.c | 56 -----------------------------------
drivers/infiniband/hw/qib/qib_mr.c | 32 +-------------------
drivers/infiniband/hw/qib/qib_verbs.c | 8 -----
drivers/infiniband/hw/qib/qib_verbs.h | 7 -----
4 files changed, 1 insertion(+), 102 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_keys.c b/drivers/infiniband/hw/qib/qib_keys.c
index 95b8b9110fc6..d725c565518d 100644
--- a/drivers/infiniband/hw/qib/qib_keys.c
+++ b/drivers/infiniband/hw/qib/qib_keys.c
@@ -336,62 +336,6 @@ bail:
}

/*
- * Initialize the memory region specified by the work reqeust.
- */
-int qib_fast_reg_mr(struct qib_qp *qp, struct ib_send_wr *send_wr)
-{
- struct ib_fast_reg_wr *wr = fast_reg_wr(send_wr);
- struct qib_lkey_table *rkt = &to_idev(qp->ibqp.device)->lk_table;
- struct qib_pd *pd = to_ipd(qp->ibqp.pd);
- struct qib_mregion *mr;
- u32 rkey = wr->rkey;
- unsigned i, n, m;
- int ret = -EINVAL;
- unsigned long flags;
- u64 *page_list;
- size_t ps;
-
- spin_lock_irqsave(&rkt->lock, flags);
- if (pd->user || rkey == 0)
- goto bail;
-
- mr = rcu_dereference_protected(
- rkt->table[(rkey >> (32 - ib_qib_lkey_table_size))],
- lockdep_is_held(&rkt->lock));
- if (unlikely(mr == NULL || qp->ibqp.pd != mr->pd))
- goto bail;
-
- if (wr->page_list_len > mr->max_segs)
- goto bail;
-
- ps = 1UL << wr->page_shift;
- if (wr->length > ps * wr->page_list_len)
- goto bail;
-
- mr->user_base = wr->iova_start;
- mr->iova = wr->iova_start;
- mr->lkey = rkey;
- mr->length = wr->length;
- mr->access_flags = wr->access_flags;
- page_list = wr->page_list->page_list;
- m = 0;
- n = 0;
- for (i = 0; i < wr->page_list_len; i++) {
- mr->map[m]->segs[n].vaddr = (void *) page_list[i];
- mr->map[m]->segs[n].length = ps;
- if (++n == QIB_SEGSZ) {
- m++;
- n = 0;
- }
- }
-
- ret = 0;
-bail:
- spin_unlock_irqrestore(&rkt->lock, flags);
- return ret;
-}
-
-/*
* Initialize the memory region specified by the work request.
*/
int qib_reg_mr(struct qib_qp *qp, struct ib_reg_wr *wr)
diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c
index 0fa4b0de8074..73f78c0f9522 100644
--- a/drivers/infiniband/hw/qib/qib_mr.c
+++ b/drivers/infiniband/hw/qib/qib_mr.c
@@ -324,7 +324,7 @@ out:

/*
* Allocate a memory region usable with the
- * IB_WR_FAST_REG_MR send work request.
+ * IB_WR_REG_MR send work request.
*
* Return the memory region on success, otherwise return an errno.
*/
@@ -375,36 +375,6 @@ int qib_map_mr_sg(struct ib_mr *ibmr,
return ib_sg_to_pages(ibmr, sg, sg_nents, qib_set_page);
}

-struct ib_fast_reg_page_list *
-qib_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len)
-{
- unsigned size = page_list_len * sizeof(u64);
- struct ib_fast_reg_page_list *pl;
-
- if (size > PAGE_SIZE)
- return ERR_PTR(-EINVAL);
-
- pl = kzalloc(sizeof(*pl), GFP_KERNEL);
- if (!pl)
- return ERR_PTR(-ENOMEM);
-
- pl->page_list = kzalloc(size, GFP_KERNEL);
- if (!pl->page_list)
- goto err_free;
-
- return pl;
-
-err_free:
- kfree(pl);
- return ERR_PTR(-ENOMEM);
-}
-
-void qib_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl)
-{
- kfree(pl->page_list);
- kfree(pl);
-}
-
/**
* qib_alloc_fmr - allocate a fast memory region
* @pd: the protection domain for this memory region
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index a1e53d7b662b..de6cb6fcda8d 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -365,9 +365,6 @@ static int qib_post_one_send(struct qib_qp *qp, struct ib_send_wr *wr,
if (wr->opcode == IB_WR_REG_MR) {
if (qib_reg_mr(qp, reg_wr(wr)))
goto bail_inval;
- } else if (wr->opcode == IB_WR_FAST_REG_MR) {
- if (qib_fast_reg_mr(qp, wr))
- goto bail_inval;
} else if (qp->ibqp.qp_type == IB_QPT_UC) {
if ((unsigned) wr->opcode >= IB_WR_RDMA_READ)
goto bail_inval;
@@ -407,9 +404,6 @@ static int qib_post_one_send(struct qib_qp *qp, struct ib_send_wr *wr,
else if (wr->opcode == IB_WR_REG_MR)
memcpy(&wqe->reg_wr, reg_wr(wr),
sizeof(wqe->reg_wr));
- else if (wr->opcode == IB_WR_FAST_REG_MR)
- memcpy(&wqe->fast_reg_wr, fast_reg_wr(wr),
- sizeof(wqe->fast_reg_wr));
else if (wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM ||
wr->opcode == IB_WR_RDMA_WRITE ||
wr->opcode == IB_WR_RDMA_READ)
@@ -2267,8 +2261,6 @@ int qib_register_ib_device(struct qib_devdata *dd)
ibdev->dereg_mr = qib_dereg_mr;
ibdev->alloc_mr = qib_alloc_mr;
ibdev->map_mr_sg = qib_map_mr_sg;
- ibdev->alloc_fast_reg_page_list = qib_alloc_fast_reg_page_list;
- ibdev->free_fast_reg_page_list = qib_free_fast_reg_page_list;
ibdev->alloc_fmr = qib_alloc_fmr;
ibdev->map_phys_fmr = qib_map_phys_fmr;
ibdev->unmap_fmr = qib_unmap_fmr;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index dbc81c5761e3..bb11fe56ac6e 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -344,7 +344,6 @@ struct qib_swqe {
struct ib_send_wr wr; /* don't use wr.sg_list */
struct ib_ud_wr ud_wr;
struct ib_reg_wr reg_wr;
- struct ib_fast_reg_wr fast_reg_wr;
struct ib_rdma_wr rdma_wr;
struct ib_atomic_wr atomic_wr;
};
@@ -1051,12 +1050,6 @@ int qib_map_mr_sg(struct ib_mr *ibmr,
struct scatterlist *sg,
unsigned int sg_nents);

-struct ib_fast_reg_page_list *qib_alloc_fast_reg_page_list(
- struct ib_device *ibdev, int page_list_len);
-
-void qib_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl);
-
-int qib_fast_reg_mr(struct qib_qp *qp, struct ib_send_wr *wr);
int qib_reg_mr(struct qib_qp *qp, struct ib_reg_wr *wr);

struct ib_fmr *qib_alloc_fmr(struct ib_pd *pd, int mr_access_flags,
--
1.8.4.3


2015-10-08 07:52:18

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 19/26] IB/mlx5: Remove old FRWR API support

No ULP uses it anymore, go ahead and remove it.
Keep only the local invalidate part of the handlers.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/mlx5/cq.c | 3 --
drivers/infiniband/hw/mlx5/main.c | 2 -
drivers/infiniband/hw/mlx5/mlx5_ib.h | 14 ------
drivers/infiniband/hw/mlx5/mr.c | 42 ----------------
drivers/infiniband/hw/mlx5/qp.c | 97 ++++--------------------------------
5 files changed, 9 insertions(+), 149 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 206930096d56..3dfd287256d6 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -112,9 +112,6 @@ static enum ib_wc_opcode get_umr_comp(struct mlx5_ib_wq *wq, int idx)
case IB_WR_REG_MR:
return IB_WC_REG_MR;

- case IB_WR_FAST_REG_MR:
- return IB_WC_FAST_REG_MR;
-
default:
pr_warn("unknown completion status\n");
return 0;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 7e93044ea6ce..bdd60a69be2d 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1426,8 +1426,6 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
dev->ib_dev.process_mad = mlx5_ib_process_mad;
dev->ib_dev.alloc_mr = mlx5_ib_alloc_mr;
dev->ib_dev.map_mr_sg = mlx5_ib_map_mr_sg;
- dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
- dev->ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list;
dev->ib_dev.check_mr_status = mlx5_ib_check_mr_status;
dev->ib_dev.get_port_immutable = mlx5_port_immutable;

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 72672ae48296..86bf036887d0 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -338,12 +338,6 @@ struct mlx5_ib_mr {
void *descs_alloc;
};

-struct mlx5_ib_fast_reg_page_list {
- struct ib_fast_reg_page_list ibfrpl;
- __be64 *mapped_page_list;
- dma_addr_t map;
-};
-
struct mlx5_ib_umr_context {
enum ib_wc_status status;
struct completion done;
@@ -494,11 +488,6 @@ static inline struct mlx5_ib_mr *to_mmr(struct ib_mr *ibmr)
return container_of(ibmr, struct mlx5_ib_mr, ibmr);
}

-static inline struct mlx5_ib_fast_reg_page_list *to_mfrpl(struct ib_fast_reg_page_list *ibfrpl)
-{
- return container_of(ibfrpl, struct mlx5_ib_fast_reg_page_list, ibfrpl);
-}
-
struct mlx5_ib_ah {
struct ib_ah ibah;
struct mlx5_av av;
@@ -569,9 +558,6 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
struct scatterlist *sg,
unsigned int sg_nents);
-struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
- int page_list_len);
-void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
const struct ib_wc *in_wc, const struct ib_grh *in_grh,
const struct ib_mad_hdr *in, size_t in_mad_size,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index cabfe4190657..277499c4f156 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1383,48 +1383,6 @@ err_free:
return ERR_PTR(err);
}

-struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
- int page_list_len)
-{
- struct mlx5_ib_fast_reg_page_list *mfrpl;
- int size = page_list_len * sizeof(u64);
-
- mfrpl = kmalloc(sizeof(*mfrpl), GFP_KERNEL);
- if (!mfrpl)
- return ERR_PTR(-ENOMEM);
-
- mfrpl->ibfrpl.page_list = kmalloc(size, GFP_KERNEL);
- if (!mfrpl->ibfrpl.page_list)
- goto err_free;
-
- mfrpl->mapped_page_list = dma_alloc_coherent(ibdev->dma_device,
- size, &mfrpl->map,
- GFP_KERNEL);
- if (!mfrpl->mapped_page_list)
- goto err_free;
-
- WARN_ON(mfrpl->map & 0x3f);
-
- return &mfrpl->ibfrpl;
-
-err_free:
- kfree(mfrpl->ibfrpl.page_list);
- kfree(mfrpl);
- return ERR_PTR(-ENOMEM);
-}
-
-void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list)
-{
- struct mlx5_ib_fast_reg_page_list *mfrpl = to_mfrpl(page_list);
- struct mlx5_ib_dev *dev = to_mdev(page_list->device);
- int size = page_list->max_page_list_len * sizeof(u64);
-
- dma_free_coherent(&dev->mdev->pdev->dev, size, mfrpl->mapped_page_list,
- mfrpl->map);
- kfree(mfrpl->ibfrpl.page_list);
- kfree(mfrpl);
-}
-
int mlx5_ib_check_mr_status(struct ib_mr *ibmr, u32 check_mask,
struct ib_mr_status *mr_status)
{
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index a71e43a4f028..8378d6a9221c 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -64,7 +64,6 @@ static const u32 mlx5_ib_opcode[] = {
[IB_WR_ATOMIC_FETCH_AND_ADD] = MLX5_OPCODE_ATOMIC_FA,
[IB_WR_SEND_WITH_INV] = MLX5_OPCODE_SEND_INVAL,
[IB_WR_LOCAL_INV] = MLX5_OPCODE_UMR,
- [IB_WR_FAST_REG_MR] = MLX5_OPCODE_UMR,
[IB_WR_REG_MR] = MLX5_OPCODE_UMR,
[IB_WR_MASKED_ATOMIC_CMP_AND_SWP] = MLX5_OPCODE_ATOMIC_MASKED_CS,
[IB_WR_MASKED_ATOMIC_FETCH_AND_ADD] = MLX5_OPCODE_ATOMIC_MASKED_FA,
@@ -1908,20 +1907,11 @@ static void set_reg_umr_seg(struct mlx5_wqe_umr_ctrl_seg *umr,
umr->mkey_mask = frwr_mkey_mask();
}

-static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr,
- struct ib_send_wr *wr, int li)
+static void set_linv_umr_seg(struct mlx5_wqe_umr_ctrl_seg *umr)
{
memset(umr, 0, sizeof(*umr));
-
- if (li) {
- umr->mkey_mask = cpu_to_be64(MLX5_MKEY_MASK_FREE);
- umr->flags = 1 << 7;
- return;
- }
-
- umr->flags = (1 << 5); /* fail if not free */
- umr->klm_octowords = get_klm_octo(fast_reg_wr(wr)->page_list_len);
- umr->mkey_mask = frwr_mkey_mask();
+ umr->mkey_mask = cpu_to_be64(MLX5_MKEY_MASK_FREE);
+ umr->flags = 1 << 7;
}

static __be64 get_umr_reg_mr_mask(void)
@@ -2015,24 +2005,10 @@ static void set_reg_mkey_seg(struct mlx5_mkey_seg *seg,
seg->log2_page_size = ilog2(mr->ibmr.page_size);
}

-static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr,
- int li, int *writ)
+static void set_linv_mkey_seg(struct mlx5_mkey_seg *seg)
{
memset(seg, 0, sizeof(*seg));
- if (li) {
- seg->status = MLX5_MKEY_STATUS_FREE;
- return;
- }
-
- seg->flags = get_umr_flags(fast_reg_wr(wr)->access_flags) |
- MLX5_ACCESS_MODE_MTT;
- *writ = seg->flags & (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE);
- seg->qpn_mkey7_0 = cpu_to_be32((fast_reg_wr(wr)->rkey & 0xff) | 0xffffff00);
- seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
- seg->start_addr = cpu_to_be64(fast_reg_wr(wr)->iova_start);
- seg->len = cpu_to_be64(fast_reg_wr(wr)->length);
- seg->xlt_oct_size = cpu_to_be32((fast_reg_wr(wr)->page_list_len + 1) / 2);
- seg->log2_page_size = fast_reg_wr(wr)->page_shift;
+ seg->status = MLX5_MKEY_STATUS_FREE;
}

static void set_reg_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr)
@@ -2067,24 +2043,6 @@ static void set_reg_data_seg(struct mlx5_wqe_data_seg *dseg,
dseg->lkey = cpu_to_be32(pd->ibpd.local_dma_lkey);
}

-static void set_frwr_pages(struct mlx5_wqe_data_seg *dseg,
- struct ib_send_wr *wr,
- struct mlx5_core_dev *mdev,
- struct mlx5_ib_pd *pd,
- int writ)
-{
- struct mlx5_ib_fast_reg_page_list *mfrpl = to_mfrpl(fast_reg_wr(wr)->page_list);
- u64 *page_list = fast_reg_wr(wr)->page_list->page_list;
- u64 perm = MLX5_EN_RD | (writ ? MLX5_EN_WR : 0);
- int i;
-
- for (i = 0; i < fast_reg_wr(wr)->page_list_len; i++)
- mfrpl->mapped_page_list[i] = cpu_to_be64(page_list[i] | perm);
- dseg->addr = cpu_to_be64(mfrpl->map);
- dseg->byte_count = cpu_to_be32(ALIGN(sizeof(u64) * fast_reg_wr(wr)->page_list_len, 64));
- dseg->lkey = cpu_to_be32(pd->ibpd.local_dma_lkey);
-}
-
static __be32 send_ieth(struct ib_send_wr *wr)
{
switch (wr->opcode) {
@@ -2504,36 +2462,18 @@ static int set_reg_wr(struct mlx5_ib_qp *qp,
return 0;
}

-static int set_frwr_li_wr(void **seg, struct ib_send_wr *wr, int *size,
- struct mlx5_core_dev *mdev, struct mlx5_ib_pd *pd, struct mlx5_ib_qp *qp)
+static void set_linv_wr(struct mlx5_ib_qp *qp, void **seg, int *size)
{
- int writ = 0;
- int li;
-
- li = wr->opcode == IB_WR_LOCAL_INV ? 1 : 0;
- if (unlikely(wr->send_flags & IB_SEND_INLINE))
- return -EINVAL;
-
- set_frwr_umr_segment(*seg, wr, li);
+ set_linv_umr_seg(*seg);
*seg += sizeof(struct mlx5_wqe_umr_ctrl_seg);
*size += sizeof(struct mlx5_wqe_umr_ctrl_seg) / 16;
if (unlikely((*seg == qp->sq.qend)))
*seg = mlx5_get_send_wqe(qp, 0);
- set_mkey_segment(*seg, wr, li, &writ);
+ set_linv_mkey_seg(*seg);
*seg += sizeof(struct mlx5_mkey_seg);
*size += sizeof(struct mlx5_mkey_seg) / 16;
if (unlikely((*seg == qp->sq.qend)))
*seg = mlx5_get_send_wqe(qp, 0);
- if (!li) {
- if (unlikely(fast_reg_wr(wr)->page_list_len >
- fast_reg_wr(wr)->page_list->max_page_list_len))
- return -ENOMEM;
-
- set_frwr_pages(*seg, wr, mdev, pd, writ);
- *seg += sizeof(struct mlx5_wqe_data_seg);
- *size += (sizeof(struct mlx5_wqe_data_seg) / 16);
- }
- return 0;
}

static void dump_wqe(struct mlx5_ib_qp *qp, int idx, int size_16)
@@ -2649,7 +2589,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
{
struct mlx5_wqe_ctrl_seg *ctrl = NULL; /* compiler warning */
struct mlx5_ib_dev *dev = to_mdev(ibqp->device);
- struct mlx5_core_dev *mdev = dev->mdev;
struct mlx5_ib_qp *qp = to_mqp(ibqp);
struct mlx5_ib_mr *mr;
struct mlx5_wqe_data_seg *dpseg;
@@ -2725,25 +2664,7 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL;
qp->sq.wr_data[idx] = IB_WR_LOCAL_INV;
ctrl->imm = cpu_to_be32(wr->ex.invalidate_rkey);
- err = set_frwr_li_wr(&seg, wr, &size, mdev, to_mpd(ibqp->pd), qp);
- if (err) {
- mlx5_ib_warn(dev, "\n");
- *bad_wr = wr;
- goto out;
- }
- num_sge = 0;
- break;
-
- case IB_WR_FAST_REG_MR:
- next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL;
- qp->sq.wr_data[idx] = IB_WR_FAST_REG_MR;
- ctrl->imm = cpu_to_be32(fast_reg_wr(wr)->rkey);
- err = set_frwr_li_wr(&seg, wr, &size, mdev, to_mpd(ibqp->pd), qp);
- if (err) {
- mlx5_ib_warn(dev, "\n");
- *bad_wr = wr;
- goto out;
- }
+ set_linv_wr(qp, &seg, &size);
num_sge = 0;
break;

--
1.8.4.3


2015-10-08 07:52:02

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 14/26] RDS/IW: Convert to new memory registration API

Get rid of fast_reg page list and its construction.
Instead, just pass the RDS sg list to ib_map_mr_sg
and post the new ib_reg_wr.

This is done both for server IW RDMA_READ registration
and the client remote key registration.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
---
net/rds/iw.h | 5 +--
net/rds/iw_rdma.c | 128 +++++++++++++++++++-----------------------------------
net/rds/iw_send.c | 57 ++++++++++++------------
3 files changed, 75 insertions(+), 115 deletions(-)

diff --git a/net/rds/iw.h b/net/rds/iw.h
index fe858e5dd8d1..5af01d1758b3 100644
--- a/net/rds/iw.h
+++ b/net/rds/iw.h
@@ -74,13 +74,12 @@ struct rds_iw_send_work {
struct rm_rdma_op *s_op;
struct rds_iw_mapping *s_mapping;
struct ib_mr *s_mr;
- struct ib_fast_reg_page_list *s_page_list;
unsigned char s_remap_count;

union {
struct ib_send_wr s_send_wr;
struct ib_rdma_wr s_rdma_wr;
- struct ib_fast_reg_wr s_fast_reg_wr;
+ struct ib_reg_wr s_reg_wr;
};
struct ib_sge s_sge[RDS_IW_MAX_SGE];
unsigned long s_queued;
@@ -199,7 +198,7 @@ struct rds_iw_device {

/* Magic WR_ID for ACKs */
#define RDS_IW_ACK_WR_ID ((u64)0xffffffffffffffffULL)
-#define RDS_IW_FAST_REG_WR_ID ((u64)0xefefefefefefefefULL)
+#define RDS_IW_REG_WR_ID ((u64)0xefefefefefefefefULL)
#define RDS_IW_LOCAL_INV_WR_ID ((u64)0xdfdfdfdfdfdfdfdfULL)

struct rds_iw_statistics {
diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
index f8a612cc69e6..47bd68451ff7 100644
--- a/net/rds/iw_rdma.c
+++ b/net/rds/iw_rdma.c
@@ -47,7 +47,6 @@ struct rds_iw_mr {
struct rdma_cm_id *cm_id;

struct ib_mr *mr;
- struct ib_fast_reg_page_list *page_list;

struct rds_iw_mapping mapping;
unsigned char remap_count;
@@ -77,8 +76,8 @@ struct rds_iw_mr_pool {

static int rds_iw_flush_mr_pool(struct rds_iw_mr_pool *pool, int free_all);
static void rds_iw_mr_pool_flush_worker(struct work_struct *work);
-static int rds_iw_init_fastreg(struct rds_iw_mr_pool *pool, struct rds_iw_mr *ibmr);
-static int rds_iw_map_fastreg(struct rds_iw_mr_pool *pool,
+static int rds_iw_init_reg(struct rds_iw_mr_pool *pool, struct rds_iw_mr *ibmr);
+static int rds_iw_map_reg(struct rds_iw_mr_pool *pool,
struct rds_iw_mr *ibmr,
struct scatterlist *sg, unsigned int nents);
static void rds_iw_free_fastreg(struct rds_iw_mr_pool *pool, struct rds_iw_mr *ibmr);
@@ -258,19 +257,18 @@ static void rds_iw_set_scatterlist(struct rds_iw_scatterlist *sg,
sg->bytes = 0;
}

-static u64 *rds_iw_map_scatterlist(struct rds_iw_device *rds_iwdev,
- struct rds_iw_scatterlist *sg)
+static int rds_iw_map_scatterlist(struct rds_iw_device *rds_iwdev,
+ struct rds_iw_scatterlist *sg)
{
struct ib_device *dev = rds_iwdev->dev;
- u64 *dma_pages = NULL;
- int i, j, ret;
+ int i, ret;

WARN_ON(sg->dma_len);

sg->dma_len = ib_dma_map_sg(dev, sg->list, sg->len, DMA_BIDIRECTIONAL);
if (unlikely(!sg->dma_len)) {
printk(KERN_WARNING "RDS/IW: dma_map_sg failed!\n");
- return ERR_PTR(-EBUSY);
+ return -EBUSY;
}

sg->bytes = 0;
@@ -303,31 +301,14 @@ static u64 *rds_iw_map_scatterlist(struct rds_iw_device *rds_iwdev,
if (sg->dma_npages > fastreg_message_size)
goto out_unmap;

- dma_pages = kmalloc(sizeof(u64) * sg->dma_npages, GFP_ATOMIC);
- if (!dma_pages) {
- ret = -ENOMEM;
- goto out_unmap;
- }

- for (i = j = 0; i < sg->dma_len; ++i) {
- unsigned int dma_len = ib_sg_dma_len(dev, &sg->list[i]);
- u64 dma_addr = ib_sg_dma_address(dev, &sg->list[i]);
- u64 end_addr;

- end_addr = dma_addr + dma_len;
- dma_addr &= ~PAGE_MASK;
- for (; dma_addr < end_addr; dma_addr += PAGE_SIZE)
- dma_pages[j++] = dma_addr;
- BUG_ON(j > sg->dma_npages);
- }
-
- return dma_pages;
+ return 0;

out_unmap:
ib_dma_unmap_sg(rds_iwdev->dev, sg->list, sg->len, DMA_BIDIRECTIONAL);
sg->dma_len = 0;
- kfree(dma_pages);
- return ERR_PTR(ret);
+ return ret;
}


@@ -440,7 +421,7 @@ static struct rds_iw_mr *rds_iw_alloc_mr(struct rds_iw_device *rds_iwdev)
INIT_LIST_HEAD(&ibmr->mapping.m_list);
ibmr->mapping.m_mr = ibmr;

- err = rds_iw_init_fastreg(pool, ibmr);
+ err = rds_iw_init_reg(pool, ibmr);
if (err)
goto out_no_cigar;

@@ -622,7 +603,7 @@ void *rds_iw_get_mr(struct scatterlist *sg, unsigned long nents,
ibmr->cm_id = cm_id;
ibmr->device = rds_iwdev;

- ret = rds_iw_map_fastreg(rds_iwdev->mr_pool, ibmr, sg, nents);
+ ret = rds_iw_map_reg(rds_iwdev->mr_pool, ibmr, sg, nents);
if (ret == 0)
*key_ret = ibmr->mr->rkey;
else
@@ -638,7 +619,7 @@ out:
}

/*
- * iWARP fastreg handling
+ * iWARP reg handling
*
* The life cycle of a fastreg registration is a bit different from
* FMRs.
@@ -650,7 +631,7 @@ out:
* This creates a bit of a problem for us, as we do not have the destination
* IP in GET_MR, so the connection must be setup prior to the GET_MR call for
* RDMA to be correctly setup. If a fastreg request is present, rds_iw_xmit
- * will try to queue a LOCAL_INV (if needed) and a FAST_REG_MR work request
+ * will try to queue a LOCAL_INV (if needed) and a REG_MR work request
* before queuing the SEND. When completions for these arrive, they are
* dispatched to the MR has a bit set showing that RDMa can be performed.
*
@@ -659,11 +640,10 @@ out:
* The expectation there is that this invalidation step includes ALL
* PREVIOUSLY FREED MRs.
*/
-static int rds_iw_init_fastreg(struct rds_iw_mr_pool *pool,
- struct rds_iw_mr *ibmr)
+static int rds_iw_init_reg(struct rds_iw_mr_pool *pool,
+ struct rds_iw_mr *ibmr)
{
struct rds_iw_device *rds_iwdev = pool->device;
- struct ib_fast_reg_page_list *page_list = NULL;
struct ib_mr *mr;
int err;

@@ -676,56 +656,44 @@ static int rds_iw_init_fastreg(struct rds_iw_mr_pool *pool,
return err;
}

- /* FIXME - this is overkill, but mapping->m_sg.dma_len/mapping->m_sg.dma_npages
- * is not filled in.
- */
- page_list = ib_alloc_fast_reg_page_list(rds_iwdev->dev, pool->max_message_size);
- if (IS_ERR(page_list)) {
- err = PTR_ERR(page_list);
-
- printk(KERN_WARNING "RDS/IW: ib_alloc_fast_reg_page_list failed (err=%d)\n", err);
- ib_dereg_mr(mr);
- return err;
- }
-
- ibmr->page_list = page_list;
ibmr->mr = mr;
return 0;
}

-static int rds_iw_rdma_build_fastreg(struct rds_iw_mapping *mapping)
+static int rds_iw_rdma_reg_mr(struct rds_iw_mapping *mapping)
{
struct rds_iw_mr *ibmr = mapping->m_mr;
- struct ib_fast_reg_wr f_wr;
+ struct rds_iw_scatterlist *m_sg = &mapping->m_sg;
+ struct ib_reg_wr reg_wr;
struct ib_send_wr *failed_wr;
- int ret;
+ int ret, n;
+
+ n = ib_map_mr_sg_zbva(ibmr->mr, m_sg->list, m_sg->len, PAGE_SIZE);
+ if (unlikely(n != m_sg->len))
+ return n < 0 ? n : -EINVAL;
+
+ reg_wr.wr.next = NULL;
+ reg_wr.wr.opcode = IB_WR_REG_MR;
+ reg_wr.wr.wr_id = RDS_IW_REG_WR_ID;
+ reg_wr.wr.num_sge = 0;
+ reg_wr.mr = ibmr->mr;
+ reg_wr.key = mapping->m_rkey;
+ reg_wr.access = IB_ACCESS_LOCAL_WRITE |
+ IB_ACCESS_REMOTE_READ |
+ IB_ACCESS_REMOTE_WRITE;

/*
- * Perform a WR for the fast_reg_mr. Each individual page
+ * Perform a WR for the reg_mr. Each individual page
* in the sg list is added to the fast reg page list and placed
- * inside the fast_reg_mr WR. The key used is a rolling 8bit
+ * inside the reg_mr WR. The key used is a rolling 8bit
* counter, which should guarantee uniqueness.
*/
ib_update_fast_reg_key(ibmr->mr, ibmr->remap_count++);
mapping->m_rkey = ibmr->mr->rkey;

- memset(&f_wr, 0, sizeof(f_wr));
- f_wr.wr.wr_id = RDS_IW_FAST_REG_WR_ID;
- f_wr.wr.opcode = IB_WR_FAST_REG_MR;
- f_wr.length = mapping->m_sg.bytes;
- f_wr.rkey = mapping->m_rkey;
- f_wr.page_list = ibmr->page_list;
- f_wr.page_list_len = mapping->m_sg.dma_len;
- f_wr.page_shift = PAGE_SHIFT;
- f_wr.access_flags = IB_ACCESS_LOCAL_WRITE |
- IB_ACCESS_REMOTE_READ |
- IB_ACCESS_REMOTE_WRITE;
- f_wr.iova_start = 0;
- f_wr.wr.send_flags = IB_SEND_SIGNALED;
-
- failed_wr = &f_wr.wr;
- ret = ib_post_send(ibmr->cm_id->qp, &f_wr.wr, &failed_wr);
- BUG_ON(failed_wr != &f_wr.wr);
+ failed_wr = &reg_wr.wr;
+ ret = ib_post_send(ibmr->cm_id->qp, &reg_wr.wr, &failed_wr);
+ BUG_ON(failed_wr != &reg_wr.wr);
if (ret)
printk_ratelimited(KERN_WARNING "RDS/IW: %s:%d ib_post_send returned %d\n",
__func__, __LINE__, ret);
@@ -757,21 +725,20 @@ out:
return ret;
}

-static int rds_iw_map_fastreg(struct rds_iw_mr_pool *pool,
- struct rds_iw_mr *ibmr,
- struct scatterlist *sg,
- unsigned int sg_len)
+static int rds_iw_map_reg(struct rds_iw_mr_pool *pool,
+ struct rds_iw_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_len)
{
struct rds_iw_device *rds_iwdev = pool->device;
struct rds_iw_mapping *mapping = &ibmr->mapping;
u64 *dma_pages;
- int i, ret = 0;
+ int ret = 0;

rds_iw_set_scatterlist(&mapping->m_sg, sg, sg_len);

- dma_pages = rds_iw_map_scatterlist(rds_iwdev, &mapping->m_sg);
- if (IS_ERR(dma_pages)) {
- ret = PTR_ERR(dma_pages);
+ ret = rds_iw_map_scatterlist(rds_iwdev, &mapping->m_sg);
+ if (ret) {
dma_pages = NULL;
goto out;
}
@@ -781,10 +748,7 @@ static int rds_iw_map_fastreg(struct rds_iw_mr_pool *pool,
goto out;
}

- for (i = 0; i < mapping->m_sg.dma_npages; ++i)
- ibmr->page_list->page_list[i] = dma_pages[i];
-
- ret = rds_iw_rdma_build_fastreg(mapping);
+ ret = rds_iw_rdma_reg_mr(mapping);
if (ret)
goto out;

@@ -870,8 +834,6 @@ static unsigned int rds_iw_unmap_fastreg_list(struct rds_iw_mr_pool *pool,
static void rds_iw_destroy_fastreg(struct rds_iw_mr_pool *pool,
struct rds_iw_mr *ibmr)
{
- if (ibmr->page_list)
- ib_free_fast_reg_page_list(ibmr->page_list);
if (ibmr->mr)
ib_dereg_mr(ibmr->mr);
}
diff --git a/net/rds/iw_send.c b/net/rds/iw_send.c
index f6e23c515b44..e20bd503f4bd 100644
--- a/net/rds/iw_send.c
+++ b/net/rds/iw_send.c
@@ -159,13 +159,6 @@ void rds_iw_send_init_ring(struct rds_iw_connection *ic)
printk(KERN_WARNING "RDS/IW: ib_alloc_mr failed\n");
break;
}
-
- send->s_page_list = ib_alloc_fast_reg_page_list(
- ic->i_cm_id->device, fastreg_message_size);
- if (IS_ERR(send->s_page_list)) {
- printk(KERN_WARNING "RDS/IW: ib_alloc_fast_reg_page_list failed\n");
- break;
- }
}
}

@@ -177,8 +170,6 @@ void rds_iw_send_clear_ring(struct rds_iw_connection *ic)
for (i = 0, send = ic->i_sends; i < ic->i_send_ring.w_nr; i++, send++) {
BUG_ON(!send->s_mr);
ib_dereg_mr(send->s_mr);
- BUG_ON(!send->s_page_list);
- ib_free_fast_reg_page_list(send->s_page_list);
if (send->s_send_wr.opcode == 0xdead)
continue;
if (send->s_rm)
@@ -227,7 +218,7 @@ void rds_iw_send_cq_comp_handler(struct ib_cq *cq, void *context)
continue;
}

- if (wc.opcode == IB_WC_FAST_REG_MR && wc.wr_id == RDS_IW_FAST_REG_WR_ID) {
+ if (wc.opcode == IB_WC_REG_MR && wc.wr_id == RDS_IW_REG_WR_ID) {
ic->i_fastreg_posted = 1;
continue;
}
@@ -252,7 +243,7 @@ void rds_iw_send_cq_comp_handler(struct ib_cq *cq, void *context)
if (send->s_rm)
rds_iw_send_unmap_rm(ic, send, wc.status);
break;
- case IB_WR_FAST_REG_MR:
+ case IB_WR_REG_MR:
case IB_WR_RDMA_WRITE:
case IB_WR_RDMA_READ:
case IB_WR_RDMA_READ_WITH_INV:
@@ -770,24 +761,26 @@ out:
return ret;
}

-static void rds_iw_build_send_fastreg(struct rds_iw_device *rds_iwdev, struct rds_iw_connection *ic, struct rds_iw_send_work *send, int nent, int len, u64 sg_addr)
+static int rds_iw_build_send_reg(struct rds_iw_send_work *send,
+ struct scatterlist *sg,
+ int sg_nents)
{
- BUG_ON(nent > send->s_page_list->max_page_list_len);
- /*
- * Perform a WR for the fast_reg_mr. Each individual page
- * in the sg list is added to the fast reg page list and placed
- * inside the fast_reg_mr WR.
- */
- send->s_fast_reg_wr.wr.opcode = IB_WR_FAST_REG_MR;
- send->s_fast_reg_wr.length = len;
- send->s_fast_reg_wr.rkey = send->s_mr->rkey;
- send->s_fast_reg_wr.page_list = send->s_page_list;
- send->s_fast_reg_wr.page_list_len = nent;
- send->s_fast_reg_wr.page_shift = PAGE_SHIFT;
- send->s_fast_reg_wr.access_flags = IB_ACCESS_REMOTE_WRITE;
- send->s_fast_reg_wr.iova_start = sg_addr;
+ int n;
+
+ n = ib_map_mr_sg(send->s_mr, sg, sg_nents, PAGE_SIZE);
+ if (unlikely(n != sg_nents))
+ return n < 0 ? n : -EINVAL;
+
+ send->s_reg_wr.wr.opcode = IB_WR_REG_MR;
+ send->s_reg_wr.wr.wr_id = 0;
+ send->s_reg_wr.wr.num_sge = 0;
+ send->s_reg_wr.mr = send->s_mr;
+ send->s_reg_wr.key = send->s_mr->rkey;
+ send->s_reg_wr.access = IB_ACCESS_REMOTE_WRITE;

ib_update_fast_reg_key(send->s_mr, send->s_remap_count++);
+
+ return 0;
}

int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
@@ -808,6 +801,7 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
int sent;
int ret;
int num_sge;
+ int sg_nents;

rds_iwdev = ib_get_client_data(ic->i_cm_id->device, &rds_iw_client);

@@ -861,6 +855,7 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
scat = &op->op_sg[0];
sent = 0;
num_sge = op->op_count;
+ sg_nents = 0;

for (i = 0; i < work_alloc && scat != &op->op_sg[op->op_count]; i++) {
send->s_rdma_wr.wr.send_flags = 0;
@@ -904,7 +899,7 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
len = ib_sg_dma_len(ic->i_cm_id->device, scat);

if (send->s_rdma_wr.wr.opcode == IB_WR_RDMA_READ_WITH_INV)
- send->s_page_list->page_list[j] = ib_sg_dma_address(ic->i_cm_id->device, scat);
+ sg_nents++;
else {
send->s_sge[j].addr = ib_sg_dma_address(ic->i_cm_id->device, scat);
send->s_sge[j].length = len;
@@ -951,8 +946,12 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
* fastreg_mr (or possibly a dma_mr)
*/
if (!op->op_write) {
- rds_iw_build_send_fastreg(rds_iwdev, ic, &ic->i_sends[fr_pos],
- op->op_count, sent, conn->c_xmit_rm->m_rs->rs_user_addr);
+ ret = rds_iw_build_send_reg(&ic->i_sends[fr_pos],
+ &op->op_sg[0], sg_nents);
+ if (ret) {
+ printk(KERN_WARNING "RDS/IW: failed to reg send mem\n");
+ goto out;
+ }
work_alloc++;
}

--
1.8.4.3


2015-10-08 07:52:19

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 15/26] IB/srp: Split srp_map_sg

This is a preparation patch for the new registration API
conversion. It splits srp_map_sg per registration strategy
(srp_map_sg[fmr|fr|dma]. On its own it adds some code duplication,
but it makes the API switch easier to comprehend.

Signed-off-by: Sagi Grimberg <[email protected]>
Tested-by: Bart Van Assche <[email protected]>
---
drivers/infiniband/ulp/srp/ib_srp.c | 157 ++++++++++++++++++++++++------------
1 file changed, 106 insertions(+), 51 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 1390f99ca76b..3ec94c109e1b 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1286,6 +1286,17 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
if (state->fmr.next >= state->fmr.end)
return -ENOMEM;

+ WARN_ON_ONCE(!dev->use_fmr);
+
+ if (state->npages == 0)
+ return 0;
+
+ if (state->npages == 1 && target->global_mr) {
+ srp_map_desc(state, state->base_dma_addr, state->dma_len,
+ target->global_mr->rkey);
+ goto reset_state;
+ }
+
fmr = ib_fmr_pool_map_phys(ch->fmr_pool, state->pages,
state->npages, io_addr);
if (IS_ERR(fmr))
@@ -1297,6 +1308,10 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
srp_map_desc(state, state->base_dma_addr & ~dev->mr_page_mask,
state->dma_len, fmr->fmr->rkey);

+reset_state:
+ state->npages = 0;
+ state->dma_len = 0;
+
return 0;
}

@@ -1309,10 +1324,22 @@ static int srp_map_finish_fr(struct srp_map_state *state,
struct ib_fast_reg_wr wr;
struct srp_fr_desc *desc;
u32 rkey;
+ int err;

if (state->fr.next >= state->fr.end)
return -ENOMEM;

+ WARN_ON_ONCE(!dev->use_fast_reg);
+
+ if (state->npages == 0)
+ return 0;
+
+ if (state->npages == 1 && target->global_mr) {
+ srp_map_desc(state, state->base_dma_addr, state->dma_len,
+ target->global_mr->rkey);
+ goto reset_state;
+ }
+
desc = srp_fr_pool_get(ch->fr_pool);
if (!desc)
return -ENOMEM;
@@ -1342,7 +1369,15 @@ static int srp_map_finish_fr(struct srp_map_state *state,
srp_map_desc(state, state->base_dma_addr, state->dma_len,
desc->mr->rkey);

- return ib_post_send(ch->qp, &wr.wr, &bad_wr);
+ err = ib_post_send(ch->qp, &wr.wr, &bad_wr);
+ if (err)
+ return err;
+
+reset_state:
+ state->npages = 0;
+ state->dma_len = 0;
+
+ return 0;
}

static int srp_finish_mapping(struct srp_map_state *state,
@@ -1350,26 +1385,9 @@ static int srp_finish_mapping(struct srp_map_state *state,
{
struct srp_target_port *target = ch->target;
struct srp_device *dev = target->srp_host->srp_dev;
- int ret = 0;

- WARN_ON_ONCE(!dev->use_fast_reg && !dev->use_fmr);
-
- if (state->npages == 0)
- return 0;
-
- if (state->npages == 1 && target->global_mr)
- srp_map_desc(state, state->base_dma_addr, state->dma_len,
- target->global_mr->rkey);
- else
- ret = dev->use_fast_reg ? srp_map_finish_fr(state, ch) :
- srp_map_finish_fmr(state, ch);
-
- if (ret == 0) {
- state->npages = 0;
- state->dma_len = 0;
- }
-
- return ret;
+ return dev->use_fast_reg ? srp_map_finish_fr(state, ch) :
+ srp_map_finish_fmr(state, ch);
}

static int srp_map_sg_entry(struct srp_map_state *state,
@@ -1415,47 +1433,79 @@ static int srp_map_sg_entry(struct srp_map_state *state,
return ret;
}

-static int srp_map_sg(struct srp_map_state *state, struct srp_rdma_ch *ch,
- struct srp_request *req, struct scatterlist *scat,
- int count)
+static int srp_map_sg_fmr(struct srp_map_state *state, struct srp_rdma_ch *ch,
+ struct srp_request *req, struct scatterlist *scat,
+ int count)
{
- struct srp_target_port *target = ch->target;
- struct srp_device *dev = target->srp_host->srp_dev;
struct scatterlist *sg;
int i, ret;

- state->desc = req->indirect_desc;
- state->pages = req->map_page;
- if (dev->use_fast_reg) {
- state->fr.next = req->fr_list;
- state->fr.end = req->fr_list + target->cmd_sg_cnt;
- } else if (dev->use_fmr) {
- state->fmr.next = req->fmr_list;
- state->fmr.end = req->fmr_list + target->cmd_sg_cnt;
+ state->desc = req->indirect_desc;
+ state->pages = req->map_page;
+ state->fmr.next = req->fmr_list;
+ state->fmr.end = req->fmr_list + ch->target->cmd_sg_cnt;
+
+ for_each_sg(scat, sg, count, i) {
+ ret = srp_map_sg_entry(state, ch, sg, i);
+ if (ret)
+ return ret;
}

- if (dev->use_fast_reg || dev->use_fmr) {
- for_each_sg(scat, sg, count, i) {
- ret = srp_map_sg_entry(state, ch, sg, i);
- if (ret)
- goto out;
- }
- ret = srp_finish_mapping(state, ch);
+ ret = srp_finish_mapping(state, ch);
+ if (ret)
+ return ret;
+
+ req->nmdesc = state->nmdesc;
+
+ return 0;
+}
+
+static int srp_map_sg_fr(struct srp_map_state *state, struct srp_rdma_ch *ch,
+ struct srp_request *req, struct scatterlist *scat,
+ int count)
+{
+ struct scatterlist *sg;
+ int i, ret;
+
+ state->desc = req->indirect_desc;
+ state->pages = req->map_page;
+ state->fmr.next = req->fmr_list;
+ state->fmr.end = req->fmr_list + ch->target->cmd_sg_cnt;
+
+ for_each_sg(scat, sg, count, i) {
+ ret = srp_map_sg_entry(state, ch, sg, i);
if (ret)
- goto out;
- } else {
- for_each_sg(scat, sg, count, i) {
- srp_map_desc(state, ib_sg_dma_address(dev->dev, sg),
- ib_sg_dma_len(dev->dev, sg),
- target->global_mr->rkey);
- }
+ return ret;
}

+ ret = srp_finish_mapping(state, ch);
+ if (ret)
+ return ret;
+
req->nmdesc = state->nmdesc;
- ret = 0;

-out:
- return ret;
+ return 0;
+}
+
+static int srp_map_sg_dma(struct srp_map_state *state, struct srp_rdma_ch *ch,
+ struct srp_request *req, struct scatterlist *scat,
+ int count)
+{
+ struct srp_target_port *target = ch->target;
+ struct srp_device *dev = target->srp_host->srp_dev;
+ struct scatterlist *sg;
+ int i;
+
+ state->desc = req->indirect_desc;
+ for_each_sg(scat, sg, count, i) {
+ srp_map_desc(state, ib_sg_dma_address(dev->dev, sg),
+ ib_sg_dma_len(dev->dev, sg),
+ target->global_mr->rkey);
+ }
+
+ req->nmdesc = state->nmdesc;
+
+ return 0;
}

/*
@@ -1563,7 +1613,12 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_rdma_ch *ch,
target->indirect_size, DMA_TO_DEVICE);

memset(&state, 0, sizeof(state));
- srp_map_sg(&state, ch, req, scat, count);
+ if (dev->use_fast_reg)
+ srp_map_sg_fr(&state, ch, req, scat, count);
+ else if (dev->use_fmr)
+ srp_map_sg_fmr(&state, ch, req, scat, count);
+ else
+ srp_map_sg_dma(&state, ch, req, scat, count);

/* We've mapped the request, now pull as much of the indirect
* descriptor table as we can into the command buffer. If this
--
1.8.4.3


2015-10-08 07:52:20

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 20/26] IB/mlx4: Remove old FRWR API support

No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/mlx4/cq.c | 3 +--
drivers/infiniband/hw/mlx4/main.c | 2 --
drivers/infiniband/hw/mlx4/mlx4_ib.h | 15 -----------
drivers/infiniband/hw/mlx4/mr.c | 48 ------------------------------------
drivers/infiniband/hw/mlx4/qp.c | 31 -----------------------
5 files changed, 1 insertion(+), 98 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 2ea4125b7903..b88fc8f5ab18 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -818,8 +818,7 @@ repoll:
wc->opcode = IB_WC_LSO;
break;
case MLX4_OPCODE_FMR:
- wc->opcode = IB_WC_FAST_REG_MR;
- /* TODO: wc->opcode = IB_WC_REG_MR; */
+ wc->opcode = IB_WC_REG_MR;
break;
case MLX4_OPCODE_LOCAL_INVAL:
wc->opcode = IB_WC_LOCAL_INV;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 19191ac0783c..0d7698e11329 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2250,8 +2250,6 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
ibdev->ib_dev.dereg_mr = mlx4_ib_dereg_mr;
ibdev->ib_dev.alloc_mr = mlx4_ib_alloc_mr;
ibdev->ib_dev.map_mr_sg = mlx4_ib_map_mr_sg;
- ibdev->ib_dev.alloc_fast_reg_page_list = mlx4_ib_alloc_fast_reg_page_list;
- ibdev->ib_dev.free_fast_reg_page_list = mlx4_ib_free_fast_reg_page_list;
ibdev->ib_dev.attach_mcast = mlx4_ib_mcg_attach;
ibdev->ib_dev.detach_mcast = mlx4_ib_mcg_detach;
ibdev->ib_dev.process_mad = mlx4_ib_process_mad;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index d6214577ecf8..4c6924791771 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -147,12 +147,6 @@ struct mlx4_ib_mw {
struct mlx4_mw mmw;
};

-struct mlx4_ib_fast_reg_page_list {
- struct ib_fast_reg_page_list ibfrpl;
- __be64 *mapped_page_list;
- dma_addr_t map;
-};
-
struct mlx4_ib_fmr {
struct ib_fmr ibfmr;
struct mlx4_fmr mfmr;
@@ -645,11 +639,6 @@ static inline struct mlx4_ib_mw *to_mmw(struct ib_mw *ibmw)
return container_of(ibmw, struct mlx4_ib_mw, ibmw);
}

-static inline struct mlx4_ib_fast_reg_page_list *to_mfrpl(struct ib_fast_reg_page_list *ibfrpl)
-{
- return container_of(ibfrpl, struct mlx4_ib_fast_reg_page_list, ibfrpl);
-}
-
static inline struct mlx4_ib_fmr *to_mfmr(struct ib_fmr *ibfmr)
{
return container_of(ibfmr, struct mlx4_ib_fmr, ibfmr);
@@ -716,10 +705,6 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
struct scatterlist *sg,
unsigned int sg_nents);
-struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
- int page_list_len);
-void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
-
int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 96fc7ed99fb8..33f7ffac7b19 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -443,54 +443,6 @@ err_free:
return ERR_PTR(err);
}

-struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
- int page_list_len)
-{
- struct mlx4_ib_dev *dev = to_mdev(ibdev);
- struct mlx4_ib_fast_reg_page_list *mfrpl;
- int size = page_list_len * sizeof (u64);
-
- if (page_list_len > MLX4_MAX_FAST_REG_PAGES)
- return ERR_PTR(-EINVAL);
-
- mfrpl = kmalloc(sizeof *mfrpl, GFP_KERNEL);
- if (!mfrpl)
- return ERR_PTR(-ENOMEM);
-
- mfrpl->ibfrpl.page_list = kmalloc(size, GFP_KERNEL);
- if (!mfrpl->ibfrpl.page_list)
- goto err_free;
-
- mfrpl->mapped_page_list = dma_alloc_coherent(&dev->dev->persist->
- pdev->dev,
- size, &mfrpl->map,
- GFP_KERNEL);
- if (!mfrpl->mapped_page_list)
- goto err_free;
-
- WARN_ON(mfrpl->map & 0x3f);
-
- return &mfrpl->ibfrpl;
-
-err_free:
- kfree(mfrpl->ibfrpl.page_list);
- kfree(mfrpl);
- return ERR_PTR(-ENOMEM);
-}
-
-void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list)
-{
- struct mlx4_ib_dev *dev = to_mdev(page_list->device);
- struct mlx4_ib_fast_reg_page_list *mfrpl = to_mfrpl(page_list);
- int size = page_list->max_page_list_len * sizeof (u64);
-
- dma_free_coherent(&dev->dev->persist->pdev->dev, size,
- mfrpl->mapped_page_list,
- mfrpl->map);
- kfree(mfrpl->ibfrpl.page_list);
- kfree(mfrpl);
-}
-
struct ib_fmr *mlx4_ib_fmr_alloc(struct ib_pd *pd, int acc,
struct ib_fmr_attr *fmr_attr)
{
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 850d4e9c13c5..b88a2b78ee88 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -111,7 +111,6 @@ static const __be32 mlx4_ib_opcode[] = {
[IB_WR_ATOMIC_FETCH_AND_ADD] = cpu_to_be32(MLX4_OPCODE_ATOMIC_FA),
[IB_WR_SEND_WITH_INV] = cpu_to_be32(MLX4_OPCODE_SEND_INVAL),
[IB_WR_LOCAL_INV] = cpu_to_be32(MLX4_OPCODE_LOCAL_INVAL),
- [IB_WR_FAST_REG_MR] = cpu_to_be32(MLX4_OPCODE_FMR),
[IB_WR_REG_MR] = cpu_to_be32(MLX4_OPCODE_FMR),
[IB_WR_MASKED_ATOMIC_CMP_AND_SWP] = cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_CS),
[IB_WR_MASKED_ATOMIC_FETCH_AND_ADD] = cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_FA),
@@ -2422,28 +2421,6 @@ static void set_reg_seg(struct mlx4_wqe_fmr_seg *fseg,
fseg->reserved[1] = 0;
}

-static void set_fmr_seg(struct mlx4_wqe_fmr_seg *fseg,
- struct ib_fast_reg_wr *wr)
-{
- struct mlx4_ib_fast_reg_page_list *mfrpl = to_mfrpl(wr->page_list);
- int i;
-
- for (i = 0; i < wr->page_list_len; ++i)
- mfrpl->mapped_page_list[i] =
- cpu_to_be64(wr->page_list->page_list[i] |
- MLX4_MTT_FLAG_PRESENT);
-
- fseg->flags = convert_access(wr->access_flags);
- fseg->mem_key = cpu_to_be32(wr->rkey);
- fseg->buf_list = cpu_to_be64(mfrpl->map);
- fseg->start_addr = cpu_to_be64(wr->iova_start);
- fseg->reg_len = cpu_to_be64(wr->length);
- fseg->offset = 0; /* XXX -- is this just for ZBVA? */
- fseg->page_size = cpu_to_be32(wr->page_shift);
- fseg->reserved[0] = 0;
- fseg->reserved[1] = 0;
-}
-
static void set_bind_seg(struct mlx4_wqe_bind_seg *bseg,
struct ib_bind_mw_wr *wr)
{
@@ -2775,14 +2752,6 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
size += sizeof (struct mlx4_wqe_local_inval_seg) / 16;
break;

- case IB_WR_FAST_REG_MR:
- ctrl->srcrb_flags |=
- cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER);
- set_fmr_seg(wqe, fast_reg_wr(wr));
- wqe += sizeof (struct mlx4_wqe_fmr_seg);
- size += sizeof (struct mlx4_wqe_fmr_seg) / 16;
- break;
-
case IB_WR_REG_MR:
ctrl->srcrb_flags |=
cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER);
--
1.8.4.3


2015-10-08 07:52:20

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 21/26] RDMA/ocrdma: Remove old FRWR API

No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 2 -
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 104 +---------------------------
drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 4 --
3 files changed, 1 insertion(+), 109 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 874beb4b07a1..9bf430ef8eb6 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -183,8 +183,6 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)

dev->ibdev.alloc_mr = ocrdma_alloc_mr;
dev->ibdev.map_mr_sg = ocrdma_map_mr_sg;
- dev->ibdev.alloc_fast_reg_page_list = ocrdma_alloc_frmr_page_list;
- dev->ibdev.free_fast_reg_page_list = ocrdma_free_frmr_page_list;

/* mandatory to support user space verbs consumer. */
dev->ibdev.alloc_ucontext = ocrdma_alloc_ucontext;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 66cc15a78f63..0b8598c9d56b 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -2133,41 +2133,6 @@ static void ocrdma_build_read(struct ocrdma_qp *qp, struct ocrdma_hdr_wqe *hdr,
ext_rw->len = hdr->total_len;
}

-static void build_frmr_pbes(struct ib_fast_reg_wr *wr,
- struct ocrdma_pbl *pbl_tbl,
- struct ocrdma_hw_mr *hwmr)
-{
- int i;
- u64 buf_addr = 0;
- int num_pbes;
- struct ocrdma_pbe *pbe;
-
- pbe = (struct ocrdma_pbe *)pbl_tbl->va;
- num_pbes = 0;
-
- /* go through the OS phy regions & fill hw pbe entries into pbls. */
- for (i = 0; i < wr->page_list_len; i++) {
- /* number of pbes can be more for one OS buf, when
- * buffers are of different sizes.
- * split the ib_buf to one or more pbes.
- */
- buf_addr = wr->page_list->page_list[i];
- pbe->pa_lo = cpu_to_le32((u32) (buf_addr & PAGE_MASK));
- pbe->pa_hi = cpu_to_le32((u32) upper_32_bits(buf_addr));
- num_pbes += 1;
- pbe++;
-
- /* if the pbl is full storing the pbes,
- * move to next pbl.
- */
- if (num_pbes == (hwmr->pbl_size/sizeof(u64))) {
- pbl_tbl++;
- pbe = (struct ocrdma_pbe *)pbl_tbl->va;
- }
- }
- return;
-}
-
static int get_encoded_page_size(int pg_sz)
{
/* Max size is 256M 4096 << 16 */
@@ -2234,50 +2199,6 @@ static int ocrdma_build_reg(struct ocrdma_qp *qp,
return 0;
}

-static int ocrdma_build_fr(struct ocrdma_qp *qp, struct ocrdma_hdr_wqe *hdr,
- struct ib_send_wr *send_wr)
-{
- u64 fbo;
- struct ib_fast_reg_wr *wr = fast_reg_wr(send_wr);
- struct ocrdma_ewqe_fr *fast_reg = (struct ocrdma_ewqe_fr *)(hdr + 1);
- struct ocrdma_mr *mr;
- struct ocrdma_dev *dev = get_ocrdma_dev(qp->ibqp.device);
- u32 wqe_size = sizeof(*fast_reg) + sizeof(*hdr);
-
- wqe_size = roundup(wqe_size, OCRDMA_WQE_ALIGN_BYTES);
-
- if (wr->page_list_len > dev->attr.max_pages_per_frmr)
- return -EINVAL;
-
- hdr->cw |= (OCRDMA_FR_MR << OCRDMA_WQE_OPCODE_SHIFT);
- hdr->cw |= ((wqe_size / OCRDMA_WQE_STRIDE) << OCRDMA_WQE_SIZE_SHIFT);
-
- if (wr->page_list_len == 0)
- BUG();
- if (wr->access_flags & IB_ACCESS_LOCAL_WRITE)
- hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_LOCAL_WR;
- if (wr->access_flags & IB_ACCESS_REMOTE_WRITE)
- hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_REMOTE_WR;
- if (wr->access_flags & IB_ACCESS_REMOTE_READ)
- hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_REMOTE_RD;
- hdr->lkey = wr->rkey;
- hdr->total_len = wr->length;
-
- fbo = wr->iova_start - (wr->page_list->page_list[0] & PAGE_MASK);
-
- fast_reg->va_hi = upper_32_bits(wr->iova_start);
- fast_reg->va_lo = (u32) (wr->iova_start & 0xffffffff);
- fast_reg->fbo_hi = upper_32_bits(fbo);
- fast_reg->fbo_lo = (u32) fbo & 0xffffffff;
- fast_reg->num_sges = wr->page_list_len;
- fast_reg->size_sge =
- get_encoded_page_size(1 << wr->page_shift);
- mr = (struct ocrdma_mr *) (unsigned long)
- dev->stag_arr[(hdr->lkey >> 8) & (OCRDMA_MAX_STAG - 1)];
- build_frmr_pbes(wr, mr->hwmr.pbl_table, &mr->hwmr);
- return 0;
-}
-
static void ocrdma_ring_sq_db(struct ocrdma_qp *qp)
{
u32 val = qp->sq.dbid | (1 << OCRDMA_DB_SQ_SHIFT);
@@ -2357,9 +2278,6 @@ int ocrdma_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
OCRDMA_WQE_STRIDE) << OCRDMA_WQE_SIZE_SHIFT;
hdr->lkey = wr->ex.invalidate_rkey;
break;
- case IB_WR_FAST_REG_MR:
- status = ocrdma_build_fr(qp, hdr, wr);
- break;
case IB_WR_REG_MR:
status = ocrdma_build_reg(qp, hdr, reg_wr(wr));
break;
@@ -2627,7 +2545,7 @@ static void ocrdma_update_wc(struct ocrdma_qp *qp, struct ib_wc *ibwc,
ibwc->opcode = IB_WC_SEND;
break;
case OCRDMA_FR_MR:
- ibwc->opcode = IB_WC_FAST_REG_MR;
+ ibwc->opcode = IB_WC_REG_MR;
break;
case OCRDMA_LKEY_INV:
ibwc->opcode = IB_WC_LOCAL_INV;
@@ -3153,26 +3071,6 @@ pl_err:
return ERR_PTR(-ENOMEM);
}

-struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device
- *ibdev,
- int page_list_len)
-{
- struct ib_fast_reg_page_list *frmr_list;
- int size;
-
- size = sizeof(*frmr_list) + (page_list_len * sizeof(u64));
- frmr_list = kzalloc(size, GFP_KERNEL);
- if (!frmr_list)
- return ERR_PTR(-ENOMEM);
- frmr_list->page_list = (u64 *)(frmr_list + 1);
- return frmr_list;
-}
-
-void ocrdma_free_frmr_page_list(struct ib_fast_reg_page_list *page_list)
-{
- kfree(page_list);
-}
-
#define MAX_KERNEL_PBE_SIZE 65536
static inline int count_kernel_pbes(struct ib_phys_buf *buf_list,
int buf_cnt, u32 *pbe_size)
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index 4edf63f9c6c7..62d6608830cb 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -128,9 +128,5 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd,
int ocrdma_map_mr_sg(struct ib_mr *ibmr,
struct scatterlist *sg,
unsigned int sg_nents);
-struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device
- *ibdev,
- int page_list_len);
-void ocrdma_free_frmr_page_list(struct ib_fast_reg_page_list *page_list);

#endif /* __OCRDMA_VERBS_H__ */
--
1.8.4.3


2015-10-08 07:52:25

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 16/26] IB/srp: Convert to new registration API

Instead of constructing a page list, call ib_map_mr_sg
and post a new ib_reg_wr. srp_map_finish_fr now returns
the number of sg elements registered.

Remove srp_finish_mapping since no one is calling it.

Signed-off-by: Sagi Grimberg <[email protected]>
Tested-by: Bart Van Assche <[email protected]>
---
drivers/infiniband/ulp/srp/ib_srp.c | 125 ++++++++++++++++++------------------
drivers/infiniband/ulp/srp/ib_srp.h | 11 +++-
2 files changed, 71 insertions(+), 65 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 3ec94c109e1b..6d399378928d 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -340,8 +340,6 @@ static void srp_destroy_fr_pool(struct srp_fr_pool *pool)
return;

for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
- if (d->frpl)
- ib_free_fast_reg_page_list(d->frpl);
if (d->mr)
ib_dereg_mr(d->mr);
}
@@ -362,7 +360,6 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
struct srp_fr_pool *pool;
struct srp_fr_desc *d;
struct ib_mr *mr;
- struct ib_fast_reg_page_list *frpl;
int i, ret = -EINVAL;

if (pool_size <= 0)
@@ -385,12 +382,6 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
goto destroy_pool;
}
d->mr = mr;
- frpl = ib_alloc_fast_reg_page_list(device, max_page_list_len);
- if (IS_ERR(frpl)) {
- ret = PTR_ERR(frpl);
- goto destroy_pool;
- }
- d->frpl = frpl;
list_add_tail(&d->entry, &pool->free_list);
}

@@ -1321,23 +1312,24 @@ static int srp_map_finish_fr(struct srp_map_state *state,
struct srp_target_port *target = ch->target;
struct srp_device *dev = target->srp_host->srp_dev;
struct ib_send_wr *bad_wr;
- struct ib_fast_reg_wr wr;
+ struct ib_reg_wr wr;
struct srp_fr_desc *desc;
u32 rkey;
- int err;
+ int n, err;

if (state->fr.next >= state->fr.end)
return -ENOMEM;

WARN_ON_ONCE(!dev->use_fast_reg);

- if (state->npages == 0)
+ if (state->sg_nents == 0)
return 0;

- if (state->npages == 1 && target->global_mr) {
- srp_map_desc(state, state->base_dma_addr, state->dma_len,
+ if (state->sg_nents == 1 && target->global_mr) {
+ srp_map_desc(state, sg_dma_address(state->sg),
+ sg_dma_len(state->sg),
target->global_mr->rkey);
- goto reset_state;
+ return 1;
}

desc = srp_fr_pool_get(ch->fr_pool);
@@ -1347,37 +1339,33 @@ static int srp_map_finish_fr(struct srp_map_state *state,
rkey = ib_inc_rkey(desc->mr->rkey);
ib_update_fast_reg_key(desc->mr, rkey);

- memcpy(desc->frpl->page_list, state->pages,
- sizeof(state->pages[0]) * state->npages);
+ n = ib_map_mr_sg(desc->mr, state->sg, state->sg_nents,
+ dev->mr_page_size);
+ if (unlikely(n < 0))
+ return n;

- memset(&wr, 0, sizeof(wr));
- wr.wr.opcode = IB_WR_FAST_REG_MR;
+ wr.wr.next = NULL;
+ wr.wr.opcode = IB_WR_REG_MR;
wr.wr.wr_id = FAST_REG_WR_ID_MASK;
- wr.iova_start = state->base_dma_addr;
- wr.page_list = desc->frpl;
- wr.page_list_len = state->npages;
- wr.page_shift = ilog2(dev->mr_page_size);
- wr.length = state->dma_len;
- wr.access_flags = (IB_ACCESS_LOCAL_WRITE |
- IB_ACCESS_REMOTE_READ |
- IB_ACCESS_REMOTE_WRITE);
- wr.rkey = desc->mr->lkey;
+ wr.wr.num_sge = 0;
+ wr.wr.send_flags = 0;
+ wr.mr = desc->mr;
+ wr.key = desc->mr->rkey;
+ wr.access = (IB_ACCESS_LOCAL_WRITE |
+ IB_ACCESS_REMOTE_READ |
+ IB_ACCESS_REMOTE_WRITE);

*state->fr.next++ = desc;
state->nmdesc++;

- srp_map_desc(state, state->base_dma_addr, state->dma_len,
- desc->mr->rkey);
+ srp_map_desc(state, desc->mr->iova,
+ desc->mr->length, desc->mr->rkey);

err = ib_post_send(ch->qp, &wr.wr, &bad_wr);
- if (err)
+ if (unlikely(err))
return err;

-reset_state:
- state->npages = 0;
- state->dma_len = 0;
-
- return 0;
+ return n;
}

static int srp_finish_mapping(struct srp_map_state *state,
@@ -1407,7 +1395,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
while (dma_len) {
unsigned offset = dma_addr & ~dev->mr_page_mask;
if (state->npages == dev->max_pages_per_mr || offset != 0) {
- ret = srp_finish_mapping(state, ch);
+ ret = srp_map_finish_fmr(state, ch);
if (ret)
return ret;
}
@@ -1429,7 +1417,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
*/
ret = 0;
if (len != dev->mr_page_size)
- ret = srp_finish_mapping(state, ch);
+ ret = srp_map_finish_fmr(state, ch);
return ret;
}

@@ -1451,7 +1439,7 @@ static int srp_map_sg_fmr(struct srp_map_state *state, struct srp_rdma_ch *ch,
return ret;
}

- ret = srp_finish_mapping(state, ch);
+ ret = srp_map_finish_fmr(state, ch);
if (ret)
return ret;

@@ -1464,23 +1452,23 @@ static int srp_map_sg_fr(struct srp_map_state *state, struct srp_rdma_ch *ch,
struct srp_request *req, struct scatterlist *scat,
int count)
{
- struct scatterlist *sg;
- int i, ret;
-
state->desc = req->indirect_desc;
- state->pages = req->map_page;
- state->fmr.next = req->fmr_list;
- state->fmr.end = req->fmr_list + ch->target->cmd_sg_cnt;
+ state->fr.next = req->fr_list;
+ state->fr.end = req->fr_list + ch->target->cmd_sg_cnt;
+ state->sg = scat;
+ state->sg_nents = scsi_sg_count(req->scmnd);

- for_each_sg(scat, sg, count, i) {
- ret = srp_map_sg_entry(state, ch, sg, i);
- if (ret)
- return ret;
- }
+ while (state->sg_nents) {
+ int i, n;

- ret = srp_finish_mapping(state, ch);
- if (ret)
- return ret;
+ n = srp_map_finish_fr(state, ch);
+ if (unlikely(n < 0))
+ return n;
+
+ state->sg_nents -= n;
+ for (i = 0; i < n; i++)
+ state->sg = sg_next(state->sg);
+ }

req->nmdesc = state->nmdesc;

@@ -1524,6 +1512,7 @@ static int srp_map_idb(struct srp_rdma_ch *ch, struct srp_request *req,
struct srp_map_state state;
struct srp_direct_buf idb_desc;
u64 idb_pages[1];
+ struct scatterlist idb_sg[1];
int ret;

memset(&state, 0, sizeof(state));
@@ -1531,20 +1520,32 @@ static int srp_map_idb(struct srp_rdma_ch *ch, struct srp_request *req,
state.gen.next = next_mr;
state.gen.end = end_mr;
state.desc = &idb_desc;
- state.pages = idb_pages;
- state.pages[0] = (req->indirect_dma_addr &
- dev->mr_page_mask);
- state.npages = 1;
state.base_dma_addr = req->indirect_dma_addr;
state.dma_len = idb_len;
- ret = srp_finish_mapping(&state, ch);
- if (ret < 0)
- goto out;
+
+ if (dev->use_fast_reg) {
+ state.sg = idb_sg;
+ state.sg_nents = 1;
+ sg_set_buf(idb_sg, req->indirect_desc, idb_len);
+ idb_sg->dma_address = req->indirect_dma_addr; /* hack! */
+ ret = srp_map_finish_fr(&state, ch);
+ if (ret < 0)
+ return ret;
+ } else if (dev->use_fmr) {
+ state.pages = idb_pages;
+ state.pages[0] = (req->indirect_dma_addr &
+ dev->mr_page_mask);
+ state.npages = 1;
+ ret = srp_map_finish_fmr(&state, ch);
+ if (ret < 0)
+ return ret;
+ } else {
+ return -EINVAL;
+ }

*idb_rkey = idb_desc.key;

-out:
- return ret;
+ return 0;
}

static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_rdma_ch *ch,
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index 3608f2e4819c..a31a93716f3f 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -242,7 +242,6 @@ struct srp_iu {
struct srp_fr_desc {
struct list_head entry;
struct ib_mr *mr;
- struct ib_fast_reg_page_list *frpl;
};

/**
@@ -294,11 +293,17 @@ struct srp_map_state {
} gen;
};
struct srp_direct_buf *desc;
- u64 *pages;
+ union {
+ u64 *pages;
+ struct scatterlist *sg;
+ };
dma_addr_t base_dma_addr;
u32 dma_len;
u32 total_len;
- unsigned int npages;
+ union {
+ unsigned int npages;
+ unsigned int sg_nents;
+ };
unsigned int nmdesc;
unsigned int ndesc;
};
--
1.8.4.3


2015-10-08 07:52:25

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 22/26] RDMA/cxgb3: Remove old FRWR API

No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/cxgb3/iwch_cq.c | 2 +-
drivers/infiniband/hw/cxgb3/iwch_provider.c | 24 ---------------
drivers/infiniband/hw/cxgb3/iwch_qp.c | 47 -----------------------------
3 files changed, 1 insertion(+), 72 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c b/drivers/infiniband/hw/cxgb3/iwch_cq.c
index cf5474ae68ff..cfe404925a39 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cq.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c
@@ -123,7 +123,7 @@ static int iwch_poll_cq_one(struct iwch_dev *rhp, struct iwch_cq *chp,
wc->opcode = IB_WC_LOCAL_INV;
break;
case T3_FAST_REGISTER:
- wc->opcode = IB_WC_FAST_REG_MR;
+ wc->opcode = IB_WC_REG_MR;
break;
default:
printk(KERN_ERR MOD "Unexpected opcode %d "
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index ee3d5ca7de6c..99ae2ab14b9e 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -884,28 +884,6 @@ static int iwch_map_mr_sg(struct ib_mr *ibmr,
return ib_sg_to_pages(ibmr, sg, sg_nents, iwch_set_page);
}

-static struct ib_fast_reg_page_list *iwch_alloc_fastreg_pbl(
- struct ib_device *device,
- int page_list_len)
-{
- struct ib_fast_reg_page_list *page_list;
-
- page_list = kmalloc(sizeof *page_list + page_list_len * sizeof(u64),
- GFP_KERNEL);
- if (!page_list)
- return ERR_PTR(-ENOMEM);
-
- page_list->page_list = (u64 *)(page_list + 1);
- page_list->max_page_list_len = page_list_len;
-
- return page_list;
-}
-
-static void iwch_free_fastreg_pbl(struct ib_fast_reg_page_list *page_list)
-{
- kfree(page_list);
-}
-
static int iwch_destroy_qp(struct ib_qp *ib_qp)
{
struct iwch_dev *rhp;
@@ -1483,8 +1461,6 @@ int iwch_register_device(struct iwch_dev *dev)
dev->ibdev.dealloc_mw = iwch_dealloc_mw;
dev->ibdev.alloc_mr = iwch_alloc_mr;
dev->ibdev.map_mr_sg = iwch_map_mr_sg;
- dev->ibdev.alloc_fast_reg_page_list = iwch_alloc_fastreg_pbl;
- dev->ibdev.free_fast_reg_page_list = iwch_free_fastreg_pbl;
dev->ibdev.attach_mcast = iwch_multicast_attach;
dev->ibdev.detach_mcast = iwch_multicast_detach;
dev->ibdev.process_mad = iwch_process_mad;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index a09ea538e990..d0548fc6395e 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -189,48 +189,6 @@ static int build_memreg(union t3_wr *wqe, struct ib_reg_wr *wr,
return 0;
}

-static int build_fastreg(union t3_wr *wqe, struct ib_send_wr *send_wr,
- u8 *flit_cnt, int *wr_cnt, struct t3_wq *wq)
-{
- struct ib_fast_reg_wr *wr = fast_reg_wr(send_wr);
- int i;
- __be64 *p;
-
- if (wr->page_list_len > T3_MAX_FASTREG_DEPTH)
- return -EINVAL;
- *wr_cnt = 1;
- wqe->fastreg.stag = cpu_to_be32(wr->rkey);
- wqe->fastreg.len = cpu_to_be32(wr->length);
- wqe->fastreg.va_base_hi = cpu_to_be32(wr->iova_start >> 32);
- wqe->fastreg.va_base_lo_fbo = cpu_to_be32(wr->iova_start & 0xffffffff);
- wqe->fastreg.page_type_perms = cpu_to_be32(
- V_FR_PAGE_COUNT(wr->page_list_len) |
- V_FR_PAGE_SIZE(wr->page_shift-12) |
- V_FR_TYPE(TPT_VATO) |
- V_FR_PERMS(iwch_ib_to_tpt_access(wr->access_flags)));
- p = &wqe->fastreg.pbl_addrs[0];
- for (i = 0; i < wr->page_list_len; i++, p++) {
-
- /* If we need a 2nd WR, then set it up */
- if (i == T3_MAX_FASTREG_FRAG) {
- *wr_cnt = 2;
- wqe = (union t3_wr *)(wq->queue +
- Q_PTR2IDX((wq->wptr+1), wq->size_log2));
- build_fw_riwrh((void *)wqe, T3_WR_FASTREG, 0,
- Q_GENBIT(wq->wptr + 1, wq->size_log2),
- 0, 1 + wr->page_list_len - T3_MAX_FASTREG_FRAG,
- T3_EOP);
-
- p = &wqe->pbl_frag.pbl_addrs[0];
- }
- *p = cpu_to_be64((u64)wr->page_list->page_list[i]);
- }
- *flit_cnt = 5 + wr->page_list_len;
- if (*flit_cnt > 15)
- *flit_cnt = 15;
- return 0;
-}
-
static int build_inv_stag(union t3_wr *wqe, struct ib_send_wr *wr,
u8 *flit_cnt)
{
@@ -457,11 +415,6 @@ int iwch_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
if (!qhp->wq.oldest_read)
qhp->wq.oldest_read = sqp;
break;
- case IB_WR_FAST_REG_MR:
- t3_wr_opcode = T3_WR_FASTREG;
- err = build_fastreg(wqe, wr, &t3_wr_flit_cnt,
- &wr_cnt, &qhp->wq);
- break;
case IB_WR_REG_MR:
t3_wr_opcode = T3_WR_FASTREG;
err = build_memreg(wqe, reg_wr(wr), &t3_wr_flit_cnt,
--
1.8.4.3


2015-10-08 07:52:25

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v3 23/26] iw_cxgb4: Remove old FRWR API

No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/infiniband/hw/cxgb4/cq.c | 2 +-
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 18 --------
drivers/infiniband/hw/cxgb4/mem.c | 45 --------------------
drivers/infiniband/hw/cxgb4/provider.c | 2 -
drivers/infiniband/hw/cxgb4/qp.c | 77 ----------------------------------
5 files changed, 1 insertion(+), 143 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c
index 92d518382a9f..de9cd6901752 100644
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -752,7 +752,7 @@ static int c4iw_poll_cq_one(struct c4iw_cq *chp, struct ib_wc *wc)
wc->opcode = IB_WC_LOCAL_INV;
break;
case FW_RI_FAST_REGISTER:
- wc->opcode = IB_WC_FAST_REG_MR;
+ wc->opcode = IB_WC_REG_MR;
break;
default:
printk(KERN_ERR MOD "Unexpected opcode %d "
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 032f90aa8ac9..699c52b875b1 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -409,20 +409,6 @@ static inline struct c4iw_mw *to_c4iw_mw(struct ib_mw *ibmw)
return container_of(ibmw, struct c4iw_mw, ibmw);
}

-struct c4iw_fr_page_list {
- struct ib_fast_reg_page_list ibpl;
- DEFINE_DMA_UNMAP_ADDR(mapping);
- dma_addr_t dma_addr;
- struct c4iw_dev *dev;
- int pll_len;
-};
-
-static inline struct c4iw_fr_page_list *to_c4iw_fr_page_list(
- struct ib_fast_reg_page_list *ibpl)
-{
- return container_of(ibpl, struct c4iw_fr_page_list, ibpl);
-}
-
struct c4iw_cq {
struct ib_cq ibcq;
struct c4iw_dev *rhp;
@@ -970,10 +956,6 @@ int c4iw_accept_cr(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param);
int c4iw_reject_cr(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len);
void c4iw_qp_add_ref(struct ib_qp *qp);
void c4iw_qp_rem_ref(struct ib_qp *qp);
-void c4iw_free_fastreg_pbl(struct ib_fast_reg_page_list *page_list);
-struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(
- struct ib_device *device,
- int page_list_len);
struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 86ec65721797..ada42ed425a0 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -945,51 +945,6 @@ int c4iw_map_mr_sg(struct ib_mr *ibmr,
return ib_sg_to_pages(ibmr, sg, sg_nents, c4iw_set_page);
}

-struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(struct ib_device *device,
- int page_list_len)
-{
- struct c4iw_fr_page_list *c4pl;
- struct c4iw_dev *dev = to_c4iw_dev(device);
- dma_addr_t dma_addr;
- int pll_len = roundup(page_list_len * sizeof(u64), 32);
-
- c4pl = kmalloc(sizeof(*c4pl), GFP_KERNEL);
- if (!c4pl)
- return ERR_PTR(-ENOMEM);
-
- c4pl->ibpl.page_list = dma_alloc_coherent(&dev->rdev.lldi.pdev->dev,
- pll_len, &dma_addr,
- GFP_KERNEL);
- if (!c4pl->ibpl.page_list) {
- kfree(c4pl);
- return ERR_PTR(-ENOMEM);
- }
- dma_unmap_addr_set(c4pl, mapping, dma_addr);
- c4pl->dma_addr = dma_addr;
- c4pl->dev = dev;
- c4pl->pll_len = pll_len;
-
- PDBG("%s c4pl %p pll_len %u page_list %p dma_addr %pad\n",
- __func__, c4pl, c4pl->pll_len, c4pl->ibpl.page_list,
- &c4pl->dma_addr);
-
- return &c4pl->ibpl;
-}
-
-void c4iw_free_fastreg_pbl(struct ib_fast_reg_page_list *ibpl)
-{
- struct c4iw_fr_page_list *c4pl = to_c4iw_fr_page_list(ibpl);
-
- PDBG("%s c4pl %p pll_len %u page_list %p dma_addr %pad\n",
- __func__, c4pl, c4pl->pll_len, c4pl->ibpl.page_list,
- &c4pl->dma_addr);
-
- dma_free_coherent(&c4pl->dev->rdev.lldi.pdev->dev,
- c4pl->pll_len,
- c4pl->ibpl.page_list, dma_unmap_addr(c4pl, mapping));
- kfree(c4pl);
-}
-
int c4iw_dereg_mr(struct ib_mr *ib_mr)
{
struct c4iw_dev *rhp;
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 55dedadcffaa..8f115b405d76 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -558,8 +558,6 @@ int c4iw_register_device(struct c4iw_dev *dev)
dev->ibdev.dealloc_mw = c4iw_dealloc_mw;
dev->ibdev.alloc_mr = c4iw_alloc_mr;
dev->ibdev.map_mr_sg = c4iw_map_mr_sg;
- dev->ibdev.alloc_fast_reg_page_list = c4iw_alloc_fastreg_pbl;
- dev->ibdev.free_fast_reg_page_list = c4iw_free_fastreg_pbl;
dev->ibdev.attach_mcast = c4iw_multicast_attach;
dev->ibdev.detach_mcast = c4iw_multicast_detach;
dev->ibdev.process_mad = c4iw_process_mad;
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 12a23bc935f3..c7d05a933e2d 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -670,75 +670,6 @@ static int build_memreg(struct t4_sq *sq, union t4_wr *wqe,
return 0;
}

-static int build_fastreg(struct t4_sq *sq, union t4_wr *wqe,
- struct ib_send_wr *send_wr, u8 *len16, u8 t5dev)
-{
- struct ib_fast_reg_wr *wr = fast_reg_wr(send_wr);
-
- struct fw_ri_immd *imdp;
- __be64 *p;
- int i;
- int pbllen = roundup(wr->page_list_len * sizeof(u64), 32);
- int rem;
-
- if (wr->page_list_len > t4_max_fr_depth(use_dsgl))
- return -EINVAL;
-
- wqe->fr.qpbinde_to_dcacpu = 0;
- wqe->fr.pgsz_shift = wr->page_shift - 12;
- wqe->fr.addr_type = FW_RI_VA_BASED_TO;
- wqe->fr.mem_perms = c4iw_ib_to_tpt_access(wr->access_flags);
- wqe->fr.len_hi = 0;
- wqe->fr.len_lo = cpu_to_be32(wr->length);
- wqe->fr.stag = cpu_to_be32(wr->rkey);
- wqe->fr.va_hi = cpu_to_be32(wr->iova_start >> 32);
- wqe->fr.va_lo_fbo = cpu_to_be32(wr->iova_start & 0xffffffff);
-
- if (t5dev && use_dsgl && (pbllen > max_fr_immd)) {
- struct c4iw_fr_page_list *c4pl =
- to_c4iw_fr_page_list(wr->page_list);
- struct fw_ri_dsgl *sglp;
-
- for (i = 0; i < wr->page_list_len; i++) {
- wr->page_list->page_list[i] = (__force u64)
- cpu_to_be64((u64)wr->page_list->page_list[i]);
- }
-
- sglp = (struct fw_ri_dsgl *)(&wqe->fr + 1);
- sglp->op = FW_RI_DATA_DSGL;
- sglp->r1 = 0;
- sglp->nsge = cpu_to_be16(1);
- sglp->addr0 = cpu_to_be64(c4pl->dma_addr);
- sglp->len0 = cpu_to_be32(pbllen);
-
- *len16 = DIV_ROUND_UP(sizeof(wqe->fr) + sizeof(*sglp), 16);
- } else {
- imdp = (struct fw_ri_immd *)(&wqe->fr + 1);
- imdp->op = FW_RI_DATA_IMMD;
- imdp->r1 = 0;
- imdp->r2 = 0;
- imdp->immdlen = cpu_to_be32(pbllen);
- p = (__be64 *)(imdp + 1);
- rem = pbllen;
- for (i = 0; i < wr->page_list_len; i++) {
- *p = cpu_to_be64((u64)wr->page_list->page_list[i]);
- rem -= sizeof(*p);
- if (++p == (__be64 *)&sq->queue[sq->size])
- p = (__be64 *)sq->queue;
- }
- BUG_ON(rem < 0);
- while (rem) {
- *p = 0;
- rem -= sizeof(*p);
- if (++p == (__be64 *)&sq->queue[sq->size])
- p = (__be64 *)sq->queue;
- }
- *len16 = DIV_ROUND_UP(sizeof(wqe->fr) + sizeof(*imdp)
- + pbllen, 16);
- }
- return 0;
-}
-
static int build_inv_stag(union t4_wr *wqe, struct ib_send_wr *wr,
u8 *len16)
{
@@ -875,14 +806,6 @@ int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
if (!qhp->wq.sq.oldest_read)
qhp->wq.sq.oldest_read = swsqe;
break;
- case IB_WR_FAST_REG_MR:
- fw_opcode = FW_RI_FR_NSMR_WR;
- swsqe->opcode = FW_RI_FAST_REGISTER;
- err = build_fastreg(&qhp->wq.sq, wqe, wr, &len16,
- is_t5(
- qhp->rhp->rdev.lldi.adapter_type) ?
- 1 : 0);
- break;
case IB_WR_REG_MR:
fw_opcode = FW_RI_FR_NSMR_WR;
swsqe->opcode = FW_RI_FAST_REGISTER;
--
1.8.4.3


2015-10-08 08:01:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 00/26] New fast registration API

On Thu, Oct 08, 2015 at 10:50:43AM +0300, Sagi Grimberg wrote:
> - Rebased against Doug's for-4.4 tree (4.3.0-rc1) + 4.3-rc fixes

Does this means its not on top of the send_wr changes now? I'm fine
with that as I think your series is even more important than the send_wr
split and I'm happy to rebase, but I'd also really like to see some
progress on them..

2015-10-08 08:22:03

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v3 00/26] New fast registration API

On 10/8/2015 11:01 AM, Christoph Hellwig wrote:
> On Thu, Oct 08, 2015 at 10:50:43AM +0300, Sagi Grimberg wrote:
>> - Rebased against Doug's for-4.4 tree (4.3.0-rc1) + 4.3-rc fixes
>
> Does this means its not on top of the send_wr changes now? I'm fine
> with that as I think your series is even more important than the send_wr
> split and I'm happy to rebase, but I'd also really like to see some
> progress on them..

Its on top of your wr_cleanup as well.

2015-10-08 15:24:33

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v3 12/26] xprtrdma: Port to new memory registration API

Hi Sagi,

On 10/08/2015 03:50 AM, Sagi Grimberg wrote:
> Instead of maintaining a fastreg page list, keep an sg table
> and convert an array of pages to a sg list. Then call ib_map_mr_sg
> and construct ib_reg_wr.
>
> Note that the next step would be to have NFS work with sg lists
> as it maps well to sk_frags (see comment from hch
> http://marc.info/?l=linux-rdma&m=143677002622296&w=2).

Looks good to me!

Acked-by: Anna Schumaker <[email protected]>
>
> Signed-off-by: Sagi Grimberg <[email protected]>
> Acked-by: Christoph Hellwig <[email protected]>
> Tested-by: Steve Wise <[email protected]>
> Tested-by: Selvin Xavier <[email protected]>
> ---
> net/sunrpc/xprtrdma/frwr_ops.c | 112 +++++++++++++++++++++++-----------------
> net/sunrpc/xprtrdma/xprt_rdma.h | 3 +-
> 2 files changed, 67 insertions(+), 48 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> index 0d2f46f600b6..3c89f8052786 100644
> --- a/net/sunrpc/xprtrdma/frwr_ops.c
> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> @@ -151,9 +151,13 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device,
> f->fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, depth);
> if (IS_ERR(f->fr_mr))
> goto out_mr_err;
> - f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth);
> - if (IS_ERR(f->fr_pgl))
> +
> + f->sg = kcalloc(depth, sizeof(*f->sg), GFP_KERNEL);
> + if (!f->sg)
> goto out_list_err;
> +
> + sg_init_table(f->sg, depth);
> +
> return 0;
>
> out_mr_err:
> @@ -163,7 +167,7 @@ out_mr_err:
> return rc;
>
> out_list_err:
> - rc = PTR_ERR(f->fr_pgl);
> + rc = -ENOMEM;
> dprintk("RPC: %s: ib_alloc_fast_reg_page_list status %i\n",
> __func__, rc);
> ib_dereg_mr(f->fr_mr);
> @@ -179,7 +183,7 @@ __frwr_release(struct rpcrdma_mw *r)
> if (rc)
> dprintk("RPC: %s: ib_dereg_mr status %i\n",
> __func__, rc);
> - ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
> + kfree(r->r.frmr.sg);
> }
>
> static int
> @@ -312,14 +316,11 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
> struct rpcrdma_mw *mw;
> struct rpcrdma_frmr *frmr;
> struct ib_mr *mr;
> - struct ib_fast_reg_wr fastreg_wr;
> + struct ib_reg_wr reg_wr;
> struct ib_send_wr *bad_wr;
> + unsigned int dma_nents;
> u8 key;
> - int len, pageoff;
> - int i, rc;
> - int seg_len;
> - u64 pa;
> - int page_no;
> + int i, rc, len, n;
>
> mw = seg1->rl_mw;
> seg1->rl_mw = NULL;
> @@ -332,64 +333,81 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
> } while (mw->r.frmr.fr_state != FRMR_IS_INVALID);
> frmr = &mw->r.frmr;
> frmr->fr_state = FRMR_IS_VALID;
> + mr = frmr->fr_mr;
>
> - pageoff = offset_in_page(seg1->mr_offset);
> - seg1->mr_offset -= pageoff; /* start of page */
> - seg1->mr_len += pageoff;
> - len = -pageoff;
> if (nsegs > ia->ri_max_frmr_depth)
> nsegs = ia->ri_max_frmr_depth;
>
> - for (page_no = i = 0; i < nsegs;) {
> - rpcrdma_map_one(device, seg, direction);
> - pa = seg->mr_dma;
> - for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
> - frmr->fr_pgl->page_list[page_no++] = pa;
> - pa += PAGE_SIZE;
> - }
> + for (len = 0, i = 0; i < nsegs;) {
> + if (seg->mr_page)
> + sg_set_page(&frmr->sg[i],
> + seg->mr_page,
> + seg->mr_len,
> + offset_in_page(seg->mr_offset));
> + else
> + sg_set_buf(&frmr->sg[i], seg->mr_offset,
> + seg->mr_len);
> +
> len += seg->mr_len;
> ++seg;
> ++i;
> +
> /* Check for holes */
> if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
> offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
> break;
> }
> + frmr->sg_nents = i;
> +
> + dma_nents = ib_dma_map_sg(device, frmr->sg, frmr->sg_nents, direction);
> + if (!dma_nents) {
> + pr_err("RPC: %s: failed to dma map sg %p sg_nents %d\n",
> + __func__, frmr->sg, frmr->sg_nents);
> + return -ENOMEM;
> + }
> +
> + n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
> + if (unlikely(n != frmr->sg_nents)) {
> + pr_err("RPC: %s: failed to map mr %p (%d/%d)\n",
> + __func__, frmr->fr_mr, n, frmr->sg_nents);
> + rc = n < 0 ? n : -EINVAL;
> + goto out_senderr;
> + }
> +
> dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n",
> - __func__, mw, i, len);
> -
> - memset(&fastreg_wr, 0, sizeof(fastreg_wr));
> - fastreg_wr.wr.wr_id = (unsigned long)(void *)mw;
> - fastreg_wr.wr.opcode = IB_WR_FAST_REG_MR;
> - fastreg_wr.iova_start = seg1->mr_dma + pageoff;
> - fastreg_wr.page_list = frmr->fr_pgl;
> - fastreg_wr.page_shift = PAGE_SHIFT;
> - fastreg_wr.page_list_len = page_no;
> - fastreg_wr.length = len;
> - fastreg_wr.access_flags = writing ?
> - IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
> - IB_ACCESS_REMOTE_READ;
> - mr = frmr->fr_mr;
> + __func__, mw, frmr->sg_nents, mr->length);
> +
> key = (u8)(mr->rkey & 0x000000FF);
> ib_update_fast_reg_key(mr, ++key);
> - fastreg_wr.rkey = mr->rkey;
> +
> + reg_wr.wr.next = NULL;
> + reg_wr.wr.opcode = IB_WR_REG_MR;
> + reg_wr.wr.wr_id = (uintptr_t)mw;
> + reg_wr.wr.num_sge = 0;
> + reg_wr.wr.send_flags = 0;
> + reg_wr.mr = mr;
> + reg_wr.key = mr->rkey;
> + reg_wr.access = writing ?
> + IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
> + IB_ACCESS_REMOTE_READ;
>
> DECR_CQCOUNT(&r_xprt->rx_ep);
> - rc = ib_post_send(ia->ri_id->qp, &fastreg_wr.wr, &bad_wr);
> + rc = ib_post_send(ia->ri_id->qp, &reg_wr.wr, &bad_wr);
> if (rc)
> goto out_senderr;
>
> + seg1->mr_dir = direction;
> seg1->rl_mw = mw;
> seg1->mr_rkey = mr->rkey;
> - seg1->mr_base = seg1->mr_dma + pageoff;
> - seg1->mr_nsegs = i;
> - seg1->mr_len = len;
> - return i;
> + seg1->mr_base = mr->iova;
> + seg1->mr_nsegs = frmr->sg_nents;
> + seg1->mr_len = mr->length;
> +
> + return frmr->sg_nents;
>
> out_senderr:
> dprintk("RPC: %s: ib_post_send status %i\n", __func__, rc);
> - while (i--)
> - rpcrdma_unmap_one(device, --seg);
> + ib_dma_unmap_sg(device, frmr->sg, frmr->sg_nents, direction);
> __frwr_queue_recovery(mw);
> return rc;
> }
> @@ -403,22 +421,22 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
> struct rpcrdma_mr_seg *seg1 = seg;
> struct rpcrdma_ia *ia = &r_xprt->rx_ia;
> struct rpcrdma_mw *mw = seg1->rl_mw;
> + struct rpcrdma_frmr *frmr = &mw->r.frmr;
> struct ib_send_wr invalidate_wr, *bad_wr;
> int rc, nsegs = seg->mr_nsegs;
>
> dprintk("RPC: %s: FRMR %p\n", __func__, mw);
>
> seg1->rl_mw = NULL;
> - mw->r.frmr.fr_state = FRMR_IS_INVALID;
> + frmr->fr_state = FRMR_IS_INVALID;
>
> memset(&invalidate_wr, 0, sizeof(invalidate_wr));
> invalidate_wr.wr_id = (unsigned long)(void *)mw;
> invalidate_wr.opcode = IB_WR_LOCAL_INV;
> - invalidate_wr.ex.invalidate_rkey = mw->r.frmr.fr_mr->rkey;
> + invalidate_wr.ex.invalidate_rkey = frmr->fr_mr->rkey;
> DECR_CQCOUNT(&r_xprt->rx_ep);
>
> - while (seg1->mr_nsegs--)
> - rpcrdma_unmap_one(ia->ri_device, seg++);
> + ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir);
> read_lock(&ia->ri_qplock);
> rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
> read_unlock(&ia->ri_qplock);
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index c09414e6f91b..426a9107c0ff 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -193,7 +193,8 @@ enum rpcrdma_frmr_state {
> };
>
> struct rpcrdma_frmr {
> - struct ib_fast_reg_page_list *fr_pgl;
> + struct scatterlist *sg;
> + unsigned int sg_nents;
> struct ib_mr *fr_mr;
> enum rpcrdma_frmr_state fr_state;
> struct work_struct fr_work;
>


2015-10-08 15:56:43

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v3 12/26] xprtrdma: Port to new memory registration API


> On Oct 8, 2015, at 3:50 AM, Sagi Grimberg <[email protected]> wrote:
>
> Instead of maintaining a fastreg page list, keep an sg table
> and convert an array of pages to a sg list. Then call ib_map_mr_sg
> and construct ib_reg_wr.
>
> Note that the next step would be to have NFS work with sg lists
> as it maps well to sk_frags (see comment from hch
> http://marc.info/?l=linux-rdma&m=143677002622296&w=2).
>
> Signed-off-by: Sagi Grimberg <[email protected]>
> Acked-by: Christoph Hellwig <[email protected]>
> Tested-by: Steve Wise <[email protected]>
> Tested-by: Selvin Xavier <[email protected]>

Comments below.


> ---
> net/sunrpc/xprtrdma/frwr_ops.c | 112 +++++++++++++++++++++++-----------------
> net/sunrpc/xprtrdma/xprt_rdma.h | 3 +-
> 2 files changed, 67 insertions(+), 48 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> index 0d2f46f600b6..3c89f8052786 100644
> --- a/net/sunrpc/xprtrdma/frwr_ops.c
> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> @@ -151,9 +151,13 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device,
> f->fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, depth);
> if (IS_ERR(f->fr_mr))
> goto out_mr_err;
> - f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth);
> - if (IS_ERR(f->fr_pgl))
> +
> + f->sg = kcalloc(depth, sizeof(*f->sg), GFP_KERNEL);
> + if (!f->sg)
> goto out_list_err;
> +
> + sg_init_table(f->sg, depth);
> +
> return 0;
>
> out_mr_err:
> @@ -163,7 +167,7 @@ out_mr_err:
> return rc;
>
> out_list_err:
> - rc = PTR_ERR(f->fr_pgl);
> + rc = -ENOMEM;
> dprintk("RPC: %s: ib_alloc_fast_reg_page_list status %i\n”,

Should you update this debugging message?


> __func__, rc);
> ib_dereg_mr(f->fr_mr);
> @@ -179,7 +183,7 @@ __frwr_release(struct rpcrdma_mw *r)
> if (rc)
> dprintk("RPC: %s: ib_dereg_mr status %i\n",
> __func__, rc);
> - ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
> + kfree(r->r.frmr.sg);
> }
>
> static int
> @@ -312,14 +316,11 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
> struct rpcrdma_mw *mw;
> struct rpcrdma_frmr *frmr;
> struct ib_mr *mr;
> - struct ib_fast_reg_wr fastreg_wr;
> + struct ib_reg_wr reg_wr;
> struct ib_send_wr *bad_wr;
> + unsigned int dma_nents;
> u8 key;
> - int len, pageoff;
> - int i, rc;
> - int seg_len;
> - u64 pa;
> - int page_no;
> + int i, rc, len, n;
>
> mw = seg1->rl_mw;
> seg1->rl_mw = NULL;
> @@ -332,64 +333,81 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
> } while (mw->r.frmr.fr_state != FRMR_IS_INVALID);
> frmr = &mw->r.frmr;
> frmr->fr_state = FRMR_IS_VALID;
> + mr = frmr->fr_mr;
>
> - pageoff = offset_in_page(seg1->mr_offset);
> - seg1->mr_offset -= pageoff; /* start of page */
> - seg1->mr_len += pageoff;
> - len = -pageoff;
> if (nsegs > ia->ri_max_frmr_depth)
> nsegs = ia->ri_max_frmr_depth;
>
> - for (page_no = i = 0; i < nsegs;) {
> - rpcrdma_map_one(device, seg, direction);
> - pa = seg->mr_dma;
> - for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
> - frmr->fr_pgl->page_list[page_no++] = pa;
> - pa += PAGE_SIZE;
> - }
> + for (len = 0, i = 0; i < nsegs;) {
> + if (seg->mr_page)
> + sg_set_page(&frmr->sg[i],
> + seg->mr_page,
> + seg->mr_len,
> + offset_in_page(seg->mr_offset));
> + else
> + sg_set_buf(&frmr->sg[i], seg->mr_offset,
> + seg->mr_len);
> +
> len += seg->mr_len;
> ++seg;
> ++i;
> +
> /* Check for holes */
> if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
> offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
> break;
> }
> + frmr->sg_nents = i;
> +
> + dma_nents = ib_dma_map_sg(device, frmr->sg, frmr->sg_nents, direction);
> + if (!dma_nents) {
> + pr_err("RPC: %s: failed to dma map sg %p sg_nents %d\n",
> + __func__, frmr->sg, frmr->sg_nents);
> + return -ENOMEM;
> + }
> +
> + n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
> + if (unlikely(n != frmr->sg_nents)) {

The basic logic looks OK. But signed v. unsigned comparison here?
Is there a guarantee that the maximum value sg_nents can have is
INT_MAX?

i and n are signed, sg_nents is unsigned. I’d prefer to have
the variable signage all agree. Since we’re counting stuff,
maybe unsigned all around is a better choice? (rc should
remain signed).

If ib_map_mr_sg can return a negative value, maybe “rc = ib_map_mr_sg”
is more clear. Is n used anywhere else?

Use %u formatter for displaying unsigned variables.


> + pr_err("RPC: %s: failed to map mr %p (%d/%d)\n",
> + __func__, frmr->fr_mr, n, frmr->sg_nents);
> + rc = n < 0 ? n : -EINVAL;
> + goto out_senderr;

Is frmr->sg_nents set correctly here for the ib_dma_unmap_sg()
call in the error exit? Maybe you want to use the value in
“n” instead (as long as it’s positive, of course).


> + }
> +
> dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n",
> - __func__, mw, i, len);
> -
> - memset(&fastreg_wr, 0, sizeof(fastreg_wr));
> - fastreg_wr.wr.wr_id = (unsigned long)(void *)mw;
> - fastreg_wr.wr.opcode = IB_WR_FAST_REG_MR;
> - fastreg_wr.iova_start = seg1->mr_dma + pageoff;
> - fastreg_wr.page_list = frmr->fr_pgl;
> - fastreg_wr.page_shift = PAGE_SHIFT;
> - fastreg_wr.page_list_len = page_no;
> - fastreg_wr.length = len;
> - fastreg_wr.access_flags = writing ?
> - IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
> - IB_ACCESS_REMOTE_READ;
> - mr = frmr->fr_mr;
> + __func__, mw, frmr->sg_nents, mr->length);
> +
> key = (u8)(mr->rkey & 0x000000FF);
> ib_update_fast_reg_key(mr, ++key);
> - fastreg_wr.rkey = mr->rkey;
> +
> + reg_wr.wr.next = NULL;
> + reg_wr.wr.opcode = IB_WR_REG_MR;
> + reg_wr.wr.wr_id = (uintptr_t)mw;

uintptr_t: Nice cleanup.


> + reg_wr.wr.num_sge = 0;
> + reg_wr.wr.send_flags = 0;
> + reg_wr.mr = mr;
> + reg_wr.key = mr->rkey;
> + reg_wr.access = writing ?
> + IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
> + IB_ACCESS_REMOTE_READ;
>
> DECR_CQCOUNT(&r_xprt->rx_ep);
> - rc = ib_post_send(ia->ri_id->qp, &fastreg_wr.wr, &bad_wr);
> + rc = ib_post_send(ia->ri_id->qp, &reg_wr.wr, &bad_wr);
> if (rc)
> goto out_senderr;
>
> + seg1->mr_dir = direction;
> seg1->rl_mw = mw;
> seg1->mr_rkey = mr->rkey;
> - seg1->mr_base = seg1->mr_dma + pageoff;
> - seg1->mr_nsegs = i;
> - seg1->mr_len = len;
> - return i;
> + seg1->mr_base = mr->iova;
> + seg1->mr_nsegs = frmr->sg_nents;
> + seg1->mr_len = mr->length;
> +
> + return frmr->sg_nents;
>
> out_senderr:
> dprintk("RPC: %s: ib_post_send status %i\n", __func__, rc);
> - while (i--)
> - rpcrdma_unmap_one(device, --seg);
> + ib_dma_unmap_sg(device, frmr->sg, frmr->sg_nents, direction);
> __frwr_queue_recovery(mw);
> return rc;
> }
> @@ -403,22 +421,22 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
> struct rpcrdma_mr_seg *seg1 = seg;
> struct rpcrdma_ia *ia = &r_xprt->rx_ia;
> struct rpcrdma_mw *mw = seg1->rl_mw;
> + struct rpcrdma_frmr *frmr = &mw->r.frmr;
> struct ib_send_wr invalidate_wr, *bad_wr;
> int rc, nsegs = seg->mr_nsegs;
>
> dprintk("RPC: %s: FRMR %p\n", __func__, mw);
>
> seg1->rl_mw = NULL;
> - mw->r.frmr.fr_state = FRMR_IS_INVALID;
> + frmr->fr_state = FRMR_IS_INVALID;
>
> memset(&invalidate_wr, 0, sizeof(invalidate_wr));
> invalidate_wr.wr_id = (unsigned long)(void *)mw;
> invalidate_wr.opcode = IB_WR_LOCAL_INV;
> - invalidate_wr.ex.invalidate_rkey = mw->r.frmr.fr_mr->rkey;
> + invalidate_wr.ex.invalidate_rkey = frmr->fr_mr->rkey;
> DECR_CQCOUNT(&r_xprt->rx_ep);
>
> - while (seg1->mr_nsegs--)
> - rpcrdma_unmap_one(ia->ri_device, seg++);
> + ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir);
> read_lock(&ia->ri_qplock);
> rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
> read_unlock(&ia->ri_qplock);
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index c09414e6f91b..426a9107c0ff 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -193,7 +193,8 @@ enum rpcrdma_frmr_state {
> };
>
> struct rpcrdma_frmr {
> - struct ib_fast_reg_page_list *fr_pgl;
> + struct scatterlist *sg;
> + unsigned int sg_nents;
> struct ib_mr *fr_mr;
> enum rpcrdma_frmr_state fr_state;
> struct work_struct fr_work;
> --
> 1.8.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Chuck Lever




2015-10-08 20:26:39

by Or Gerlitz

[permalink] [raw]
Subject: Re: [PATCH v3 12/26] xprtrdma: Port to new memory registration API

On Thu, Oct 8, 2015 at 10:50 AM, Sagi Grimberg <[email protected]> wrote:
> Instead of maintaining a fastreg page list, keep an sg table
> and convert an array of pages to a sg list. Then call ib_map_mr_sg
> and construct ib_reg_wr.
>
> Note that the next step would be to have NFS work with sg lists
> as it maps well to sk_frags (see comment from hch
> http://marc.info/?l=linux-rdma&m=143677002622296&w=2).


I don't see the point to keep the 2nd paragraph in the upstream git log
forever, it deserves to be after the --- line such that when the maintainer
uses git am this stays out, so apply this in the respin you'd be doing
to address Chuck's comments.

Or.

2015-10-10 01:06:33

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v3 12/26] xprtrdma: Port to new memory registration API


>> out_list_err:
>> - rc = PTR_ERR(f->fr_pgl);
>> + rc = -ENOMEM;
>> dprintk("RPC: %s: ib_alloc_fast_reg_page_list status %i\n”,
>
> Should you update this debugging message?

Yes I will.

>> + n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
>> + if (unlikely(n != frmr->sg_nents)) {
>
> The basic logic looks OK. But signed v. unsigned comparison here?
> Is there a guarantee that the maximum value sg_nents can have is
> INT_MAX?
>
> i and n are signed, sg_nents is unsigned. I’d prefer to have
> the variable signage all agree. Since we’re counting stuff,
> maybe unsigned all around is a better choice? (rc should
> remain signed).

OK, I'll make i, n, len unsigned integers.

>
> If ib_map_mr_sg can return a negative value, maybe “rc = ib_map_mr_sg”
> is more clear. Is n used anywhere else?

Nop.

>
> Use %u formatter for displaying unsigned variables.
>
>
>> + pr_err("RPC: %s: failed to map mr %p (%d/%d)\n",
>> + __func__, frmr->fr_mr, n, frmr->sg_nents);
>> + rc = n < 0 ? n : -EINVAL;
>> + goto out_senderr;
>
> Is frmr->sg_nents set correctly here for the ib_dma_unmap_sg()
> call in the error exit? Maybe you want to use the value in
> “n” instead (as long as it’s positive, of course).

Umm, I think that frmr->sg_nents is assigned before the unmap is even
reachable (we assign before mapping because we won't take partial
maapings at least for now).



I'll be waiting for more comments before posting v4.

Cheers,
Sagi.