2015-09-17 09:42:52

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 00/24] New fast registration API

Hi all,

As discussed on the linux-rdma list, there is plenty of room for
improvement in our memory registration APIs. We keep finding
ULPs that are duplicating code, sometimes use wrong strategies
and mis-use our current API.

As a first step, this patch set replaces the fast registration API
to accept a kernel common struct scatterlist and takes care of
the page vector construction in the core layer with hooks for the
drivers HW specific assignments. This allows to remove a common
code duplication as it was done in each and every ULP driver.

The changes from v0 (WIP) are:
- Rebased on top of 4.3-rc1 + Christoph's ib_send_wr conversion patches

- Allow the ULP to pass page_size argument to ib_map_mr_sg in order
to have it work better in some specific workloads. This suggestion
came from Bart Van Assache which pointed out that some applications
might use page sizes significantly smaller than the system PAGE_SIZE
of specific architectures

- Fixed some logical bugs in ib_sg_to_pages

- Added a set_page function pointer for drivers to pass to ib_sg_to_pages
so some drivers (e.g mlx4, mlx5, nes) can avoid keeping a second page
vector and/or re-iterate on the page vector in order to perform HW specific
assignments (big/little endian conversion, extra flags)

- Converted SRP initiator and RDS iwarp ULPs to the new API

- Removed fast registration code from hfi1 driver (as it isn't supported
anyway). I assume that the correct place to get the support back would
be in a shared SW library (hfi1, qib, rxe).

- Updated the change logs

So far my tests covered:
- ULPs:
* iser initiator
* iser target
* xprtrdma
* svcrdma
- Drivers:
* mlx4
* mlx5
* Steve Wise was kind enough to run NFS client/server over cxgb4 and I
have yet to receive any negative feedback from him.

I don't have access to other HW devices (qib, nes) nor iwarp devices so RDS is
compile tested only.

I'm targeting this to 4.4 so I'll appreciate more feedback and a bigger testing
coverage.

The code is available at: https://github.com/sagigrimberg/linux/tree/reg_api.3

Sagi Grimberg (24):
IB/core: Introduce new fast registration API
IB/mlx5: Remove dead fmr code
IB/mlx5: Support the new memory registration API
IB/mlx4: Support the new memory registration API
RDMA/ocrdma: Support the new memory registration API
RDMA/cxgb3: Support the new memory registration API
iw_cxgb4: Support the new memory registration API
IB/qib: Support the new memory registration API
RDMA/nes: Support the new memory registration API
IB/iser: Port to new fast registration API
iser-target: Port to new memory registration API
xprtrdma: Port to new memory registration API
svcrdma: Port to new memory registration API
RDS/IW: Convert to new memory registration API
IB/srp: Convert to new memory registration API
IB/mlx5: Remove old FRWR API support
IB/mlx4: Remove old FRWR API support
RDMA/ocrdma: Remove old FRWR API
RDMA/cxgb3: Remove old FRWR API
iw_cxgb4: Remove old FRWR API
IB/qib: Remove old FRWR API
RDMA/nes: Remove old FRWR API
IB/hfi1: Remove Old fast registraion API support
IB/core: Remove old fast registration API

drivers/infiniband/core/verbs.c | 132 ++++++++++++---
drivers/infiniband/hw/cxgb3/iwch_cq.c | 2 +-
drivers/infiniband/hw/cxgb3/iwch_provider.c | 39 +++--
drivers/infiniband/hw/cxgb3/iwch_provider.h | 2 +
drivers/infiniband/hw/cxgb3/iwch_qp.c | 37 +++--
drivers/infiniband/hw/cxgb4/cq.c | 2 +-
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 25 +--
drivers/infiniband/hw/cxgb4/mem.c | 61 +++----
drivers/infiniband/hw/cxgb4/provider.c | 3 +-
drivers/infiniband/hw/cxgb4/qp.c | 46 +++---
drivers/infiniband/hw/mlx4/cq.c | 2 +-
drivers/infiniband/hw/mlx4/main.c | 3 +-
drivers/infiniband/hw/mlx4/mlx4_ib.h | 22 +--
drivers/infiniband/hw/mlx4/mr.c | 120 ++++++++------
drivers/infiniband/hw/mlx4/qp.c | 30 ++--
drivers/infiniband/hw/mlx5/cq.c | 4 +-
drivers/infiniband/hw/mlx5/main.c | 3 +-
drivers/infiniband/hw/mlx5/mlx5_ib.h | 47 +-----
drivers/infiniband/hw/mlx5/mr.c | 107 +++++++-----
drivers/infiniband/hw/mlx5/qp.c | 140 ++++++++--------
drivers/infiniband/hw/nes/nes_hw.h | 6 -
drivers/infiniband/hw/nes/nes_verbs.c | 161 +++++++-----------
drivers/infiniband/hw/nes/nes_verbs.h | 4 +
drivers/infiniband/hw/ocrdma/ocrdma.h | 2 +
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 3 +-
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 151 ++++++++---------
drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 7 +-
drivers/infiniband/hw/qib/qib_keys.c | 40 ++---
drivers/infiniband/hw/qib/qib_mr.c | 46 +++---
drivers/infiniband/hw/qib/qib_verbs.c | 13 +-
drivers/infiniband/hw/qib/qib_verbs.h | 13 +-
drivers/infiniband/ulp/iser/iscsi_iser.h | 8 +-
drivers/infiniband/ulp/iser/iser_memory.c | 53 +++---
drivers/infiniband/ulp/iser/iser_verbs.c | 16 +-
drivers/infiniband/ulp/isert/ib_isert.c | 129 +++------------
drivers/infiniband/ulp/isert/ib_isert.h | 2 -
drivers/infiniband/ulp/srp/ib_srp.c | 248 +++++++++++++++++-----------
drivers/infiniband/ulp/srp/ib_srp.h | 11 +-
drivers/staging/hfi1/keys.c | 55 ------
drivers/staging/hfi1/mr.c | 32 +---
drivers/staging/hfi1/verbs.c | 9 +-
drivers/staging/hfi1/verbs.h | 8 -
include/linux/sunrpc/svc_rdma.h | 6 +-
include/rdma/ib_verbs.h | 70 +++-----
net/rds/iw.h | 5 +-
net/rds/iw_rdma.c | 118 +++++--------
net/rds/iw_send.c | 57 ++++---
net/sunrpc/xprtrdma/frwr_ops.c | 112 +++++++------
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 76 +++++----
net/sunrpc/xprtrdma/svc_rdma_transport.c | 34 ++--
net/sunrpc/xprtrdma/xprt_rdma.h | 3 +-
51 files changed, 1067 insertions(+), 1258 deletions(-)

--
1.8.4.3



2015-09-17 09:42:56

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 04/24] IB/mlx4: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in mlx4_ib_mr and populate it when
mlx4_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by setting the exact WQE as IB_WR_FAST_REG_MR, just take the
needed information from different places:
- page_size, iova, length, access flags (ib_mr)
- page array (mlx4_ib_mr)
- key (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/mlx4/cq.c | 1 +
drivers/infiniband/hw/mlx4/main.c | 1 +
drivers/infiniband/hw/mlx4/mlx4_ib.h | 7 ++++
drivers/infiniband/hw/mlx4/mr.c | 72 +++++++++++++++++++++++++++++++++---
drivers/infiniband/hw/mlx4/qp.c | 25 +++++++++++++
5 files changed, 100 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 2f4259525bb1..b62236e24708 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -819,6 +819,7 @@ repoll:
break;
case MLX4_OPCODE_FMR:
wc->opcode = IB_WC_FAST_REG_MR;
+ /* TODO: wc->opcode = IB_WC_REG_MR; */
break;
case MLX4_OPCODE_LOCAL_INVAL:
wc->opcode = IB_WC_LOCAL_INV;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index efecdf0216d8..bb82f5fa1612 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2247,6 +2247,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
ibdev->ib_dev.rereg_user_mr = mlx4_ib_rereg_user_mr;
ibdev->ib_dev.dereg_mr = mlx4_ib_dereg_mr;
ibdev->ib_dev.alloc_mr = mlx4_ib_alloc_mr;
+ ibdev->ib_dev.map_mr_sg = mlx4_ib_map_mr_sg;
ibdev->ib_dev.alloc_fast_reg_page_list = mlx4_ib_alloc_fast_reg_page_list;
ibdev->ib_dev.free_fast_reg_page_list = mlx4_ib_free_fast_reg_page_list;
ibdev->ib_dev.attach_mcast = mlx4_ib_mcg_attach;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 1e7b23bb2eb0..07fcf3a49256 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -131,6 +131,10 @@ struct mlx4_ib_cq {

struct mlx4_ib_mr {
struct ib_mr ibmr;
+ __be64 *pages;
+ dma_addr_t page_map;
+ u32 npages;
+ u32 max_pages;
struct mlx4_mr mmr;
struct ib_umem *umem;
};
@@ -706,6 +710,9 @@ int mlx4_ib_dealloc_mw(struct ib_mw *mw);
struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
+int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents);
struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
int page_list_len);
void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 5bba176e9dfa..6ed745798ad3 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -59,7 +59,7 @@ struct ib_mr *mlx4_ib_get_dma_mr(struct ib_pd *pd, int acc)
struct mlx4_ib_mr *mr;
int err;

- mr = kmalloc(sizeof *mr, GFP_KERNEL);
+ mr = kzalloc(sizeof *mr, GFP_KERNEL);
if (!mr)
return ERR_PTR(-ENOMEM);

@@ -140,7 +140,7 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
int err;
int n;

- mr = kmalloc(sizeof *mr, GFP_KERNEL);
+ mr = kzalloc(sizeof *mr, GFP_KERNEL);
if (!mr)
return ERR_PTR(-ENOMEM);

@@ -271,11 +271,41 @@ release_mpt_entry:
return err;
}

+static int
+mlx4_alloc_priv_pages(struct ib_device *device,
+ struct mlx4_ib_mr *mr,
+ int max_pages)
+{
+ int size = max_pages * sizeof(u64);
+
+ mr->pages = dma_alloc_coherent(device->dma_device, size,
+ &mr->page_map, GFP_KERNEL);
+ if (!mr->pages)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void
+mlx4_free_priv_pages(struct mlx4_ib_mr *mr)
+{
+ struct ib_device *device = mr->ibmr.device;
+ int size = mr->max_pages * sizeof(u64);
+
+ if (mr->pages) {
+ dma_free_coherent(device->dma_device, size,
+ mr->pages, mr->page_map);
+ mr->pages = NULL;
+ }
+}
+
int mlx4_ib_dereg_mr(struct ib_mr *ibmr)
{
struct mlx4_ib_mr *mr = to_mmr(ibmr);
int ret;

+ mlx4_free_priv_pages(mr);
+
ret = mlx4_mr_free(to_mdev(ibmr->device)->dev, &mr->mmr);
if (ret)
return ret;
@@ -362,7 +392,7 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
max_num_sg > MLX4_MAX_FAST_REG_PAGES)
return ERR_PTR(-EINVAL);

- mr = kmalloc(sizeof *mr, GFP_KERNEL);
+ mr = kzalloc(sizeof *mr, GFP_KERNEL);
if (!mr)
return ERR_PTR(-ENOMEM);

@@ -371,18 +401,25 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
if (err)
goto err_free;

+ err = mlx4_alloc_priv_pages(pd->device, mr, max_num_sg);
+ if (err)
+ goto err_free_mr;
+
+ mr->max_pages = max_num_sg;
+
err = mlx4_mr_enable(dev->dev, &mr->mmr);
if (err)
- goto err_mr;
+ goto err_free_pl;

mr->ibmr.rkey = mr->ibmr.lkey = mr->mmr.key;
mr->umem = NULL;

return &mr->ibmr;

-err_mr:
+err_free_pl:
+ mlx4_free_priv_pages(mr);
+err_free_mr:
(void) mlx4_mr_free(dev->dev, &mr->mmr);
-
err_free:
kfree(mr);
return ERR_PTR(err);
@@ -528,3 +565,26 @@ int mlx4_ib_fmr_dealloc(struct ib_fmr *ibfmr)

return err;
}
+
+static int mlx4_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct mlx4_ib_mr *mr = to_mmr(ibmr);
+
+ if (unlikely(mr->npages == mr->max_pages))
+ return -ENOMEM;
+
+ mr->pages[mr->npages++] = cpu_to_be64(addr | MLX4_MTT_FLAG_PRESENT);
+
+ return 0;
+}
+
+int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct mlx4_ib_mr *mr = to_mmr(ibmr);
+
+ mr->npages = 0;
+
+ return ib_sg_to_pages(ibmr, sg, sg_nents, mlx4_set_page);
+}
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 3831cddb551f..75097151fc16 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -112,6 +112,7 @@ static const __be32 mlx4_ib_opcode[] = {
[IB_WR_SEND_WITH_INV] = cpu_to_be32(MLX4_OPCODE_SEND_INVAL),
[IB_WR_LOCAL_INV] = cpu_to_be32(MLX4_OPCODE_LOCAL_INVAL),
[IB_WR_FAST_REG_MR] = cpu_to_be32(MLX4_OPCODE_FMR),
+ [IB_WR_REG_MR] = cpu_to_be32(MLX4_OPCODE_FMR),
[IB_WR_MASKED_ATOMIC_CMP_AND_SWP] = cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_CS),
[IB_WR_MASKED_ATOMIC_FETCH_AND_ADD] = cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_FA),
[IB_WR_BIND_MW] = cpu_to_be32(MLX4_OPCODE_BIND_MW),
@@ -2405,6 +2406,22 @@ static __be32 convert_access(int acc)
cpu_to_be32(MLX4_WQE_FMR_PERM_LOCAL_READ);
}

+static void set_reg_seg(struct mlx4_wqe_fmr_seg *fseg,
+ struct ib_reg_wr *wr)
+{
+ struct mlx4_ib_mr *mr = to_mmr(wr->mr);
+
+ fseg->flags = convert_access(wr->access);
+ fseg->mem_key = cpu_to_be32(wr->key);
+ fseg->buf_list = cpu_to_be64(mr->page_map);
+ fseg->start_addr = cpu_to_be64(mr->ibmr.iova);
+ fseg->reg_len = cpu_to_be64(mr->ibmr.length);
+ fseg->offset = 0; /* XXX -- is this just for ZBVA? */
+ fseg->page_size = cpu_to_be32(ilog2(mr->ibmr.page_size));
+ fseg->reserved[0] = 0;
+ fseg->reserved[1] = 0;
+}
+
static void set_fmr_seg(struct mlx4_wqe_fmr_seg *fseg,
struct ib_fast_reg_wr *wr)
{
@@ -2766,6 +2783,14 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
size += sizeof (struct mlx4_wqe_fmr_seg) / 16;
break;

+ case IB_WR_REG_MR:
+ ctrl->srcrb_flags |=
+ cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER);
+ set_reg_seg(wqe, reg_wr(wr));
+ wqe += sizeof (struct mlx4_wqe_fmr_seg);
+ size += sizeof (struct mlx4_wqe_fmr_seg) / 16;
+ break;
+
case IB_WR_BIND_MW:
ctrl->srcrb_flags |=
cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER);
--
1.8.4.3


2015-09-17 09:42:56

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 02/24] IB/mlx5: Remove dead fmr code

Just function declarations - no need for those
laying arround. If for some reason someone will want
FMR support in mlx5, it should be easy enough to restore
a few structs.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/mlx5/mlx5_ib.h | 25 -------------------------
1 file changed, 25 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index ef4a47658f7a..210f99877b0b 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -364,20 +364,6 @@ enum {
MLX5_FMR_BUSY,
};

-struct mlx5_ib_fmr {
- struct ib_fmr ibfmr;
- struct mlx5_core_mr mr;
- int access_flags;
- int state;
- /* protect fmr state
- */
- spinlock_t lock;
- u64 wrid;
- struct ib_send_wr wr[2];
- u8 page_shift;
- struct ib_fast_reg_page_list page_list;
-};
-
struct mlx5_cache_ent {
struct list_head head;
/* sync access to the cahce entry
@@ -462,11 +448,6 @@ static inline struct mlx5_ib_dev *to_mdev(struct ib_device *ibdev)
return container_of(ibdev, struct mlx5_ib_dev, ib_dev);
}

-static inline struct mlx5_ib_fmr *to_mfmr(struct ib_fmr *ibfmr)
-{
- return container_of(ibfmr, struct mlx5_ib_fmr, ibfmr);
-}
-
static inline struct mlx5_ib_cq *to_mcq(struct ib_cq *ibcq)
{
return container_of(ibcq, struct mlx5_ib_cq, ibcq);
@@ -582,12 +563,6 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
int page_list_len);
void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
-struct ib_fmr *mlx5_ib_fmr_alloc(struct ib_pd *pd, int acc,
- struct ib_fmr_attr *fmr_attr);
-int mlx5_ib_map_phys_fmr(struct ib_fmr *ibfmr, u64 *page_list,
- int npages, u64 iova);
-int mlx5_ib_unmap_fmr(struct list_head *fmr_list);
-int mlx5_ib_fmr_dealloc(struct ib_fmr *ibfmr);
int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
const struct ib_wc *in_wc, const struct ib_grh *in_grh,
const struct ib_mad_hdr *in, size_t in_mad_size,
--
1.8.4.3


2015-09-17 09:42:56

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 03/24] IB/mlx5: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in mlx5_ib_mr and populate it when
mlx5_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by setting the exact WQE as IB_WR_FAST_REG_MR, just take the
needed information from different places:
- page_size, iova, length, access flags (ib_mr)
- page array (mlx5_ib_mr)
- key (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/mlx5/cq.c | 3 ++
drivers/infiniband/hw/mlx5/main.c | 1 +
drivers/infiniband/hw/mlx5/mlx5_ib.h | 8 ++++
drivers/infiniband/hw/mlx5/mr.c | 65 ++++++++++++++++++++++++++++
drivers/infiniband/hw/mlx5/qp.c | 83 ++++++++++++++++++++++++++++++++++++
5 files changed, 160 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 5c9eeea62805..90daf791d51d 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -108,6 +108,9 @@ static enum ib_wc_opcode get_umr_comp(struct mlx5_ib_wq *wq, int idx)
case IB_WR_LOCAL_INV:
return IB_WC_LOCAL_INV;

+ case IB_WR_REG_MR:
+ return IB_WC_REG_MR;
+
case IB_WR_FAST_REG_MR:
return IB_WC_FAST_REG_MR;

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 276d7824be8a..7ebce545daf1 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1432,6 +1432,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
dev->ib_dev.detach_mcast = mlx5_ib_mcg_detach;
dev->ib_dev.process_mad = mlx5_ib_process_mad;
dev->ib_dev.alloc_mr = mlx5_ib_alloc_mr;
+ dev->ib_dev.map_mr_sg = mlx5_ib_map_mr_sg;
dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
dev->ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list;
dev->ib_dev.check_mr_status = mlx5_ib_check_mr_status;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 210f99877b0b..bc1853f8e67d 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -319,6 +319,11 @@ enum mlx5_ib_mtt_access_flags {

struct mlx5_ib_mr {
struct ib_mr ibmr;
+ void *descs;
+ dma_addr_t desc_map;
+ int ndescs;
+ int max_descs;
+ int desc_size;
struct mlx5_core_mr mmr;
struct ib_umem *umem;
struct mlx5_shared_mr_info *smr_info;
@@ -560,6 +565,9 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr);
struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
+int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents);
struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
int page_list_len);
void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 6d8aac0c1748..2f3b648719da 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1165,6 +1165,35 @@ error:
return err;
}

+static int
+mlx5_alloc_priv_descs(struct ib_device *device,
+ struct mlx5_ib_mr *mr,
+ int ndescs,
+ int desc_size)
+{
+ int size = ndescs * desc_size;
+
+ mr->descs = dma_alloc_coherent(device->dma_device, size,
+ &mr->desc_map, GFP_KERNEL);
+ if (!mr->descs)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void
+mlx5_free_priv_descs(struct mlx5_ib_mr *mr)
+{
+ struct ib_device *device = mr->ibmr.device;
+ int size = mr->max_descs * mr->desc_size;
+
+ if (mr->descs) {
+ dma_free_coherent(device->dma_device, size,
+ mr->descs, mr->desc_map);
+ mr->descs = NULL;
+ }
+}
+
static int clean_mr(struct mlx5_ib_mr *mr)
{
struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device);
@@ -1184,6 +1213,8 @@ static int clean_mr(struct mlx5_ib_mr *mr)
mr->sig = NULL;
}

+ mlx5_free_priv_descs(mr);
+
if (!umred) {
err = destroy_mkey(dev, mr);
if (err) {
@@ -1273,6 +1304,14 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
if (mr_type == IB_MR_TYPE_MEM_REG) {
access_mode = MLX5_ACCESS_MODE_MTT;
in->seg.log2_page_size = PAGE_SHIFT;
+
+ err = mlx5_alloc_priv_descs(pd->device, mr,
+ ndescs, sizeof(u64));
+ if (err)
+ goto err_free_in;
+
+ mr->desc_size = sizeof(u64);
+ mr->max_descs = ndescs;
} else if (mr_type == IB_MR_TYPE_SIGNATURE) {
u32 psv_index[2];

@@ -1329,6 +1368,7 @@ err_destroy_psv:
mlx5_ib_warn(dev, "failed to destroy wire psv %d\n",
mr->sig->psv_wire.psv_idx);
}
+ mlx5_free_priv_descs(mr);
err_free_sig:
kfree(mr->sig);
err_free_in:
@@ -1420,3 +1460,28 @@ int mlx5_ib_check_mr_status(struct ib_mr *ibmr, u32 check_mask,
done:
return ret;
}
+
+static int mlx5_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct mlx5_ib_mr *mr = to_mmr(ibmr);
+ __be64 *descs;
+
+ if (unlikely(mr->ndescs == mr->max_descs))
+ return -ENOMEM;
+
+ descs = mr->descs;
+ descs[mr->ndescs++] = cpu_to_be64(addr | MLX5_EN_RD | MLX5_EN_WR);
+
+ return 0;
+}
+
+int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct mlx5_ib_mr *mr = to_mmr(ibmr);
+
+ mr->ndescs = 0;
+
+ return ib_sg_to_pages(ibmr, sg, sg_nents, mlx5_set_page);
+}
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index dcd8d58f95e1..61d3aa9a6ca9 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -65,6 +65,7 @@ static const u32 mlx5_ib_opcode[] = {
[IB_WR_SEND_WITH_INV] = MLX5_OPCODE_SEND_INVAL,
[IB_WR_LOCAL_INV] = MLX5_OPCODE_UMR,
[IB_WR_FAST_REG_MR] = MLX5_OPCODE_UMR,
+ [IB_WR_REG_MR] = MLX5_OPCODE_UMR,
[IB_WR_MASKED_ATOMIC_CMP_AND_SWP] = MLX5_OPCODE_ATOMIC_MASKED_CS,
[IB_WR_MASKED_ATOMIC_FETCH_AND_ADD] = MLX5_OPCODE_ATOMIC_MASKED_FA,
[MLX5_IB_WR_UMR] = MLX5_OPCODE_UMR,
@@ -1901,6 +1902,17 @@ static __be64 sig_mkey_mask(void)
return cpu_to_be64(result);
}

+static void set_reg_umr_seg(struct mlx5_wqe_umr_ctrl_seg *umr,
+ struct mlx5_ib_mr *mr)
+{
+ int ndescs = mr->ndescs;
+
+ memset(umr, 0, sizeof(*umr));
+ umr->flags = MLX5_UMR_CHECK_NOT_FREE;
+ umr->klm_octowords = get_klm_octo(ndescs);
+ umr->mkey_mask = frwr_mkey_mask();
+}
+
static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr,
struct ib_send_wr *wr, int li)
{
@@ -1992,6 +2004,22 @@ static u8 get_umr_flags(int acc)
MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN;
}

+static void set_reg_mkey_seg(struct mlx5_mkey_seg *seg,
+ struct mlx5_ib_mr *mr,
+ u32 key, int access)
+{
+ int ndescs = ALIGN(mr->ndescs, 8) >> 1;
+
+ memset(seg, 0, sizeof(*seg));
+ seg->flags = get_umr_flags(access) | MLX5_ACCESS_MODE_MTT;
+ seg->qpn_mkey7_0 = cpu_to_be32((key & 0xff) | 0xffffff00);
+ seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
+ seg->start_addr = cpu_to_be64(mr->ibmr.iova);
+ seg->len = cpu_to_be64(mr->ibmr.length);
+ seg->xlt_oct_size = cpu_to_be32(ndescs);
+ seg->log2_page_size = ilog2(mr->ibmr.page_size);
+}
+
static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr,
int li, int *writ)
{
@@ -2033,6 +2061,17 @@ static void set_reg_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *w
mlx5_mkey_variant(umrwr->mkey));
}

+static void set_reg_data_seg(struct mlx5_wqe_data_seg *dseg,
+ struct mlx5_ib_mr *mr,
+ struct mlx5_ib_pd *pd)
+{
+ int bcount = mr->desc_size * mr->ndescs;
+
+ dseg->addr = cpu_to_be64(mr->desc_map);
+ dseg->byte_count = cpu_to_be32(ALIGN(bcount, 64));
+ dseg->lkey = cpu_to_be32(pd->ibpd.local_dma_lkey);
+}
+
static void set_frwr_pages(struct mlx5_wqe_data_seg *dseg,
struct ib_send_wr *wr,
struct mlx5_core_dev *mdev,
@@ -2438,6 +2477,38 @@ static int set_psv_wr(struct ib_sig_domain *domain,
return 0;
}

+static int set_reg_wr(struct mlx5_ib_qp *qp,
+ struct ib_reg_wr *wr,
+ void **seg, int *size)
+{
+ struct mlx5_ib_mr *mr = to_mmr(wr->mr);
+ struct mlx5_ib_pd *pd = to_mpd(qp->ibqp.pd);
+
+ if (unlikely(wr->wr.send_flags & IB_SEND_INLINE)) {
+ mlx5_ib_warn(to_mdev(qp->ibqp.device),
+ "Invalid IB_SEND_INLINE send flag\n");
+ return -EINVAL;
+ }
+
+ set_reg_umr_seg(*seg, mr);
+ *seg += sizeof(struct mlx5_wqe_umr_ctrl_seg);
+ *size += sizeof(struct mlx5_wqe_umr_ctrl_seg) / 16;
+ if (unlikely((*seg == qp->sq.qend)))
+ *seg = mlx5_get_send_wqe(qp, 0);
+
+ set_reg_mkey_seg(*seg, mr, wr->key, wr->access);
+ *seg += sizeof(struct mlx5_mkey_seg);
+ *size += sizeof(struct mlx5_mkey_seg) / 16;
+ if (unlikely((*seg == qp->sq.qend)))
+ *seg = mlx5_get_send_wqe(qp, 0);
+
+ set_reg_data_seg(*seg, mr, pd);
+ *seg += sizeof(struct mlx5_wqe_data_seg);
+ *size += (sizeof(struct mlx5_wqe_data_seg) / 16);
+
+ return 0;
+}
+
static int set_frwr_li_wr(void **seg, struct ib_send_wr *wr, int *size,
struct mlx5_core_dev *mdev, struct mlx5_ib_pd *pd, struct mlx5_ib_qp *qp)
{
@@ -2680,6 +2751,18 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
num_sge = 0;
break;

+ case IB_WR_REG_MR:
+ next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL;
+ qp->sq.wr_data[idx] = IB_WR_REG_MR;
+ ctrl->imm = cpu_to_be32(reg_wr(wr)->key);
+ err = set_reg_wr(qp, reg_wr(wr), &seg, &size);
+ if (err) {
+ *bad_wr = wr;
+ goto out;
+ }
+ num_sge = 0;
+ break;
+
case IB_WR_REG_SIG_MR:
qp->sq.wr_data[idx] = IB_WR_REG_SIG_MR;
mr = to_mmr(sig_handover_wr(wr)->sig_mr);
--
1.8.4.3


2015-09-17 09:42:56

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 01/24] IB/core: Introduce new fast registration API

The new fast registration verg ib_map_mr_sg receives a scatterlist
and converts it to a page list under the verbs API thus hiding
the specific HW mapping details away from the consumer.

The provider drivers are provided with a generic helper ib_sg_to_pages
that converts a scatterlist into a vector of page addresses. The
drivers can still perform any HW specific page address setting
by passing a set_page function pointer which will be invoked for
each page address. This allows drivers to avoid keeping a shadow
page vectors and convert them to HW specific translations by doing
extra copies.

This API will allow ULPs to remove the duplicated code of constructing
a page vector from a given sg list.

The send work request ib_reg_wr also shrinks as it will contain only
mr, key and access flags in addition.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/core/verbs.c | 107 ++++++++++++++++++++++++++++++++++++++++
include/rdma/ib_verbs.h | 30 +++++++++++
2 files changed, 137 insertions(+)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index e1f2c9887f3f..d99f57f1f737 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1469,3 +1469,110 @@ int ib_check_mr_status(struct ib_mr *mr, u32 check_mask,
mr->device->check_mr_status(mr, check_mask, mr_status) : -ENOSYS;
}
EXPORT_SYMBOL(ib_check_mr_status);
+
+/**
+ * ib_map_mr_sg() - Map a memory region with the largest prefix of
+ * a dma mapped SG list
+ * @mr: memory region
+ * @sg: dma mapped scatterlist
+ * @sg_nents: number of entries in sg
+ * @page_size: page vector desired page size
+ *
+ * Constraints:
+ * - The first sg element is allowed to have an offset.
+ * - Each sg element must be aligned to page_size (or physically
+ * contiguous to the previous element). In case an sg element has a
+ * non contiguous offset, the mapping prefix will not include it.
+ * - The last sg element is allowed to have length less than page_size.
+ * - If sg_nents total byte length exceeds the mr max_num_sge * page_size
+ * then only max_num_sg entries will be mapped.
+ *
+ * Returns the number of sg elements that were mapped to the memory region.
+ *
+ * After this completes successfully, the memory region
+ * is ready for registration.
+ */
+int ib_map_mr_sg(struct ib_mr *mr,
+ struct scatterlist *sg,
+ unsigned int sg_nents,
+ unsigned int page_size)
+{
+ if (unlikely(!mr->device->map_mr_sg))
+ return -ENOSYS;
+
+ mr->page_size = page_size;
+
+ return mr->device->map_mr_sg(mr, sg, sg_nents);
+}
+EXPORT_SYMBOL(ib_map_mr_sg);
+
+/**
+ * ib_sg_to_pages() - Convert the largest prefix of a sg list
+ * to a page vector
+ * @mr: memory region
+ * @sgl: dma mapped scatterlist
+ * @sg_nents: number of entries in sg
+ * @set_page: driver page assignment function pointer
+ *
+ * Core service helper for drivers to covert the largest
+ * prefix of given sg list to a page vector. The sg list
+ * prefix converted is the prefix that meet the requirements
+ * of ib_map_mr_sg.
+ *
+ * Returns the number of sg elements that were assigned to
+ * a page vector.
+ */
+int ib_sg_to_pages(struct ib_mr *mr,
+ struct scatterlist *sgl,
+ unsigned int sg_nents,
+ int (*set_page)(struct ib_mr *, u64))
+{
+ struct scatterlist *sg;
+ u64 last_end_dma_addr = 0, last_page_addr = 0;
+ unsigned int last_page_off = 0;
+ u64 page_mask = ~((u64)mr->page_size - 1);
+ int i;
+
+ mr->iova = sg_dma_address(&sgl[0]);
+ mr->length = 0;
+
+ for_each_sg(sgl, sg, sg_nents, i) {
+ u64 dma_addr = sg_dma_address(sg);
+ unsigned int dma_len = sg_dma_len(sg);
+ u64 end_dma_addr = dma_addr + dma_len;
+ u64 page_addr = dma_addr & page_mask;
+
+ if (i && page_addr != dma_addr) {
+ if (last_end_dma_addr != dma_addr) {
+ /* gap */
+ goto done;
+
+ } else if (last_page_off + dma_len < mr->page_size) {
+ /* chunk this fragment with the last */
+ last_end_dma_addr += dma_len;
+ last_page_off += dma_len;
+ mr->length += dma_len;
+ continue;
+ } else {
+ /* map starting from the next page */
+ page_addr = last_page_addr + mr->page_size;
+ dma_len -= mr->page_size - last_page_off;
+ }
+ }
+
+ do {
+ if (unlikely(set_page(mr, page_addr)))
+ goto done;
+ page_addr += mr->page_size;
+ } while (page_addr < end_dma_addr);
+
+ mr->length += dma_len;
+ last_end_dma_addr = end_dma_addr;
+ last_page_addr = end_dma_addr & page_mask;
+ last_page_off = end_dma_addr & ~page_mask;
+ }
+
+done:
+ return i;
+}
+EXPORT_SYMBOL(ib_sg_to_pages);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index edf02908a0fd..97c73359ade8 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -737,6 +737,7 @@ enum ib_wc_opcode {
IB_WC_LSO,
IB_WC_LOCAL_INV,
IB_WC_FAST_REG_MR,
+ IB_WC_REG_MR,
IB_WC_MASKED_COMP_SWAP,
IB_WC_MASKED_FETCH_ADD,
/*
@@ -1029,6 +1030,7 @@ enum ib_wr_opcode {
IB_WR_RDMA_READ_WITH_INV,
IB_WR_LOCAL_INV,
IB_WR_FAST_REG_MR,
+ IB_WR_REG_MR,
IB_WR_MASKED_ATOMIC_CMP_AND_SWP,
IB_WR_MASKED_ATOMIC_FETCH_AND_ADD,
IB_WR_BIND_MW,
@@ -1161,6 +1163,18 @@ static inline struct ib_fast_reg_wr *fast_reg_wr(struct ib_send_wr *wr)
return container_of(wr, struct ib_fast_reg_wr, wr);
}

+struct ib_reg_wr {
+ struct ib_send_wr wr;
+ struct ib_mr *mr;
+ u32 key;
+ int access;
+};
+
+static inline struct ib_reg_wr *reg_wr(struct ib_send_wr *wr)
+{
+ return container_of(wr, struct ib_reg_wr, wr);
+}
+
struct ib_bind_mw_wr {
struct ib_send_wr wr;
struct ib_mw *mw;
@@ -1373,6 +1387,9 @@ struct ib_mr {
struct ib_uobject *uobject;
u32 lkey;
u32 rkey;
+ u64 iova;
+ u32 length;
+ unsigned int page_size;
atomic_t usecnt; /* count number of MWs */
};

@@ -1757,6 +1774,9 @@ struct ib_device {
struct ib_mr * (*alloc_mr)(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
+ int (*map_mr_sg)(struct ib_mr *mr,
+ struct scatterlist *sg,
+ unsigned int sg_nents);
struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device,
int page_list_len);
void (*free_fast_reg_page_list)(struct ib_fast_reg_page_list *page_list);
@@ -3062,4 +3082,14 @@ struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port,
u16 pkey, const union ib_gid *gid,
const struct sockaddr *addr);

+int ib_map_mr_sg(struct ib_mr *mr,
+ struct scatterlist *sg,
+ unsigned int sg_nents,
+ unsigned int page_size);
+
+int ib_sg_to_pages(struct ib_mr *mr,
+ struct scatterlist *sgl,
+ unsigned int sg_nents,
+ int (*set_page)(struct ib_mr *, u64));
+
#endif /* IB_VERBS_H */
--
1.8.4.3


2015-09-17 09:42:56

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 06/24] RDMA/cxgb3: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in iwch_mr and populate it when
iwch_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating build_fastreg just take the needed information
from different places:
- page_size, iova, length (ib_mr)
- page array (iwch_mr)
- key, access flags (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/cxgb3/iwch_provider.c | 33 ++++++++++++++++++++
drivers/infiniband/hw/cxgb3/iwch_provider.h | 2 ++
drivers/infiniband/hw/cxgb3/iwch_qp.c | 48 +++++++++++++++++++++++++++++
3 files changed, 83 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 93308c45f298..ee3d5ca7de6c 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -463,6 +463,7 @@ static int iwch_dereg_mr(struct ib_mr *ib_mr)
return -EINVAL;

mhp = to_iwch_mr(ib_mr);
+ kfree(mhp->pages);
rhp = mhp->rhp;
mmid = mhp->attr.stag >> 8;
cxio_dereg_mem(&rhp->rdev, mhp->attr.stag, mhp->attr.pbl_size,
@@ -821,6 +822,12 @@ static struct ib_mr *iwch_alloc_mr(struct ib_pd *pd,
if (!mhp)
goto err;

+ mhp->pages = kcalloc(max_num_sg, sizeof(u64), GFP_KERNEL);
+ if (!mhp->pages) {
+ ret = -ENOMEM;
+ goto pl_err;
+ }
+
mhp->rhp = rhp;
ret = iwch_alloc_pbl(mhp, max_num_sg);
if (ret)
@@ -847,11 +854,36 @@ err3:
err2:
iwch_free_pbl(mhp);
err1:
+ kfree(mhp->pages);
+pl_err:
kfree(mhp);
err:
return ERR_PTR(ret);
}

+static int iwch_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct iwch_mr *mhp = to_iwch_mr(ibmr);
+
+ if (unlikely(mhp->npages == mhp->attr.pbl_size))
+ return -ENOMEM;
+
+ mhp->pages[mhp->npages++] = addr;
+
+ return 0;
+}
+
+static int iwch_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct iwch_mr *mhp = to_iwch_mr(ibmr);
+
+ mhp->npages = 0;
+
+ return ib_sg_to_pages(ibmr, sg, sg_nents, iwch_set_page);
+}
+
static struct ib_fast_reg_page_list *iwch_alloc_fastreg_pbl(
struct ib_device *device,
int page_list_len)
@@ -1450,6 +1482,7 @@ int iwch_register_device(struct iwch_dev *dev)
dev->ibdev.bind_mw = iwch_bind_mw;
dev->ibdev.dealloc_mw = iwch_dealloc_mw;
dev->ibdev.alloc_mr = iwch_alloc_mr;
+ dev->ibdev.map_mr_sg = iwch_map_mr_sg;
dev->ibdev.alloc_fast_reg_page_list = iwch_alloc_fastreg_pbl;
dev->ibdev.free_fast_reg_page_list = iwch_free_fastreg_pbl;
dev->ibdev.attach_mcast = iwch_multicast_attach;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
index 87c14b0c5ac0..2ac85b86a680 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
@@ -77,6 +77,8 @@ struct iwch_mr {
struct iwch_dev *rhp;
u64 kva;
struct tpt_attributes attr;
+ u64 *pages;
+ u32 npages;
};

typedef struct iwch_mw iwch_mw_handle;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index bac0508fedd9..a09ea538e990 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -146,6 +146,49 @@ static int build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
return 0;
}

+static int build_memreg(union t3_wr *wqe, struct ib_reg_wr *wr,
+ u8 *flit_cnt, int *wr_cnt, struct t3_wq *wq)
+{
+ struct iwch_mr *mhp = to_iwch_mr(wr->mr);
+ int i;
+ __be64 *p;
+
+ if (mhp->npages > T3_MAX_FASTREG_DEPTH)
+ return -EINVAL;
+ *wr_cnt = 1;
+ wqe->fastreg.stag = cpu_to_be32(wr->key);
+ wqe->fastreg.len = cpu_to_be32(mhp->ibmr.length);
+ wqe->fastreg.va_base_hi = cpu_to_be32(mhp->ibmr.iova >> 32);
+ wqe->fastreg.va_base_lo_fbo =
+ cpu_to_be32(mhp->ibmr.iova & 0xffffffff);
+ wqe->fastreg.page_type_perms = cpu_to_be32(
+ V_FR_PAGE_COUNT(mhp->npages) |
+ V_FR_PAGE_SIZE(ilog2(wr->mr->page_size) - 12) |
+ V_FR_TYPE(TPT_VATO) |
+ V_FR_PERMS(iwch_ib_to_tpt_access(wr->access)));
+ p = &wqe->fastreg.pbl_addrs[0];
+ for (i = 0; i < mhp->npages; i++, p++) {
+
+ /* If we need a 2nd WR, then set it up */
+ if (i == T3_MAX_FASTREG_FRAG) {
+ *wr_cnt = 2;
+ wqe = (union t3_wr *)(wq->queue +
+ Q_PTR2IDX((wq->wptr+1), wq->size_log2));
+ build_fw_riwrh((void *)wqe, T3_WR_FASTREG, 0,
+ Q_GENBIT(wq->wptr + 1, wq->size_log2),
+ 0, 1 + mhp->npages - T3_MAX_FASTREG_FRAG,
+ T3_EOP);
+
+ p = &wqe->pbl_frag.pbl_addrs[0];
+ }
+ *p = cpu_to_be64((u64)mhp->pages[i]);
+ }
+ *flit_cnt = 5 + mhp->npages;
+ if (*flit_cnt > 15)
+ *flit_cnt = 15;
+ return 0;
+}
+
static int build_fastreg(union t3_wr *wqe, struct ib_send_wr *send_wr,
u8 *flit_cnt, int *wr_cnt, struct t3_wq *wq)
{
@@ -419,6 +462,11 @@ int iwch_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
err = build_fastreg(wqe, wr, &t3_wr_flit_cnt,
&wr_cnt, &qhp->wq);
break;
+ case IB_WR_REG_MR:
+ t3_wr_opcode = T3_WR_FASTREG;
+ err = build_memreg(wqe, reg_wr(wr), &t3_wr_flit_cnt,
+ &wr_cnt, &qhp->wq);
+ break;
case IB_WR_LOCAL_INV:
if (wr->send_flags & IB_SEND_FENCE)
t3_wr_flags |= T3_LOCAL_FENCE_FLAG;
--
1.8.4.3


2015-09-17 09:42:56

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 05/24] RDMA/ocrdma: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in ocrdma_mr and populate it when
ocrdma_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating IB_WR_FAST_REG_MR, but take the needed
information from different places:
- page_size, iova, length, access flags (ib_mr)
- page array (ocrdma_mr)
- key (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/ocrdma/ocrdma.h | 2 +
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 +
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 89 +++++++++++++++++++++++++++++
drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 +
4 files changed, 95 insertions(+)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h
index b4091ab48db0..c2f3af5d5194 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -193,6 +193,8 @@ struct ocrdma_mr {
struct ib_mr ibmr;
struct ib_umem *umem;
struct ocrdma_hw_mr hwmr;
+ u64 *pages;
+ u32 npages;
};

struct ocrdma_stats {
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 87aa55df7c82..874beb4b07a1 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -182,6 +182,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
dev->ibdev.reg_user_mr = ocrdma_reg_user_mr;

dev->ibdev.alloc_mr = ocrdma_alloc_mr;
+ dev->ibdev.map_mr_sg = ocrdma_map_mr_sg;
dev->ibdev.alloc_fast_reg_page_list = ocrdma_alloc_frmr_page_list;
dev->ibdev.free_fast_reg_page_list = ocrdma_free_frmr_page_list;

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index eb09e224acb9..853746e17d5c 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -1013,6 +1013,7 @@ int ocrdma_dereg_mr(struct ib_mr *ib_mr)

(void) ocrdma_mbx_dealloc_lkey(dev, mr->hwmr.fr_mr, mr->hwmr.lkey);

+ kfree(mr->pages);
ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr);

/* it could be user registered memory. */
@@ -2177,6 +2178,60 @@ static int get_encoded_page_size(int pg_sz)
return i;
}

+static int ocrdma_build_reg(struct ocrdma_qp *qp,
+ struct ocrdma_hdr_wqe *hdr,
+ struct ib_reg_wr *wr)
+{
+ u64 fbo;
+ struct ocrdma_ewqe_fr *fast_reg = (struct ocrdma_ewqe_fr *)(hdr + 1);
+ struct ocrdma_mr *mr = get_ocrdma_mr(wr->mr);
+ struct ocrdma_pbl *pbl_tbl = mr->hwmr.pbl_table;
+ struct ocrdma_pbe *pbe;
+ u32 wqe_size = sizeof(*fast_reg) + sizeof(*hdr);
+ int num_pbes = 0, i;
+
+ wqe_size = roundup(wqe_size, OCRDMA_WQE_ALIGN_BYTES);
+
+ hdr->cw |= (OCRDMA_FR_MR << OCRDMA_WQE_OPCODE_SHIFT);
+ hdr->cw |= ((wqe_size / OCRDMA_WQE_STRIDE) << OCRDMA_WQE_SIZE_SHIFT);
+
+ if (wr->access & IB_ACCESS_LOCAL_WRITE)
+ hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_LOCAL_WR;
+ if (wr->access & IB_ACCESS_REMOTE_WRITE)
+ hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_REMOTE_WR;
+ if (wr->access & IB_ACCESS_REMOTE_READ)
+ hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_REMOTE_RD;
+ hdr->lkey = wr->key;
+ hdr->total_len = mr->ibmr.length;
+
+ fbo = mr->ibmr.iova - mr->pages[0];
+
+ fast_reg->va_hi = upper_32_bits(mr->ibmr.iova);
+ fast_reg->va_lo = (u32) (mr->ibmr.iova & 0xffffffff);
+ fast_reg->fbo_hi = upper_32_bits(fbo);
+ fast_reg->fbo_lo = (u32) fbo & 0xffffffff;
+ fast_reg->num_sges = mr->npages;
+ fast_reg->size_sge = get_encoded_page_size(mr->ibmr.page_size);
+
+ pbe = pbl_tbl->va;
+ for (i = 0; i < mr->npages; i++) {
+ u64 buf_addr = mr->pages[i];
+ pbe->pa_lo = cpu_to_le32((u32) (buf_addr & PAGE_MASK));
+ pbe->pa_hi = cpu_to_le32((u32) upper_32_bits(buf_addr));
+ num_pbes += 1;
+ pbe++;
+
+ /* if the pbl is full storing the pbes,
+ * move to next pbl.
+ */
+ if (num_pbes == (mr->hwmr.pbl_size/sizeof(u64))) {
+ pbl_tbl++;
+ pbe = (struct ocrdma_pbe *)pbl_tbl->va;
+ }
+ }
+
+ return 0;
+}

static int ocrdma_build_fr(struct ocrdma_qp *qp, struct ocrdma_hdr_wqe *hdr,
struct ib_send_wr *send_wr)
@@ -2304,6 +2359,9 @@ int ocrdma_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
case IB_WR_FAST_REG_MR:
status = ocrdma_build_fr(qp, hdr, wr);
break;
+ case IB_WR_REG_MR:
+ status = ocrdma_build_reg(qp, hdr, reg_wr(wr));
+ break;
default:
status = -EINVAL;
break;
@@ -3059,6 +3117,12 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *ibpd,
if (!mr)
return ERR_PTR(-ENOMEM);

+ mr->pages = kcalloc(max_num_sg, sizeof(u64), GFP_KERNEL);
+ if (!mr->pages) {
+ status = -ENOMEM;
+ goto pl_err;
+ }
+
status = ocrdma_get_pbl_info(dev, mr, max_num_sg);
if (status)
goto pbl_err;
@@ -3082,6 +3146,8 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *ibpd,
mbx_err:
ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr);
pbl_err:
+ kfree(mr->pages);
+pl_err:
kfree(mr);
return ERR_PTR(-ENOMEM);
}
@@ -3268,3 +3334,26 @@ pbl_err:
kfree(mr);
return ERR_PTR(status);
}
+
+static int ocrdma_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct ocrdma_mr *mr = get_ocrdma_mr(ibmr);
+
+ if (unlikely(mr->npages == mr->hwmr.num_pbes))
+ return -ENOMEM;
+
+ mr->pages[mr->npages++] = addr;
+
+ return 0;
+}
+
+int ocrdma_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct ocrdma_mr *mr = get_ocrdma_mr(ibmr);
+
+ mr->npages = 0;
+
+ return ib_sg_to_pages(ibmr, sg, sg_nents, ocrdma_set_page);
+}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index 308c16857a5d..4edf63f9c6c7 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -125,6 +125,9 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *, u64 start, u64 length,
struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
+int ocrdma_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents);
struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device
*ibdev,
int page_list_len);
--
1.8.4.3


2015-09-17 09:43:11

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 11/24] iser-target: Port to new memory registration API

Remove fastreg page list allocation as the page vector
is now private to the provider. Instead of constructing
the page list and fast_req work request, call ib_map_mr_sg
and construct ib_reg_wr.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/ulp/isert/ib_isert.c | 129 +++++++-------------------------
drivers/infiniband/ulp/isert/ib_isert.h | 2 -
2 files changed, 27 insertions(+), 104 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index dcb29d166211..67d56c3de3dd 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -475,10 +475,8 @@ isert_conn_free_fastreg_pool(struct isert_conn *isert_conn)
list_for_each_entry_safe(fr_desc, tmp,
&isert_conn->fr_pool, list) {
list_del(&fr_desc->list);
- ib_free_fast_reg_page_list(fr_desc->data_frpl);
ib_dereg_mr(fr_desc->data_mr);
if (fr_desc->pi_ctx) {
- ib_free_fast_reg_page_list(fr_desc->pi_ctx->prot_frpl);
ib_dereg_mr(fr_desc->pi_ctx->prot_mr);
ib_dereg_mr(fr_desc->pi_ctx->sig_mr);
kfree(fr_desc->pi_ctx);
@@ -506,22 +504,13 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc,
return -ENOMEM;
}

- pi_ctx->prot_frpl = ib_alloc_fast_reg_page_list(device,
- ISCSI_ISER_SG_TABLESIZE);
- if (IS_ERR(pi_ctx->prot_frpl)) {
- isert_err("Failed to allocate prot frpl err=%ld\n",
- PTR_ERR(pi_ctx->prot_frpl));
- ret = PTR_ERR(pi_ctx->prot_frpl);
- goto err_pi_ctx;
- }
-
pi_ctx->prot_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
ISCSI_ISER_SG_TABLESIZE);
if (IS_ERR(pi_ctx->prot_mr)) {
isert_err("Failed to allocate prot frmr err=%ld\n",
PTR_ERR(pi_ctx->prot_mr));
ret = PTR_ERR(pi_ctx->prot_mr);
- goto err_prot_frpl;
+ goto err_pi_ctx;
}
desc->ind |= ISERT_PROT_KEY_VALID;

@@ -541,8 +530,6 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc,

err_prot_mr:
ib_dereg_mr(pi_ctx->prot_mr);
-err_prot_frpl:
- ib_free_fast_reg_page_list(pi_ctx->prot_frpl);
err_pi_ctx:
kfree(pi_ctx);

@@ -553,34 +540,18 @@ static int
isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd,
struct fast_reg_descriptor *fr_desc)
{
- int ret;
-
- fr_desc->data_frpl = ib_alloc_fast_reg_page_list(ib_device,
- ISCSI_ISER_SG_TABLESIZE);
- if (IS_ERR(fr_desc->data_frpl)) {
- isert_err("Failed to allocate data frpl err=%ld\n",
- PTR_ERR(fr_desc->data_frpl));
- return PTR_ERR(fr_desc->data_frpl);
- }
-
fr_desc->data_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
ISCSI_ISER_SG_TABLESIZE);
if (IS_ERR(fr_desc->data_mr)) {
isert_err("Failed to allocate data frmr err=%ld\n",
PTR_ERR(fr_desc->data_mr));
- ret = PTR_ERR(fr_desc->data_mr);
- goto err_data_frpl;
+ return PTR_ERR(fr_desc->data_mr);
}
fr_desc->ind |= ISERT_DATA_KEY_VALID;

isert_dbg("Created fr_desc %p\n", fr_desc);

return 0;
-
-err_data_frpl:
- ib_free_fast_reg_page_list(fr_desc->data_frpl);
-
- return ret;
}

static int
@@ -2516,45 +2487,6 @@ unmap_cmd:
return ret;
}

-static int
-isert_map_fr_pagelist(struct ib_device *ib_dev,
- struct scatterlist *sg_start, int sg_nents, u64 *fr_pl)
-{
- u64 start_addr, end_addr, page, chunk_start = 0;
- struct scatterlist *tmp_sg;
- int i = 0, new_chunk, last_ent, n_pages;
-
- n_pages = 0;
- new_chunk = 1;
- last_ent = sg_nents - 1;
- for_each_sg(sg_start, tmp_sg, sg_nents, i) {
- start_addr = ib_sg_dma_address(ib_dev, tmp_sg);
- if (new_chunk)
- chunk_start = start_addr;
- end_addr = start_addr + ib_sg_dma_len(ib_dev, tmp_sg);
-
- isert_dbg("SGL[%d] dma_addr: 0x%llx len: %u\n",
- i, (unsigned long long)tmp_sg->dma_address,
- tmp_sg->length);
-
- if ((end_addr & ~PAGE_MASK) && i < last_ent) {
- new_chunk = 0;
- continue;
- }
- new_chunk = 1;
-
- page = chunk_start & PAGE_MASK;
- do {
- fr_pl[n_pages++] = page;
- isert_dbg("Mapped page_list[%d] page_addr: 0x%llx\n",
- n_pages - 1, page);
- page += PAGE_SIZE;
- } while (page < end_addr);
- }
-
- return n_pages;
-}
-
static inline void
isert_inv_rkey(struct ib_send_wr *inv_wr, struct ib_mr *mr)
{
@@ -2580,11 +2512,9 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
struct isert_device *device = isert_conn->device;
struct ib_device *ib_dev = device->ib_device;
struct ib_mr *mr;
- struct ib_fast_reg_page_list *frpl;
- struct ib_fast_reg_wr fr_wr;
+ struct ib_reg_wr reg_wr;
struct ib_send_wr inv_wr, *bad_wr, *wr = NULL;
- int ret, pagelist_len;
- u32 page_off;
+ int ret, n;

if (mem->dma_nents == 1) {
sge->lkey = device->pd->local_dma_lkey;
@@ -2595,45 +2525,40 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
return 0;
}

- if (ind == ISERT_DATA_KEY_VALID) {
+ if (ind == ISERT_DATA_KEY_VALID)
/* Registering data buffer */
mr = fr_desc->data_mr;
- frpl = fr_desc->data_frpl;
- } else {
+ else
/* Registering protection buffer */
mr = fr_desc->pi_ctx->prot_mr;
- frpl = fr_desc->pi_ctx->prot_frpl;
- }
-
- page_off = mem->offset % PAGE_SIZE;
-
- isert_dbg("Use fr_desc %p sg_nents %d offset %u\n",
- fr_desc, mem->nents, mem->offset);
-
- pagelist_len = isert_map_fr_pagelist(ib_dev, mem->sg, mem->nents,
- &frpl->page_list[0]);

if (!(fr_desc->ind & ind)) {
isert_inv_rkey(&inv_wr, mr);
wr = &inv_wr;
}

- /* Prepare FASTREG WR */
- memset(&fr_wr, 0, sizeof(fr_wr));
- fr_wr.wr.wr_id = ISER_FASTREG_LI_WRID;
- fr_wr.wr.opcode = IB_WR_FAST_REG_MR;
- fr_wr.iova_start = frpl->page_list[0] + page_off;
- fr_wr.page_list = frpl;
- fr_wr.page_list_len = pagelist_len;
- fr_wr.page_shift = PAGE_SHIFT;
- fr_wr.length = mem->len;
- fr_wr.rkey = mr->rkey;
- fr_wr.access_flags = IB_ACCESS_LOCAL_WRITE;
+ n = ib_map_mr_sg(mr, mem->sg, mem->nents, PAGE_SIZE);
+ if (unlikely(n != mem->nents)) {
+ isert_err("failed to map mr sg (%d/%d)\n",
+ n, mem->nents);
+ return n < 0 ? n : -EINVAL;
+ }
+
+ isert_dbg("Use fr_desc %p sg_nents %d offset %u\n",
+ fr_desc, mem->nents, mem->offset);
+
+ reg_wr.wr.opcode = IB_WR_REG_MR;
+ reg_wr.wr.wr_id = ISER_FASTREG_LI_WRID;
+ reg_wr.wr.send_flags = 0;
+ reg_wr.wr.num_sge = 0;
+ reg_wr.mr = mr;
+ reg_wr.key = mr->lkey;
+ reg_wr.access = IB_ACCESS_LOCAL_WRITE;

if (!wr)
- wr = &fr_wr.wr;
+ wr = &reg_wr.wr;
else
- wr->next = &fr_wr.wr;
+ wr->next = &reg_wr.wr;

ret = ib_post_send(isert_conn->qp, wr, &bad_wr);
if (ret) {
@@ -2643,8 +2568,8 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
fr_desc->ind &= ~ind;

sge->lkey = mr->lkey;
- sge->addr = frpl->page_list[0] + page_off;
- sge->length = mem->len;
+ sge->addr = mr->iova;
+ sge->length = mr->length;

isert_dbg("sge: addr: 0x%llx length: %u lkey: %x\n",
sge->addr, sge->length, sge->lkey);
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 0a4a7861cce9..e87f9b096533 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -84,14 +84,12 @@ enum isert_indicator {

struct pi_context {
struct ib_mr *prot_mr;
- struct ib_fast_reg_page_list *prot_frpl;
struct ib_mr *sig_mr;
};

struct fast_reg_descriptor {
struct list_head list;
struct ib_mr *data_mr;
- struct ib_fast_reg_page_list *data_frpl;
u8 ind;
struct pi_context *pi_ctx;
};
--
1.8.4.3


2015-09-17 09:43:11

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 07/24] iw_cxgb4: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in c4iw_mr and populate it when
c4iw_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating build_fastreg just take the needed information
from different places:
- page_size, iova, length (ib_mr)
- page array (c4iw_mr)
- key, access flags (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 7 ++++
drivers/infiniband/hw/cxgb4/mem.c | 38 +++++++++++++++++
drivers/infiniband/hw/cxgb4/provider.c | 1 +
drivers/infiniband/hw/cxgb4/qp.c | 75 ++++++++++++++++++++++++++++++++++
4 files changed, 121 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index c7bb38c931a5..032f90aa8ac9 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -386,6 +386,10 @@ struct c4iw_mr {
struct c4iw_dev *rhp;
u64 kva;
struct tpt_attributes attr;
+ u64 *mpl;
+ dma_addr_t mpl_addr;
+ u32 max_mpl_len;
+ u32 mpl_len;
};

static inline struct c4iw_mr *to_c4iw_mr(struct ib_mr *ibmr)
@@ -973,6 +977,9 @@ struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(
struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
+int c4iw_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents);
int c4iw_dealloc_mw(struct ib_mw *mw);
struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start,
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 026b91ebd5e2..86ec65721797 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -863,6 +863,7 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
u32 mmid;
u32 stag = 0;
int ret = 0;
+ int length = roundup(max_num_sg * sizeof(u64), 32);

if (mr_type != IB_MR_TYPE_MEM_REG ||
max_num_sg > t4_max_fr_depth(use_dsgl))
@@ -876,6 +877,14 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
goto err;
}

+ mhp->mpl = dma_alloc_coherent(&rhp->rdev.lldi.pdev->dev,
+ length, &mhp->mpl_addr, GFP_KERNEL);
+ if (!mhp->mpl) {
+ ret = -ENOMEM;
+ goto err_mpl;
+ }
+ mhp->max_mpl_len = length;
+
mhp->rhp = rhp;
ret = alloc_pbl(mhp, max_num_sg);
if (ret)
@@ -905,11 +914,37 @@ err2:
c4iw_pblpool_free(&mhp->rhp->rdev, mhp->attr.pbl_addr,
mhp->attr.pbl_size << 3);
err1:
+ dma_free_coherent(&mhp->rhp->rdev.lldi.pdev->dev,
+ mhp->max_mpl_len, mhp->mpl, mhp->mpl_addr);
+err_mpl:
kfree(mhp);
err:
return ERR_PTR(ret);
}

+static int c4iw_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct c4iw_mr *mhp = to_c4iw_mr(ibmr);
+
+ if (unlikely(mhp->mpl_len == mhp->max_mpl_len))
+ return -ENOMEM;
+
+ mhp->mpl[mhp->mpl_len++] = addr;
+
+ return 0;
+}
+
+int c4iw_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct c4iw_mr *mhp = to_c4iw_mr(ibmr);
+
+ mhp->mpl_len = 0;
+
+ return ib_sg_to_pages(ibmr, sg, sg_nents, c4iw_set_page);
+}
+
struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(struct ib_device *device,
int page_list_len)
{
@@ -970,6 +1005,9 @@ int c4iw_dereg_mr(struct ib_mr *ib_mr)
rhp = mhp->rhp;
mmid = mhp->attr.stag >> 8;
remove_handle(rhp, &rhp->mmidr, mmid);
+ if (mhp->mpl)
+ dma_free_coherent(&mhp->rhp->rdev.lldi.pdev->dev,
+ mhp->max_mpl_len, mhp->mpl, mhp->mpl_addr);
dereg_mem(&rhp->rdev, mhp->attr.stag, mhp->attr.pbl_size,
mhp->attr.pbl_addr);
if (mhp->attr.pbl_size)
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 7746113552e7..55dedadcffaa 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -557,6 +557,7 @@ int c4iw_register_device(struct c4iw_dev *dev)
dev->ibdev.bind_mw = c4iw_bind_mw;
dev->ibdev.dealloc_mw = c4iw_dealloc_mw;
dev->ibdev.alloc_mr = c4iw_alloc_mr;
+ dev->ibdev.map_mr_sg = c4iw_map_mr_sg;
dev->ibdev.alloc_fast_reg_page_list = c4iw_alloc_fastreg_pbl;
dev->ibdev.free_fast_reg_page_list = c4iw_free_fastreg_pbl;
dev->ibdev.attach_mcast = c4iw_multicast_attach;
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index b60498fff99a..fddbd2cc90b8 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -605,10 +605,77 @@ static int build_rdma_recv(struct c4iw_qp *qhp, union t4_recv_wr *wqe,
return 0;
}

+static int build_memreg(struct t4_sq *sq, union t4_wr *wqe,
+ struct ib_reg_wr *wr, u8 *len16, u8 t5dev)
+{
+ struct c4iw_mr *mhp = to_c4iw_mr(wr->mr);
+ struct fw_ri_immd *imdp;
+ __be64 *p;
+ int i;
+ int pbllen = roundup(mhp->mpl_len * sizeof(u64), 32);
+ int rem;
+
+ if (mhp->mpl_len > t4_max_fr_depth(use_dsgl))
+ return -EINVAL;
+
+ wqe->fr.qpbinde_to_dcacpu = 0;
+ wqe->fr.pgsz_shift = ilog2(wr->mr->page_size) - 12;
+ wqe->fr.addr_type = FW_RI_VA_BASED_TO;
+ wqe->fr.mem_perms = c4iw_ib_to_tpt_access(wr->access);
+ wqe->fr.len_hi = 0;
+ wqe->fr.len_lo = cpu_to_be32(mhp->ibmr.length);
+ wqe->fr.stag = cpu_to_be32(wr->key);
+ wqe->fr.va_hi = cpu_to_be32(mhp->ibmr.iova >> 32);
+ wqe->fr.va_lo_fbo = cpu_to_be32(mhp->ibmr.iova &
+ 0xffffffff);
+
+ if (t5dev && use_dsgl && (pbllen > max_fr_immd)) {
+ struct fw_ri_dsgl *sglp;
+
+ for (i = 0; i < mhp->mpl_len; i++) {
+ mhp->mpl[i] = (__force u64)cpu_to_be64((u64)mhp->mpl[i]);
+ }
+
+ sglp = (struct fw_ri_dsgl *)(&wqe->fr + 1);
+ sglp->op = FW_RI_DATA_DSGL;
+ sglp->r1 = 0;
+ sglp->nsge = cpu_to_be16(1);
+ sglp->addr0 = cpu_to_be64(mhp->mpl_addr);
+ sglp->len0 = cpu_to_be32(pbllen);
+
+ *len16 = DIV_ROUND_UP(sizeof(wqe->fr) + sizeof(*sglp), 16);
+ } else {
+ imdp = (struct fw_ri_immd *)(&wqe->fr + 1);
+ imdp->op = FW_RI_DATA_IMMD;
+ imdp->r1 = 0;
+ imdp->r2 = 0;
+ imdp->immdlen = cpu_to_be32(pbllen);
+ p = (__be64 *)(imdp + 1);
+ rem = pbllen;
+ for (i = 0; i < mhp->mpl_len; i++) {
+ *p = cpu_to_be64((u64)mhp->mpl[i]);
+ rem -= sizeof(*p);
+ if (++p == (__be64 *)&sq->queue[sq->size])
+ p = (__be64 *)sq->queue;
+ }
+ BUG_ON(rem < 0);
+ while (rem) {
+ *p = 0;
+ rem -= sizeof(*p);
+ if (++p == (__be64 *)&sq->queue[sq->size])
+ p = (__be64 *)sq->queue;
+ }
+ *len16 = DIV_ROUND_UP(sizeof(wqe->fr) + sizeof(*imdp)
+ + pbllen, 16);
+ }
+ return 0;
+}
+
static int build_fastreg(struct t4_sq *sq, union t4_wr *wqe,
struct ib_send_wr *send_wr, u8 *len16, u8 t5dev)
{
struct ib_fast_reg_wr *wr = fast_reg_wr(send_wr);
+
struct fw_ri_immd *imdp;
__be64 *p;
int i;
@@ -817,6 +884,14 @@ int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
qhp->rhp->rdev.lldi.adapter_type) ?
1 : 0);
break;
+ case IB_WR_REG_MR:
+ fw_opcode = FW_RI_FR_NSMR_WR;
+ swsqe->opcode = FW_RI_FAST_REGISTER;
+ err = build_memreg(&qhp->wq.sq, wqe, reg_wr(wr), &len16,
+ is_t5(
+ qhp->rhp->rdev.lldi.adapter_type) ?
+ 1 : 0);
+ break;
case IB_WR_LOCAL_INV:
if (wr->send_flags & IB_SEND_FENCE)
fw_flags |= FW_RI_LOCAL_FENCE_FLAG;
--
1.8.4.3


2015-09-17 09:43:19

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 12/24] xprtrdma: Port to new memory registration API

Instead of maintaining a fastreg page list, keep an sg table
and convert an array of pages to a sg list. Then call ib_map_mr_sg
and construct ib_reg_wr.

Note that the next step would be to have NFS work with sg lists
as it maps well to sk_frags (see comment from hch
http://marc.info/?l=linux-rdma&m=143677002622296&w=2).

Signed-off-by: Sagi Grimberg <[email protected]>
---
net/sunrpc/xprtrdma/frwr_ops.c | 112 +++++++++++++++++++++++-----------------
net/sunrpc/xprtrdma/xprt_rdma.h | 3 +-
2 files changed, 67 insertions(+), 48 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 0d2f46f600b6..4d0221ccb043 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -151,9 +151,13 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device,
f->fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, depth);
if (IS_ERR(f->fr_mr))
goto out_mr_err;
- f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth);
- if (IS_ERR(f->fr_pgl))
+
+ f->sg = kcalloc(sizeof(*f->sg), depth, GFP_KERNEL);
+ if (IS_ERR(f->sg))
goto out_list_err;
+
+ sg_init_table(f->sg, depth);
+
return 0;

out_mr_err:
@@ -163,7 +167,7 @@ out_mr_err:
return rc;

out_list_err:
- rc = PTR_ERR(f->fr_pgl);
+ rc = -ENOMEM;
dprintk("RPC: %s: ib_alloc_fast_reg_page_list status %i\n",
__func__, rc);
ib_dereg_mr(f->fr_mr);
@@ -179,7 +183,7 @@ __frwr_release(struct rpcrdma_mw *r)
if (rc)
dprintk("RPC: %s: ib_dereg_mr status %i\n",
__func__, rc);
- ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
+ kfree(r->r.frmr.sg);
}

static int
@@ -312,14 +316,11 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
struct rpcrdma_mw *mw;
struct rpcrdma_frmr *frmr;
struct ib_mr *mr;
- struct ib_fast_reg_wr fastreg_wr;
+ struct ib_reg_wr reg_wr;
struct ib_send_wr *bad_wr;
+ unsigned int dma_nents;
u8 key;
- int len, pageoff;
- int i, rc;
- int seg_len;
- u64 pa;
- int page_no;
+ int i, rc, len, n;

mw = seg1->rl_mw;
seg1->rl_mw = NULL;
@@ -332,64 +333,80 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
} while (mw->r.frmr.fr_state != FRMR_IS_INVALID);
frmr = &mw->r.frmr;
frmr->fr_state = FRMR_IS_VALID;
+ mr = frmr->fr_mr;

- pageoff = offset_in_page(seg1->mr_offset);
- seg1->mr_offset -= pageoff; /* start of page */
- seg1->mr_len += pageoff;
- len = -pageoff;
if (nsegs > ia->ri_max_frmr_depth)
nsegs = ia->ri_max_frmr_depth;

- for (page_no = i = 0; i < nsegs;) {
- rpcrdma_map_one(device, seg, direction);
- pa = seg->mr_dma;
- for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
- frmr->fr_pgl->page_list[page_no++] = pa;
- pa += PAGE_SIZE;
- }
+ for (len = 0, i = 0; i < nsegs;) {
+ if (seg->mr_page)
+ sg_set_page(&frmr->sg[i],
+ seg->mr_page,
+ seg->mr_len,
+ offset_in_page(seg->mr_offset));
+ else
+ sg_set_buf(&frmr->sg[i], seg->mr_offset,
+ seg->mr_len);
+
len += seg->mr_len;
++seg;
++i;
+
/* Check for holes */
if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
break;
}
+ frmr->sg_nents = i;
+
+ dma_nents = ib_dma_map_sg(device, frmr->sg, frmr->sg_nents, direction);
+ if (!dma_nents) {
+ pr_err("RPC: %s: failed to dma map sg %p sg_nents %d\n",
+ __func__, frmr->sg, frmr->sg_nents);
+ return -ENOMEM;
+ }
+
+ n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
+ if (unlikely(n != frmr->sg_nents)) {
+ pr_err("RPC: %s: failed to map mr %p (%d/%d)\n",
+ __func__, frmr->fr_mr, n, frmr->sg_nents);
+ rc = n < 0 ? n : -EINVAL;
+ goto out_senderr;
+ }
+
dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n",
- __func__, mw, i, len);
-
- memset(&fastreg_wr, 0, sizeof(fastreg_wr));
- fastreg_wr.wr.wr_id = (unsigned long)(void *)mw;
- fastreg_wr.wr.opcode = IB_WR_FAST_REG_MR;
- fastreg_wr.iova_start = seg1->mr_dma + pageoff;
- fastreg_wr.page_list = frmr->fr_pgl;
- fastreg_wr.page_shift = PAGE_SHIFT;
- fastreg_wr.page_list_len = page_no;
- fastreg_wr.length = len;
- fastreg_wr.access_flags = writing ?
- IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
- IB_ACCESS_REMOTE_READ;
- mr = frmr->fr_mr;
+ __func__, mw, frmr->sg_nents, mr->length);
+
+ reg_wr.wr.opcode = IB_WR_REG_MR;
+ reg_wr.wr.wr_id = (uintptr_t)mw;
+ reg_wr.wr.num_sge = 0;
+ reg_wr.wr.send_flags = 0;
+ reg_wr.mr = mr;
+ reg_wr.key = mr->rkey;
+ reg_wr.access = writing ?
+ IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
+ IB_ACCESS_REMOTE_READ;
+
key = (u8)(mr->rkey & 0x000000FF);
ib_update_fast_reg_key(mr, ++key);
- fastreg_wr.rkey = mr->rkey;

DECR_CQCOUNT(&r_xprt->rx_ep);
- rc = ib_post_send(ia->ri_id->qp, &fastreg_wr.wr, &bad_wr);
+ rc = ib_post_send(ia->ri_id->qp, &reg_wr.wr, &bad_wr);
if (rc)
goto out_senderr;

+ seg1->mr_dir = direction;
seg1->rl_mw = mw;
seg1->mr_rkey = mr->rkey;
- seg1->mr_base = seg1->mr_dma + pageoff;
- seg1->mr_nsegs = i;
- seg1->mr_len = len;
- return i;
+ seg1->mr_base = mr->iova;
+ seg1->mr_nsegs = frmr->sg_nents;
+ seg1->mr_len = mr->length;
+
+ return frmr->sg_nents;

out_senderr:
dprintk("RPC: %s: ib_post_send status %i\n", __func__, rc);
- while (i--)
- rpcrdma_unmap_one(device, --seg);
+ ib_dma_unmap_sg(device, frmr->sg, frmr->sg_nents, direction);
__frwr_queue_recovery(mw);
return rc;
}
@@ -403,28 +420,29 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
struct rpcrdma_mr_seg *seg1 = seg;
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_mw *mw = seg1->rl_mw;
+ struct rpcrdma_frmr *frmr = &mw->r.frmr;
struct ib_send_wr invalidate_wr, *bad_wr;
int rc, nsegs = seg->mr_nsegs;

dprintk("RPC: %s: FRMR %p\n", __func__, mw);

seg1->rl_mw = NULL;
- mw->r.frmr.fr_state = FRMR_IS_INVALID;
+ frmr->fr_state = FRMR_IS_INVALID;

memset(&invalidate_wr, 0, sizeof(invalidate_wr));
invalidate_wr.wr_id = (unsigned long)(void *)mw;
invalidate_wr.opcode = IB_WR_LOCAL_INV;
- invalidate_wr.ex.invalidate_rkey = mw->r.frmr.fr_mr->rkey;
+ invalidate_wr.ex.invalidate_rkey = frmr->fr_mr->rkey;
DECR_CQCOUNT(&r_xprt->rx_ep);

- while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(ia->ri_device, seg++);
read_lock(&ia->ri_qplock);
rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
read_unlock(&ia->ri_qplock);
if (rc)
goto out_err;

+ ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir);
+
rpcrdma_put_mw(r_xprt, mw);
return nsegs;

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index d252457ff21a..00773636d17e 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -195,7 +195,8 @@ enum rpcrdma_frmr_state {
};

struct rpcrdma_frmr {
- struct ib_fast_reg_page_list *fr_pgl;
+ struct scatterlist *sg;
+ unsigned int sg_nents;
struct ib_mr *fr_mr;
enum rpcrdma_frmr_state fr_state;
struct work_struct fr_work;
--
1.8.4.3


2015-09-17 09:43:20

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 13/24] svcrdma: Port to new memory registration API

Instead of maintaining a fastreg page list, keep an sg table
and convert an array of pages to a sg list. Then call ib_map_mr_sg
and construct ib_reg_wr.

Signed-off-by: Sagi Grimberg <[email protected]>
---
include/linux/sunrpc/svc_rdma.h | 6 +--
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 76 ++++++++++++++++++--------------
net/sunrpc/xprtrdma/svc_rdma_transport.c | 34 +++++---------
3 files changed, 55 insertions(+), 61 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 83211bc9219e..e240d102a911 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -105,11 +105,9 @@ struct svc_rdma_chunk_sge {
};
struct svc_rdma_fastreg_mr {
struct ib_mr *mr;
- void *kva;
- struct ib_fast_reg_page_list *page_list;
- int page_list_len;
+ struct scatterlist *sg;
+ unsigned int sg_nents;
unsigned long access_flags;
- unsigned long map_len;
enum dma_data_direction direction;
struct list_head frmr_list;
};
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 7be42d0da19e..303f194970f9 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -220,12 +220,12 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
{
struct ib_rdma_wr read_wr;
struct ib_send_wr inv_wr;
- struct ib_fast_reg_wr fastreg_wr;
+ struct ib_reg_wr reg_wr;
u8 key;
- int pages_needed = PAGE_ALIGN(*page_offset + rs_length) >> PAGE_SHIFT;
+ unsigned int nents = PAGE_ALIGN(*page_offset + rs_length) >> PAGE_SHIFT;
struct svc_rdma_op_ctxt *ctxt = svc_rdma_get_context(xprt);
struct svc_rdma_fastreg_mr *frmr = svc_rdma_get_frmr(xprt);
- int ret, read, pno;
+ int ret, read, pno, dma_nents, n;
u32 pg_off = *page_offset;
u32 pg_no = *page_no;

@@ -234,16 +234,14 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,

ctxt->direction = DMA_FROM_DEVICE;
ctxt->frmr = frmr;
- pages_needed = min_t(int, pages_needed, xprt->sc_frmr_pg_list_len);
- read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
+ nents = min_t(unsigned int, nents, xprt->sc_frmr_pg_list_len);
+ read = min_t(int, nents << PAGE_SHIFT, rs_length);

- frmr->kva = page_address(rqstp->rq_arg.pages[pg_no]);
frmr->direction = DMA_FROM_DEVICE;
frmr->access_flags = (IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE);
- frmr->map_len = pages_needed << PAGE_SHIFT;
- frmr->page_list_len = pages_needed;
+ frmr->sg_nents = nents;

- for (pno = 0; pno < pages_needed; pno++) {
+ for (pno = 0; pno < nents; pno++) {
int len = min_t(int, rs_length, PAGE_SIZE - pg_off);

head->arg.pages[pg_no] = rqstp->rq_arg.pages[pg_no];
@@ -251,17 +249,12 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
head->arg.len += len;
if (!pg_off)
head->count++;
+
+ sg_set_page(&frmr->sg[pno], rqstp->rq_arg.pages[pg_no],
+ len, pg_off);
+
rqstp->rq_respages = &rqstp->rq_arg.pages[pg_no+1];
rqstp->rq_next_page = rqstp->rq_respages + 1;
- frmr->page_list->page_list[pno] =
- ib_dma_map_page(xprt->sc_cm_id->device,
- head->arg.pages[pg_no], 0,
- PAGE_SIZE, DMA_FROM_DEVICE);
- ret = ib_dma_mapping_error(xprt->sc_cm_id->device,
- frmr->page_list->page_list[pno]);
- if (ret)
- goto err;
- atomic_inc(&xprt->sc_dma_used);

/* adjust offset and wrap to next page if needed */
pg_off += len;
@@ -277,28 +270,42 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
else
clear_bit(RDMACTXT_F_LAST_CTXT, &ctxt->flags);

+ dma_nents = ib_dma_map_sg(xprt->sc_cm_id->device,
+ frmr->sg, frmr->sg_nents,
+ frmr->direction);
+ if (!dma_nents) {
+ pr_err("svcrdma: failed to dma map sg %p\n",
+ frmr->sg);
+ return -ENOMEM;
+ }
+ atomic_inc(&xprt->sc_dma_used);
+
+ n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
+ if (unlikely(n != frmr->sg_nents)) {
+ pr_err("svcrdma: failed to map mr %p (%d/%d elements)\n",
+ frmr->mr, n, frmr->sg_nents);
+ return n < 0 ? n : -EINVAL;
+ }
+
/* Bump the key */
key = (u8)(frmr->mr->lkey & 0x000000FF);
ib_update_fast_reg_key(frmr->mr, ++key);

- ctxt->sge[0].addr = (unsigned long)frmr->kva + *page_offset;
+ ctxt->sge[0].addr = frmr->mr->iova;
ctxt->sge[0].lkey = frmr->mr->lkey;
- ctxt->sge[0].length = read;
+ ctxt->sge[0].length = frmr->mr->length;
ctxt->count = 1;
ctxt->read_hdr = head;

- /* Prepare FASTREG WR */
- memset(&fastreg_wr, 0, sizeof(fastreg_wr));
- fastreg_wr.wr.opcode = IB_WR_FAST_REG_MR;
- fastreg_wr.wr.send_flags = IB_SEND_SIGNALED;
- fastreg_wr.iova_start = (unsigned long)frmr->kva;
- fastreg_wr.page_list = frmr->page_list;
- fastreg_wr.page_list_len = frmr->page_list_len;
- fastreg_wr.page_shift = PAGE_SHIFT;
- fastreg_wr.length = frmr->map_len;
- fastreg_wr.access_flags = frmr->access_flags;
- fastreg_wr.rkey = frmr->mr->lkey;
- fastreg_wr.wr.next = &read_wr.wr;
+ /* Prepare REG WR */
+ reg_wr.wr.opcode = IB_WR_REG_MR;
+ reg_wr.wr.wr_id = 0;
+ reg_wr.wr.send_flags = IB_SEND_SIGNALED;
+ reg_wr.wr.num_sge = 0;
+ reg_wr.mr = frmr->mr;
+ reg_wr.key = frmr->mr->lkey;
+ reg_wr.access = frmr->access_flags;
+ reg_wr.wr.next = &read_wr.wr;

/* Prepare RDMA_READ */
memset(&read_wr, 0, sizeof(read_wr));
@@ -324,7 +331,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
ctxt->wr_op = read_wr.wr.opcode;

/* Post the chain */
- ret = svc_rdma_send(xprt, &fastreg_wr.wr);
+ ret = svc_rdma_send(xprt, &reg_wr.wr);
if (ret) {
pr_err("svcrdma: Error %d posting RDMA_READ\n", ret);
set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
@@ -338,7 +345,8 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
atomic_inc(&rdma_stat_read);
return ret;
err:
- svc_rdma_unmap_dma(ctxt);
+ ib_dma_unmap_sg(xprt->sc_cm_id->device,
+ frmr->sg, frmr->sg_nents, frmr->direction);
svc_rdma_put_context(ctxt, 0);
svc_rdma_put_frmr(xprt, frmr);
return ret;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index b9fe134bb416..13d26a7ddee4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -731,7 +731,7 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
static struct svc_rdma_fastreg_mr *rdma_alloc_frmr(struct svcxprt_rdma *xprt)
{
struct ib_mr *mr;
- struct ib_fast_reg_page_list *pl;
+ struct scatterlist *sg;
struct svc_rdma_fastreg_mr *frmr;
u32 num_sg;

@@ -744,13 +744,14 @@ static struct svc_rdma_fastreg_mr *rdma_alloc_frmr(struct svcxprt_rdma *xprt)
if (IS_ERR(mr))
goto err_free_frmr;

- pl = ib_alloc_fast_reg_page_list(xprt->sc_cm_id->device,
- num_sg);
- if (IS_ERR(pl))
+ sg = kcalloc(sizeof(*sg), RPCSVC_MAXPAGES, GFP_KERNEL);
+ if (IS_ERR(sg))
goto err_free_mr;

+ sg_init_table(sg, RPCSVC_MAXPAGES);
+
frmr->mr = mr;
- frmr->page_list = pl;
+ frmr->sg = sg;
INIT_LIST_HEAD(&frmr->frmr_list);
return frmr;

@@ -770,8 +771,8 @@ static void rdma_dealloc_frmr_q(struct svcxprt_rdma *xprt)
frmr = list_entry(xprt->sc_frmr_q.next,
struct svc_rdma_fastreg_mr, frmr_list);
list_del_init(&frmr->frmr_list);
+ kfree(frmr->sg);
ib_dereg_mr(frmr->mr);
- ib_free_fast_reg_page_list(frmr->page_list);
kfree(frmr);
}
}
@@ -785,8 +786,7 @@ struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *rdma)
frmr = list_entry(rdma->sc_frmr_q.next,
struct svc_rdma_fastreg_mr, frmr_list);
list_del_init(&frmr->frmr_list);
- frmr->map_len = 0;
- frmr->page_list_len = 0;
+ frmr->sg_nents = 0;
}
spin_unlock_bh(&rdma->sc_frmr_q_lock);
if (frmr)
@@ -795,25 +795,13 @@ struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *rdma)
return rdma_alloc_frmr(rdma);
}

-static void frmr_unmap_dma(struct svcxprt_rdma *xprt,
- struct svc_rdma_fastreg_mr *frmr)
-{
- int page_no;
- for (page_no = 0; page_no < frmr->page_list_len; page_no++) {
- dma_addr_t addr = frmr->page_list->page_list[page_no];
- if (ib_dma_mapping_error(frmr->mr->device, addr))
- continue;
- atomic_dec(&xprt->sc_dma_used);
- ib_dma_unmap_page(frmr->mr->device, addr, PAGE_SIZE,
- frmr->direction);
- }
-}
-
void svc_rdma_put_frmr(struct svcxprt_rdma *rdma,
struct svc_rdma_fastreg_mr *frmr)
{
if (frmr) {
- frmr_unmap_dma(rdma, frmr);
+ ib_dma_unmap_sg(rdma->sc_cm_id->device,
+ frmr->sg, frmr->sg_nents, frmr->direction);
+ atomic_dec(&rdma->sc_dma_used);
spin_lock_bh(&rdma->sc_frmr_q_lock);
WARN_ON_ONCE(!list_empty(&frmr->frmr_list));
list_add(&frmr->frmr_list, &rdma->sc_frmr_q);
--
1.8.4.3


2015-09-17 09:43:22

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 08/24] IB/qib: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in qib_mr and populate it when
qib_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating qib_fastreg_mr just take the needed information
from different places:
- page_size, iova, length (ib_mr)
- page array (qib_mr)
- key, access flags (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/qib/qib_keys.c | 56 +++++++++++++++++++++++++++++++++++
drivers/infiniband/hw/qib/qib_mr.c | 32 ++++++++++++++++++++
drivers/infiniband/hw/qib/qib_verbs.c | 9 +++++-
drivers/infiniband/hw/qib/qib_verbs.h | 8 +++++
4 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/qib/qib_keys.c b/drivers/infiniband/hw/qib/qib_keys.c
index eaf139a33b2e..a5057efc7faf 100644
--- a/drivers/infiniband/hw/qib/qib_keys.c
+++ b/drivers/infiniband/hw/qib/qib_keys.c
@@ -390,3 +390,59 @@ bail:
spin_unlock_irqrestore(&rkt->lock, flags);
return ret;
}
+
+/*
+ * Initialize the memory region specified by the work reqeust.
+ */
+int qib_reg_mr(struct qib_qp *qp, struct ib_reg_wr *wr)
+{
+ struct qib_lkey_table *rkt = &to_idev(qp->ibqp.device)->lk_table;
+ struct qib_pd *pd = to_ipd(qp->ibqp.pd);
+ struct qib_mr *mr = to_imr(wr->mr);
+ struct qib_mregion *mrg;
+ u32 key = wr->key;
+ unsigned i, n, m;
+ int ret = -EINVAL;
+ unsigned long flags;
+ u64 *page_list;
+ size_t ps;
+
+ spin_lock_irqsave(&rkt->lock, flags);
+ if (pd->user || key == 0)
+ goto bail;
+
+ mrg = rcu_dereference_protected(
+ rkt->table[(key >> (32 - ib_qib_lkey_table_size))],
+ lockdep_is_held(&rkt->lock));
+ if (unlikely(mrg == NULL || qp->ibqp.pd != mrg->pd))
+ goto bail;
+
+ if (mr->npages > mrg->max_segs)
+ goto bail;
+
+ ps = mr->ibmr.page_size;
+ if (mr->ibmr.length > ps * mr->npages)
+ goto bail;
+
+ mrg->user_base = mr->ibmr.iova;
+ mrg->iova = mr->ibmr.iova;
+ mrg->lkey = key;
+ mrg->length = mr->ibmr.length;
+ mrg->access_flags = wr->access;
+ page_list = mr->pages;
+ m = 0;
+ n = 0;
+ for (i = 0; i < mr->npages; i++) {
+ mrg->map[m]->segs[n].vaddr = (void *) page_list[i];
+ mrg->map[m]->segs[n].length = ps;
+ if (++n == QIB_SEGSZ) {
+ m++;
+ n = 0;
+ }
+ }
+
+ ret = 0;
+bail:
+ spin_unlock_irqrestore(&rkt->lock, flags);
+ return ret;
+}
diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c
index 19220dcb9a3b..0fa4b0de8074 100644
--- a/drivers/infiniband/hw/qib/qib_mr.c
+++ b/drivers/infiniband/hw/qib/qib_mr.c
@@ -303,6 +303,7 @@ int qib_dereg_mr(struct ib_mr *ibmr)
int ret = 0;
unsigned long timeout;

+ kfree(mr->pages);
qib_free_lkey(&mr->mr);

qib_put_mr(&mr->mr); /* will set completion if last */
@@ -340,7 +341,38 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
if (IS_ERR(mr))
return (struct ib_mr *)mr;

+ mr->pages = kcalloc(max_num_sg, sizeof(u64), GFP_KERNEL);
+ if (!mr->pages)
+ goto err;
+
return &mr->ibmr;
+
+err:
+ qib_dereg_mr(&mr->ibmr);
+ return ERR_PTR(-ENOMEM);
+}
+
+static int qib_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct qib_mr *mr = to_imr(ibmr);
+
+ if (unlikely(mr->npages == mr->mr.max_segs))
+ return -ENOMEM;
+
+ mr->pages[mr->npages++] = addr;
+
+ return 0;
+}
+
+int qib_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct qib_mr *mr = to_imr(ibmr);
+
+ mr->npages = 0;
+
+ return ib_sg_to_pages(ibmr, sg, sg_nents, qib_set_page);
}

struct ib_fast_reg_page_list *
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index a6b0b098ff30..a1e53d7b662b 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -362,7 +362,10 @@ static int qib_post_one_send(struct qib_qp *qp, struct ib_send_wr *wr,
* undefined operations.
* Make sure buffer is large enough to hold the result for atomics.
*/
- if (wr->opcode == IB_WR_FAST_REG_MR) {
+ if (wr->opcode == IB_WR_REG_MR) {
+ if (qib_reg_mr(qp, reg_wr(wr)))
+ goto bail_inval;
+ } else if (wr->opcode == IB_WR_FAST_REG_MR) {
if (qib_fast_reg_mr(qp, wr))
goto bail_inval;
} else if (qp->ibqp.qp_type == IB_QPT_UC) {
@@ -401,6 +404,9 @@ static int qib_post_one_send(struct qib_qp *qp, struct ib_send_wr *wr,
if (qp->ibqp.qp_type != IB_QPT_UC &&
qp->ibqp.qp_type != IB_QPT_RC)
memcpy(&wqe->ud_wr, ud_wr(wr), sizeof(wqe->ud_wr));
+ else if (wr->opcode == IB_WR_REG_MR)
+ memcpy(&wqe->reg_wr, reg_wr(wr),
+ sizeof(wqe->reg_wr));
else if (wr->opcode == IB_WR_FAST_REG_MR)
memcpy(&wqe->fast_reg_wr, fast_reg_wr(wr),
sizeof(wqe->fast_reg_wr));
@@ -2260,6 +2266,7 @@ int qib_register_ib_device(struct qib_devdata *dd)
ibdev->reg_user_mr = qib_reg_user_mr;
ibdev->dereg_mr = qib_dereg_mr;
ibdev->alloc_mr = qib_alloc_mr;
+ ibdev->map_mr_sg = qib_map_mr_sg;
ibdev->alloc_fast_reg_page_list = qib_alloc_fast_reg_page_list;
ibdev->free_fast_reg_page_list = qib_free_fast_reg_page_list;
ibdev->alloc_fmr = qib_alloc_fmr;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index 8aa16851a5e6..dbc81c5761e3 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -330,6 +330,8 @@ struct qib_mr {
struct ib_mr ibmr;
struct ib_umem *umem;
struct qib_mregion mr; /* must be last */
+ u64 *pages;
+ u32 npages;
};

/*
@@ -341,6 +343,7 @@ struct qib_swqe {
union {
struct ib_send_wr wr; /* don't use wr.sg_list */
struct ib_ud_wr ud_wr;
+ struct ib_reg_wr reg_wr;
struct ib_fast_reg_wr fast_reg_wr;
struct ib_rdma_wr rdma_wr;
struct ib_atomic_wr atomic_wr;
@@ -1044,12 +1047,17 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_entries);

+int qib_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents);
+
struct ib_fast_reg_page_list *qib_alloc_fast_reg_page_list(
struct ib_device *ibdev, int page_list_len);

void qib_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl);

int qib_fast_reg_mr(struct qib_qp *qp, struct ib_send_wr *wr);
+int qib_reg_mr(struct qib_qp *qp, struct ib_reg_wr *wr);

struct ib_fmr *qib_alloc_fmr(struct ib_pd *pd, int mr_access_flags,
struct ib_fmr_attr *fmr_attr);
--
1.8.4.3


2015-09-17 09:43:21

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 14/24] RDS/IW: Convert to new memory registration API

Get rid of fast_reg page list and its construction.
Instead, just pass the RDS sg list to ib_map_mr_sg
and post the new ib_reg_wr.

This is done both for server IW RDMA_READ registration
and the client remote key registration.

Signed-off-by: Sagi Grimberg <[email protected]>
---
net/rds/iw.h | 5 +--
net/rds/iw_rdma.c | 118 ++++++++++++++++++------------------------------------
net/rds/iw_send.c | 57 +++++++++++++-------------
3 files changed, 70 insertions(+), 110 deletions(-)

diff --git a/net/rds/iw.h b/net/rds/iw.h
index fe858e5dd8d1..5af01d1758b3 100644
--- a/net/rds/iw.h
+++ b/net/rds/iw.h
@@ -74,13 +74,12 @@ struct rds_iw_send_work {
struct rm_rdma_op *s_op;
struct rds_iw_mapping *s_mapping;
struct ib_mr *s_mr;
- struct ib_fast_reg_page_list *s_page_list;
unsigned char s_remap_count;

union {
struct ib_send_wr s_send_wr;
struct ib_rdma_wr s_rdma_wr;
- struct ib_fast_reg_wr s_fast_reg_wr;
+ struct ib_reg_wr s_reg_wr;
};
struct ib_sge s_sge[RDS_IW_MAX_SGE];
unsigned long s_queued;
@@ -199,7 +198,7 @@ struct rds_iw_device {

/* Magic WR_ID for ACKs */
#define RDS_IW_ACK_WR_ID ((u64)0xffffffffffffffffULL)
-#define RDS_IW_FAST_REG_WR_ID ((u64)0xefefefefefefefefULL)
+#define RDS_IW_REG_WR_ID ((u64)0xefefefefefefefefULL)
#define RDS_IW_LOCAL_INV_WR_ID ((u64)0xdfdfdfdfdfdfdfdfULL)

struct rds_iw_statistics {
diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
index f8a612cc69e6..8c4e42759514 100644
--- a/net/rds/iw_rdma.c
+++ b/net/rds/iw_rdma.c
@@ -47,7 +47,6 @@ struct rds_iw_mr {
struct rdma_cm_id *cm_id;

struct ib_mr *mr;
- struct ib_fast_reg_page_list *page_list;

struct rds_iw_mapping mapping;
unsigned char remap_count;
@@ -77,8 +76,8 @@ struct rds_iw_mr_pool {

static int rds_iw_flush_mr_pool(struct rds_iw_mr_pool *pool, int free_all);
static void rds_iw_mr_pool_flush_worker(struct work_struct *work);
-static int rds_iw_init_fastreg(struct rds_iw_mr_pool *pool, struct rds_iw_mr *ibmr);
-static int rds_iw_map_fastreg(struct rds_iw_mr_pool *pool,
+static int rds_iw_init_reg(struct rds_iw_mr_pool *pool, struct rds_iw_mr *ibmr);
+static int rds_iw_map_reg(struct rds_iw_mr_pool *pool,
struct rds_iw_mr *ibmr,
struct scatterlist *sg, unsigned int nents);
static void rds_iw_free_fastreg(struct rds_iw_mr_pool *pool, struct rds_iw_mr *ibmr);
@@ -258,19 +257,18 @@ static void rds_iw_set_scatterlist(struct rds_iw_scatterlist *sg,
sg->bytes = 0;
}

-static u64 *rds_iw_map_scatterlist(struct rds_iw_device *rds_iwdev,
+static int rds_iw_map_scatterlist(struct rds_iw_device *rds_iwdev,
struct rds_iw_scatterlist *sg)
{
struct ib_device *dev = rds_iwdev->dev;
- u64 *dma_pages = NULL;
- int i, j, ret;
+ int i, ret;

WARN_ON(sg->dma_len);

sg->dma_len = ib_dma_map_sg(dev, sg->list, sg->len, DMA_BIDIRECTIONAL);
if (unlikely(!sg->dma_len)) {
printk(KERN_WARNING "RDS/IW: dma_map_sg failed!\n");
- return ERR_PTR(-EBUSY);
+ return -EBUSY;
}

sg->bytes = 0;
@@ -303,31 +301,14 @@ static u64 *rds_iw_map_scatterlist(struct rds_iw_device *rds_iwdev,
if (sg->dma_npages > fastreg_message_size)
goto out_unmap;

- dma_pages = kmalloc(sizeof(u64) * sg->dma_npages, GFP_ATOMIC);
- if (!dma_pages) {
- ret = -ENOMEM;
- goto out_unmap;
- }
-
- for (i = j = 0; i < sg->dma_len; ++i) {
- unsigned int dma_len = ib_sg_dma_len(dev, &sg->list[i]);
- u64 dma_addr = ib_sg_dma_address(dev, &sg->list[i]);
- u64 end_addr;

- end_addr = dma_addr + dma_len;
- dma_addr &= ~PAGE_MASK;
- for (; dma_addr < end_addr; dma_addr += PAGE_SIZE)
- dma_pages[j++] = dma_addr;
- BUG_ON(j > sg->dma_npages);
- }

- return dma_pages;
+ return 0;

out_unmap:
ib_dma_unmap_sg(rds_iwdev->dev, sg->list, sg->len, DMA_BIDIRECTIONAL);
sg->dma_len = 0;
- kfree(dma_pages);
- return ERR_PTR(ret);
+ return ret;
}


@@ -440,7 +421,7 @@ static struct rds_iw_mr *rds_iw_alloc_mr(struct rds_iw_device *rds_iwdev)
INIT_LIST_HEAD(&ibmr->mapping.m_list);
ibmr->mapping.m_mr = ibmr;

- err = rds_iw_init_fastreg(pool, ibmr);
+ err = rds_iw_init_reg(pool, ibmr);
if (err)
goto out_no_cigar;

@@ -622,7 +603,7 @@ void *rds_iw_get_mr(struct scatterlist *sg, unsigned long nents,
ibmr->cm_id = cm_id;
ibmr->device = rds_iwdev;

- ret = rds_iw_map_fastreg(rds_iwdev->mr_pool, ibmr, sg, nents);
+ ret = rds_iw_map_reg(rds_iwdev->mr_pool, ibmr, sg, nents);
if (ret == 0)
*key_ret = ibmr->mr->rkey;
else
@@ -638,7 +619,7 @@ out:
}

/*
- * iWARP fastreg handling
+ * iWARP reg handling
*
* The life cycle of a fastreg registration is a bit different from
* FMRs.
@@ -650,7 +631,7 @@ out:
* This creates a bit of a problem for us, as we do not have the destination
* IP in GET_MR, so the connection must be setup prior to the GET_MR call for
* RDMA to be correctly setup. If a fastreg request is present, rds_iw_xmit
- * will try to queue a LOCAL_INV (if needed) and a FAST_REG_MR work request
+ * will try to queue a LOCAL_INV (if needed) and a REG_MR work request
* before queuing the SEND. When completions for these arrive, they are
* dispatched to the MR has a bit set showing that RDMa can be performed.
*
@@ -659,11 +640,10 @@ out:
* The expectation there is that this invalidation step includes ALL
* PREVIOUSLY FREED MRs.
*/
-static int rds_iw_init_fastreg(struct rds_iw_mr_pool *pool,
+static int rds_iw_init_reg(struct rds_iw_mr_pool *pool,
struct rds_iw_mr *ibmr)
{
struct rds_iw_device *rds_iwdev = pool->device;
- struct ib_fast_reg_page_list *page_list = NULL;
struct ib_mr *mr;
int err;

@@ -676,56 +656,44 @@ static int rds_iw_init_fastreg(struct rds_iw_mr_pool *pool,
return err;
}

- /* FIXME - this is overkill, but mapping->m_sg.dma_len/mapping->m_sg.dma_npages
- * is not filled in.
- */
- page_list = ib_alloc_fast_reg_page_list(rds_iwdev->dev, pool->max_message_size);
- if (IS_ERR(page_list)) {
- err = PTR_ERR(page_list);
-
- printk(KERN_WARNING "RDS/IW: ib_alloc_fast_reg_page_list failed (err=%d)\n", err);
- ib_dereg_mr(mr);
- return err;
- }
-
- ibmr->page_list = page_list;
ibmr->mr = mr;
return 0;
}

-static int rds_iw_rdma_build_fastreg(struct rds_iw_mapping *mapping)
+static int rds_iw_rdma_reg_mr(struct rds_iw_mapping *mapping)
{
struct rds_iw_mr *ibmr = mapping->m_mr;
- struct ib_fast_reg_wr f_wr;
+ struct rds_iw_scatterlist *m_sg = &mapping->m_sg;
+ struct ib_reg_wr reg_wr;
struct ib_send_wr *failed_wr;
- int ret;
+ int ret, n;
+
+
+ n = ib_map_mr_sg(ibmr->mr, m_sg->list, m_sg->len, PAGE_SIZE);
+ if (unlikely(n != m_sg->len))
+ return n < 0 ? n : -EINVAL;
+
+ reg_wr.wr.opcode = IB_WR_REG_MR;
+ reg_wr.wr.wr_id = RDS_IW_REG_WR_ID;
+ reg_wr.wr.num_sge = 0;
+ reg_wr.mr = ibmr->mr;
+ reg_wr.key = mapping->m_rkey;
+ reg_wr.access = IB_ACCESS_LOCAL_WRITE |
+ IB_ACCESS_REMOTE_READ |
+ IB_ACCESS_REMOTE_WRITE;

/*
- * Perform a WR for the fast_reg_mr. Each individual page
+ * Perform a WR for the reg_mr. Each individual page
* in the sg list is added to the fast reg page list and placed
- * inside the fast_reg_mr WR. The key used is a rolling 8bit
+ * inside the reg_mr WR. The key used is a rolling 8bit
* counter, which should guarantee uniqueness.
*/
ib_update_fast_reg_key(ibmr->mr, ibmr->remap_count++);
mapping->m_rkey = ibmr->mr->rkey;

- memset(&f_wr, 0, sizeof(f_wr));
- f_wr.wr.wr_id = RDS_IW_FAST_REG_WR_ID;
- f_wr.wr.opcode = IB_WR_FAST_REG_MR;
- f_wr.length = mapping->m_sg.bytes;
- f_wr.rkey = mapping->m_rkey;
- f_wr.page_list = ibmr->page_list;
- f_wr.page_list_len = mapping->m_sg.dma_len;
- f_wr.page_shift = PAGE_SHIFT;
- f_wr.access_flags = IB_ACCESS_LOCAL_WRITE |
- IB_ACCESS_REMOTE_READ |
- IB_ACCESS_REMOTE_WRITE;
- f_wr.iova_start = 0;
- f_wr.wr.send_flags = IB_SEND_SIGNALED;
-
- failed_wr = &f_wr.wr;
- ret = ib_post_send(ibmr->cm_id->qp, &f_wr.wr, &failed_wr);
- BUG_ON(failed_wr != &f_wr.wr);
+ failed_wr = &reg_wr.wr;
+ ret = ib_post_send(ibmr->cm_id->qp, &reg_wr.wr, &failed_wr);
+ BUG_ON(failed_wr != &reg_wr.wr);
if (ret)
printk_ratelimited(KERN_WARNING "RDS/IW: %s:%d ib_post_send returned %d\n",
__func__, __LINE__, ret);
@@ -757,7 +725,7 @@ out:
return ret;
}

-static int rds_iw_map_fastreg(struct rds_iw_mr_pool *pool,
+static int rds_iw_map_reg(struct rds_iw_mr_pool *pool,
struct rds_iw_mr *ibmr,
struct scatterlist *sg,
unsigned int sg_len)
@@ -765,13 +733,12 @@ static int rds_iw_map_fastreg(struct rds_iw_mr_pool *pool,
struct rds_iw_device *rds_iwdev = pool->device;
struct rds_iw_mapping *mapping = &ibmr->mapping;
u64 *dma_pages;
- int i, ret = 0;
+ int ret = 0;

rds_iw_set_scatterlist(&mapping->m_sg, sg, sg_len);

- dma_pages = rds_iw_map_scatterlist(rds_iwdev, &mapping->m_sg);
- if (IS_ERR(dma_pages)) {
- ret = PTR_ERR(dma_pages);
+ ret = rds_iw_map_scatterlist(rds_iwdev, &mapping->m_sg);
+ if (ret) {
dma_pages = NULL;
goto out;
}
@@ -781,10 +748,7 @@ static int rds_iw_map_fastreg(struct rds_iw_mr_pool *pool,
goto out;
}

- for (i = 0; i < mapping->m_sg.dma_npages; ++i)
- ibmr->page_list->page_list[i] = dma_pages[i];
-
- ret = rds_iw_rdma_build_fastreg(mapping);
+ ret = rds_iw_rdma_reg_mr(mapping);
if (ret)
goto out;

@@ -870,8 +834,6 @@ static unsigned int rds_iw_unmap_fastreg_list(struct rds_iw_mr_pool *pool,
static void rds_iw_destroy_fastreg(struct rds_iw_mr_pool *pool,
struct rds_iw_mr *ibmr)
{
- if (ibmr->page_list)
- ib_free_fast_reg_page_list(ibmr->page_list);
if (ibmr->mr)
ib_dereg_mr(ibmr->mr);
}
diff --git a/net/rds/iw_send.c b/net/rds/iw_send.c
index f6e23c515b44..e20bd503f4bd 100644
--- a/net/rds/iw_send.c
+++ b/net/rds/iw_send.c
@@ -159,13 +159,6 @@ void rds_iw_send_init_ring(struct rds_iw_connection *ic)
printk(KERN_WARNING "RDS/IW: ib_alloc_mr failed\n");
break;
}
-
- send->s_page_list = ib_alloc_fast_reg_page_list(
- ic->i_cm_id->device, fastreg_message_size);
- if (IS_ERR(send->s_page_list)) {
- printk(KERN_WARNING "RDS/IW: ib_alloc_fast_reg_page_list failed\n");
- break;
- }
}
}

@@ -177,8 +170,6 @@ void rds_iw_send_clear_ring(struct rds_iw_connection *ic)
for (i = 0, send = ic->i_sends; i < ic->i_send_ring.w_nr; i++, send++) {
BUG_ON(!send->s_mr);
ib_dereg_mr(send->s_mr);
- BUG_ON(!send->s_page_list);
- ib_free_fast_reg_page_list(send->s_page_list);
if (send->s_send_wr.opcode == 0xdead)
continue;
if (send->s_rm)
@@ -227,7 +218,7 @@ void rds_iw_send_cq_comp_handler(struct ib_cq *cq, void *context)
continue;
}

- if (wc.opcode == IB_WC_FAST_REG_MR && wc.wr_id == RDS_IW_FAST_REG_WR_ID) {
+ if (wc.opcode == IB_WC_REG_MR && wc.wr_id == RDS_IW_REG_WR_ID) {
ic->i_fastreg_posted = 1;
continue;
}
@@ -252,7 +243,7 @@ void rds_iw_send_cq_comp_handler(struct ib_cq *cq, void *context)
if (send->s_rm)
rds_iw_send_unmap_rm(ic, send, wc.status);
break;
- case IB_WR_FAST_REG_MR:
+ case IB_WR_REG_MR:
case IB_WR_RDMA_WRITE:
case IB_WR_RDMA_READ:
case IB_WR_RDMA_READ_WITH_INV:
@@ -770,24 +761,26 @@ out:
return ret;
}

-static void rds_iw_build_send_fastreg(struct rds_iw_device *rds_iwdev, struct rds_iw_connection *ic, struct rds_iw_send_work *send, int nent, int len, u64 sg_addr)
+static int rds_iw_build_send_reg(struct rds_iw_send_work *send,
+ struct scatterlist *sg,
+ int sg_nents)
{
- BUG_ON(nent > send->s_page_list->max_page_list_len);
- /*
- * Perform a WR for the fast_reg_mr. Each individual page
- * in the sg list is added to the fast reg page list and placed
- * inside the fast_reg_mr WR.
- */
- send->s_fast_reg_wr.wr.opcode = IB_WR_FAST_REG_MR;
- send->s_fast_reg_wr.length = len;
- send->s_fast_reg_wr.rkey = send->s_mr->rkey;
- send->s_fast_reg_wr.page_list = send->s_page_list;
- send->s_fast_reg_wr.page_list_len = nent;
- send->s_fast_reg_wr.page_shift = PAGE_SHIFT;
- send->s_fast_reg_wr.access_flags = IB_ACCESS_REMOTE_WRITE;
- send->s_fast_reg_wr.iova_start = sg_addr;
+ int n;
+
+ n = ib_map_mr_sg(send->s_mr, sg, sg_nents, PAGE_SIZE);
+ if (unlikely(n != sg_nents))
+ return n < 0 ? n : -EINVAL;
+
+ send->s_reg_wr.wr.opcode = IB_WR_REG_MR;
+ send->s_reg_wr.wr.wr_id = 0;
+ send->s_reg_wr.wr.num_sge = 0;
+ send->s_reg_wr.mr = send->s_mr;
+ send->s_reg_wr.key = send->s_mr->rkey;
+ send->s_reg_wr.access = IB_ACCESS_REMOTE_WRITE;

ib_update_fast_reg_key(send->s_mr, send->s_remap_count++);
+
+ return 0;
}

int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
@@ -808,6 +801,7 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
int sent;
int ret;
int num_sge;
+ int sg_nents;

rds_iwdev = ib_get_client_data(ic->i_cm_id->device, &rds_iw_client);

@@ -861,6 +855,7 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
scat = &op->op_sg[0];
sent = 0;
num_sge = op->op_count;
+ sg_nents = 0;

for (i = 0; i < work_alloc && scat != &op->op_sg[op->op_count]; i++) {
send->s_rdma_wr.wr.send_flags = 0;
@@ -904,7 +899,7 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
len = ib_sg_dma_len(ic->i_cm_id->device, scat);

if (send->s_rdma_wr.wr.opcode == IB_WR_RDMA_READ_WITH_INV)
- send->s_page_list->page_list[j] = ib_sg_dma_address(ic->i_cm_id->device, scat);
+ sg_nents++;
else {
send->s_sge[j].addr = ib_sg_dma_address(ic->i_cm_id->device, scat);
send->s_sge[j].length = len;
@@ -951,8 +946,12 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
* fastreg_mr (or possibly a dma_mr)
*/
if (!op->op_write) {
- rds_iw_build_send_fastreg(rds_iwdev, ic, &ic->i_sends[fr_pos],
- op->op_count, sent, conn->c_xmit_rm->m_rs->rs_user_addr);
+ ret = rds_iw_build_send_reg(&ic->i_sends[fr_pos],
+ &op->op_sg[0], sg_nents);
+ if (ret) {
+ printk(KERN_WARNING "RDS/IW: failed to reg send mem\n");
+ goto out;
+ }
work_alloc++;
}

--
1.8.4.3


2015-09-17 09:43:28

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 09/24] RDMA/nes: Support the new memory registration API

Support the new memory registration API by allocating a
private page list array in nes_mr and populate it when
nes_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating IB_WR_FAST_REG_MR handling and take the
needed information from different places:
- page_size, iova, length (ib_mr)
- page array (nes_mr)
- key, access flags (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/nes/nes_verbs.c | 115 ++++++++++++++++++++++++++++++++++
drivers/infiniband/hw/nes/nes_verbs.h | 4 ++
2 files changed, 119 insertions(+)

diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index f71b37b75f82..ba069ec2ebf9 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -51,6 +51,7 @@ atomic_t qps_created;
atomic_t sw_qps_destroyed;

static void nes_unregister_ofa_device(struct nes_ib_device *nesibdev);
+static int nes_dereg_mr(struct ib_mr *ib_mr);

/**
* nes_alloc_mw
@@ -445,7 +446,44 @@ static struct ib_mr *nes_alloc_mr(struct ib_pd *ibpd,
nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index);
ibmr = ERR_PTR(-ENOMEM);
}
+
+ nesmr->pages = pci_alloc_consistent(nesdev->pcidev,
+ max_num_sg * sizeof(u64),
+ &nesmr->paddr);
+ if (!nesmr->paddr)
+ goto err;
+
+ nesmr->max_pages = max_num_sg;
+
return ibmr;
+
+err:
+ nes_dereg_mr(ibmr);
+
+ return ERR_PTR(-ENOMEM);
+}
+
+static int nes_set_page(struct ib_mr *ibmr, u64 addr)
+{
+ struct nes_mr *nesmr = to_nesmr(ibmr);
+
+ if (unlikely(nesmr->npages == nesmr->max_pages))
+ return -ENOMEM;
+
+ nesmr->pages[nesmr->npages++] = cpu_to_le64(addr);
+
+ return 0;
+}
+
+static int nes_map_mr_sg(struct ib_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_nents)
+{
+ struct nes_mr *nesmr = to_nesmr(ibmr);
+
+ nesmr->npages = 0;
+
+ return ib_sg_to_pages(ibmr, sg, sg_nents, nes_set_page);
}

/*
@@ -2683,6 +2721,13 @@ static int nes_dereg_mr(struct ib_mr *ib_mr)
u16 major_code;
u16 minor_code;

+
+ if (nesmr->pages)
+ pci_free_consistent(nesdev->pcidev,
+ nesmr->max_pages * sizeof(u64),
+ nesmr->pages,
+ nesmr->paddr);
+
if (nesmr->region) {
ib_umem_release(nesmr->region);
}
@@ -3513,6 +3558,75 @@ static int nes_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr,
wqe_misc);
break;
}
+ case IB_WR_REG_MR:
+ {
+ struct nes_mr *mr = to_nesmr(reg_wr(ib_wr)->mr);
+ int page_shift = ilog2(reg_wr(ib_wr)->mr->page_size);
+ int flags = reg_wr(ib_wr)->access;
+
+ if (mr->npages > (NES_4K_PBL_CHUNK_SIZE / sizeof(u64))) {
+ nes_debug(NES_DBG_IW_TX, "SQ_FMR: bad page_list_len\n");
+ err = -EINVAL;
+ break;
+ }
+ wqe_misc = NES_IWARP_SQ_OP_FAST_REG;
+ set_wqe_64bit_value(wqe->wqe_words,
+ NES_IWARP_SQ_FMR_WQE_VA_FBO_LOW_IDX,
+ mr->ibmr.iova);
+ set_wqe_32bit_value(wqe->wqe_words,
+ NES_IWARP_SQ_FMR_WQE_LENGTH_LOW_IDX,
+ mr->ibmr.length);
+ set_wqe_32bit_value(wqe->wqe_words,
+ NES_IWARP_SQ_FMR_WQE_LENGTH_HIGH_IDX, 0);
+ set_wqe_32bit_value(wqe->wqe_words,
+ NES_IWARP_SQ_FMR_WQE_MR_STAG_IDX,
+ reg_wr(ib_wr)->key);
+
+ if (page_shift == 12) {
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_PAGE_SIZE_4K;
+ } else if (page_shift == 21) {
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_PAGE_SIZE_2M;
+ } else {
+ nes_debug(NES_DBG_IW_TX, "Invalid page shift,"
+ " ib_wr=%u, max=1\n", ib_wr->num_sge);
+ err = -EINVAL;
+ break;
+ }
+
+ /* Set access_flags */
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_LOCAL_READ;
+ if (flags & IB_ACCESS_LOCAL_WRITE)
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_LOCAL_WRITE;
+
+ if (flags & IB_ACCESS_REMOTE_WRITE)
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_REMOTE_WRITE;
+
+ if (flags & IB_ACCESS_REMOTE_READ)
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_REMOTE_READ;
+
+ if (flags & IB_ACCESS_MW_BIND)
+ wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_WINDOW_BIND;
+
+ /* Fill in PBL info: */
+ set_wqe_64bit_value(wqe->wqe_words,
+ NES_IWARP_SQ_FMR_WQE_PBL_ADDR_LOW_IDX,
+ mr->paddr);
+
+ set_wqe_32bit_value(wqe->wqe_words,
+ NES_IWARP_SQ_FMR_WQE_PBL_LENGTH_IDX,
+ mr->npages * 8);
+
+ nes_debug(NES_DBG_IW_TX, "SQ_REG_MR: iova_start: %llx, "
+ "length: %d, rkey: %0x, pgl_paddr: %llx, "
+ "page_list_len: %u, wqe_misc: %x\n",
+ (unsigned long long) mr->ibmr.iova,
+ mr->ibmr.length,
+ reg_wr(ib_wr)->key,
+ (unsigned long long) mr->paddr,
+ mr->npages,
+ wqe_misc);
+ break;
+ }
default:
/* error */
err = -EINVAL;
@@ -3940,6 +4054,7 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev)
nesibdev->ibdev.bind_mw = nes_bind_mw;

nesibdev->ibdev.alloc_mr = nes_alloc_mr;
+ nesibdev->ibdev.map_mr_sg = nes_map_mr_sg;
nesibdev->ibdev.alloc_fast_reg_page_list = nes_alloc_fast_reg_page_list;
nesibdev->ibdev.free_fast_reg_page_list = nes_free_fast_reg_page_list;

diff --git a/drivers/infiniband/hw/nes/nes_verbs.h b/drivers/infiniband/hw/nes/nes_verbs.h
index 309b31c31ae1..a204b677af22 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.h
+++ b/drivers/infiniband/hw/nes/nes_verbs.h
@@ -79,6 +79,10 @@ struct nes_mr {
u16 pbls_used;
u8 mode;
u8 pbl_4k;
+ __le64 *pages;
+ dma_addr_t paddr;
+ u32 max_pages;
+ u32 npages;
};

struct nes_hw_pb {
--
1.8.4.3


2015-09-17 09:43:26

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 15/24] IB/srp: Convert to new memory registration API

Since SRP supports both FMRs and FRWR, the new API conversion
includes splitting the sg list mapping routines in srp_map_data to
srp_map_sg_fr that works with the new memory registration API,
srp_map_sg_fmr which constructs a page vector and calls
ib_fmr_pool_map_phys, and srp_map_sg_dma which is used only
if no FRWR nor FMR are supported (which I'm not sure is a valid
use-case anymore).

The srp protocol is able to pass to the target multiple descriptors
for remote access, so it basically calls registers muleitple sg list
partials the entire sg list is mapped and registered (each time maps
a prefix of an sg list).

Note that now the per request page vector is allocated only when FMR
mode is used as it is not needed for the new registration API.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/ulp/srp/ib_srp.c | 248 +++++++++++++++++++++---------------
drivers/infiniband/ulp/srp/ib_srp.h | 11 +-
2 files changed, 156 insertions(+), 103 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index f8b9c18da03d..35cddbb120ea 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -340,8 +340,6 @@ static void srp_destroy_fr_pool(struct srp_fr_pool *pool)
return;

for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
- if (d->frpl)
- ib_free_fast_reg_page_list(d->frpl);
if (d->mr)
ib_dereg_mr(d->mr);
}
@@ -362,7 +360,6 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
struct srp_fr_pool *pool;
struct srp_fr_desc *d;
struct ib_mr *mr;
- struct ib_fast_reg_page_list *frpl;
int i, ret = -EINVAL;

if (pool_size <= 0)
@@ -385,12 +382,6 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
goto destroy_pool;
}
d->mr = mr;
- frpl = ib_alloc_fast_reg_page_list(device, max_page_list_len);
- if (IS_ERR(frpl)) {
- ret = PTR_ERR(frpl);
- goto destroy_pool;
- }
- d->frpl = frpl;
list_add_tail(&d->entry, &pool->free_list);
}

@@ -887,14 +878,16 @@ static int srp_alloc_req_data(struct srp_rdma_ch *ch)
GFP_KERNEL);
if (!mr_list)
goto out;
- if (srp_dev->use_fast_reg)
+ if (srp_dev->use_fast_reg) {
req->fr_list = mr_list;
- else
+ } else {
req->fmr_list = mr_list;
- req->map_page = kmalloc(srp_dev->max_pages_per_mr *
- sizeof(void *), GFP_KERNEL);
- if (!req->map_page)
- goto out;
+ req->map_page = kmalloc(srp_dev->max_pages_per_mr *
+ sizeof(void *), GFP_KERNEL);
+ if (!req->map_page)
+ goto out;
+ }
+
req->indirect_desc = kmalloc(target->indirect_size, GFP_KERNEL);
if (!req->indirect_desc)
goto out;
@@ -1283,6 +1276,15 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
struct ib_pool_fmr *fmr;
u64 io_addr = 0;

+ if (state->npages == 0)
+ return 0;
+
+ if (state->npages == 1 && target->global_mr) {
+ srp_map_desc(state, state->base_dma_addr, state->dma_len,
+ target->global_mr->rkey);
+ return 0;
+ }
+
if (WARN_ON_ONCE(state->fmr.next >= state->fmr.end))
return -ENOMEM;

@@ -1297,6 +1299,9 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
srp_map_desc(state, state->base_dma_addr & ~dev->mr_page_mask,
state->dma_len, fmr->fmr->rkey);

+ state->npages = 0;
+ state->dma_len = 0;
+
return 0;
}

@@ -1306,9 +1311,17 @@ static int srp_map_finish_fr(struct srp_map_state *state,
struct srp_target_port *target = ch->target;
struct srp_device *dev = target->srp_host->srp_dev;
struct ib_send_wr *bad_wr;
- struct ib_fast_reg_wr wr;
+ struct ib_reg_wr wr;
struct srp_fr_desc *desc;
u32 rkey;
+ int n, err;
+
+ if (state->sg_nents == 1 && target->global_mr) {
+ srp_map_desc(state, sg_dma_address(state->sg),
+ sg_dma_len(state->sg),
+ target->global_mr->rkey);
+ return 1;
+ }

if (WARN_ON_ONCE(state->fr.next >= state->fr.end))
return -ENOMEM;
@@ -1320,56 +1333,32 @@ static int srp_map_finish_fr(struct srp_map_state *state,
rkey = ib_inc_rkey(desc->mr->rkey);
ib_update_fast_reg_key(desc->mr, rkey);

- memcpy(desc->frpl->page_list, state->pages,
- sizeof(state->pages[0]) * state->npages);
+ n = ib_map_mr_sg(desc->mr, state->sg, state->sg_nents,
+ dev->mr_page_size);
+ if (unlikely(n < 0))
+ return n;

- memset(&wr, 0, sizeof(wr));
- wr.wr.opcode = IB_WR_FAST_REG_MR;
+ wr.wr.opcode = IB_WR_REG_MR;
wr.wr.wr_id = FAST_REG_WR_ID_MASK;
- wr.iova_start = state->base_dma_addr;
- wr.page_list = desc->frpl;
- wr.page_list_len = state->npages;
- wr.page_shift = ilog2(dev->mr_page_size);
- wr.length = state->dma_len;
- wr.access_flags = (IB_ACCESS_LOCAL_WRITE |
- IB_ACCESS_REMOTE_READ |
- IB_ACCESS_REMOTE_WRITE);
- wr.rkey = desc->mr->lkey;
+ wr.wr.num_sge = 0;
+ wr.wr.send_flags = 0;
+ wr.mr = desc->mr;
+ wr.key = desc->mr->rkey;
+ wr.access = (IB_ACCESS_LOCAL_WRITE |
+ IB_ACCESS_REMOTE_READ |
+ IB_ACCESS_REMOTE_WRITE);

*state->fr.next++ = desc;
state->nmdesc++;

- srp_map_desc(state, state->base_dma_addr, state->dma_len,
- desc->mr->rkey);
-
- return ib_post_send(ch->qp, &wr.wr, &bad_wr);
-}
-
-static int srp_finish_mapping(struct srp_map_state *state,
- struct srp_rdma_ch *ch)
-{
- struct srp_target_port *target = ch->target;
- struct srp_device *dev = target->srp_host->srp_dev;
- int ret = 0;
-
- WARN_ON_ONCE(!dev->use_fast_reg && !dev->use_fmr);
+ srp_map_desc(state, desc->mr->iova,
+ desc->mr->length, desc->mr->rkey);

- if (state->npages == 0)
- return 0;
+ err = ib_post_send(ch->qp, &wr.wr, &bad_wr);
+ if (unlikely(err))
+ return err;

- if (state->npages == 1 && target->global_mr)
- srp_map_desc(state, state->base_dma_addr, state->dma_len,
- target->global_mr->rkey);
- else
- ret = dev->use_fast_reg ? srp_map_finish_fr(state, ch) :
- srp_map_finish_fmr(state, ch);
-
- if (ret == 0) {
- state->npages = 0;
- state->dma_len = 0;
- }
-
- return ret;
+ return n;
}

static int srp_map_sg_entry(struct srp_map_state *state,
@@ -1389,7 +1378,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
while (dma_len) {
unsigned offset = dma_addr & ~dev->mr_page_mask;
if (state->npages == dev->max_pages_per_mr || offset != 0) {
- ret = srp_finish_mapping(state, ch);
+ ret = srp_map_finish_fmr(state, ch);
if (ret)
return ret;
}
@@ -1411,51 +1400,91 @@ static int srp_map_sg_entry(struct srp_map_state *state,
*/
ret = 0;
if (len != dev->mr_page_size)
- ret = srp_finish_mapping(state, ch);
+ ret = srp_map_finish_fmr(state, ch);
+
return ret;
}

-static int srp_map_sg(struct srp_map_state *state, struct srp_rdma_ch *ch,
- struct srp_request *req, struct scatterlist *scat,
- int count)
+static int srp_map_sg_fmr(struct srp_map_state *state,
+ struct srp_rdma_ch *ch,
+ struct srp_request *req,
+ struct scatterlist *scat,
+ int count)
{
- struct srp_target_port *target = ch->target;
- struct srp_device *dev = target->srp_host->srp_dev;
struct scatterlist *sg;
int i, ret;

- state->desc = req->indirect_desc;
- state->pages = req->map_page;
- if (dev->use_fast_reg) {
- state->fr.next = req->fr_list;
- state->fr.end = req->fr_list + target->cmd_sg_cnt;
- } else if (dev->use_fmr) {
- state->fmr.next = req->fmr_list;
- state->fmr.end = req->fmr_list + target->cmd_sg_cnt;
- }
+ state->desc = req->indirect_desc;
+ state->pages = req->map_page;
+ state->fmr.next = req->fmr_list;
+ state->fmr.end = req->fmr_list + ch->target->cmd_sg_cnt;

- if (dev->use_fast_reg || dev->use_fmr) {
- for_each_sg(scat, sg, count, i) {
- ret = srp_map_sg_entry(state, ch, sg, i);
- if (ret)
- goto out;
- }
- ret = srp_finish_mapping(state, ch);
+ for_each_sg(scat, sg, count, i) {
+ ret = srp_map_sg_entry(state, ch, sg, i);
if (ret)
- goto out;
- } else {
- for_each_sg(scat, sg, count, i) {
- srp_map_desc(state, ib_sg_dma_address(dev->dev, sg),
- ib_sg_dma_len(dev->dev, sg),
- target->global_mr->rkey);
- }
+ return ret;
}

+ ret = srp_map_finish_fmr(state, ch);
+ if (ret)
+ return ret;
+
req->nmdesc = state->nmdesc;
- ret = 0;

-out:
- return ret;
+ return 0;
+}
+
+static int srp_map_sg_fr(struct srp_map_state *state,
+ struct srp_rdma_ch *ch,
+ struct srp_request *req,
+ struct scatterlist *scat,
+ int count)
+{
+
+ state->desc = req->indirect_desc;
+ state->fr.next = req->fr_list;
+ state->fr.end = req->fr_list + ch->target->cmd_sg_cnt;
+ state->sg = scat;
+ state->sg_nents = scsi_sg_count(req->scmnd);
+
+ while (state->sg_nents) {
+ int i, n;
+
+ n = srp_map_finish_fr(state, ch);
+ if (unlikely(n < 0))
+ return n;
+
+ state->sg_nents -= n;
+ for (i = 0; i < n; i++)
+ state->sg = sg_next(state->sg);
+ }
+
+ req->nmdesc = state->nmdesc;
+
+ return 0;
+}
+
+static int srp_map_sg_dma(struct srp_map_state *state,
+ struct srp_rdma_ch *ch,
+ struct srp_request *req,
+ struct scatterlist *scat,
+ int count)
+{
+ struct srp_target_port *target = ch->target;
+ struct srp_device *dev = target->srp_host->srp_dev;
+ struct scatterlist *sg;
+ int i;
+
+ state->desc = req->indirect_desc;
+ for_each_sg(scat, sg, count, i) {
+ srp_map_desc(state, ib_sg_dma_address(dev->dev, sg),
+ ib_sg_dma_len(dev->dev, sg),
+ target->global_mr->rkey);
+ }
+
+ req->nmdesc = state->nmdesc;
+
+ return 0;
}

/*
@@ -1474,6 +1503,7 @@ static int srp_map_idb(struct srp_rdma_ch *ch, struct srp_request *req,
struct srp_map_state state;
struct srp_direct_buf idb_desc;
u64 idb_pages[1];
+ struct scatterlist idb_sg[1];
int ret;

memset(&state, 0, sizeof(state));
@@ -1481,19 +1511,31 @@ static int srp_map_idb(struct srp_rdma_ch *ch, struct srp_request *req,
state.gen.next = next_mr;
state.gen.end = end_mr;
state.desc = &idb_desc;
- state.pages = idb_pages;
- state.pages[0] = (req->indirect_dma_addr &
- dev->mr_page_mask);
- state.npages = 1;
state.base_dma_addr = req->indirect_dma_addr;
state.dma_len = idb_len;
- ret = srp_finish_mapping(&state, ch);
- if (ret < 0)
- goto out;
+
+ if (dev->use_fast_reg) {
+ state.sg = idb_sg;
+ state.sg_nents = 1;
+ sg_set_buf(idb_sg, req->indirect_desc, idb_len);
+ idb_sg->dma_address = req->indirect_dma_addr; /* hack! */
+ ret = srp_map_finish_fr(&state, ch);
+ if (ret < 0)
+ return ret;
+ } else if (dev->use_fmr) {
+ state.pages = idb_pages;
+ state.pages[0] = (req->indirect_dma_addr &
+ dev->mr_page_mask);
+ state.npages = 1;
+ ret = srp_map_finish_fmr(&state, ch);
+ if (ret < 0)
+ return ret;
+ } else {
+ return -EINVAL;
+ }

*idb_rkey = idb_desc.key;

-out:
return ret;
}

@@ -1563,7 +1605,13 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_rdma_ch *ch,
target->indirect_size, DMA_TO_DEVICE);

memset(&state, 0, sizeof(state));
- srp_map_sg(&state, ch, req, scat, count);
+
+ if (dev->use_fast_reg)
+ srp_map_sg_fr(&state, ch, req, scat, count);
+ else if (dev->use_fmr)
+ srp_map_sg_fmr(&state, ch, req, scat, count);
+ else
+ srp_map_sg_dma(&state, ch, req, scat, count);

/* We've mapped the request, now pull as much of the indirect
* descriptor table as we can into the command buffer. If this
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index 3608f2e4819c..a31a93716f3f 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -242,7 +242,6 @@ struct srp_iu {
struct srp_fr_desc {
struct list_head entry;
struct ib_mr *mr;
- struct ib_fast_reg_page_list *frpl;
};

/**
@@ -294,11 +293,17 @@ struct srp_map_state {
} gen;
};
struct srp_direct_buf *desc;
- u64 *pages;
+ union {
+ u64 *pages;
+ struct scatterlist *sg;
+ };
dma_addr_t base_dma_addr;
u32 dma_len;
u32 total_len;
- unsigned int npages;
+ union {
+ unsigned int npages;
+ unsigned int sg_nents;
+ };
unsigned int nmdesc;
unsigned int ndesc;
};
--
1.8.4.3


2015-09-17 09:43:33

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 10/24] IB/iser: Port to new fast registration API

Remove fastreg page list allocation as the page vector
is now private to the provider. Instead of constructing
the page list and fast_req work request, call ib_map_mr_sg
and construct ib_reg_wr.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/ulp/iser/iscsi_iser.h | 8 ++---
drivers/infiniband/ulp/iser/iser_memory.c | 53 ++++++++++++++-----------------
drivers/infiniband/ulp/iser/iser_verbs.c | 16 +---------
3 files changed, 26 insertions(+), 51 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 2484bee993ec..271aa71e827c 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -297,7 +297,7 @@ struct iser_tx_desc {
u8 wr_idx;
union iser_wr {
struct ib_send_wr send;
- struct ib_fast_reg_wr fast_reg;
+ struct ib_reg_wr fast_reg;
struct ib_sig_handover_wr sig;
} wrs[ISER_MAX_WRS];
struct iser_mem_reg data_reg;
@@ -412,7 +412,6 @@ struct iser_device {
*
* @mr: memory region
* @fmr_pool: pool of fmrs
- * @frpl: fast reg page list used by frwrs
* @page_vec: fast reg page list used by fmr pool
* @mr_valid: is mr valid indicator
*/
@@ -421,10 +420,7 @@ struct iser_reg_resources {
struct ib_mr *mr;
struct ib_fmr_pool *fmr_pool;
};
- union {
- struct ib_fast_reg_page_list *frpl;
- struct iser_page_vec *page_vec;
- };
+ struct iser_page_vec *page_vec;
u8 mr_valid:1;
};

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index b29fda3e8e74..d78eafb159b4 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -472,7 +472,7 @@ iser_reg_sig_mr(struct iscsi_iser_task *iser_task,
sig_reg->sge.addr = 0;
sig_reg->sge.length = scsi_transfer_length(iser_task->sc);

- iser_dbg("sig reg: lkey: 0x%x, rkey: 0x%x, addr: 0x%llx, length: %u\n",
+ iser_dbg("lkey=0x%x rkey=0x%x addr=0x%llx length=%u\n",
sig_reg->sge.lkey, sig_reg->rkey, sig_reg->sge.addr,
sig_reg->sge.length);
err:
@@ -484,47 +484,40 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task,
struct iser_reg_resources *rsc,
struct iser_mem_reg *reg)
{
- struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn;
- struct iser_device *device = ib_conn->device;
- struct ib_mr *mr = rsc->mr;
- struct ib_fast_reg_page_list *frpl = rsc->frpl;
struct iser_tx_desc *tx_desc = &iser_task->desc;
- struct ib_fast_reg_wr *wr;
- int offset, size, plen;
-
- plen = iser_sg_to_page_vec(mem, device->ib_device, frpl->page_list,
- &offset, &size);
- if (plen * SIZE_4K < size) {
- iser_err("fast reg page_list too short to hold this SG\n");
- return -EINVAL;
- }
+ struct ib_mr *mr = rsc->mr;
+ struct ib_reg_wr *wr;
+ int n;

if (!rsc->mr_valid)
iser_inv_rkey(iser_tx_next_wr(tx_desc), mr);

- wr = fast_reg_wr(iser_tx_next_wr(tx_desc));
- wr->wr.opcode = IB_WR_FAST_REG_MR;
+ n = ib_map_mr_sg(mr, mem->sg, mem->size, SIZE_4K);
+ if (unlikely(n != mem->size)) {
+ iser_err("failed to map sg (%d/%d)\n",
+ n, mem->size);
+ return n < 0 ? n : -EINVAL;
+ }
+
+ wr = reg_wr(iser_tx_next_wr(tx_desc));
+ wr->wr.opcode = IB_WR_REG_MR;
wr->wr.wr_id = ISER_FASTREG_LI_WRID;
wr->wr.send_flags = 0;
- wr->iova_start = frpl->page_list[0] + offset;
- wr->page_list = frpl;
- wr->page_list_len = plen;
- wr->page_shift = SHIFT_4K;
- wr->length = size;
- wr->rkey = mr->rkey;
- wr->access_flags = (IB_ACCESS_LOCAL_WRITE |
- IB_ACCESS_REMOTE_WRITE |
- IB_ACCESS_REMOTE_READ);
+ wr->mr = mr;
+ wr->key = mr->rkey;
+ wr->access = IB_ACCESS_LOCAL_WRITE |
+ IB_ACCESS_REMOTE_WRITE |
+ IB_ACCESS_REMOTE_READ;
+
rsc->mr_valid = 0;

reg->sge.lkey = mr->lkey;
reg->rkey = mr->rkey;
- reg->sge.addr = frpl->page_list[0] + offset;
- reg->sge.length = size;
+ reg->sge.addr = mr->iova;
+ reg->sge.length = mr->length;

- iser_dbg("fast reg: lkey=0x%x, rkey=0x%x, addr=0x%llx,"
- " length=0x%x\n", reg->sge.lkey, reg->rkey,
- reg->sge.addr, reg->sge.length);
+ iser_dbg("lkey=0x%x rkey=0x%x addr=0x%llx length=0x%x\n",
+ reg->sge.lkey, reg->rkey, reg->sge.addr, reg->sge.length);

return 0;
}
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index b26022e30af1..a66b9dea96d8 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -293,35 +293,21 @@ iser_alloc_reg_res(struct ib_device *ib_device,
{
int ret;

- res->frpl = ib_alloc_fast_reg_page_list(ib_device, size);
- if (IS_ERR(res->frpl)) {
- ret = PTR_ERR(res->frpl);
- iser_err("Failed to allocate ib_fast_reg_page_list err=%d\n",
- ret);
- return PTR_ERR(res->frpl);
- }
-
res->mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, size);
if (IS_ERR(res->mr)) {
ret = PTR_ERR(res->mr);
iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", ret);
- goto fast_reg_mr_failure;
+ return ret;
}
res->mr_valid = 1;

return 0;
-
-fast_reg_mr_failure:
- ib_free_fast_reg_page_list(res->frpl);
-
- return ret;
}

static void
iser_free_reg_res(struct iser_reg_resources *rsc)
{
ib_dereg_mr(rsc->mr);
- ib_free_fast_reg_page_list(rsc->frpl);
}

static int
--
1.8.4.3


2015-09-17 09:43:33

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 19/24] RDMA/cxgb3: Remove old FRWR API

No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/cxgb3/iwch_cq.c | 2 +-
drivers/infiniband/hw/cxgb3/iwch_provider.c | 24 ---------------
drivers/infiniband/hw/cxgb3/iwch_qp.c | 47 -----------------------------
3 files changed, 1 insertion(+), 72 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c b/drivers/infiniband/hw/cxgb3/iwch_cq.c
index cf5474ae68ff..cfe404925a39 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cq.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c
@@ -123,7 +123,7 @@ static int iwch_poll_cq_one(struct iwch_dev *rhp, struct iwch_cq *chp,
wc->opcode = IB_WC_LOCAL_INV;
break;
case T3_FAST_REGISTER:
- wc->opcode = IB_WC_FAST_REG_MR;
+ wc->opcode = IB_WC_REG_MR;
break;
default:
printk(KERN_ERR MOD "Unexpected opcode %d "
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index ee3d5ca7de6c..99ae2ab14b9e 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -884,28 +884,6 @@ static int iwch_map_mr_sg(struct ib_mr *ibmr,
return ib_sg_to_pages(ibmr, sg, sg_nents, iwch_set_page);
}

-static struct ib_fast_reg_page_list *iwch_alloc_fastreg_pbl(
- struct ib_device *device,
- int page_list_len)
-{
- struct ib_fast_reg_page_list *page_list;
-
- page_list = kmalloc(sizeof *page_list + page_list_len * sizeof(u64),
- GFP_KERNEL);
- if (!page_list)
- return ERR_PTR(-ENOMEM);
-
- page_list->page_list = (u64 *)(page_list + 1);
- page_list->max_page_list_len = page_list_len;
-
- return page_list;
-}
-
-static void iwch_free_fastreg_pbl(struct ib_fast_reg_page_list *page_list)
-{
- kfree(page_list);
-}
-
static int iwch_destroy_qp(struct ib_qp *ib_qp)
{
struct iwch_dev *rhp;
@@ -1483,8 +1461,6 @@ int iwch_register_device(struct iwch_dev *dev)
dev->ibdev.dealloc_mw = iwch_dealloc_mw;
dev->ibdev.alloc_mr = iwch_alloc_mr;
dev->ibdev.map_mr_sg = iwch_map_mr_sg;
- dev->ibdev.alloc_fast_reg_page_list = iwch_alloc_fastreg_pbl;
- dev->ibdev.free_fast_reg_page_list = iwch_free_fastreg_pbl;
dev->ibdev.attach_mcast = iwch_multicast_attach;
dev->ibdev.detach_mcast = iwch_multicast_detach;
dev->ibdev.process_mad = iwch_process_mad;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index a09ea538e990..d0548fc6395e 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -189,48 +189,6 @@ static int build_memreg(union t3_wr *wqe, struct ib_reg_wr *wr,
return 0;
}

-static int build_fastreg(union t3_wr *wqe, struct ib_send_wr *send_wr,
- u8 *flit_cnt, int *wr_cnt, struct t3_wq *wq)
-{
- struct ib_fast_reg_wr *wr = fast_reg_wr(send_wr);
- int i;
- __be64 *p;
-
- if (wr->page_list_len > T3_MAX_FASTREG_DEPTH)
- return -EINVAL;
- *wr_cnt = 1;
- wqe->fastreg.stag = cpu_to_be32(wr->rkey);
- wqe->fastreg.len = cpu_to_be32(wr->length);
- wqe->fastreg.va_base_hi = cpu_to_be32(wr->iova_start >> 32);
- wqe->fastreg.va_base_lo_fbo = cpu_to_be32(wr->iova_start & 0xffffffff);
- wqe->fastreg.page_type_perms = cpu_to_be32(
- V_FR_PAGE_COUNT(wr->page_list_len) |
- V_FR_PAGE_SIZE(wr->page_shift-12) |
- V_FR_TYPE(TPT_VATO) |
- V_FR_PERMS(iwch_ib_to_tpt_access(wr->access_flags)));
- p = &wqe->fastreg.pbl_addrs[0];
- for (i = 0; i < wr->page_list_len; i++, p++) {
-
- /* If we need a 2nd WR, then set it up */
- if (i == T3_MAX_FASTREG_FRAG) {
- *wr_cnt = 2;
- wqe = (union t3_wr *)(wq->queue +
- Q_PTR2IDX((wq->wptr+1), wq->size_log2));
- build_fw_riwrh((void *)wqe, T3_WR_FASTREG, 0,
- Q_GENBIT(wq->wptr + 1, wq->size_log2),
- 0, 1 + wr->page_list_len - T3_MAX_FASTREG_FRAG,
- T3_EOP);
-
- p = &wqe->pbl_frag.pbl_addrs[0];
- }
- *p = cpu_to_be64((u64)wr->page_list->page_list[i]);
- }
- *flit_cnt = 5 + wr->page_list_len;
- if (*flit_cnt > 15)
- *flit_cnt = 15;
- return 0;
-}
-
static int build_inv_stag(union t3_wr *wqe, struct ib_send_wr *wr,
u8 *flit_cnt)
{
@@ -457,11 +415,6 @@ int iwch_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
if (!qhp->wq.oldest_read)
qhp->wq.oldest_read = sqp;
break;
- case IB_WR_FAST_REG_MR:
- t3_wr_opcode = T3_WR_FASTREG;
- err = build_fastreg(wqe, wr, &t3_wr_flit_cnt,
- &wr_cnt, &qhp->wq);
- break;
case IB_WR_REG_MR:
t3_wr_opcode = T3_WR_FASTREG;
err = build_memreg(wqe, reg_wr(wr), &t3_wr_flit_cnt,
--
1.8.4.3


2015-09-17 09:43:36

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 20/24] iw_cxgb4: Remove old FRWR API

No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/cxgb4/cq.c | 2 +-
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 18 --------
drivers/infiniband/hw/cxgb4/mem.c | 45 --------------------
drivers/infiniband/hw/cxgb4/provider.c | 2 -
drivers/infiniband/hw/cxgb4/qp.c | 77 ----------------------------------
5 files changed, 1 insertion(+), 143 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c
index c7aab48f07cd..4f8c3ff3da5e 100644
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -752,7 +752,7 @@ static int c4iw_poll_cq_one(struct c4iw_cq *chp, struct ib_wc *wc)
wc->opcode = IB_WC_LOCAL_INV;
break;
case FW_RI_FAST_REGISTER:
- wc->opcode = IB_WC_FAST_REG_MR;
+ wc->opcode = IB_WC_REG_MR;
break;
default:
printk(KERN_ERR MOD "Unexpected opcode %d "
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 032f90aa8ac9..699c52b875b1 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -409,20 +409,6 @@ static inline struct c4iw_mw *to_c4iw_mw(struct ib_mw *ibmw)
return container_of(ibmw, struct c4iw_mw, ibmw);
}

-struct c4iw_fr_page_list {
- struct ib_fast_reg_page_list ibpl;
- DEFINE_DMA_UNMAP_ADDR(mapping);
- dma_addr_t dma_addr;
- struct c4iw_dev *dev;
- int pll_len;
-};
-
-static inline struct c4iw_fr_page_list *to_c4iw_fr_page_list(
- struct ib_fast_reg_page_list *ibpl)
-{
- return container_of(ibpl, struct c4iw_fr_page_list, ibpl);
-}
-
struct c4iw_cq {
struct ib_cq ibcq;
struct c4iw_dev *rhp;
@@ -970,10 +956,6 @@ int c4iw_accept_cr(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param);
int c4iw_reject_cr(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len);
void c4iw_qp_add_ref(struct ib_qp *qp);
void c4iw_qp_rem_ref(struct ib_qp *qp);
-void c4iw_free_fastreg_pbl(struct ib_fast_reg_page_list *page_list);
-struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(
- struct ib_device *device,
- int page_list_len);
struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 86ec65721797..ada42ed425a0 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -945,51 +945,6 @@ int c4iw_map_mr_sg(struct ib_mr *ibmr,
return ib_sg_to_pages(ibmr, sg, sg_nents, c4iw_set_page);
}

-struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(struct ib_device *device,
- int page_list_len)
-{
- struct c4iw_fr_page_list *c4pl;
- struct c4iw_dev *dev = to_c4iw_dev(device);
- dma_addr_t dma_addr;
- int pll_len = roundup(page_list_len * sizeof(u64), 32);
-
- c4pl = kmalloc(sizeof(*c4pl), GFP_KERNEL);
- if (!c4pl)
- return ERR_PTR(-ENOMEM);
-
- c4pl->ibpl.page_list = dma_alloc_coherent(&dev->rdev.lldi.pdev->dev,
- pll_len, &dma_addr,
- GFP_KERNEL);
- if (!c4pl->ibpl.page_list) {
- kfree(c4pl);
- return ERR_PTR(-ENOMEM);
- }
- dma_unmap_addr_set(c4pl, mapping, dma_addr);
- c4pl->dma_addr = dma_addr;
- c4pl->dev = dev;
- c4pl->pll_len = pll_len;
-
- PDBG("%s c4pl %p pll_len %u page_list %p dma_addr %pad\n",
- __func__, c4pl, c4pl->pll_len, c4pl->ibpl.page_list,
- &c4pl->dma_addr);
-
- return &c4pl->ibpl;
-}
-
-void c4iw_free_fastreg_pbl(struct ib_fast_reg_page_list *ibpl)
-{
- struct c4iw_fr_page_list *c4pl = to_c4iw_fr_page_list(ibpl);
-
- PDBG("%s c4pl %p pll_len %u page_list %p dma_addr %pad\n",
- __func__, c4pl, c4pl->pll_len, c4pl->ibpl.page_list,
- &c4pl->dma_addr);
-
- dma_free_coherent(&c4pl->dev->rdev.lldi.pdev->dev,
- c4pl->pll_len,
- c4pl->ibpl.page_list, dma_unmap_addr(c4pl, mapping));
- kfree(c4pl);
-}
-
int c4iw_dereg_mr(struct ib_mr *ib_mr)
{
struct c4iw_dev *rhp;
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 55dedadcffaa..8f115b405d76 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -558,8 +558,6 @@ int c4iw_register_device(struct c4iw_dev *dev)
dev->ibdev.dealloc_mw = c4iw_dealloc_mw;
dev->ibdev.alloc_mr = c4iw_alloc_mr;
dev->ibdev.map_mr_sg = c4iw_map_mr_sg;
- dev->ibdev.alloc_fast_reg_page_list = c4iw_alloc_fastreg_pbl;
- dev->ibdev.free_fast_reg_page_list = c4iw_free_fastreg_pbl;
dev->ibdev.attach_mcast = c4iw_multicast_attach;
dev->ibdev.detach_mcast = c4iw_multicast_detach;
dev->ibdev.process_mad = c4iw_process_mad;
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index fddbd2cc90b8..d2d9b54355b4 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -671,75 +671,6 @@ static int build_memreg(struct t4_sq *sq, union t4_wr *wqe,
return 0;
}

-static int build_fastreg(struct t4_sq *sq, union t4_wr *wqe,
- struct ib_send_wr *send_wr, u8 *len16, u8 t5dev)
-{
- struct ib_fast_reg_wr *wr = fast_reg_wr(send_wr);
-
- struct fw_ri_immd *imdp;
- __be64 *p;
- int i;
- int pbllen = roundup(wr->page_list_len * sizeof(u64), 32);
- int rem;
-
- if (wr->page_list_len > t4_max_fr_depth(use_dsgl))
- return -EINVAL;
-
- wqe->fr.qpbinde_to_dcacpu = 0;
- wqe->fr.pgsz_shift = wr->page_shift - 12;
- wqe->fr.addr_type = FW_RI_VA_BASED_TO;
- wqe->fr.mem_perms = c4iw_ib_to_tpt_access(wr->access_flags);
- wqe->fr.len_hi = 0;
- wqe->fr.len_lo = cpu_to_be32(wr->length);
- wqe->fr.stag = cpu_to_be32(wr->rkey);
- wqe->fr.va_hi = cpu_to_be32(wr->iova_start >> 32);
- wqe->fr.va_lo_fbo = cpu_to_be32(wr->iova_start & 0xffffffff);
-
- if (t5dev && use_dsgl && (pbllen > max_fr_immd)) {
- struct c4iw_fr_page_list *c4pl =
- to_c4iw_fr_page_list(wr->page_list);
- struct fw_ri_dsgl *sglp;
-
- for (i = 0; i < wr->page_list_len; i++) {
- wr->page_list->page_list[i] = (__force u64)
- cpu_to_be64((u64)wr->page_list->page_list[i]);
- }
-
- sglp = (struct fw_ri_dsgl *)(&wqe->fr + 1);
- sglp->op = FW_RI_DATA_DSGL;
- sglp->r1 = 0;
- sglp->nsge = cpu_to_be16(1);
- sglp->addr0 = cpu_to_be64(c4pl->dma_addr);
- sglp->len0 = cpu_to_be32(pbllen);
-
- *len16 = DIV_ROUND_UP(sizeof(wqe->fr) + sizeof(*sglp), 16);
- } else {
- imdp = (struct fw_ri_immd *)(&wqe->fr + 1);
- imdp->op = FW_RI_DATA_IMMD;
- imdp->r1 = 0;
- imdp->r2 = 0;
- imdp->immdlen = cpu_to_be32(pbllen);
- p = (__be64 *)(imdp + 1);
- rem = pbllen;
- for (i = 0; i < wr->page_list_len; i++) {
- *p = cpu_to_be64((u64)wr->page_list->page_list[i]);
- rem -= sizeof(*p);
- if (++p == (__be64 *)&sq->queue[sq->size])
- p = (__be64 *)sq->queue;
- }
- BUG_ON(rem < 0);
- while (rem) {
- *p = 0;
- rem -= sizeof(*p);
- if (++p == (__be64 *)&sq->queue[sq->size])
- p = (__be64 *)sq->queue;
- }
- *len16 = DIV_ROUND_UP(sizeof(wqe->fr) + sizeof(*imdp)
- + pbllen, 16);
- }
- return 0;
-}
-
static int build_inv_stag(union t4_wr *wqe, struct ib_send_wr *wr,
u8 *len16)
{
@@ -876,14 +807,6 @@ int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
if (!qhp->wq.sq.oldest_read)
qhp->wq.sq.oldest_read = swsqe;
break;
- case IB_WR_FAST_REG_MR:
- fw_opcode = FW_RI_FR_NSMR_WR;
- swsqe->opcode = FW_RI_FAST_REGISTER;
- err = build_fastreg(&qhp->wq.sq, wqe, wr, &len16,
- is_t5(
- qhp->rhp->rdev.lldi.adapter_type) ?
- 1 : 0);
- break;
case IB_WR_REG_MR:
fw_opcode = FW_RI_FR_NSMR_WR;
swsqe->opcode = FW_RI_FAST_REGISTER;
--
1.8.4.3


2015-09-17 09:43:38

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 22/24] RDMA/nes: Remove old FRWR API

No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/nes/nes_hw.h | 6 --
drivers/infiniband/hw/nes/nes_verbs.c | 162 +---------------------------------
2 files changed, 1 insertion(+), 167 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h
index d748e4b31b8d..c9080208aad2 100644
--- a/drivers/infiniband/hw/nes/nes_hw.h
+++ b/drivers/infiniband/hw/nes/nes_hw.h
@@ -1200,12 +1200,6 @@ struct nes_fast_mr_wqe_pbl {
dma_addr_t paddr;
};

-struct nes_ib_fast_reg_page_list {
- struct ib_fast_reg_page_list ibfrpl;
- struct nes_fast_mr_wqe_pbl nes_wqe_pbl;
- u64 pbl;
-};
-
struct nes_listener {
struct work_struct work;
struct workqueue_struct *wq;
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index ba069ec2ebf9..51a0a9cedcf4 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -486,76 +486,6 @@ static int nes_map_mr_sg(struct ib_mr *ibmr,
return ib_sg_to_pages(ibmr, sg, sg_nents, nes_set_page);
}

-/*
- * nes_alloc_fast_reg_page_list
- */
-static struct ib_fast_reg_page_list *nes_alloc_fast_reg_page_list(
- struct ib_device *ibdev,
- int page_list_len)
-{
- struct nes_vnic *nesvnic = to_nesvnic(ibdev);
- struct nes_device *nesdev = nesvnic->nesdev;
- struct ib_fast_reg_page_list *pifrpl;
- struct nes_ib_fast_reg_page_list *pnesfrpl;
-
- if (page_list_len > (NES_4K_PBL_CHUNK_SIZE / sizeof(u64)))
- return ERR_PTR(-E2BIG);
- /*
- * Allocate the ib_fast_reg_page_list structure, the
- * nes_fast_bpl structure, and the PLB table.
- */
- pnesfrpl = kmalloc(sizeof(struct nes_ib_fast_reg_page_list) +
- page_list_len * sizeof(u64), GFP_KERNEL);
-
- if (!pnesfrpl)
- return ERR_PTR(-ENOMEM);
-
- pifrpl = &pnesfrpl->ibfrpl;
- pifrpl->page_list = &pnesfrpl->pbl;
- pifrpl->max_page_list_len = page_list_len;
- /*
- * Allocate the WQE PBL
- */
- pnesfrpl->nes_wqe_pbl.kva = pci_alloc_consistent(nesdev->pcidev,
- page_list_len * sizeof(u64),
- &pnesfrpl->nes_wqe_pbl.paddr);
-
- if (!pnesfrpl->nes_wqe_pbl.kva) {
- kfree(pnesfrpl);
- return ERR_PTR(-ENOMEM);
- }
- nes_debug(NES_DBG_MR, "nes_alloc_fast_reg_pbl: nes_frpl = %p, "
- "ibfrpl = %p, ibfrpl.page_list = %p, pbl.kva = %p, "
- "pbl.paddr = %llx\n", pnesfrpl, &pnesfrpl->ibfrpl,
- pnesfrpl->ibfrpl.page_list, pnesfrpl->nes_wqe_pbl.kva,
- (unsigned long long) pnesfrpl->nes_wqe_pbl.paddr);
-
- return pifrpl;
-}
-
-/*
- * nes_free_fast_reg_page_list
- */
-static void nes_free_fast_reg_page_list(struct ib_fast_reg_page_list *pifrpl)
-{
- struct nes_vnic *nesvnic = to_nesvnic(pifrpl->device);
- struct nes_device *nesdev = nesvnic->nesdev;
- struct nes_ib_fast_reg_page_list *pnesfrpl;
-
- pnesfrpl = container_of(pifrpl, struct nes_ib_fast_reg_page_list, ibfrpl);
- /*
- * Free the WQE PBL.
- */
- pci_free_consistent(nesdev->pcidev,
- pifrpl->max_page_list_len * sizeof(u64),
- pnesfrpl->nes_wqe_pbl.kva,
- pnesfrpl->nes_wqe_pbl.paddr);
- /*
- * Free the PBL structure
- */
- kfree(pnesfrpl);
-}
-
/**
* nes_query_device
*/
@@ -3470,94 +3400,6 @@ static int nes_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr,
NES_IWARP_SQ_LOCINV_WQE_INV_STAG_IDX,
ib_wr->ex.invalidate_rkey);
break;
- case IB_WR_FAST_REG_MR:
- {
- int i;
- struct ib_fast_reg_wr *fwr = fast_reg_wr(ib_wr);
- int flags = fwr->access_flags;
- struct nes_ib_fast_reg_page_list *pnesfrpl =
- container_of(fwr->page_list,
- struct nes_ib_fast_reg_page_list,
- ibfrpl);
- u64 *src_page_list = pnesfrpl->ibfrpl.page_list;
- u64 *dst_page_list = pnesfrpl->nes_wqe_pbl.kva;
-
- if (fwr->page_list_len >
- (NES_4K_PBL_CHUNK_SIZE / sizeof(u64))) {
- nes_debug(NES_DBG_IW_TX, "SQ_FMR: bad page_list_len\n");
- err = -EINVAL;
- break;
- }
- wqe_misc = NES_IWARP_SQ_OP_FAST_REG;
- set_wqe_64bit_value(wqe->wqe_words,
- NES_IWARP_SQ_FMR_WQE_VA_FBO_LOW_IDX,
- fwr->iova_start);
- set_wqe_32bit_value(wqe->wqe_words,
- NES_IWARP_SQ_FMR_WQE_LENGTH_LOW_IDX,
- fwr->length);
- set_wqe_32bit_value(wqe->wqe_words,
- NES_IWARP_SQ_FMR_WQE_LENGTH_HIGH_IDX, 0);
- set_wqe_32bit_value(wqe->wqe_words,
- NES_IWARP_SQ_FMR_WQE_MR_STAG_IDX,
- fwr->rkey);
- /* Set page size: */
- if (fwr->page_shift == 12) {
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_PAGE_SIZE_4K;
- } else if (fwr->page_shift == 21) {
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_PAGE_SIZE_2M;
- } else {
- nes_debug(NES_DBG_IW_TX, "Invalid page shift,"
- " ib_wr=%u, max=1\n", ib_wr->num_sge);
- err = -EINVAL;
- break;
- }
- /* Set access_flags */
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_LOCAL_READ;
- if (flags & IB_ACCESS_LOCAL_WRITE)
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_LOCAL_WRITE;
-
- if (flags & IB_ACCESS_REMOTE_WRITE)
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_REMOTE_WRITE;
-
- if (flags & IB_ACCESS_REMOTE_READ)
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_REMOTE_READ;
-
- if (flags & IB_ACCESS_MW_BIND)
- wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_WINDOW_BIND;
-
- /* Fill in PBL info: */
- if (fwr->page_list_len >
- pnesfrpl->ibfrpl.max_page_list_len) {
- nes_debug(NES_DBG_IW_TX, "Invalid page list length,"
- " ib_wr=%p, value=%u, max=%u\n",
- ib_wr, fwr->page_list_len,
- pnesfrpl->ibfrpl.max_page_list_len);
- err = -EINVAL;
- break;
- }
-
- set_wqe_64bit_value(wqe->wqe_words,
- NES_IWARP_SQ_FMR_WQE_PBL_ADDR_LOW_IDX,
- pnesfrpl->nes_wqe_pbl.paddr);
-
- set_wqe_32bit_value(wqe->wqe_words,
- NES_IWARP_SQ_FMR_WQE_PBL_LENGTH_IDX,
- fwr->page_list_len * 8);
-
- for (i = 0; i < fwr->page_list_len; i++)
- dst_page_list[i] = cpu_to_le64(src_page_list[i]);
-
- nes_debug(NES_DBG_IW_TX, "SQ_FMR: iova_start: %llx, "
- "length: %d, rkey: %0x, pgl_paddr: %llx, "
- "page_list_len: %u, wqe_misc: %x\n",
- (unsigned long long) fwr->iova_start,
- fwr->length,
- fwr->rkey,
- (unsigned long long) pnesfrpl->nes_wqe_pbl.paddr,
- fwr->page_list_len,
- wqe_misc);
- break;
- }
case IB_WR_REG_MR:
{
struct nes_mr *mr = to_nesmr(reg_wr(ib_wr)->mr);
@@ -3866,7 +3708,7 @@ static int nes_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry)
entry->opcode = IB_WC_LOCAL_INV;
break;
case NES_IWARP_SQ_OP_FAST_REG:
- entry->opcode = IB_WC_FAST_REG_MR;
+ entry->opcode = IB_WC_REG_MR;
break;
}

@@ -4055,8 +3897,6 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev)

nesibdev->ibdev.alloc_mr = nes_alloc_mr;
nesibdev->ibdev.map_mr_sg = nes_map_mr_sg;
- nesibdev->ibdev.alloc_fast_reg_page_list = nes_alloc_fast_reg_page_list;
- nesibdev->ibdev.free_fast_reg_page_list = nes_free_fast_reg_page_list;

nesibdev->ibdev.attach_mcast = nes_multicast_attach;
nesibdev->ibdev.detach_mcast = nes_multicast_detach;
--
1.8.4.3


2015-09-17 09:43:44

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 16/24] IB/mlx5: Remove old FRWR API support

No ULP uses it anymore, go ahead and remove it.
Keep only the local invalidate part of the handlers.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/mlx5/cq.c | 3 --
drivers/infiniband/hw/mlx5/main.c | 2 -
drivers/infiniband/hw/mlx5/mlx5_ib.h | 14 ------
drivers/infiniband/hw/mlx5/mr.c | 42 ----------------
drivers/infiniband/hw/mlx5/qp.c | 97 ++++--------------------------------
5 files changed, 9 insertions(+), 149 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 90daf791d51d..640c54ef5eed 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -111,9 +111,6 @@ static enum ib_wc_opcode get_umr_comp(struct mlx5_ib_wq *wq, int idx)
case IB_WR_REG_MR:
return IB_WC_REG_MR;

- case IB_WR_FAST_REG_MR:
- return IB_WC_FAST_REG_MR;
-
default:
pr_warn("unknown completion status\n");
return 0;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 7ebce545daf1..32f20d0fd632 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1433,8 +1433,6 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
dev->ib_dev.process_mad = mlx5_ib_process_mad;
dev->ib_dev.alloc_mr = mlx5_ib_alloc_mr;
dev->ib_dev.map_mr_sg = mlx5_ib_map_mr_sg;
- dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
- dev->ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list;
dev->ib_dev.check_mr_status = mlx5_ib_check_mr_status;
dev->ib_dev.get_port_immutable = mlx5_port_immutable;

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index bc1853f8e67d..91062d648125 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -337,12 +337,6 @@ struct mlx5_ib_mr {
int live;
};

-struct mlx5_ib_fast_reg_page_list {
- struct ib_fast_reg_page_list ibfrpl;
- __be64 *mapped_page_list;
- dma_addr_t map;
-};
-
struct mlx5_ib_umr_context {
enum ib_wc_status status;
struct completion done;
@@ -493,11 +487,6 @@ static inline struct mlx5_ib_mr *to_mmr(struct ib_mr *ibmr)
return container_of(ibmr, struct mlx5_ib_mr, ibmr);
}

-static inline struct mlx5_ib_fast_reg_page_list *to_mfrpl(struct ib_fast_reg_page_list *ibfrpl)
-{
- return container_of(ibfrpl, struct mlx5_ib_fast_reg_page_list, ibfrpl);
-}
-
struct mlx5_ib_ah {
struct ib_ah ibah;
struct mlx5_av av;
@@ -568,9 +557,6 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
struct scatterlist *sg,
unsigned int sg_nents);
-struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
- int page_list_len);
-void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
const struct ib_wc *in_wc, const struct ib_grh *in_grh,
const struct ib_mad_hdr *in, size_t in_mad_size,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 2f3b648719da..9f662d48606d 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1378,48 +1378,6 @@ err_free:
return ERR_PTR(err);
}

-struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
- int page_list_len)
-{
- struct mlx5_ib_fast_reg_page_list *mfrpl;
- int size = page_list_len * sizeof(u64);
-
- mfrpl = kmalloc(sizeof(*mfrpl), GFP_KERNEL);
- if (!mfrpl)
- return ERR_PTR(-ENOMEM);
-
- mfrpl->ibfrpl.page_list = kmalloc(size, GFP_KERNEL);
- if (!mfrpl->ibfrpl.page_list)
- goto err_free;
-
- mfrpl->mapped_page_list = dma_alloc_coherent(ibdev->dma_device,
- size, &mfrpl->map,
- GFP_KERNEL);
- if (!mfrpl->mapped_page_list)
- goto err_free;
-
- WARN_ON(mfrpl->map & 0x3f);
-
- return &mfrpl->ibfrpl;
-
-err_free:
- kfree(mfrpl->ibfrpl.page_list);
- kfree(mfrpl);
- return ERR_PTR(-ENOMEM);
-}
-
-void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list)
-{
- struct mlx5_ib_fast_reg_page_list *mfrpl = to_mfrpl(page_list);
- struct mlx5_ib_dev *dev = to_mdev(page_list->device);
- int size = page_list->max_page_list_len * sizeof(u64);
-
- dma_free_coherent(&dev->mdev->pdev->dev, size, mfrpl->mapped_page_list,
- mfrpl->map);
- kfree(mfrpl->ibfrpl.page_list);
- kfree(mfrpl);
-}
-
int mlx5_ib_check_mr_status(struct ib_mr *ibmr, u32 check_mask,
struct ib_mr_status *mr_status)
{
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 61d3aa9a6ca9..96adc6f13dcb 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -64,7 +64,6 @@ static const u32 mlx5_ib_opcode[] = {
[IB_WR_ATOMIC_FETCH_AND_ADD] = MLX5_OPCODE_ATOMIC_FA,
[IB_WR_SEND_WITH_INV] = MLX5_OPCODE_SEND_INVAL,
[IB_WR_LOCAL_INV] = MLX5_OPCODE_UMR,
- [IB_WR_FAST_REG_MR] = MLX5_OPCODE_UMR,
[IB_WR_REG_MR] = MLX5_OPCODE_UMR,
[IB_WR_MASKED_ATOMIC_CMP_AND_SWP] = MLX5_OPCODE_ATOMIC_MASKED_CS,
[IB_WR_MASKED_ATOMIC_FETCH_AND_ADD] = MLX5_OPCODE_ATOMIC_MASKED_FA,
@@ -1913,20 +1912,11 @@ static void set_reg_umr_seg(struct mlx5_wqe_umr_ctrl_seg *umr,
umr->mkey_mask = frwr_mkey_mask();
}

-static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr,
- struct ib_send_wr *wr, int li)
+static void set_linv_umr_seg(struct mlx5_wqe_umr_ctrl_seg *umr)
{
memset(umr, 0, sizeof(*umr));
-
- if (li) {
- umr->mkey_mask = cpu_to_be64(MLX5_MKEY_MASK_FREE);
- umr->flags = 1 << 7;
- return;
- }
-
- umr->flags = (1 << 5); /* fail if not free */
- umr->klm_octowords = get_klm_octo(fast_reg_wr(wr)->page_list_len);
- umr->mkey_mask = frwr_mkey_mask();
+ umr->mkey_mask = cpu_to_be64(MLX5_MKEY_MASK_FREE);
+ umr->flags = 1 << 7;
}

static __be64 get_umr_reg_mr_mask(void)
@@ -2020,24 +2010,10 @@ static void set_reg_mkey_seg(struct mlx5_mkey_seg *seg,
seg->log2_page_size = ilog2(mr->ibmr.page_size);
}

-static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr,
- int li, int *writ)
+static void set_linv_mkey_seg(struct mlx5_mkey_seg *seg)
{
memset(seg, 0, sizeof(*seg));
- if (li) {
- seg->status = MLX5_MKEY_STATUS_FREE;
- return;
- }
-
- seg->flags = get_umr_flags(fast_reg_wr(wr)->access_flags) |
- MLX5_ACCESS_MODE_MTT;
- *writ = seg->flags & (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE);
- seg->qpn_mkey7_0 = cpu_to_be32((fast_reg_wr(wr)->rkey & 0xff) | 0xffffff00);
- seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
- seg->start_addr = cpu_to_be64(fast_reg_wr(wr)->iova_start);
- seg->len = cpu_to_be64(fast_reg_wr(wr)->length);
- seg->xlt_oct_size = cpu_to_be32((fast_reg_wr(wr)->page_list_len + 1) / 2);
- seg->log2_page_size = fast_reg_wr(wr)->page_shift;
+ seg->status = MLX5_MKEY_STATUS_FREE;
}

static void set_reg_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr)
@@ -2072,24 +2048,6 @@ static void set_reg_data_seg(struct mlx5_wqe_data_seg *dseg,
dseg->lkey = cpu_to_be32(pd->ibpd.local_dma_lkey);
}

-static void set_frwr_pages(struct mlx5_wqe_data_seg *dseg,
- struct ib_send_wr *wr,
- struct mlx5_core_dev *mdev,
- struct mlx5_ib_pd *pd,
- int writ)
-{
- struct mlx5_ib_fast_reg_page_list *mfrpl = to_mfrpl(fast_reg_wr(wr)->page_list);
- u64 *page_list = fast_reg_wr(wr)->page_list->page_list;
- u64 perm = MLX5_EN_RD | (writ ? MLX5_EN_WR : 0);
- int i;
-
- for (i = 0; i < fast_reg_wr(wr)->page_list_len; i++)
- mfrpl->mapped_page_list[i] = cpu_to_be64(page_list[i] | perm);
- dseg->addr = cpu_to_be64(mfrpl->map);
- dseg->byte_count = cpu_to_be32(ALIGN(sizeof(u64) * fast_reg_wr(wr)->page_list_len, 64));
- dseg->lkey = cpu_to_be32(pd->ibpd.local_dma_lkey);
-}
-
static __be32 send_ieth(struct ib_send_wr *wr)
{
switch (wr->opcode) {
@@ -2509,36 +2467,18 @@ static int set_reg_wr(struct mlx5_ib_qp *qp,
return 0;
}

-static int set_frwr_li_wr(void **seg, struct ib_send_wr *wr, int *size,
- struct mlx5_core_dev *mdev, struct mlx5_ib_pd *pd, struct mlx5_ib_qp *qp)
+static void set_linv_wr(struct mlx5_ib_qp *qp, void **seg, int *size)
{
- int writ = 0;
- int li;
-
- li = wr->opcode == IB_WR_LOCAL_INV ? 1 : 0;
- if (unlikely(wr->send_flags & IB_SEND_INLINE))
- return -EINVAL;
-
- set_frwr_umr_segment(*seg, wr, li);
+ set_linv_umr_seg(*seg);
*seg += sizeof(struct mlx5_wqe_umr_ctrl_seg);
*size += sizeof(struct mlx5_wqe_umr_ctrl_seg) / 16;
if (unlikely((*seg == qp->sq.qend)))
*seg = mlx5_get_send_wqe(qp, 0);
- set_mkey_segment(*seg, wr, li, &writ);
+ set_linv_mkey_seg(*seg);
*seg += sizeof(struct mlx5_mkey_seg);
*size += sizeof(struct mlx5_mkey_seg) / 16;
if (unlikely((*seg == qp->sq.qend)))
*seg = mlx5_get_send_wqe(qp, 0);
- if (!li) {
- if (unlikely(fast_reg_wr(wr)->page_list_len >
- fast_reg_wr(wr)->page_list->max_page_list_len))
- return -ENOMEM;
-
- set_frwr_pages(*seg, wr, mdev, pd, writ);
- *seg += sizeof(struct mlx5_wqe_data_seg);
- *size += (sizeof(struct mlx5_wqe_data_seg) / 16);
- }
- return 0;
}

static void dump_wqe(struct mlx5_ib_qp *qp, int idx, int size_16)
@@ -2654,7 +2594,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
{
struct mlx5_wqe_ctrl_seg *ctrl = NULL; /* compiler warning */
struct mlx5_ib_dev *dev = to_mdev(ibqp->device);
- struct mlx5_core_dev *mdev = dev->mdev;
struct mlx5_ib_qp *qp = to_mqp(ibqp);
struct mlx5_ib_mr *mr;
struct mlx5_wqe_data_seg *dpseg;
@@ -2729,25 +2668,7 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL;
qp->sq.wr_data[idx] = IB_WR_LOCAL_INV;
ctrl->imm = cpu_to_be32(wr->ex.invalidate_rkey);
- err = set_frwr_li_wr(&seg, wr, &size, mdev, to_mpd(ibqp->pd), qp);
- if (err) {
- mlx5_ib_warn(dev, "\n");
- *bad_wr = wr;
- goto out;
- }
- num_sge = 0;
- break;
-
- case IB_WR_FAST_REG_MR:
- next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL;
- qp->sq.wr_data[idx] = IB_WR_FAST_REG_MR;
- ctrl->imm = cpu_to_be32(fast_reg_wr(wr)->rkey);
- err = set_frwr_li_wr(&seg, wr, &size, mdev, to_mpd(ibqp->pd), qp);
- if (err) {
- mlx5_ib_warn(dev, "\n");
- *bad_wr = wr;
- goto out;
- }
+ set_linv_wr(qp, &seg, &size);
num_sge = 0;
break;

--
1.8.4.3


2015-09-17 09:43:44

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 21/24] IB/qib: Remove old FRWR API

No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/qib/qib_keys.c | 56 -----------------------------------
drivers/infiniband/hw/qib/qib_mr.c | 32 +-------------------
drivers/infiniband/hw/qib/qib_verbs.c | 8 -----
drivers/infiniband/hw/qib/qib_verbs.h | 7 -----
4 files changed, 1 insertion(+), 102 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_keys.c b/drivers/infiniband/hw/qib/qib_keys.c
index a5057efc7faf..3f61bda77e0e 100644
--- a/drivers/infiniband/hw/qib/qib_keys.c
+++ b/drivers/infiniband/hw/qib/qib_keys.c
@@ -338,62 +338,6 @@ bail:
/*
* Initialize the memory region specified by the work reqeust.
*/
-int qib_fast_reg_mr(struct qib_qp *qp, struct ib_send_wr *send_wr)
-{
- struct ib_fast_reg_wr *wr = fast_reg_wr(send_wr);
- struct qib_lkey_table *rkt = &to_idev(qp->ibqp.device)->lk_table;
- struct qib_pd *pd = to_ipd(qp->ibqp.pd);
- struct qib_mregion *mr;
- u32 rkey = wr->rkey;
- unsigned i, n, m;
- int ret = -EINVAL;
- unsigned long flags;
- u64 *page_list;
- size_t ps;
-
- spin_lock_irqsave(&rkt->lock, flags);
- if (pd->user || rkey == 0)
- goto bail;
-
- mr = rcu_dereference_protected(
- rkt->table[(rkey >> (32 - ib_qib_lkey_table_size))],
- lockdep_is_held(&rkt->lock));
- if (unlikely(mr == NULL || qp->ibqp.pd != mr->pd))
- goto bail;
-
- if (wr->page_list_len > mr->max_segs)
- goto bail;
-
- ps = 1UL << wr->page_shift;
- if (wr->length > ps * wr->page_list_len)
- goto bail;
-
- mr->user_base = wr->iova_start;
- mr->iova = wr->iova_start;
- mr->lkey = rkey;
- mr->length = wr->length;
- mr->access_flags = wr->access_flags;
- page_list = wr->page_list->page_list;
- m = 0;
- n = 0;
- for (i = 0; i < wr->page_list_len; i++) {
- mr->map[m]->segs[n].vaddr = (void *) page_list[i];
- mr->map[m]->segs[n].length = ps;
- if (++n == QIB_SEGSZ) {
- m++;
- n = 0;
- }
- }
-
- ret = 0;
-bail:
- spin_unlock_irqrestore(&rkt->lock, flags);
- return ret;
-}
-
-/*
- * Initialize the memory region specified by the work reqeust.
- */
int qib_reg_mr(struct qib_qp *qp, struct ib_reg_wr *wr)
{
struct qib_lkey_table *rkt = &to_idev(qp->ibqp.device)->lk_table;
diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c
index 0fa4b0de8074..73f78c0f9522 100644
--- a/drivers/infiniband/hw/qib/qib_mr.c
+++ b/drivers/infiniband/hw/qib/qib_mr.c
@@ -324,7 +324,7 @@ out:

/*
* Allocate a memory region usable with the
- * IB_WR_FAST_REG_MR send work request.
+ * IB_WR_REG_MR send work request.
*
* Return the memory region on success, otherwise return an errno.
*/
@@ -375,36 +375,6 @@ int qib_map_mr_sg(struct ib_mr *ibmr,
return ib_sg_to_pages(ibmr, sg, sg_nents, qib_set_page);
}

-struct ib_fast_reg_page_list *
-qib_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len)
-{
- unsigned size = page_list_len * sizeof(u64);
- struct ib_fast_reg_page_list *pl;
-
- if (size > PAGE_SIZE)
- return ERR_PTR(-EINVAL);
-
- pl = kzalloc(sizeof(*pl), GFP_KERNEL);
- if (!pl)
- return ERR_PTR(-ENOMEM);
-
- pl->page_list = kzalloc(size, GFP_KERNEL);
- if (!pl->page_list)
- goto err_free;
-
- return pl;
-
-err_free:
- kfree(pl);
- return ERR_PTR(-ENOMEM);
-}
-
-void qib_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl)
-{
- kfree(pl->page_list);
- kfree(pl);
-}
-
/**
* qib_alloc_fmr - allocate a fast memory region
* @pd: the protection domain for this memory region
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index a1e53d7b662b..de6cb6fcda8d 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -365,9 +365,6 @@ static int qib_post_one_send(struct qib_qp *qp, struct ib_send_wr *wr,
if (wr->opcode == IB_WR_REG_MR) {
if (qib_reg_mr(qp, reg_wr(wr)))
goto bail_inval;
- } else if (wr->opcode == IB_WR_FAST_REG_MR) {
- if (qib_fast_reg_mr(qp, wr))
- goto bail_inval;
} else if (qp->ibqp.qp_type == IB_QPT_UC) {
if ((unsigned) wr->opcode >= IB_WR_RDMA_READ)
goto bail_inval;
@@ -407,9 +404,6 @@ static int qib_post_one_send(struct qib_qp *qp, struct ib_send_wr *wr,
else if (wr->opcode == IB_WR_REG_MR)
memcpy(&wqe->reg_wr, reg_wr(wr),
sizeof(wqe->reg_wr));
- else if (wr->opcode == IB_WR_FAST_REG_MR)
- memcpy(&wqe->fast_reg_wr, fast_reg_wr(wr),
- sizeof(wqe->fast_reg_wr));
else if (wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM ||
wr->opcode == IB_WR_RDMA_WRITE ||
wr->opcode == IB_WR_RDMA_READ)
@@ -2267,8 +2261,6 @@ int qib_register_ib_device(struct qib_devdata *dd)
ibdev->dereg_mr = qib_dereg_mr;
ibdev->alloc_mr = qib_alloc_mr;
ibdev->map_mr_sg = qib_map_mr_sg;
- ibdev->alloc_fast_reg_page_list = qib_alloc_fast_reg_page_list;
- ibdev->free_fast_reg_page_list = qib_free_fast_reg_page_list;
ibdev->alloc_fmr = qib_alloc_fmr;
ibdev->map_phys_fmr = qib_map_phys_fmr;
ibdev->unmap_fmr = qib_unmap_fmr;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index dbc81c5761e3..bb11fe56ac6e 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -344,7 +344,6 @@ struct qib_swqe {
struct ib_send_wr wr; /* don't use wr.sg_list */
struct ib_ud_wr ud_wr;
struct ib_reg_wr reg_wr;
- struct ib_fast_reg_wr fast_reg_wr;
struct ib_rdma_wr rdma_wr;
struct ib_atomic_wr atomic_wr;
};
@@ -1051,12 +1050,6 @@ int qib_map_mr_sg(struct ib_mr *ibmr,
struct scatterlist *sg,
unsigned int sg_nents);

-struct ib_fast_reg_page_list *qib_alloc_fast_reg_page_list(
- struct ib_device *ibdev, int page_list_len);
-
-void qib_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl);
-
-int qib_fast_reg_mr(struct qib_qp *qp, struct ib_send_wr *wr);
int qib_reg_mr(struct qib_qp *qp, struct ib_reg_wr *wr);

struct ib_fmr *qib_alloc_fmr(struct ib_pd *pd, int mr_access_flags,
--
1.8.4.3


2015-09-17 09:43:50

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 17/24] IB/mlx4: Remove old FRWR API support

No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/mlx4/cq.c | 3 +--
drivers/infiniband/hw/mlx4/main.c | 2 --
drivers/infiniband/hw/mlx4/mlx4_ib.h | 15 -----------
drivers/infiniband/hw/mlx4/mr.c | 48 ------------------------------------
drivers/infiniband/hw/mlx4/qp.c | 31 -----------------------
5 files changed, 1 insertion(+), 98 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index b62236e24708..84ff03618e31 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -818,8 +818,7 @@ repoll:
wc->opcode = IB_WC_LSO;
break;
case MLX4_OPCODE_FMR:
- wc->opcode = IB_WC_FAST_REG_MR;
- /* TODO: wc->opcode = IB_WC_REG_MR; */
+ wc->opcode = IB_WC_REG_MR;
break;
case MLX4_OPCODE_LOCAL_INVAL:
wc->opcode = IB_WC_LOCAL_INV;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index bb82f5fa1612..a25048bf9913 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2248,8 +2248,6 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
ibdev->ib_dev.dereg_mr = mlx4_ib_dereg_mr;
ibdev->ib_dev.alloc_mr = mlx4_ib_alloc_mr;
ibdev->ib_dev.map_mr_sg = mlx4_ib_map_mr_sg;
- ibdev->ib_dev.alloc_fast_reg_page_list = mlx4_ib_alloc_fast_reg_page_list;
- ibdev->ib_dev.free_fast_reg_page_list = mlx4_ib_free_fast_reg_page_list;
ibdev->ib_dev.attach_mcast = mlx4_ib_mcg_attach;
ibdev->ib_dev.detach_mcast = mlx4_ib_mcg_detach;
ibdev->ib_dev.process_mad = mlx4_ib_process_mad;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 07fcf3a49256..de6eab38b024 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -144,12 +144,6 @@ struct mlx4_ib_mw {
struct mlx4_mw mmw;
};

-struct mlx4_ib_fast_reg_page_list {
- struct ib_fast_reg_page_list ibfrpl;
- __be64 *mapped_page_list;
- dma_addr_t map;
-};
-
struct mlx4_ib_fmr {
struct ib_fmr ibfmr;
struct mlx4_fmr mfmr;
@@ -642,11 +636,6 @@ static inline struct mlx4_ib_mw *to_mmw(struct ib_mw *ibmw)
return container_of(ibmw, struct mlx4_ib_mw, ibmw);
}

-static inline struct mlx4_ib_fast_reg_page_list *to_mfrpl(struct ib_fast_reg_page_list *ibfrpl)
-{
- return container_of(ibfrpl, struct mlx4_ib_fast_reg_page_list, ibfrpl);
-}
-
static inline struct mlx4_ib_fmr *to_mfmr(struct ib_fmr *ibfmr)
{
return container_of(ibfmr, struct mlx4_ib_fmr, ibfmr);
@@ -713,10 +702,6 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
struct scatterlist *sg,
unsigned int sg_nents);
-struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
- int page_list_len);
-void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
-
int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 6ed745798ad3..dc255dc4548d 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -425,54 +425,6 @@ err_free:
return ERR_PTR(err);
}

-struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
- int page_list_len)
-{
- struct mlx4_ib_dev *dev = to_mdev(ibdev);
- struct mlx4_ib_fast_reg_page_list *mfrpl;
- int size = page_list_len * sizeof (u64);
-
- if (page_list_len > MLX4_MAX_FAST_REG_PAGES)
- return ERR_PTR(-EINVAL);
-
- mfrpl = kmalloc(sizeof *mfrpl, GFP_KERNEL);
- if (!mfrpl)
- return ERR_PTR(-ENOMEM);
-
- mfrpl->ibfrpl.page_list = kmalloc(size, GFP_KERNEL);
- if (!mfrpl->ibfrpl.page_list)
- goto err_free;
-
- mfrpl->mapped_page_list = dma_alloc_coherent(&dev->dev->persist->
- pdev->dev,
- size, &mfrpl->map,
- GFP_KERNEL);
- if (!mfrpl->mapped_page_list)
- goto err_free;
-
- WARN_ON(mfrpl->map & 0x3f);
-
- return &mfrpl->ibfrpl;
-
-err_free:
- kfree(mfrpl->ibfrpl.page_list);
- kfree(mfrpl);
- return ERR_PTR(-ENOMEM);
-}
-
-void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list)
-{
- struct mlx4_ib_dev *dev = to_mdev(page_list->device);
- struct mlx4_ib_fast_reg_page_list *mfrpl = to_mfrpl(page_list);
- int size = page_list->max_page_list_len * sizeof (u64);
-
- dma_free_coherent(&dev->dev->persist->pdev->dev, size,
- mfrpl->mapped_page_list,
- mfrpl->map);
- kfree(mfrpl->ibfrpl.page_list);
- kfree(mfrpl);
-}
-
struct ib_fmr *mlx4_ib_fmr_alloc(struct ib_pd *pd, int acc,
struct ib_fmr_attr *fmr_attr)
{
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 75097151fc16..5124e8b99678 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -111,7 +111,6 @@ static const __be32 mlx4_ib_opcode[] = {
[IB_WR_ATOMIC_FETCH_AND_ADD] = cpu_to_be32(MLX4_OPCODE_ATOMIC_FA),
[IB_WR_SEND_WITH_INV] = cpu_to_be32(MLX4_OPCODE_SEND_INVAL),
[IB_WR_LOCAL_INV] = cpu_to_be32(MLX4_OPCODE_LOCAL_INVAL),
- [IB_WR_FAST_REG_MR] = cpu_to_be32(MLX4_OPCODE_FMR),
[IB_WR_REG_MR] = cpu_to_be32(MLX4_OPCODE_FMR),
[IB_WR_MASKED_ATOMIC_CMP_AND_SWP] = cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_CS),
[IB_WR_MASKED_ATOMIC_FETCH_AND_ADD] = cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_FA),
@@ -2422,28 +2421,6 @@ static void set_reg_seg(struct mlx4_wqe_fmr_seg *fseg,
fseg->reserved[1] = 0;
}

-static void set_fmr_seg(struct mlx4_wqe_fmr_seg *fseg,
- struct ib_fast_reg_wr *wr)
-{
- struct mlx4_ib_fast_reg_page_list *mfrpl = to_mfrpl(wr->page_list);
- int i;
-
- for (i = 0; i < wr->page_list_len; ++i)
- mfrpl->mapped_page_list[i] =
- cpu_to_be64(wr->page_list->page_list[i] |
- MLX4_MTT_FLAG_PRESENT);
-
- fseg->flags = convert_access(wr->access_flags);
- fseg->mem_key = cpu_to_be32(wr->rkey);
- fseg->buf_list = cpu_to_be64(mfrpl->map);
- fseg->start_addr = cpu_to_be64(wr->iova_start);
- fseg->reg_len = cpu_to_be64(wr->length);
- fseg->offset = 0; /* XXX -- is this just for ZBVA? */
- fseg->page_size = cpu_to_be32(wr->page_shift);
- fseg->reserved[0] = 0;
- fseg->reserved[1] = 0;
-}
-
static void set_bind_seg(struct mlx4_wqe_bind_seg *bseg,
struct ib_bind_mw_wr *wr)
{
@@ -2775,14 +2752,6 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
size += sizeof (struct mlx4_wqe_local_inval_seg) / 16;
break;

- case IB_WR_FAST_REG_MR:
- ctrl->srcrb_flags |=
- cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER);
- set_fmr_seg(wqe, fast_reg_wr(wr));
- wqe += sizeof (struct mlx4_wqe_fmr_seg);
- size += sizeof (struct mlx4_wqe_fmr_seg) / 16;
- break;
-
case IB_WR_REG_MR:
ctrl->srcrb_flags |=
cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER);
--
1.8.4.3


2015-09-17 09:43:56

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 23/24] IB/hfi1: Remove Old fast registraion API support

It wasn't supported before as well (post_send
returned -EINVAL for IB_WR_FAST_REG_MR). Perhaps
the new API adoption will be joint for all IB SW
implementations.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/staging/hfi1/keys.c | 55 --------------------------------------------
drivers/staging/hfi1/mr.c | 32 +-------------------------
drivers/staging/hfi1/verbs.c | 9 +-------
drivers/staging/hfi1/verbs.h | 8 -------
4 files changed, 2 insertions(+), 102 deletions(-)

diff --git a/drivers/staging/hfi1/keys.c b/drivers/staging/hfi1/keys.c
index 82c21b1c0263..cb4e6087dfdb 100644
--- a/drivers/staging/hfi1/keys.c
+++ b/drivers/staging/hfi1/keys.c
@@ -354,58 +354,3 @@ bail:
rcu_read_unlock();
return 0;
}
-
-/*
- * Initialize the memory region specified by the work request.
- */
-int hfi1_fast_reg_mr(struct hfi1_qp *qp, struct ib_fast_reg_wr *wr)
-{
- struct hfi1_lkey_table *rkt = &to_idev(qp->ibqp.device)->lk_table;
- struct hfi1_pd *pd = to_ipd(qp->ibqp.pd);
- struct hfi1_mregion *mr;
- u32 rkey = wr->rkey;
- unsigned i, n, m;
- int ret = -EINVAL;
- unsigned long flags;
- u64 *page_list;
- size_t ps;
-
- spin_lock_irqsave(&rkt->lock, flags);
- if (pd->user || rkey == 0)
- goto bail;
-
- mr = rcu_dereference_protected(
- rkt->table[(rkey >> (32 - hfi1_lkey_table_size))],
- lockdep_is_held(&rkt->lock));
- if (unlikely(mr == NULL || qp->ibqp.pd != mr->pd))
- goto bail;
-
- if (wr->page_list_len > mr->max_segs)
- goto bail;
-
- ps = 1UL << wr->page_shift;
- if (wr->length > ps * wr->page_list_len)
- goto bail;
-
- mr->user_base = wr->iova_start;
- mr->iova = wr->iova_start;
- mr->lkey = rkey;
- mr->length = wr->length;
- mr->access_flags = wr->access_flags;
- page_list = wr->page_list->page_list;
- m = 0;
- n = 0;
- for (i = 0; i < wr->page_list_len; i++) {
- mr->map[m]->segs[n].vaddr = (void *) page_list[i];
- mr->map[m]->segs[n].length = ps;
- if (++n == HFI1_SEGSZ) {
- m++;
- n = 0;
- }
- }
-
- ret = 0;
-bail:
- spin_unlock_irqrestore(&rkt->lock, flags);
- return ret;
-}
diff --git a/drivers/staging/hfi1/mr.c b/drivers/staging/hfi1/mr.c
index bd64e4f986f9..3f5623add3df 100644
--- a/drivers/staging/hfi1/mr.c
+++ b/drivers/staging/hfi1/mr.c
@@ -344,7 +344,7 @@ out:

/*
* Allocate a memory region usable with the
- * IB_WR_FAST_REG_MR send work request.
+ * IB_WR_REG_MR send work request.
*
* Return the memory region on success, otherwise return an errno.
*/
@@ -364,36 +364,6 @@ struct ib_mr *hfi1_alloc_mr(struct ib_pd *pd,
return &mr->ibmr;
}

-struct ib_fast_reg_page_list *
-hfi1_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len)
-{
- unsigned size = page_list_len * sizeof(u64);
- struct ib_fast_reg_page_list *pl;
-
- if (size > PAGE_SIZE)
- return ERR_PTR(-EINVAL);
-
- pl = kzalloc(sizeof(*pl), GFP_KERNEL);
- if (!pl)
- return ERR_PTR(-ENOMEM);
-
- pl->page_list = kzalloc(size, GFP_KERNEL);
- if (!pl->page_list)
- goto err_free;
-
- return pl;
-
-err_free:
- kfree(pl);
- return ERR_PTR(-ENOMEM);
-}
-
-void hfi1_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl)
-{
- kfree(pl->page_list);
- kfree(pl);
-}
-
/**
* hfi1_alloc_fmr - allocate a fast memory region
* @pd: the protection domain for this memory region
diff --git a/drivers/staging/hfi1/verbs.c b/drivers/staging/hfi1/verbs.c
index 542ad803bfce..a6c9bf88a4c4 100644
--- a/drivers/staging/hfi1/verbs.c
+++ b/drivers/staging/hfi1/verbs.c
@@ -380,9 +380,7 @@ static int post_one_send(struct hfi1_qp *qp, struct ib_send_wr *wr)
* undefined operations.
* Make sure buffer is large enough to hold the result for atomics.
*/
- if (wr->opcode == IB_WR_FAST_REG_MR) {
- return -EINVAL;
- } else if (qp->ibqp.qp_type == IB_QPT_UC) {
+ if (qp->ibqp.qp_type == IB_QPT_UC) {
if ((unsigned) wr->opcode >= IB_WR_RDMA_READ)
return -EINVAL;
} else if (qp->ibqp.qp_type != IB_QPT_RC) {
@@ -417,9 +415,6 @@ static int post_one_send(struct hfi1_qp *qp, struct ib_send_wr *wr)
if (qp->ibqp.qp_type != IB_QPT_UC &&
qp->ibqp.qp_type != IB_QPT_RC)
memcpy(&wqe->ud_wr, ud_wr(wr), sizeof(wqe->ud_wr));
- else if (wr->opcode == IB_WR_FAST_REG_MR)
- memcpy(&wqe->fast_reg_wr, fast_reg_wr(wr),
- sizeof(wqe->fast_reg_wr));
else if (wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM ||
wr->opcode == IB_WR_RDMA_WRITE ||
wr->opcode == IB_WR_RDMA_READ)
@@ -2063,8 +2058,6 @@ int hfi1_register_ib_device(struct hfi1_devdata *dd)
ibdev->reg_user_mr = hfi1_reg_user_mr;
ibdev->dereg_mr = hfi1_dereg_mr;
ibdev->alloc_mr = hfi1_alloc_mr;
- ibdev->alloc_fast_reg_page_list = hfi1_alloc_fast_reg_page_list;
- ibdev->free_fast_reg_page_list = hfi1_free_fast_reg_page_list;
ibdev->alloc_fmr = hfi1_alloc_fmr;
ibdev->map_phys_fmr = hfi1_map_phys_fmr;
ibdev->unmap_fmr = hfi1_unmap_fmr;
diff --git a/drivers/staging/hfi1/verbs.h b/drivers/staging/hfi1/verbs.h
index cf5a3c956284..159ec08bfcd8 100644
--- a/drivers/staging/hfi1/verbs.h
+++ b/drivers/staging/hfi1/verbs.h
@@ -353,7 +353,6 @@ struct hfi1_swqe {
struct ib_rdma_wr rdma_wr;
struct ib_atomic_wr atomic_wr;
struct ib_ud_wr ud_wr;
- struct ib_fast_reg_wr fast_reg_wr;
};
u32 psn; /* first packet sequence number */
u32 lpsn; /* last packet sequence number */
@@ -1026,13 +1025,6 @@ struct ib_mr *hfi1_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_entries);

-struct ib_fast_reg_page_list *hfi1_alloc_fast_reg_page_list(
- struct ib_device *ibdev, int page_list_len);
-
-void hfi1_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl);
-
-int hfi1_fast_reg_mr(struct hfi1_qp *qp, struct ib_fast_reg_wr *wr);
-
struct ib_fmr *hfi1_alloc_fmr(struct ib_pd *pd, int mr_access_flags,
struct ib_fmr_attr *fmr_attr);

--
1.8.4.3


2015-09-17 09:43:59

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 24/24] IB/core: Remove old fast registration API

No callers and no providers left, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/core/verbs.c | 25 --------------------
include/rdma/ib_verbs.h | 52 -----------------------------------------
2 files changed, 77 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index d99f57f1f737..bbbfd597f060 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1253,31 +1253,6 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
}
EXPORT_SYMBOL(ib_alloc_mr);

-struct ib_fast_reg_page_list *ib_alloc_fast_reg_page_list(struct ib_device *device,
- int max_page_list_len)
-{
- struct ib_fast_reg_page_list *page_list;
-
- if (!device->alloc_fast_reg_page_list)
- return ERR_PTR(-ENOSYS);
-
- page_list = device->alloc_fast_reg_page_list(device, max_page_list_len);
-
- if (!IS_ERR(page_list)) {
- page_list->device = device;
- page_list->max_page_list_len = max_page_list_len;
- }
-
- return page_list;
-}
-EXPORT_SYMBOL(ib_alloc_fast_reg_page_list);
-
-void ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list)
-{
- page_list->device->free_fast_reg_page_list(page_list);
-}
-EXPORT_SYMBOL(ib_free_fast_reg_page_list);
-
/* Memory windows */

struct ib_mw *ib_alloc_mw(struct ib_pd *pd, enum ib_mw_type type)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 97c73359ade8..ed3f181407ff 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1068,12 +1068,6 @@ struct ib_sge {
u32 lkey;
};

-struct ib_fast_reg_page_list {
- struct ib_device *device;
- u64 *page_list;
- unsigned int max_page_list_len;
-};
-
/**
* struct ib_mw_bind_info - Parameters for a memory window bind operation.
* @mr: A memory region to bind the memory window to.
@@ -1147,22 +1141,6 @@ static inline struct ib_ud_wr *ud_wr(struct ib_send_wr *wr)
return container_of(wr, struct ib_ud_wr, wr);
}

-struct ib_fast_reg_wr {
- struct ib_send_wr wr;
- u64 iova_start;
- struct ib_fast_reg_page_list *page_list;
- unsigned int page_shift;
- unsigned int page_list_len;
- u32 length;
- int access_flags;
- u32 rkey;
-};
-
-static inline struct ib_fast_reg_wr *fast_reg_wr(struct ib_send_wr *wr)
-{
- return container_of(wr, struct ib_fast_reg_wr, wr);
-}
-
struct ib_reg_wr {
struct ib_send_wr wr;
struct ib_mr *mr;
@@ -1777,9 +1755,6 @@ struct ib_device {
int (*map_mr_sg)(struct ib_mr *mr,
struct scatterlist *sg,
unsigned int sg_nents);
- struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device,
- int page_list_len);
- void (*free_fast_reg_page_list)(struct ib_fast_reg_page_list *page_list);
int (*rereg_phys_mr)(struct ib_mr *mr,
int mr_rereg_mask,
struct ib_pd *pd,
@@ -2888,33 +2863,6 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
u32 max_num_sg);

/**
- * ib_alloc_fast_reg_page_list - Allocates a page list array
- * @device - ib device pointer.
- * @page_list_len - size of the page list array to be allocated.
- *
- * This allocates and returns a struct ib_fast_reg_page_list * and a
- * page_list array that is at least page_list_len in size. The actual
- * size is returned in max_page_list_len. The caller is responsible
- * for initializing the contents of the page_list array before posting
- * a send work request with the IB_WC_FAST_REG_MR opcode.
- *
- * The page_list array entries must be translated using one of the
- * ib_dma_*() functions just like the addresses passed to
- * ib_map_phys_fmr(). Once the ib_post_send() is issued, the struct
- * ib_fast_reg_page_list must not be modified by the caller until the
- * IB_WC_FAST_REG_MR work request completes.
- */
-struct ib_fast_reg_page_list *ib_alloc_fast_reg_page_list(
- struct ib_device *device, int page_list_len);
-
-/**
- * ib_free_fast_reg_page_list - Deallocates a previously allocated
- * page list array.
- * @page_list - struct ib_fast_reg_page_list pointer to be deallocated.
- */
-void ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
-
-/**
* ib_update_fast_reg_key - updates the key portion of the fast_reg MR
* R_Key and L_Key.
* @mr - struct ib_mr pointer to be updated.
--
1.8.4.3


2015-09-17 09:43:55

by Sagi Grimberg

[permalink] [raw]
Subject: [PATCH v1 18/24] RDMA/ocrdma: Remove old FRWR API

No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <[email protected]>
---
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 2 -
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 102 ----------------------------
drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 4 --
3 files changed, 108 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 874beb4b07a1..9bf430ef8eb6 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -183,8 +183,6 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)

dev->ibdev.alloc_mr = ocrdma_alloc_mr;
dev->ibdev.map_mr_sg = ocrdma_map_mr_sg;
- dev->ibdev.alloc_fast_reg_page_list = ocrdma_alloc_frmr_page_list;
- dev->ibdev.free_fast_reg_page_list = ocrdma_free_frmr_page_list;

/* mandatory to support user space verbs consumer. */
dev->ibdev.alloc_ucontext = ocrdma_alloc_ucontext;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 853746e17d5c..2deaa2ac4a1c 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -2133,41 +2133,6 @@ static void ocrdma_build_read(struct ocrdma_qp *qp, struct ocrdma_hdr_wqe *hdr,
ext_rw->len = hdr->total_len;
}

-static void build_frmr_pbes(struct ib_fast_reg_wr *wr,
- struct ocrdma_pbl *pbl_tbl,
- struct ocrdma_hw_mr *hwmr)
-{
- int i;
- u64 buf_addr = 0;
- int num_pbes;
- struct ocrdma_pbe *pbe;
-
- pbe = (struct ocrdma_pbe *)pbl_tbl->va;
- num_pbes = 0;
-
- /* go through the OS phy regions & fill hw pbe entries into pbls. */
- for (i = 0; i < wr->page_list_len; i++) {
- /* number of pbes can be more for one OS buf, when
- * buffers are of different sizes.
- * split the ib_buf to one or more pbes.
- */
- buf_addr = wr->page_list->page_list[i];
- pbe->pa_lo = cpu_to_le32((u32) (buf_addr & PAGE_MASK));
- pbe->pa_hi = cpu_to_le32((u32) upper_32_bits(buf_addr));
- num_pbes += 1;
- pbe++;
-
- /* if the pbl is full storing the pbes,
- * move to next pbl.
- */
- if (num_pbes == (hwmr->pbl_size/sizeof(u64))) {
- pbl_tbl++;
- pbe = (struct ocrdma_pbe *)pbl_tbl->va;
- }
- }
- return;
-}
-
static int get_encoded_page_size(int pg_sz)
{
/* Max size is 256M 4096 << 16 */
@@ -2233,50 +2198,6 @@ static int ocrdma_build_reg(struct ocrdma_qp *qp,
return 0;
}

-static int ocrdma_build_fr(struct ocrdma_qp *qp, struct ocrdma_hdr_wqe *hdr,
- struct ib_send_wr *send_wr)
-{
- u64 fbo;
- struct ib_fast_reg_wr *wr = fast_reg_wr(send_wr);
- struct ocrdma_ewqe_fr *fast_reg = (struct ocrdma_ewqe_fr *)(hdr + 1);
- struct ocrdma_mr *mr;
- struct ocrdma_dev *dev = get_ocrdma_dev(qp->ibqp.device);
- u32 wqe_size = sizeof(*fast_reg) + sizeof(*hdr);
-
- wqe_size = roundup(wqe_size, OCRDMA_WQE_ALIGN_BYTES);
-
- if (wr->page_list_len > dev->attr.max_pages_per_frmr)
- return -EINVAL;
-
- hdr->cw |= (OCRDMA_FR_MR << OCRDMA_WQE_OPCODE_SHIFT);
- hdr->cw |= ((wqe_size / OCRDMA_WQE_STRIDE) << OCRDMA_WQE_SIZE_SHIFT);
-
- if (wr->page_list_len == 0)
- BUG();
- if (wr->access_flags & IB_ACCESS_LOCAL_WRITE)
- hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_LOCAL_WR;
- if (wr->access_flags & IB_ACCESS_REMOTE_WRITE)
- hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_REMOTE_WR;
- if (wr->access_flags & IB_ACCESS_REMOTE_READ)
- hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_REMOTE_RD;
- hdr->lkey = wr->rkey;
- hdr->total_len = wr->length;
-
- fbo = wr->iova_start - (wr->page_list->page_list[0] & PAGE_MASK);
-
- fast_reg->va_hi = upper_32_bits(wr->iova_start);
- fast_reg->va_lo = (u32) (wr->iova_start & 0xffffffff);
- fast_reg->fbo_hi = upper_32_bits(fbo);
- fast_reg->fbo_lo = (u32) fbo & 0xffffffff;
- fast_reg->num_sges = wr->page_list_len;
- fast_reg->size_sge =
- get_encoded_page_size(1 << wr->page_shift);
- mr = (struct ocrdma_mr *) (unsigned long)
- dev->stag_arr[(hdr->lkey >> 8) & (OCRDMA_MAX_STAG - 1)];
- build_frmr_pbes(wr, mr->hwmr.pbl_table, &mr->hwmr);
- return 0;
-}
-
static void ocrdma_ring_sq_db(struct ocrdma_qp *qp)
{
u32 val = qp->sq.dbid | (1 << OCRDMA_DB_SQ_SHIFT);
@@ -2356,9 +2277,6 @@ int ocrdma_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
OCRDMA_WQE_STRIDE) << OCRDMA_WQE_SIZE_SHIFT;
hdr->lkey = wr->ex.invalidate_rkey;
break;
- case IB_WR_FAST_REG_MR:
- status = ocrdma_build_fr(qp, hdr, wr);
- break;
case IB_WR_REG_MR:
status = ocrdma_build_reg(qp, hdr, reg_wr(wr));
break;
@@ -3152,26 +3070,6 @@ pl_err:
return ERR_PTR(-ENOMEM);
}

-struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device
- *ibdev,
- int page_list_len)
-{
- struct ib_fast_reg_page_list *frmr_list;
- int size;
-
- size = sizeof(*frmr_list) + (page_list_len * sizeof(u64));
- frmr_list = kzalloc(size, GFP_KERNEL);
- if (!frmr_list)
- return ERR_PTR(-ENOMEM);
- frmr_list->page_list = (u64 *)(frmr_list + 1);
- return frmr_list;
-}
-
-void ocrdma_free_frmr_page_list(struct ib_fast_reg_page_list *page_list)
-{
- kfree(page_list);
-}
-
#define MAX_KERNEL_PBE_SIZE 65536
static inline int count_kernel_pbes(struct ib_phys_buf *buf_list,
int buf_cnt, u32 *pbe_size)
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index 4edf63f9c6c7..62d6608830cb 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -128,9 +128,5 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd,
int ocrdma_map_mr_sg(struct ib_mr *ibmr,
struct scatterlist *sg,
unsigned int sg_nents);
-struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device
- *ibdev,
- int page_list_len);
-void ocrdma_free_frmr_page_list(struct ib_fast_reg_page_list *page_list);

#endif /* __OCRDMA_VERBS_H__ */
--
1.8.4.3


2015-09-19 22:45:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

Hi Sagi,

I've converted the driver I'm developing to your API and it works
great. I think this is an important step towards making the RDMA
more usable!

2015-09-19 23:21:09

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 9/17/15 5:42 AM, Sagi Grimberg wrote:
> Hi all,
>
> As discussed on the linux-rdma list, there is plenty of room for
> improvement in our memory registration APIs. We keep finding
> ULPs that are duplicating code, sometimes use wrong strategies
> and mis-use our current API.
>
> As a first step, this patch set replaces the fast registration API
> to accept a kernel common struct scatterlist and takes care of
> the page vector construction in the core layer with hooks for the
> drivers HW specific assignments. This allows to remove a common
> code duplication as it was done in each and every ULP driver.
>
> The changes from v0 (WIP) are:
> - Rebased on top of 4.3-rc1 + Christoph's ib_send_wr conversion patches
>
> - Allow the ULP to pass page_size argument to ib_map_mr_sg in order
> to have it work better in some specific workloads. This suggestion
> came from Bart Van Assache which pointed out that some applications
> might use page sizes significantly smaller than the system PAGE_SIZE
> of specific architectures
>
> - Fixed some logical bugs in ib_sg_to_pages
>
> - Added a set_page function pointer for drivers to pass to ib_sg_to_pages
> so some drivers (e.g mlx4, mlx5, nes) can avoid keeping a second page
> vector and/or re-iterate on the page vector in order to perform HW specific
> assignments (big/little endian conversion, extra flags)
>
> - Converted SRP initiator and RDS iwarp ULPs to the new API
>
> - Removed fast registration code from hfi1 driver (as it isn't supported
> anyway). I assume that the correct place to get the support back would
> be in a shared SW library (hfi1, qib, rxe).
>
> - Updated the change logs
>
> So far my tests covered:
> - ULPs:
> * iser initiator
> * iser target
> * xprtrdma
> * svcrdma
> - Drivers:
> * mlx4
> * mlx5
> * Steve Wise was kind enough to run NFS client/server over cxgb4 and I
> have yet to receive any negative feedback from him.
>
> I don't have access to other HW devices (qib, nes) nor iwarp devices so RDS is
> compile tested only.
>
Nice to see this consolidaton happening. I too don't have access to
iWARP hardware for RDS test but will use this series and convert our WIP
IB fastreg code and see how it goes.

Regards,
Santosh



2015-09-20 09:36:16

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

Hi Santosh,

> Nice to see this consolidaton happening. I too don't have access to
> iWARP hardware for RDS test but will use this series and convert our WIP
> IB fastreg code and see how it goes.

I'm very pleased to hear about this WIP. Please feel free to share
anything you have (code and questions/dilemmas) with the list. Also, if
you have more suggestions on how we can do better from your PoV we'd
love to hear about it.

Cheers,
Sagi.

2015-09-21 23:28:21

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

Hi Sagi,

On 9/20/15 2:36 AM, Sagi Grimberg wrote:
> Hi Santosh,
>
>> Nice to see this consolidaton happening. I too don't have access to
>> iWARP hardware for RDS test but will use this series and convert our WIP
>> IB fastreg code and see how it goes.
>
> I'm very pleased to hear about this WIP. Please feel free to share
> anything you have (code and questions/dilemmas) with the list. Also, if
> you have more suggestions on how we can do better from your PoV we'd
> love to hear about it.
>
So as promised, I tried to test your series. Your github branch [1]
'reg_api.3' though mostly has 4.3-rc1 contents, it isn't
based of 4.3-rc1 so I just cherry picked the patches and created
'rdma/sagi/reg_api.3_cherrypick' [2]. I had conflict with iser
patch so I just dropped that one.

As mentioned earlier, I have a WIP RDS fastreg branch [3]
which is functional (at least I can RDMA messages across
nodes ;-)). So merging [2] and [3], I created [4] and applied
a delta change based on your other patches. I saw ib_post_send
failure with my HCA driver returning '-EINVAL'. I didn't
debug it further but at least opcode and num_sge were set
correctly so I shouldn't have seen it. So I did memset()
on reg_wr which seems to have helped to fix the ib_post_send()
failure.

But I got into remote access errors which tells me that I
have messed up setup(rkey, sge setup or access flags)
or missing some other patch(s) in my test tree[4]. Delta
patch is top commit on [4].

Please let me know if you spot something which I missed.

Regards,
Santosh

[1] https://github.com/sagigrimberg/linux/tree/reg_api.3
[2]
https://git.kernel.org/cgit/linux/kernel/git/ssantosh/linux.git/log/?h=rdma/sagi/reg_api.3_cherrypick
[3]
https://git.kernel.org/cgit/linux/kernel/git/ssantosh/linux.git/log/?h=net/rds/4.3-fr-wip
[4]https://git.kernel.org/cgit/linux/kernel/git/ssantosh/linux.git/commit/?h=test/reg_api.3/rds

2015-09-22 07:32:36

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

>
> As mentioned earlier, I have a WIP RDS fastreg branch [3]
> which is functional (at least I can RDMA messages across
> nodes ;-)).

Nice!

> So merging [2] and [3], I created [4] and applied
> a delta change based on your other patches. I saw ib_post_send
> failure with my HCA driver returning '-EINVAL'. I didn't
> debug it further but at least opcode and num_sge were set
> correctly so I shouldn't have seen it. So I did memset()
> on reg_wr which seems to have helped to fix the ib_post_send()
> failure.

Yep - that was my fault. When converting the ULPs I optimized by
removing the memset but I forgot to set reg_wr.wr.next = NULL when
the ULP needed. This caused the driver to read a second bogus work
request. Steve just reported this as well so I'll fix that in v2.

>
> But I got into remote access errors which tells me that I
> have messed up setup(rkey, sge setup or access flags)

One thing that pops is that in the old API the MR was registered
with iova_start = 0 (which is probably what was sent to the peer),
but the new API the iova is implicitly sg_dma_address(&sg[0]).

The registered MR holds these attributes in:
mr->rkey
mr->iova
mr->length

These should be passed to a peer to perform rdma.

Hope this helps,
Sagi.

2015-09-22 07:55:37

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 9/22/2015 10:19 AM, Sagi Grimberg wrote:
>>
>> As mentioned earlier, I have a WIP RDS fastreg branch [3]
>> which is functional (at least I can RDMA messages across
>> nodes ;-)).
>
> Nice!
>
>> So merging [2] and [3], I created [4] and applied
>> a delta change based on your other patches. I saw ib_post_send
>> failure with my HCA driver returning '-EINVAL'. I didn't
>> debug it further but at least opcode and num_sge were set
>> correctly so I shouldn't have seen it. So I did memset()
>> on reg_wr which seems to have helped to fix the ib_post_send()
>> failure.
>
> Yep - that was my fault. When converting the ULPs I optimized by
> removing the memset but I forgot to set reg_wr.wr.next = NULL when
> the ULP needed. This caused the driver to read a second bogus work
> request. Steve just reported this as well so I'll fix that in v2.
>
>>
>> But I got into remote access errors which tells me that I
>> have messed up setup(rkey, sge setup or access flags)
>
> One thing that pops is that in the old API the MR was registered
> with iova_start = 0 (which is probably what was sent to the peer),
> but the new API the iova is implicitly sg_dma_address(&sg[0]).
>
> The registered MR holds these attributes in:
> mr->rkey
> mr->iova
> mr->length
>
> These should be passed to a peer to perform rdma.
>
> Hope this helps,
> Sagi.

ohh, I just read the RDS 3.1 specification (for the first time..) and I
noticed that RDS 3.1 header extension contains only a 32bit offset
parameter. Why is that anyway? why not 64bit so it can be a valid mapped
address? Also the code doesn't use it at all and always passes 0 (which
is buggy if sg[0] has an offset from a page).

This won't work with the proposed API as the iova is 64bit (as all other
existing RDMA protocols use 64bit addresses).

In any event, I'd much rather to add ib_map_mr_sg_zbva() just for RDS
to use instead of polluting the API with an iova argument, but I think
that the RDS spec can be updated to use 64bit offsets and align to all
other RDMA protocols (it has enough space in h_exthdr which is 128bit).

I was thinking of:
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e7e0251..61fcab4 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3033,6 +3033,21 @@ int ib_map_mr_sg(struct ib_mr *mr,
unsigned int sg_nents,
unsigned int page_size);

+static inline int
+ib_map_mr_sg_zbva(struct ib_mr *mr,
+ struct scatterlist *sg,
+ unsigned int sg_nents,
+ unsigned int page_size)
+{
+ int rc;
+
+ rc = ib_map_mr_sg(mr, sg, sg_nents, page_size);
+ if (likely(!rc))
+ mr->iova &= ((u64)page_size - 1);
+
+ return rc;
+}
+
int ib_sg_to_pages(struct ib_mr *mr,
struct scatterlist *sgl,
unsigned int sg_nents,
--

Thoughts?

Santosh, can you use that one instead and let us know if
it resolves your issue?

I think you should make sure to correctly construct the
h_exthdr with: rds_rdma_make_cookie(mr->rkey, (32)mr->iova)

2015-09-22 18:23:42

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 9/22/2015 12:56 AM, Sagi Grimberg wrote:
> On 9/22/2015 10:19 AM, Sagi Grimberg wrote:
>>>
>>> As mentioned earlier, I have a WIP RDS fastreg branch [3]
>>> which is functional (at least I can RDMA messages across
>>> nodes ;-)).
>>
>> Nice!
>>
>>> So merging [2] and [3], I created [4] and applied
>>> a delta change based on your other patches. I saw ib_post_send
>>> failure with my HCA driver returning '-EINVAL'. I didn't
>>> debug it further but at least opcode and num_sge were set
>>> correctly so I shouldn't have seen it. So I did memset()
>>> on reg_wr which seems to have helped to fix the ib_post_send()
>>> failure.
>>
>> Yep - that was my fault. When converting the ULPs I optimized by
>> removing the memset but I forgot to set reg_wr.wr.next = NULL when
>> the ULP needed. This caused the driver to read a second bogus work
>> request. Steve just reported this as well so I'll fix that in v2.
>>
Ahh, right. There can be chain of wr.

>>>
>>> But I got into remote access errors which tells me that I
>>> have messed up setup(rkey, sge setup or access flags)
>>
>> One thing that pops is that in the old API the MR was registered
>> with iova_start = 0 (which is probably what was sent to the peer),
>> but the new API the iova is implicitly sg_dma_address(&sg[0]).
>>
>> The registered MR holds these attributes in:
>> mr->rkey
>> mr->iova
>> mr->length
>>
>> These should be passed to a peer to perform rdma.
>>
right.

> ohh, I just read the RDS 3.1 specification (for the first time..) and I
> noticed that RDS 3.1 header extension contains only a 32bit offset
> parameter. Why is that anyway? why not 64bit so it can be a valid mapped
> address? Also the code doesn't use it at all and always passes 0 (which
> is buggy if sg[0] has an offset from a page).
>
> This won't work with the proposed API as the iova is 64bit (as all other
> existing RDMA protocols use 64bit addresses).
>
> In any event, I'd much rather to add ib_map_mr_sg_zbva() just for RDS
> to use instead of polluting the API with an iova argument, but I think
> that the RDS spec can be updated to use 64bit offsets and align to all
> other RDMA protocols (it has enough space in h_exthdr which is 128bit).
>
RDS assumes it's an offset and hence it has been used as 32 bit. I need
to look through this carefully though because all the existing
application use this header format. There is also RDMA read/write
byte information sent as part of the header(Not in upstream code yet)
so the space might be less. But point taken. Will look into it.

> I was thinking of:
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index e7e0251..61fcab4 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -3033,6 +3033,21 @@ int ib_map_mr_sg(struct ib_mr *mr,
> unsigned int sg_nents,
> unsigned int page_size);
>
> +static inline int
> +ib_map_mr_sg_zbva(struct ib_mr *mr,
> + struct scatterlist *sg,
> + unsigned int sg_nents,
> + unsigned int page_size)
> +{
> + int rc;
> +
> + rc = ib_map_mr_sg(mr, sg, sg_nents, page_size);
> + if (likely(!rc))
> + mr->iova &= ((u64)page_size - 1);
> +
> + return rc;
> +}
> +
> int ib_sg_to_pages(struct ib_mr *mr,
> struct scatterlist *sgl,
> unsigned int sg_nents,
> --
>
> Thoughts?
>
> Santosh, can you use that one instead and let us know if
> it resolves your issue?
>
Unfortunately this change still doesn't fix the issue.

> I think you should make sure to correctly construct the
> h_exthdr with: rds_rdma_make_cookie(mr->rkey, (32)mr->iova)

Will look into it. Thanks for suggestion.

Regards,
Santosh

2015-09-22 21:22:39

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
> came from Bart Van Assache which pointed out that some applications

Most people appreciate it if their name is spelled correctly :-)

Bart.

2015-09-22 21:42:35

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 03/24] IB/mlx5: Support the new memory registration API

On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
> +static int
> +mlx5_alloc_priv_descs(struct ib_device *device,
> + struct mlx5_ib_mr *mr,
> + int ndescs,
> + int desc_size)
> +{
> + int size = ndescs * desc_size;
> +
> + mr->descs = dma_alloc_coherent(device->dma_device, size,
> + &mr->desc_map, GFP_KERNEL);
> + if (!mr->descs)
> + return -ENOMEM;
> +
> + return 0;
> +}

Would it be possible to clarify the choice for coherent memory ? Would
performance be better if non-coherent memory would be used and if
memory would be synced after initialization of desc_map has finished ?

Bart.

2015-09-22 21:55:49

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 01/24] IB/core: Introduce new fast registration API

On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
> The new fast registration verg ib_map_mr_sg receives a scatterlist

^ verb ?

> +/**
> + * ib_map_mr_sg() - Map a memory region with the largest prefix of
> + * a dma mapped SG list

This description could be made more clear. How about the following:

Register the largest possible prefix of a DMA-mapped SG-list

> + } else if (last_page_off + dma_len < mr->page_size) {
> + /* chunk this fragment with the last */
> + last_end_dma_addr += dma_len;
> + last_page_off += dma_len;
> + mr->length += dma_len;
> + continue;

Shouldn't this code update last_page_addr ?


2015-09-22 21:57:48

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 02/24] IB/mlx5: Remove dead fmr code

On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
> Just function declarations - no need for those
> laying arround. If for some reason someone will want
> FMR support in mlx5, it should be easy enough to restore
> a few structs.
>
> Signed-off-by: Sagi Grimberg <[email protected]>

Reviewed-by: Bart Van Assche <[email protected]>

2015-09-22 22:14:06

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 15/24] IB/srp: Convert to new memory registration API

On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
> Since SRP supports both FMRs and FRWR, the new API conversion
> includes splitting the sg list mapping routines in srp_map_data to
> srp_map_sg_fr that works with the new memory registration API,
> srp_map_sg_fmr which constructs a page vector and calls
> ib_fmr_pool_map_phys, and srp_map_sg_dma which is used only
> if no FRWR nor FMR are supported (which I'm not sure is a valid
> use-case anymore).
>
> The srp protocol is able to pass to the target multiple descriptors
> for remote access, so it basically calls registers muleitple sg list
> partials the entire sg list is mapped and registered (each time maps
> a prefix of an sg list).

This sentence is hard to comprehend, and the spelling of "multiple"
should be addressed.

Another comment about this patch is that there are multiple changes in
this patch. It should be split into multiple patches, e.g. one patch
that converts srp_map_sg() into srp_map_sg_fmr() and srp_map_sg_fr() and
another patch that introduces the ib_map_mr_sg() call.

Bart.

2015-09-24 06:53:27

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 9/20/2015 1:45 AM, Christoph Hellwig wrote:
> Hi Sagi,
>
> I've converted the driver I'm developing to your API and it works
> great. I think this is an important step towards making the RDMA
> more usable!

Thanks Christoph,

should I take it as your "Tested-by: " on ib_core + mlx4 changes?

2015-09-24 07:37:41

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 01/24] IB/core: Introduce new fast registration API

On 9/23/2015 12:21 AM, Bart Van Assche wrote:
> On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
>> The new fast registration verg ib_map_mr_sg receives a scatterlist
>
> ^ verb ?

Will fix. Thanks.

>
>> +/**
>> + * ib_map_mr_sg() - Map a memory region with the largest prefix of
>> + * a dma mapped SG list
>
> This description could be made more clear. How about the following:
>
> Register the largest possible prefix of a DMA-mapped SG-list
>
>> + } else if (last_page_off + dma_len < mr->page_size) {
>> + /* chunk this fragment with the last */
>> + last_end_dma_addr += dma_len;
>> + last_page_off += dma_len;
>> + mr->length += dma_len;
>> + continue;
>
> Shouldn't this code update last_page_addr ?

Actually I think it doesn't since it is only relevant for the else
statement where we are passing the page_size boundary.

2015-09-24 07:39:43

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 03/24] IB/mlx5: Support the new memory registration API

On 9/23/2015 12:27 AM, Bart Van Assche wrote:
> On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
>> +static int
>> +mlx5_alloc_priv_descs(struct ib_device *device,
>> + struct mlx5_ib_mr *mr,
>> + int ndescs,
>> + int desc_size)
>> +{
>> + int size = ndescs * desc_size;
>> +
>> + mr->descs = dma_alloc_coherent(device->dma_device, size,
>> + &mr->desc_map, GFP_KERNEL);
>> + if (!mr->descs)
>> + return -ENOMEM;
>> +
>> + return 0;
>> +}
>
> Would it be possible to clarify the choice for coherent memory ?

No specific reason. I was just copying alloc_fastreg_page_list logic.

> Would performance be better if non-coherent memory would be used and if memory
> would be synced after initialization of desc_map has finished ?

It probably would I assume. I'll change that.

Thanks!

2015-09-24 07:40:22

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 9/23/2015 12:22 AM, Bart Van Assche wrote:
> On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
>> came from Bart Van Assache which pointed out that some applications
>
> Most people appreciate it if their name is spelled correctly :-)

Sorry about that :)

2015-09-24 09:07:05

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 15/24] IB/srp: Convert to new memory registration API

On 9/23/2015 12:58 AM, Bart Van Assche wrote:
> On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
>> Since SRP supports both FMRs and FRWR, the new API conversion
>> includes splitting the sg list mapping routines in srp_map_data to
>> srp_map_sg_fr that works with the new memory registration API,
>> srp_map_sg_fmr which constructs a page vector and calls
>> ib_fmr_pool_map_phys, and srp_map_sg_dma which is used only
>> if no FRWR nor FMR are supported (which I'm not sure is a valid
>> use-case anymore).
>>
>> The srp protocol is able to pass to the target multiple descriptors
>> for remote access, so it basically calls registers muleitple sg list
>> partials the entire sg list is mapped and registered (each time maps
>> a prefix of an sg list).
>
> This sentence is hard to comprehend, and the spelling of "multiple"
> should be addressed.
>
> Another comment about this patch is that there are multiple changes in
> this patch. It should be split into multiple patches, e.g. one patch
> that converts srp_map_sg() into srp_map_sg_fmr() and srp_map_sg_fr() and
> another patch that introduces the ib_map_mr_sg() call.

OK Bart, I'll split the patches.

Thanks,

Sagi.

2015-09-24 13:40:00

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On Thu, Sep 24, 2015 at 09:53:29AM +0300, Sagi Grimberg wrote:
> Thanks Christoph,
>
> should I take it as your "Tested-by: " on ib_core + mlx4 changes?

Yes. And an Acked-by: for the whole series.

2015-09-28 20:57:59

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 01/24] IB/core: Introduce new fast registration API

On 09/24/2015 12:37 AM, Sagi Grimberg wrote:
> On 9/23/2015 12:21 AM, Bart Van Assche wrote:
>> On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
>>> + } else if (last_page_off + dma_len < mr->page_size) {
>>> + /* chunk this fragment with the last */
>>> + last_end_dma_addr += dma_len;
>>> + last_page_off += dma_len;
>>> + mr->length += dma_len;
>>> + continue;
>>
>> Shouldn't this code update last_page_addr ?
>
> Actually I think it doesn't since it is only relevant for the else
> statement where we are passing the page_size boundary.

Hello Sagi,

Suppose that the following sg-list is passed to this function as {
offset, length } pairs and that this list has not been coalesced by the
DMA mapping code: [ { 0, page_size / 4 }, { page_size / 4, page_size / 4
}, { 2 * page_size / 4, page_size / 2 } ]. I think the algorithm in
patch 01/24 will map the above sample sg-list onto two pages. Shouldn't
that sg-list be mapped onto one page instead ?

Thanks,

Bart.

2015-09-29 05:59:15

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v1 01/24] IB/core: Introduce new fast registration API

On Mon, Sep 28, 2015 at 01:57:52PM -0700, Bart Van Assche wrote:
> >Actually I think it doesn't since it is only relevant for the else
> >statement where we are passing the page_size boundary.
>
> Hello Sagi,
>
> Suppose that the following sg-list is passed to this function as { offset,
> length } pairs and that this list has not been coalesced by the DMA mapping
> code: [ { 0, page_size / 4 }, { page_size / 4, page_size / 4 }, { 2 *
> page_size / 4, page_size / 2 } ]. I think the algorithm in patch 01/24 will
> map the above sample sg-list onto two pages. Shouldn't that sg-list be
> mapped onto one page instead ?

Shouldn't higher layers take care of this? Trying to implement the same
coalescing algorithm at various layers isn't very optimal, although we
need to decide and document which one is responsible.

2015-09-29 06:49:31

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 01/24] IB/core: Introduce new fast registration API

On 9/29/2015 9:47 AM, Sagi Grimberg wrote:
>> Shouldn't higher layers take care of this? Trying to implement the same
>> coalescing algorithm at various layers isn't very optimal, although we
>> need to decide and document which one is responsible.
>
> The block layer can take care of it, but I'm not sure about NFS/RDS at
> the moment (IIRC Steve specifically asked if this API would take care
> of chunking contig sg elements) so I'd rather keep it in until we are
> absolutely sure we don't need it.
>
> I can add a documentation statement for it.

Actually its documented:

* Constraints:
* - The first sg element is allowed to have an offset.
* - Each sg element must be aligned to page_size (or physically
* contiguous to the previous element). In case an sg element has a
* non contiguous offset, the mapping prefix will not include it.
* - The last sg element is allowed to have length less than page_size.


2015-09-29 06:59:03

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 01/24] IB/core: Introduce new fast registration API

On 9/28/2015 11:57 PM, Bart Van Assche wrote:
> On 09/24/2015 12:37 AM, Sagi Grimberg wrote:
>> On 9/23/2015 12:21 AM, Bart Van Assche wrote:
>>> On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
>>>> + } else if (last_page_off + dma_len < mr->page_size) {
>>>> + /* chunk this fragment with the last */
>>>> + last_end_dma_addr += dma_len;
>>>> + last_page_off += dma_len;
>>>> + mr->length += dma_len;
>>>> + continue;
>>>
>>> Shouldn't this code update last_page_addr ?
>>
>> Actually I think it doesn't since it is only relevant for the else
>> statement where we are passing the page_size boundary.
>
> Hello Sagi,
>
> Suppose that the following sg-list is passed to this function as {
> offset, length } pairs and that this list has not been coalesced by the
> DMA mapping code: [ { 0, page_size / 4 }, { page_size / 4, page_size / 4
> }, { 2 * page_size / 4, page_size / 2 } ]. I think the algorithm in
> patch 01/24 will map the above sample sg-list onto two pages. Shouldn't
> that sg-list be mapped onto one page instead ?

It seems to... In order to get that correct we'd need to change the
condition to (last_page_off + dma_len <= mr->page_size)

I'll change for v2.

Thanks.

Sagi.

2015-09-29 07:04:06

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 01/24] IB/core: Introduce new fast registration API

> Shouldn't higher layers take care of this? Trying to implement the same
> coalescing algorithm at various layers isn't very optimal, although we
> need to decide and document which one is responsible.

The block layer can take care of it, but I'm not sure about NFS/RDS at
the moment (IIRC Steve specifically asked if this API would take care
of chunking contig sg elements) so I'd rather keep it in until we are
absolutely sure we don't need it.

I can add a documentation statement for it.

2015-09-29 19:04:00

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
> - Converted SRP initiator and RDS iwarp ULPs to the new API

Hello Sagi,

How has the converted SRP initiator driver been tested ? With the kernel
tree that is available on branch reg_api.4
(427def03e9fa9801efbb27f6c3c6bf7fc0d012e1) I see on the initiator system
that login fails and that the following message is logged:

Sep 29 12:01:05 ion-dev-ib-ini kernel: scsi host72: ib_srp: failed
receive status WR flushed (5) for iu ffff88045bb80930

Thanks,

Bart.

2015-09-29 20:58:29

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 9/29/2015 10:03 PM, Bart Van Assche wrote:
> On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
>> - Converted SRP initiator and RDS iwarp ULPs to the new API
>
> Hello Sagi,
>

Hi Bart,

> How has the converted SRP initiator driver been tested ? With the kernel
> tree that is available on branch reg_api.4

That's odd. Although I haven't formally submitted reg_api.4 yet, I did
test ib_srp initiator against upstream srpt over CX3 (mlx4) and CX4
(mlx5). I ran connect, disconnect, stress IO of all block sizes and
some unaligned block-IO and SG_IO test utilities. It all seems to pass
for me.

Just this morning (my morning) I tested the v2 set on iser, srp, nfs. I
placed that in branch reg_api.5. Would you mind running reg_api.5 and
see if this issue persist (I would be surprised because I haven't seen
any sign of it)?

Thanks,
Sagi.

2015-09-30 06:47:19

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

Hi Bart,

>> How has the converted SRP initiator driver been tested ? With the kernel
>> tree that is available on branch reg_api.4
>
> That's odd. Although I haven't formally submitted reg_api.4 yet, I did
> test ib_srp initiator against upstream srpt over CX3 (mlx4) and CX4
> (mlx5). I ran connect, disconnect, stress IO of all block sizes and
> some unaligned block-IO and SG_IO test utilities. It all seems to pass
> for me.
>
> Just this morning (my morning) I tested the v2 set on iser, srp, nfs. I
> placed that in branch reg_api.5. Would you mind running reg_api.5 and
> see if this issue persist (I would be surprised because I haven't seen
> any sign of it)?

I'm waiting for your input before submitting v2 of this series.

Thanks,
Sagi.

2015-09-30 18:59:12

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 09/29/2015 01:58 PM, Sagi Grimberg wrote:
> On 9/29/2015 10:03 PM, Bart Van Assche wrote:
>> On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
>>> - Converted SRP initiator and RDS iwarp ULPs to the new API
>> How has the converted SRP initiator driver been tested ? With the kernel
>> tree that is available on branch reg_api.4
>
> That's odd. Although I haven't formally submitted reg_api.4 yet, I did
> test ib_srp initiator against upstream srpt over CX3 (mlx4) and CX4
> (mlx5). I ran connect, disconnect, stress IO of all block sizes and
> some unaligned block-IO and SG_IO test utilities. It all seems to pass
> for me.
>
> Just this morning (my morning) I tested the v2 set on iser, srp, nfs. I
> placed that in branch reg_api.5. Would you mind running reg_api.5 and
> see if this issue persist (I would be surprised because I haven't seen
> any sign of it)?

Sorry but I still see these messages with the reg_api.5 branch.
[root@ib-ini linux-kernel]# git show HEAD | grep ^commit
commit 3b5b34777d3cd606433f0aca51e3885323648e07
[root@ib-ini linux-kernel]# uname -a
Linux ib-ini 4.2.0-rc6-debug+ #1 SMP Wed Sep 30 11:38:36 PDT 2015 x86_64
x86_64 x86_64 GNU/Linux

I will try to run a bisect.

Bart.

2015-09-30 20:29:50

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 09/30/2015 11:59 AM, Bart Van Assche wrote:
> On 09/29/2015 01:58 PM, Sagi Grimberg wrote:
>> On 9/29/2015 10:03 PM, Bart Van Assche wrote:
>>> On 09/17/2015 02:42 AM, Sagi Grimberg wrote:
>>>> - Converted SRP initiator and RDS iwarp ULPs to the new API
>>> How has the converted SRP initiator driver been tested ? With the kernel
>>> tree that is available on branch reg_api.4
>>
>> That's odd. Although I haven't formally submitted reg_api.4 yet, I did
>> test ib_srp initiator against upstream srpt over CX3 (mlx4) and CX4
>> (mlx5). I ran connect, disconnect, stress IO of all block sizes and
>> some unaligned block-IO and SG_IO test utilities. It all seems to pass
>> for me.
>>
>> Just this morning (my morning) I tested the v2 set on iser, srp, nfs. I
>> placed that in branch reg_api.5. Would you mind running reg_api.5 and
>> see if this issue persist (I would be surprised because I haven't seen
>> any sign of it)?
>
> Sorry but I still see these messages with the reg_api.5 branch.
> [root@ib-ini linux-kernel]# git show HEAD | grep ^commit
> commit 3b5b34777d3cd606433f0aca51e3885323648e07
> [root@ib-ini linux-kernel]# uname -a
> Linux ib-ini 4.2.0-rc6-debug+ #1 SMP Wed Sep 30 11:38:36 PDT 2015 x86_64
> x86_64 x86_64 GNU/Linux
>
> I will try to run a bisect.

(replying to my own e-mail)

Apparently this behavior got introduced through the patch "IB/srp:
Convert to new registration API" (commit
ad66cbace5ca8c60673bedf35e5027868b0dd2d7). Without that patch SRP I/O
works fine. With that patch I see receive failures being reported. The
SRP initiator was loaded on my setup with the following kernel driver
options:

# cat /etc/modprobe.d/ib_srp.conf
options ib_srp cmd_sg_entries=255 prefer_fr=1 register_always=1

Bart.

2015-10-01 07:16:15

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API


>>> Just this morning (my morning) I tested the v2 set on iser, srp, nfs. I
>>> placed that in branch reg_api.5. Would you mind running reg_api.5 and
>>> see if this issue persist (I would be surprised because I haven't seen
>>> any sign of it)?
>>
>> Sorry but I still see these messages with the reg_api.5 branch.
>> [root@ib-ini linux-kernel]# git show HEAD | grep ^commit
>> commit 3b5b34777d3cd606433f0aca51e3885323648e07
>> [root@ib-ini linux-kernel]# uname -a
>> Linux ib-ini 4.2.0-rc6-debug+ #1 SMP Wed Sep 30 11:38:36 PDT 2015 x86_64
>> x86_64 x86_64 GNU/Linux
>>
>> I will try to run a bisect.
>
> (replying to my own e-mail)
>
> Apparently this behavior got introduced through the patch "IB/srp:
> Convert to new registration API" (commit
> ad66cbace5ca8c60673bedf35e5027868b0dd2d7). Without that patch SRP I/O
> works fine. With that patch I see receive failures being reported. The
> SRP initiator was loaded on my setup with the following kernel driver
> options:
>
> # cat /etc/modprobe.d/ib_srp.conf
> options ib_srp cmd_sg_entries=255 prefer_fr=1 register_always=1

Strange. I don't see that.

options ib_srp prefer_fr=1 register_always=1 are set by default.

When I try to connect srp initiator against upstream srpt with
cmd_sg_entries=255 I get CM reject on iu max size:
kernel: scsi host17: ib_srp: REJ received
kernel: scsi host17: ib_srp: SRP_LOGIN_REJ: requested max_it_iu_len too
large
kernel: scsi host17: ib_srp: Connection 0/8 failed
kernel: scsi host17: ib_srp: Sending CM DREQ failed

When I connect with cmd_sg_entries=128 I successfully connect:
kernel: scsi host18: SRP.T10:F452140300117400
kernel: scsi 18:0:0:0: Direct-Access LIO-ORG RAMDISK-MCP 4.0
PQ: 0 ANSI: 5
kernel: scsi host18: ib_srp: new target: id_ext f452140300117400
ioc_guid f452140300117400 pkey ffff service_id f452140300117400 sgid
fe80:0000:0000:0000:f452:1403:0011:7411 dgid
fe80:0000:0000:0000:f452:1403:0011:7401
kernel: sd 18:0:0:0: [sdy] 20480 512-byte logical blocks: (10.4 MB/10.0 MiB)
kernel: sd 18:0:0:0: [sdy] Write Protect is off
kernel: sd 18:0:0:0: [sdy] Mode Sense: 43 00 00 08
kernel: sd 18:0:0:0: [sdy] Write cache: disabled, read cache: enabled,
doesn't support DPO or FUA
kernel: sd 18:0:0:0: [sdy] Attached SCSI disk

I wander what is the difference between our test environments? I can't
look into this if I'm not able to reproduce.

Sagi.


2015-10-01 17:53:24

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 10/01/2015 12:16 AM, Sagi Grimberg wrote:
>
>>>> Just this morning (my morning) I tested the v2 set on iser, srp, nfs. I
>>>> placed that in branch reg_api.5. Would you mind running reg_api.5 and
>>>> see if this issue persist (I would be surprised because I haven't seen
>>>> any sign of it)?
>>>
>>> Sorry but I still see these messages with the reg_api.5 branch.
>>> [root@ib-ini linux-kernel]# git show HEAD | grep ^commit
>>> commit 3b5b34777d3cd606433f0aca51e3885323648e07
>>> [root@ib-ini linux-kernel]# uname -a
>>> Linux ib-ini 4.2.0-rc6-debug+ #1 SMP Wed Sep 30 11:38:36 PDT 2015 x86_64
>>> x86_64 x86_64 GNU/Linux
>>>
>>> I will try to run a bisect.
>>
>> (replying to my own e-mail)
>>
>> Apparently this behavior got introduced through the patch "IB/srp:
>> Convert to new registration API" (commit
>> ad66cbace5ca8c60673bedf35e5027868b0dd2d7). Without that patch SRP I/O
>> works fine. With that patch I see receive failures being reported. The
>> SRP initiator was loaded on my setup with the following kernel driver
>> options:
>>
>> # cat /etc/modprobe.d/ib_srp.conf
>> options ib_srp cmd_sg_entries=255 prefer_fr=1 register_always=1
>
> Strange. I don't see that.
>
> options ib_srp prefer_fr=1 register_always=1 are set by default.
>
> When I try to connect srp initiator against upstream srpt with
> cmd_sg_entries=255 I get CM reject on iu max size:
> kernel: scsi host17: ib_srp: REJ received
> kernel: scsi host17: ib_srp: SRP_LOGIN_REJ: requested max_it_iu_len too
> large
> kernel: scsi host17: ib_srp: Connection 0/8 failed
> kernel: scsi host17: ib_srp: Sending CM DREQ failed
>
> When I connect with cmd_sg_entries=128 I successfully connect:
> kernel: scsi host18: SRP.T10:F452140300117400
> kernel: scsi 18:0:0:0: Direct-Access LIO-ORG RAMDISK-MCP 4.0
> PQ: 0 ANSI: 5
> kernel: scsi host18: ib_srp: new target: id_ext f452140300117400
> ioc_guid f452140300117400 pkey ffff service_id f452140300117400 sgid
> fe80:0000:0000:0000:f452:1403:0011:7411 dgid
> fe80:0000:0000:0000:f452:1403:0011:7401
> kernel: sd 18:0:0:0: [sdy] 20480 512-byte logical blocks: (10.4 MB/10.0 MiB)
> kernel: sd 18:0:0:0: [sdy] Write Protect is off
> kernel: sd 18:0:0:0: [sdy] Mode Sense: 43 00 00 08
> kernel: sd 18:0:0:0: [sdy] Write cache: disabled, read cache: enabled,
> doesn't support DPO or FUA
> kernel: sd 18:0:0:0: [sdy] Attached SCSI disk
>
> I wander what is the difference between our test environments? I can't
> look into this if I'm not able to reproduce.

Hello Sagi,

At the target side I see "Sep 30 12:56:06 ibdev1 kernel: [178664.300296]
ib_srpt: RDMA t 5 for idx 0 failed with status 10." (status 10
corresponds to IB_WC_REM_ACCESS_ERR). I will try to determine the root
cause.

Bart.

2015-10-01 20:58:24

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 10/01/2015 10:53 AM, Bart Van Assche wrote:
> On 10/01/2015 12:16 AM, Sagi Grimberg wrote:
>> I wander what is the difference between our test environments? I can't
>> look into this if I'm not able to reproduce.
>
> Hello Sagi,
>
> At the target side I see "Sep 30 12:56:06 ibdev1 kernel: [178664.300296]
> ib_srpt: RDMA t 5 for idx 0 failed with status 10." (status 10
> corresponds to IB_WC_REM_ACCESS_ERR). I will try to determine the root
> cause.

(replying to my own e-mail)

Hello Sagi,

To determine which side is causing this issue I captured the traffic
between initiator and target with the MLNX_OFED ibdump tool (the dump
has been attached to this e-mail). As one can see in that capture the
target driver used exactly the same virtual address and length that were
specified in the SRP_CMD request. To me this means that v1 of this patch
series introduces a regression at the initiator side - either in the SRP
initiator driver or in the mlx4 driver.

The only difference between our test setups that could be relevant is
that in my tests several kernel debugging options were enabled at the
initiator side (including SLUB_DEBUG_ON=y). As one can see in the
attached capture the buffer allocated at the initiator side for the SCSI
INQUIRY request was not aligned on a page boundary.

Bart.


Attachments:
sniffer.pcap (55.02 kB)

2015-10-02 15:37:11

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 10/01/2015 11:14 PM, Sagi Grimberg wrote:
> Would you mind sending me your .config?

Hello Sagi,

I just sent this .config file to you off-list.

Bart.


2015-10-06 08:37:46

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 10/2/2015 6:37 PM, Bart Van Assche wrote:
> On 10/01/2015 11:14 PM, Sagi Grimberg wrote:
>> Would you mind sending me your .config?
>
> Hello Sagi,

Hi Bart,

>
> I just sent this .config file to you off-list.

I see now the error you are referring to.

The issue is that the device requires the MR page array to have
an alignment (0x40 for mlx4 and 0x400 for mlx5). When I modified the
page array allocation to be non-coherent I didn't take care of
alignment.

Taking care of this alignment may result in a higher order allocation
as we'd need to add (alignment - 1) to the allocation size.

e.g. a 512 pages on mlx4 will become:
512 * 8 + 0x40 - 1 = 4159

I'm leaning towards this approach. Any preference?

I think this patch should take care of mlx4:
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h
b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index de6eab3..4c69247 100644

--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -129,6 +129,8 @@ struct mlx4_ib_cq {
struct list_head recv_qp_list;
};

+#define MLX4_MR_PAGES_ALIGN 0x40
+
struct mlx4_ib_mr {
struct ib_mr ibmr;
__be64 *pages;
@@ -137,6 +139,7 @@ struct mlx4_ib_mr {
u32 max_pages;
struct mlx4_mr mmr;
struct ib_umem *umem;
+ void *pages_alloc;
};

struct mlx4_ib_mw {
diff --git a/drivers/infiniband/hw/mlx4/mr.c
b/drivers/infiniband/hw/mlx4/mr.c
index fa01f75..d3f8175 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -279,10 +279,14 @@ mlx4_alloc_priv_pages(struct ib_device *device,
int size = max_pages * sizeof(u64);
int ret;

- mr->pages = kzalloc(size, GFP_KERNEL);
- if (!mr->pages)
+ size += max_t(int, MLX4_MR_PAGES_ALIGN - ARCH_KMALLOC_MINALIGN, 0);
+
+ mr->pages_alloc = kzalloc(size, GFP_KERNEL);
+ if (!mr->pages_alloc)
return -ENOMEM;

+ mr->pages = PTR_ALIGN(mr->pages_alloc, MLX4_MR_PAGES_ALIGN);
+
mr->page_map = dma_map_single(device->dma_device, mr->pages,
size, DMA_TO_DEVICE);

@@ -293,20 +297,22 @@ mlx4_alloc_priv_pages(struct ib_device *device,

return 0;
err:
- kfree(mr->pages);
+ kfree(mr->pages_alloc);
return ret;
}

static void
mlx4_free_priv_pages(struct mlx4_ib_mr *mr)
{
- struct ib_device *device = mr->ibmr.device;
- int size = mr->max_pages * sizeof(u64);
-
if (mr->pages) {
+ struct ib_device *device = mr->ibmr.device;
+ int size = mr->max_pages * sizeof(u64);
+
+ size += max_t(int, MLX4_MR_PAGES_ALIGN -
ARCH_KMALLOC_MINALIGN, 0);
+
dma_unmap_single(device->dma_device, mr->page_map,
size, DMA_TO_DEVICE);
- kfree(mr->pages);
+ kfree(mr->pages_alloc);
mr->pages = NULL;
}
}
--

Sagi.

2015-10-06 18:49:40

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 10/06/2015 01:37 AM, Sagi Grimberg wrote:
> I see now the error you are referring to.
>
> The issue is that the device requires the MR page array to have
> an alignment (0x40 for mlx4 and 0x400 for mlx5). When I modified the
> page array allocation to be non-coherent I didn't take care of
> alignment.
>
> Taking care of this alignment may result in a higher order allocation
> as we'd need to add (alignment - 1) to the allocation size.
>
> e.g. a 512 pages on mlx4 will become:
> 512 * 8 + 0x40 - 1 = 4159
>
> I'm leaning towards this approach. Any preference?
>
> I think this patch should take care of mlx4:
> [ ... ]

Hello Sagi,

Thanks for the patch. But since the patch included in the previous
e-mail mapped a memory range that could be outside the bounds of the
allocated memory I have been testing the patch below:

---
drivers/infiniband/hw/mlx4/mlx4_ib.h | 3 +++
drivers/infiniband/hw/mlx4/mr.c | 19 ++++++++++++-------
2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index de6eab3..864d595 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -129,6 +129,8 @@ struct mlx4_ib_cq {
struct list_head recv_qp_list;
};

+#define MLX4_MR_PAGES_ALIGN 0x40
+
struct mlx4_ib_mr {
struct ib_mr ibmr;
__be64 *pages;
@@ -137,6 +139,7 @@ struct mlx4_ib_mr {
u32 max_pages;
struct mlx4_mr mmr;
struct ib_umem *umem;
+ void *pages_alloc;
};

struct mlx4_ib_mw {
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index fa01f75..8121c1c 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -277,12 +277,17 @@ mlx4_alloc_priv_pages(struct ib_device *device,
int max_pages)
{
int size = max_pages * sizeof(u64);
+ int add_size;
int ret;

- mr->pages = kzalloc(size, GFP_KERNEL);
- if (!mr->pages)
+ add_size = max_t(int, MLX4_MR_PAGES_ALIGN - ARCH_KMALLOC_MINALIGN, 0);
+
+ mr->pages_alloc = kzalloc(size + add_size, GFP_KERNEL);
+ if (!mr->pages_alloc)
return -ENOMEM;

+ mr->pages = PTR_ALIGN(mr->pages_alloc, MLX4_MR_PAGES_ALIGN);
+
mr->page_map = dma_map_single(device->dma_device, mr->pages,
size, DMA_TO_DEVICE);

@@ -293,20 +298,20 @@ mlx4_alloc_priv_pages(struct ib_device *device,

return 0;
err:
- kfree(mr->pages);
+ kfree(mr->pages_alloc);
return ret;
}

static void
mlx4_free_priv_pages(struct mlx4_ib_mr *mr)
{
- struct ib_device *device = mr->ibmr.device;
- int size = mr->max_pages * sizeof(u64);
-
if (mr->pages) {
+ struct ib_device *device = mr->ibmr.device;
+ int size = mr->max_pages * sizeof(u64);
+
dma_unmap_single(device->dma_device, mr->page_map,
size, DMA_TO_DEVICE);
- kfree(mr->pages);
+ kfree(mr->pages_alloc);
mr->pages = NULL;
}
}
--
2.1.4



2015-10-07 06:42:29

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 10/6/2015 9:49 PM, Bart Van Assche wrote:
> On 10/06/2015 01:37 AM, Sagi Grimberg wrote:
>> I see now the error you are referring to.
>>
>> The issue is that the device requires the MR page array to have
>> an alignment (0x40 for mlx4 and 0x400 for mlx5). When I modified the
>> page array allocation to be non-coherent I didn't take care of
>> alignment.
>>
>> Taking care of this alignment may result in a higher order allocation
>> as we'd need to add (alignment - 1) to the allocation size.
>>
>> e.g. a 512 pages on mlx4 will become:
>> 512 * 8 + 0x40 - 1 = 4159
>>
>> I'm leaning towards this approach. Any preference?
>>
>> I think this patch should take care of mlx4:
>> [ ... ]
>
> Hello Sagi,
>
> Thanks for the patch. But since the patch included in the previous
> e-mail mapped a memory range that could be outside the bounds of the
> allocated memory I have been testing the patch below:

Thanks! I correct the patches.

Can I take it as your Tested-by on srp?

Cheers,
Sagi.



2015-10-07 09:20:27

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On Tue, Oct 06, 2015 at 11:37:40AM +0300, Sagi Grimberg wrote:
> The issue is that the device requires the MR page array to have
> an alignment (0x40 for mlx4 and 0x400 for mlx5). When I modified the
> page array allocation to be non-coherent I didn't take care of
> alignment.

Just curious: why did you switch away from the coheret dma allocations
anyway? Seems like the page lists are mapped as long as they are
allocated so the coherent allocator would seem like a nice fit.

2015-10-07 09:25:31

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 10/7/2015 12:20 PM, Christoph Hellwig wrote:
> On Tue, Oct 06, 2015 at 11:37:40AM +0300, Sagi Grimberg wrote:
>> The issue is that the device requires the MR page array to have
>> an alignment (0x40 for mlx4 and 0x400 for mlx5). When I modified the
>> page array allocation to be non-coherent I didn't take care of
>> alignment.
>
> Just curious: why did you switch away from the coheret dma allocations
> anyway? Seems like the page lists are mapped as long as they are
> allocated so the coherent allocator would seem like a nice fit.
>

Bart suggested that having to sync once for the entire page list might
perform better than coherent memory. I'll settle either way since using
non-coherent memory might cause higher-order allocations due to
alignment, so it's not free-of-charge.

Sagi.

2015-10-07 09:36:48

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On Wed, Oct 07, 2015 at 12:25:25PM +0300, Sagi Grimberg wrote:
> Bart suggested that having to sync once for the entire page list might
> perform better than coherent memory. I'll settle either way since using
> non-coherent memory might cause higher-order allocations due to
> alignment, so it's not free-of-charge.

I don't really care either way, it just seemed like an odd change hiding
in here that I missed when reviewing earlier.

2015-10-07 10:00:25

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

> I don't really care either way, it just seemed like an odd change hiding
> in here that I missed when reviewing earlier.

OK, so I'm sticking with it until someone suggests otherwise.

Sagi.

2015-10-07 15:48:59

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 10/7/2015 6:46 PM, Bart Van Assche wrote:
> On 10/06/2015 11:42 PM, Sagi Grimberg wrote:
>> On 10/6/2015 9:49 PM, Bart Van Assche wrote:
>>> On 10/06/2015 01:37 AM, Sagi Grimberg wrote:
>>>> I see now the error you are referring to.
>>>>
>>>> The issue is that the device requires the MR page array to have
>>>> an alignment (0x40 for mlx4 and 0x400 for mlx5). When I modified the
>>>> page array allocation to be non-coherent I didn't take care of
>>>> alignment.
>>>>
>>>> Taking care of this alignment may result in a higher order allocation
>>>> as we'd need to add (alignment - 1) to the allocation size.
>>>>
>>>> e.g. a 512 pages on mlx4 will become:
>>>> 512 * 8 + 0x40 - 1 = 4159
>>>>
>>>> I'm leaning towards this approach. Any preference?
>>>>
>>>> I think this patch should take care of mlx4:
>>>> [ ... ]
>>>
>>> Hello Sagi,
>>>
>>> Thanks for the patch. But since the patch included in the previous
>>> e-mail mapped a memory range that could be outside the bounds of the
>>> allocated memory I have been testing the patch below:
>>
>> Thanks! I correct the patches.
>>
>> Can I take it as your Tested-by on srp?
>
> Sure :-) But please keep in mind that I currently only have access to
> ConnectX-3 HCA's for testing RDMA software and not to any other RDMA HCA
> model.

Thanks Bart. For what its worth, I've tested srp (and iser + nfs) on
both CX3 and CX4 with your config file.

Cheers,
Sagi.

2015-10-07 16:01:25

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 10/06/2015 11:42 PM, Sagi Grimberg wrote:
> On 10/6/2015 9:49 PM, Bart Van Assche wrote:
>> On 10/06/2015 01:37 AM, Sagi Grimberg wrote:
>>> I see now the error you are referring to.
>>>
>>> The issue is that the device requires the MR page array to have
>>> an alignment (0x40 for mlx4 and 0x400 for mlx5). When I modified the
>>> page array allocation to be non-coherent I didn't take care of
>>> alignment.
>>>
>>> Taking care of this alignment may result in a higher order allocation
>>> as we'd need to add (alignment - 1) to the allocation size.
>>>
>>> e.g. a 512 pages on mlx4 will become:
>>> 512 * 8 + 0x40 - 1 = 4159
>>>
>>> I'm leaning towards this approach. Any preference?
>>>
>>> I think this patch should take care of mlx4:
>>> [ ... ]
>>
>> Hello Sagi,
>>
>> Thanks for the patch. But since the patch included in the previous
>> e-mail mapped a memory range that could be outside the bounds of the
>> allocated memory I have been testing the patch below:
>
> Thanks! I correct the patches.
>
> Can I take it as your Tested-by on srp?

Sure :-) But please keep in mind that I currently only have access to
ConnectX-3 HCA's for testing RDMA software and not to any other RDMA HCA
model.

Bart.

2015-10-07 16:45:40

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] New fast registration API

On 10/07/2015 02:20 AM, Christoph Hellwig wrote:
> On Tue, Oct 06, 2015 at 11:37:40AM +0300, Sagi Grimberg wrote:
>> The issue is that the device requires the MR page array to have
>> an alignment (0x40 for mlx4 and 0x400 for mlx5). When I modified the
>> page array allocation to be non-coherent I didn't take care of
>> alignment.
>
> Just curious: why did you switch away from the coheret dma allocations
> anyway? Seems like the page lists are mapped as long as they are
> allocated so the coherent allocator would seem like a nice fit.

Hello Christoph,

My concern is that caching and/or write combining might be disabled for
DMA coherent memory regions. This is why I assume that calling
dma_map_single() and dma_unmap_single() will be faster for registering
multiple pages as a single memory region instead of using DMA coherent
memory.

Bart.