2015-07-24 16:18:16

by Steve Wise

[permalink] [raw]
Subject: [PATCH V6 0/9] iSER support for iWARP

The following series implements support for iWARP transports in the iSER
initiator and target. This is based on v4.2-rc3.

I know we're in the middle of some API changes that will affect the isert
patches in this series, but I wanted to get these out for another round
of review. I can merge the isert changes on top of the new API once
it solidifies and we have a staging tree for them. My goal is to get
this series in for 4.3, so if the API changes don't gel fairly soon,
I ask that we consider pulling this series in first. But I'm definitly
willing to help get this -and- the API stuff in for 4.3.

Changes since V5:

The big change in V6 is to introduce new register/unregister functions in isert
to use FRMRs + local dma lkey for devices that support this yet do not
support DIF/PI. So iWARP devices do not require a DMA-MR.

svcrdma patch added to use max_sge_rd.

reordered the series some:
patch 1 is the small iser changes to support iwarp
patches 2..5 fix drivers to set max_sge_rd and ULPs to use them
patches 6-9 enhance isert to support iwarp

Changes since V4:

iser: fixedcompiler warning

isert: back to setting REMOTE_WRITE only for iWARP devices

Changes since V3:

Fixed commit messages based on feedback.

iser: adjust max_sectors

isert: split into 2 patches

isert: always set REMOTE_WRITE for dma mrs

Changes since V2:

The transport independent work is removed from this series and will
be submitted in a subsequent series. This V3 series now enables iWARP
using existing core services.

Changes since V1:

Introduce and use transport-independent RDMA core services for allocating
DMA MRs and computing fast register access flags.

Correctly set the device max_sge_rd capability in several rdma device
drivers.

isert: use device capability max_sge_rd for the read sge depth.

isert: change max_sge to max_write_sge in struct isert_conn.

---

Sagi Grimberg (1):
mlx4, mlx5, mthca: Expose max_sge_rd correctly

Steve Wise (8):
isert: Support iWARP transports using FRMRs
isert: Use local_dma_lkey whenever possible.
isert: Use the device's max fastreg page list depth
isert: Rename IO functions to more descriptive names
RDMA/isert: Limit read depth based on the device max_sge_rd capability
svcrdma: Use max_sge_rd for destination read depths
ipath,qib: Expose max_sge_rd correctly
RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth


drivers/infiniband/hw/ipath/ipath_verbs.c | 1
drivers/infiniband/hw/mlx4/main.c | 1
drivers/infiniband/hw/mlx5/main.c | 1
drivers/infiniband/hw/mthca/mthca_provider.c | 1
drivers/infiniband/hw/qib/qib_verbs.c | 1
drivers/infiniband/ulp/iser/iscsi_iser.c | 9 +
drivers/infiniband/ulp/isert/ib_isert.c | 345 ++++++++++++++++++++++----
drivers/infiniband/ulp/isert/ib_isert.h | 5
include/linux/sunrpc/svc_rdma.h | 1
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 12 -
net/sunrpc/xprtrdma/svc_rdma_transport.c | 4
11 files changed, 320 insertions(+), 61 deletions(-)

--
Steve


2015-07-24 16:18:32

by Steve Wise

[permalink] [raw]
Subject: [PATCH V6 3/9] ipath,qib: Expose max_sge_rd correctly

Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.

Signed-off-by: Steve Wise <[email protected]>
Acked-by: Mike Marciniszyn <[email protected]>
---

drivers/infiniband/hw/ipath/ipath_verbs.c | 1 +
drivers/infiniband/hw/qib/qib_verbs.c | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 30ba49c..ed2bbc2 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1521,6 +1521,7 @@ static int ipath_query_device(struct ib_device *ibdev, struct ib_device_attr *pr
props->max_qp = ib_ipath_max_qps;
props->max_qp_wr = ib_ipath_max_qp_wrs;
props->max_sge = ib_ipath_max_sges;
+ props->max_sge_rd = ib_ipath_max_sges;
props->max_cq = ib_ipath_max_cqs;
props->max_ah = ib_ipath_max_ahs;
props->max_cqe = ib_ipath_max_cqes;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index a05d1a3..bc723b5 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -1574,6 +1574,7 @@ static int qib_query_device(struct ib_device *ibdev, struct ib_device_attr *prop
props->max_qp = ib_qib_max_qps;
props->max_qp_wr = ib_qib_max_qp_wrs;
props->max_sge = ib_qib_max_sges;
+ props->max_sge_rd = ib_qib_max_sges;
props->max_cq = ib_qib_max_cqs;
props->max_ah = ib_qib_max_ahs;
props->max_cqe = ib_qib_max_cqes;


2015-07-24 16:18:38

by Steve Wise

[permalink] [raw]
Subject: [PATCH V6 4/9] svcrdma: Use max_sge_rd for destination read depths

Signed-off-by: Steve Wise <[email protected]>
---

include/linux/sunrpc/svc_rdma.h | 1 +
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 12 +-----------
net/sunrpc/xprtrdma/svc_rdma_transport.c | 4 ++++
3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index cb94ee4..83211bc 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -132,6 +132,7 @@ struct svcxprt_rdma {
struct list_head sc_accept_q; /* Conn. waiting accept */
int sc_ord; /* RDMA read limit */
int sc_max_sge;
+ int sc_max_sge_rd; /* max sge for read target */

int sc_sq_depth; /* Depth of SQ */
atomic_t sc_sq_count; /* Number of SQ WR on queue */
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 2e1348b..cb51742 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -115,15 +115,6 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
rqstp->rq_arg.tail[0].iov_len = 0;
}

-static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
-{
- if (!rdma_cap_read_multi_sge(xprt->sc_cm_id->device,
- xprt->sc_cm_id->port_num))
- return 1;
- else
- return min_t(int, sge_count, xprt->sc_max_sge);
-}
-
/* Issue an RDMA_READ using the local lkey to map the data sink */
int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
struct svc_rqst *rqstp,
@@ -144,8 +135,7 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,

ctxt->direction = DMA_FROM_DEVICE;
ctxt->read_hdr = head;
- pages_needed =
- min_t(int, pages_needed, rdma_read_max_sge(xprt, pages_needed));
+ pages_needed = min_t(int, pages_needed, xprt->sc_max_sge_rd);
read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);

for (pno = 0; pno < pages_needed; pno++) {
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 6b36279..fdc850f 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -872,6 +872,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
* capabilities of this particular device */
newxprt->sc_max_sge = min((size_t)devattr.max_sge,
(size_t)RPCSVC_MAXPAGES);
+ newxprt->sc_max_sge_rd = min_t(size_t, devattr.max_sge_rd,
+ RPCSVC_MAXPAGES);
newxprt->sc_max_requests = min((size_t)devattr.max_qp_wr,
(size_t)svcrdma_max_requests);
newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;
@@ -1046,6 +1048,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
" remote_ip : %pI4\n"
" remote_port : %d\n"
" max_sge : %d\n"
+ " max_sge_rd : %d\n"
" sq_depth : %d\n"
" max_requests : %d\n"
" ord : %d\n",
@@ -1059,6 +1062,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
ntohs(((struct sockaddr_in *)&newxprt->sc_cm_id->
route.addr.dst_addr)->sin_port),
newxprt->sc_max_sge,
+ newxprt->sc_max_sge_rd,
newxprt->sc_sq_depth,
newxprt->sc_max_requests,
newxprt->sc_ord);


2015-07-24 16:18:43

by Steve Wise

[permalink] [raw]
Subject: [PATCH V6 5/9] RDMA/isert: Limit read depth based on the device max_sge_rd capability

Use the device's max_sge_rd capability to compute the target's read sge
depth. Save both the read and write max_sge values in the isert_conn
struct, and use these when creating RDMA_WRITE/READ work requests.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/ulp/isert/ib_isert.c | 24 ++++++++++++++++--------
drivers/infiniband/ulp/isert/ib_isert.h | 3 ++-
2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 7717009..6af3dd4 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -165,7 +165,9 @@ isert_create_qp(struct isert_conn *isert_conn,
* outgoing control PDU responses.
*/
attr.cap.max_send_sge = max(2, device->dev_attr.max_sge - 2);
- isert_conn->max_sge = attr.cap.max_send_sge;
+ isert_conn->max_write_sge = attr.cap.max_send_sge;
+ isert_conn->max_read_sge = min_t(u32, device->dev_attr.max_sge_rd,
+ attr.cap.max_send_sge);

attr.cap.max_recv_sge = 1;
attr.sq_sig_type = IB_SIGNAL_REQ_WR;
@@ -2392,7 +2394,7 @@ isert_put_text_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
static int
isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
struct ib_sge *ib_sge, struct ib_send_wr *send_wr,
- u32 data_left, u32 offset)
+ u32 data_left, u32 offset, u32 max_sge)
{
struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
struct scatterlist *sg_start, *tmp_sg;
@@ -2403,7 +2405,7 @@ isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,

sg_off = offset / PAGE_SIZE;
sg_start = &cmd->se_cmd.t_data_sg[sg_off];
- sg_nents = min(cmd->se_cmd.t_data_nents - sg_off, isert_conn->max_sge);
+ sg_nents = min(cmd->se_cmd.t_data_nents - sg_off, max_sge);
page_off = offset % PAGE_SIZE;

send_wr->sg_list = ib_sge;
@@ -2449,8 +2451,9 @@ isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
struct isert_data_buf *data = &wr->data;
struct ib_send_wr *send_wr;
struct ib_sge *ib_sge;
- u32 offset, data_len, data_left, rdma_write_max, va_offset = 0;
+ u32 offset, data_len, data_left, rdma_max_len, va_offset = 0;
int ret = 0, i, ib_sge_cnt;
+ u32 max_sge;

isert_cmd->tx_desc.isert_cmd = isert_cmd;

@@ -2472,7 +2475,12 @@ isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
}
wr->ib_sge = ib_sge;

- wr->send_wr_num = DIV_ROUND_UP(data->nents, isert_conn->max_sge);
+ if (wr->iser_ib_op == ISER_IB_RDMA_WRITE)
+ max_sge = isert_conn->max_write_sge;
+ else
+ max_sge = isert_conn->max_read_sge;
+
+ wr->send_wr_num = DIV_ROUND_UP(data->nents, max_sge);
wr->send_wr = kzalloc(sizeof(struct ib_send_wr) * wr->send_wr_num,
GFP_KERNEL);
if (!wr->send_wr) {
@@ -2482,11 +2490,11 @@ isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
}

wr->isert_cmd = isert_cmd;
- rdma_write_max = isert_conn->max_sge * PAGE_SIZE;
+ rdma_max_len = max_sge * PAGE_SIZE;

for (i = 0; i < wr->send_wr_num; i++) {
send_wr = &isert_cmd->rdma_wr.send_wr[i];
- data_len = min(data_left, rdma_write_max);
+ data_len = min(data_left, rdma_max_len);

send_wr->send_flags = 0;
if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
@@ -2508,7 +2516,7 @@ isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
}

ib_sge_cnt = isert_build_rdma_wr(isert_conn, isert_cmd, ib_sge,
- send_wr, data_len, offset);
+ send_wr, data_len, offset, max_sge);
ib_sge += ib_sge_cnt;

offset += data_len;
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 9ec23a7..29fde27 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -152,7 +152,8 @@ struct isert_conn {
u32 responder_resources;
u32 initiator_depth;
bool pi_support;
- u32 max_sge;
+ u32 max_write_sge;
+ u32 max_read_sge;
char *login_buf;
char *login_req_buf;
char *login_rsp_buf;


2015-07-24 16:18:49

by Steve Wise

[permalink] [raw]
Subject: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

This is in preparation for adding new FRMR-only IO handlers
for devices that support FRMR and not PI.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/ulp/isert/ib_isert.c | 28 ++++++++++++++--------------
1 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 6af3dd4..dcd3c55 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -48,15 +48,15 @@ static struct workqueue_struct *isert_comp_wq;
static struct workqueue_struct *isert_release_wq;

static void
-isert_unmap_cmd(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
+isert_unmap_lkey(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
static int
-isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
+isert_map_lkey(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
struct isert_rdma_wr *wr);
static void
-isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
+isert_unreg_frmr_pi(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
static int
-isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
- struct isert_rdma_wr *wr);
+isert_reg_frmr_pi(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
+ struct isert_rdma_wr *wr);
static int
isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd);
static int
@@ -367,12 +367,12 @@ isert_create_device_ib_res(struct isert_device *device)
if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS &&
dev_attr->device_cap_flags & IB_DEVICE_SIGNATURE_HANDOVER) {
device->use_fastreg = 1;
- device->reg_rdma_mem = isert_reg_rdma;
- device->unreg_rdma_mem = isert_unreg_rdma;
+ device->reg_rdma_mem = isert_reg_frmr_pi;
+ device->unreg_rdma_mem = isert_unreg_frmr_pi;
} else {
device->use_fastreg = 0;
- device->reg_rdma_mem = isert_map_rdma;
- device->unreg_rdma_mem = isert_unmap_cmd;
+ device->reg_rdma_mem = isert_map_lkey;
+ device->unreg_rdma_mem = isert_unmap_lkey;
}

ret = isert_alloc_comps(device, dev_attr);
@@ -1701,7 +1701,7 @@ isert_unmap_data_buf(struct isert_conn *isert_conn, struct isert_data_buf *data)


static void
-isert_unmap_cmd(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
+isert_unmap_lkey(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
{
struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;

@@ -1726,7 +1726,7 @@ isert_unmap_cmd(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
}

static void
-isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
+isert_unreg_frmr_pi(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
{
struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;

@@ -2442,7 +2442,7 @@ isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
}

static int
-isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
+isert_map_lkey(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
struct isert_rdma_wr *wr)
{
struct se_cmd *se_cmd = &cmd->se_cmd;
@@ -2848,8 +2848,8 @@ unmap_prot_cmd:
}

static int
-isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
- struct isert_rdma_wr *wr)
+isert_reg_frmr_pi(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
+ struct isert_rdma_wr *wr)
{
struct se_cmd *se_cmd = &cmd->se_cmd;
struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);


2015-07-24 16:18:21

by Steve Wise

[permalink] [raw]
Subject: [PATCH V6 1/9] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth

Currently the sg tablesize, which dictates fast register page list
depth to use, does not take into account the limits of the rdma device.
So adjust it once we discover the device fastreg max depth limit. Also
adjust the max_sectors based on the resulting sg tablesize.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/ulp/iser/iscsi_iser.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 6a594aa..de8730d 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -640,6 +640,15 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
SHOST_DIX_GUARD_CRC);
}

+ /*
+ * Limit the sg_tablesize and max_sectors based on the device
+ * max fastreg page list length.
+ */
+ shost->sg_tablesize = min_t(unsigned short, shost->sg_tablesize,
+ ib_conn->device->dev_attr.max_fast_reg_page_list_len);
+ shost->max_sectors = min_t(unsigned int,
+ 1024, (shost->sg_tablesize * PAGE_SIZE) >> 9);
+
if (iscsi_host_add(shost,
ib_conn->device->ib_device->dma_device)) {
mutex_unlock(&iser_conn->state_mutex);


2015-07-24 16:18:54

by Steve Wise

[permalink] [raw]
Subject: [PATCH V6 7/9] isert: Use the device's max fastreg page list depth

Use the device's max_fast_reg_page_list_len attr to size the SQ
and FRMR pool.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/ulp/isert/ib_isert.c | 28 ++++++++++++++++++++++++----
drivers/infiniband/ulp/isert/ib_isert.h | 1 +
2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index dcd3c55..8ae9208 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -141,6 +141,14 @@ isert_comp_put(struct isert_comp *comp)
mutex_unlock(&device_list_mutex);
}

+static int isert_max_send_wrs(struct isert_conn *isert_conn)
+{
+ if (isert_conn->max_frpl_len >= ISCSI_ISER_SG_TABLESIZE)
+ return ISERT_QP_MAX_REQ_DTOS;
+ return ISERT_QP_MAX_REQ_DTOS *
+ DIV_ROUND_UP(ISCSI_ISER_SG_TABLESIZE, isert_conn->max_frpl_len);
+}
+
static struct ib_qp *
isert_create_qp(struct isert_conn *isert_conn,
struct isert_comp *comp,
@@ -155,8 +163,6 @@ isert_create_qp(struct isert_conn *isert_conn,
attr.qp_context = isert_conn;
attr.send_cq = comp->cq;
attr.recv_cq = comp->cq;
- attr.cap.max_send_wr = ISERT_QP_MAX_REQ_DTOS;
- attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS + 1;
/*
* FIXME: Use devattr.max_sge - 2 for max_send_sge as
* work-around for RDMA_READs with ConnectX-2.
@@ -168,7 +174,13 @@ isert_create_qp(struct isert_conn *isert_conn,
isert_conn->max_write_sge = attr.cap.max_send_sge;
isert_conn->max_read_sge = min_t(u32, device->dev_attr.max_sge_rd,
attr.cap.max_send_sge);
+ isert_conn->max_frpl_len = device->dev_attr.max_fast_reg_page_list_len;
+ isert_info("max_write_sge %d max_read_sge %d max_frpl_len %d\n",
+ isert_conn->max_write_sge, isert_conn->max_read_sge,
+ isert_conn->max_frpl_len);

+ attr.cap.max_send_wr = isert_max_send_wrs(isert_conn);
+ attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS + 1;
attr.cap.max_recv_sge = 1;
attr.sq_sig_type = IB_SIGNAL_REQ_WR;
attr.qp_type = IB_QPT_RC;
@@ -606,12 +618,20 @@ isert_conn_create_fastreg_pool(struct isert_conn *isert_conn)
struct se_session *se_sess = isert_conn->conn->sess->se_sess;
struct se_node_acl *se_nacl = se_sess->se_node_acl;
int i, ret, tag_num;
+
/*
* Setup the number of FRMRs based upon the number of tags
- * available to session in iscsi_target_locate_portal().
+ * available to session in iscsi_target_locate_portal(),
+ * whether pi is supported, and how many FRMRs are needed to
+ * map the largest possible IO given the rdma device limits.
*/
tag_num = max_t(u32, ISCSIT_MIN_TAGS, se_nacl->queue_depth);
- tag_num = (tag_num * 2) + ISCSIT_EXTRA_TAGS;
+ if (isert_conn->pi_support)
+ tag_num *= 2;
+ if (isert_conn->max_frpl_len < ISCSI_ISER_SG_TABLESIZE)
+ tag_num *= DIV_ROUND_UP(ISCSI_ISER_SG_TABLESIZE,
+ isert_conn->max_frpl_len);
+ tag_num += ISCSIT_EXTRA_TAGS;

isert_conn->fr_pool_size = 0;
for (i = 0; i < tag_num; i++) {
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 29fde27..11cd662 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -154,6 +154,7 @@ struct isert_conn {
bool pi_support;
u32 max_write_sge;
u32 max_read_sge;
+ u32 max_frpl_len;
char *login_buf;
char *login_req_buf;
char *login_rsp_buf;


2015-07-24 16:18:27

by Steve Wise

[permalink] [raw]
Subject: [PATCH V6 2/9] mlx4, mlx5, mthca: Expose max_sge_rd correctly

From: Sagi Grimberg <[email protected]>

Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.

Reported-by: Steve Wise <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
---

drivers/infiniband/hw/mlx4/main.c | 1 +
drivers/infiniband/hw/mlx5/main.c | 1 +
drivers/infiniband/hw/mthca/mthca_provider.c | 1 +
3 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 8be6db8..05166b7 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -229,6 +229,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
props->max_qp_wr = dev->dev->caps.max_wqes - MLX4_IB_SQ_MAX_SPARE;
props->max_sge = min(dev->dev->caps.max_sq_sg,
dev->dev->caps.max_rq_sg);
+ props->max_sge_rd = props->max_sge;
props->max_cq = dev->dev->quotas.cq;
props->max_cqe = dev->dev->caps.max_cqes;
props->max_mr = dev->dev->quotas.mpt;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 085c24b..bf27f21 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -273,6 +273,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
sizeof(struct mlx5_wqe_ctrl_seg)) /
sizeof(struct mlx5_wqe_data_seg);
props->max_sge = min(max_rq_sg, max_sq_sg);
+ props->max_sge_rd = props->max_sge;
props->max_cq = 1 << MLX5_CAP_GEN(mdev, log_max_cq);
props->max_cqe = (1 << MLX5_CAP_GEN(mdev, log_max_eq_sz)) - 1;
props->max_mr = 1 << MLX5_CAP_GEN(mdev, log_max_mkey);
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 93ae51d..dc2d48c 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -97,6 +97,7 @@ static int mthca_query_device(struct ib_device *ibdev, struct ib_device_attr *pr
props->max_qp = mdev->limits.num_qps - mdev->limits.reserved_qps;
props->max_qp_wr = mdev->limits.max_wqes;
props->max_sge = mdev->limits.max_sg;
+ props->max_sge_rd = props->max_sge;
props->max_cq = mdev->limits.num_cqs - mdev->limits.reserved_cqs;
props->max_cqe = mdev->limits.max_cqes;
props->max_mr = mdev->limits.num_mpts - mdev->limits.reserved_mrws;


2015-07-24 16:19:06

by Steve Wise

[permalink] [raw]
Subject: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs

Add new register and unregister functions to be used with devices that
support FRMRs, provide a local dma lkey, yet do not support DIF/PI.

isert_reg_frmr() only needs to use FRMRs for RDMA READ since RDMA WRITE
can be handled entirely with the local dma lkey. So for RDMA READ,
it calls isert_reg_read_frmr(). Otherwise is uses the lkey map service
isert_map_lkey() for RDMA WRITEs.

isert_reg_read_frmr() will create a linked list of WR triplets of the
form: INV->FRWR->READ. The number of these triplets is dependent on
the devices fast reg page list length limit.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/ulp/isert/ib_isert.c | 224 ++++++++++++++++++++++++++++++-
1 files changed, 218 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 47bb790..34b3151 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -57,6 +57,11 @@ isert_unreg_frmr_pi(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
static int
isert_reg_frmr_pi(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
struct isert_rdma_wr *wr);
+static void
+isert_unreg_frmr(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
+static int
+isert_reg_frmr(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
+ struct isert_rdma_wr *wr);
static int
isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd);
static int
@@ -368,6 +373,7 @@ static int
isert_create_device_ib_res(struct isert_device *device)
{
struct ib_device_attr *dev_attr;
+ int cap_flags;
int ret;

dev_attr = &device->dev_attr;
@@ -376,12 +382,18 @@ isert_create_device_ib_res(struct isert_device *device)
return ret;

/* assign function handlers */
- if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS &&
- dev_attr->device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY &&
- dev_attr->device_cap_flags & IB_DEVICE_SIGNATURE_HANDOVER) {
- device->use_fastreg = 1;
- device->reg_rdma_mem = isert_reg_frmr_pi;
- device->unreg_rdma_mem = isert_unreg_frmr_pi;
+ cap_flags = dev_attr->device_cap_flags;
+ if (cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS &&
+ cap_flags & IB_DEVICE_LOCAL_DMA_LKEY) {
+ if (cap_flags & IB_DEVICE_SIGNATURE_HANDOVER) {
+ device->use_fastreg = 1;
+ device->reg_rdma_mem = isert_reg_frmr_pi;
+ device->unreg_rdma_mem = isert_unreg_frmr_pi;
+ } else {
+ device->use_fastreg = 1;
+ device->reg_rdma_mem = isert_reg_frmr;
+ device->unreg_rdma_mem = isert_unreg_frmr;
+ }
} else {
device->use_fastreg = 0;
device->reg_rdma_mem = isert_map_lkey;
@@ -1782,6 +1794,50 @@ isert_unreg_frmr_pi(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
}

static void
+isert_unreg_frmr(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
+{
+ struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
+
+ if (wr->data.sg) {
+ isert_dbg("Cmd %p unmap_sg op\n", isert_cmd);
+ isert_unmap_data_buf(isert_conn, &wr->data);
+ }
+
+ if (wr->send_wr) {
+ int i;
+
+ isert_dbg("Cmd %p free send_wr wr_num %d\n", isert_cmd,
+ wr->send_wr_num);
+ if (wr->iser_ib_op == ISER_IB_RDMA_READ) {
+ spin_lock_bh(&isert_conn->pool_lock);
+ for (i = 0; i < wr->send_wr_num; i += 3) {
+ struct ib_send_wr *fr_wr;
+ struct fast_reg_descriptor *fr_desc;
+
+ fr_wr = &wr->send_wr[i + 1];
+ fr_desc = (struct fast_reg_descriptor *)
+ (uintptr_t) fr_wr->wr_id;
+ isert_dbg("Cmd %p free fr_desc %p\n", isert_cmd,
+ fr_desc);
+ list_add_tail(&fr_desc->list,
+ &isert_conn->fr_pool);
+ }
+ spin_unlock_bh(&isert_conn->pool_lock);
+ }
+ isert_dbg("Cmd %p free wr->send_wr\n", isert_cmd);
+ kfree(wr->send_wr);
+ wr->send_wr = NULL;
+ wr->send_wr_num = 0;
+ }
+
+ if (wr->ib_sge) {
+ isert_dbg("Cmd %p free ib_sge\n", isert_cmd);
+ kfree(wr->ib_sge);
+ wr->ib_sge = NULL;
+ }
+}
+
+static void
isert_put_cmd(struct isert_cmd *isert_cmd, bool comp_err)
{
struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
@@ -2958,6 +3014,162 @@ unmap_cmd:
return ret;
}

+static void
+isert_build_inv_fr_wrs(struct isert_conn *isert_conn,
+ struct isert_cmd *isert_cmd, struct ib_sge *ib_sge,
+ struct ib_send_wr *inv_wr, u32 data_len, u32 offset)
+{
+ struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
+ struct scatterlist *sg_start;
+ struct isert_device *device = isert_conn->device;
+ struct ib_device *ib_dev = device->ib_device;
+ struct ib_fast_reg_page_list *frpl;
+ struct fast_reg_descriptor *fr_desc;
+ struct ib_send_wr *fr_wr;
+ u32 sg_off, page_off;
+ int sg_nents;
+ int frpl_len;
+ unsigned long flags;
+
+ sg_off = offset / PAGE_SIZE;
+ sg_start = &cmd->se_cmd.t_data_sg[sg_off];
+ sg_nents = min(cmd->se_cmd.t_data_nents - sg_off,
+ isert_conn->max_frpl_len);
+ page_off = offset % PAGE_SIZE;
+
+ spin_lock_irqsave(&isert_conn->pool_lock, flags);
+ fr_desc = list_first_entry(&isert_conn->fr_pool,
+ struct fast_reg_descriptor, list);
+ list_del(&fr_desc->list);
+ spin_unlock_irqrestore(&isert_conn->pool_lock, flags);
+ fr_desc->ind &= ~ISERT_DATA_KEY_VALID;
+
+ /* Build the INV WR */
+ isert_inv_rkey(inv_wr, fr_desc->data_mr);
+
+ /* Build the FR WR */
+ frpl = fr_desc->data_frpl;
+ frpl_len = isert_map_fr_pagelist(ib_dev, sg_start, sg_nents,
+ &frpl->page_list[0]);
+ fr_wr = inv_wr + 1;
+ memset(fr_wr, 0, sizeof(fr_wr));
+ fr_wr->wr_id = (uintptr_t)fr_desc;
+ fr_wr->opcode = IB_WR_FAST_REG_MR;
+ fr_wr->wr.fast_reg.iova_start = frpl->page_list[0] + page_off;
+ fr_wr->wr.fast_reg.page_list = frpl;
+ fr_wr->wr.fast_reg.page_list_len = frpl_len;
+ fr_wr->wr.fast_reg.page_shift = PAGE_SHIFT;
+ fr_wr->wr.fast_reg.length = data_len;
+ fr_wr->wr.fast_reg.rkey = fr_desc->data_mr->rkey;
+ fr_wr->wr.fast_reg.access_flags = IB_ACCESS_REMOTE_WRITE |
+ IB_ACCESS_LOCAL_WRITE;
+
+ ib_sge->addr = frpl->page_list[0] + page_off;
+ ib_sge->length = data_len;
+ ib_sge->lkey = fr_desc->data_mr->rkey;
+
+ /* Link up the wrs: inv->fr->read */
+ inv_wr->next = fr_wr;
+ fr_wr->next = fr_wr + 1;
+}
+
+static int
+isert_reg_read_frmr(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
+ struct isert_rdma_wr *wr)
+{
+ struct se_cmd *se_cmd = &cmd->se_cmd;
+ struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
+ struct isert_conn *isert_conn = conn->context;
+ struct isert_data_buf *data = &wr->data;
+ struct ib_send_wr *send_wr = NULL;
+ struct ib_sge *ib_sge;
+ u32 offset, data_len, data_left, rdma_max_len, va_offset = 0;
+ int ret = 0, i;
+ u32 num_frwrs;
+
+ isert_cmd->tx_desc.isert_cmd = isert_cmd;
+
+ offset = cmd->write_data_done;
+ ret = isert_map_data_buf(isert_conn, isert_cmd, se_cmd->t_data_sg,
+ se_cmd->t_data_nents, se_cmd->data_length,
+ offset, wr->iser_ib_op, &wr->data);
+ if (ret)
+ return ret;
+
+ data_left = data->len;
+ offset = data->offset;
+ num_frwrs = DIV_ROUND_UP(data->nents, isert_conn->max_frpl_len);
+
+ ib_sge = kcalloc(num_frwrs, sizeof(struct ib_sge), GFP_KERNEL);
+ if (!ib_sge) {
+ ret = -ENOMEM;
+ goto unmap_cmd;
+ }
+ wr->ib_sge = ib_sge;
+
+ /* always do INV + FR + READ for now */
+ wr->send_wr_num = 3 * num_frwrs;
+ wr->send_wr = kcalloc(wr->send_wr_num, sizeof(struct ib_send_wr),
+ GFP_KERNEL);
+ if (!wr->send_wr) {
+ ret = -ENOMEM;
+ goto unmap_cmd;
+ }
+ isert_info("num_frwrs %d send_wr_num %d\n", num_frwrs, wr->send_wr_num);
+
+ wr->isert_cmd = isert_cmd;
+ rdma_max_len = isert_conn->max_frpl_len * PAGE_SIZE;
+
+ for (i = 0; i < wr->send_wr_num; i += 3) {
+
+ /*
+ * i = INV, i+1 = FR, i+2 = READ
+ */
+ send_wr = &isert_cmd->rdma_wr.send_wr[i+2];
+ data_len = min(data_left, rdma_max_len);
+ isert_build_inv_fr_wrs(isert_conn, isert_cmd, ib_sge,
+ send_wr - 2, data_len, offset);
+ send_wr->opcode = IB_WR_RDMA_READ;
+ send_wr->wr.rdma.remote_addr = isert_cmd->write_va + va_offset;
+ send_wr->wr.rdma.rkey = isert_cmd->write_stag;
+ send_wr->sg_list = ib_sge;
+ send_wr->num_sge = 1;
+ if (i + 3 == wr->send_wr_num) {
+ send_wr->wr_id = (uintptr_t)&isert_cmd->tx_desc;
+ send_wr->send_flags = IB_SEND_SIGNALED;
+ } else {
+ send_wr->next = send_wr + 1;
+ }
+
+ ib_sge++;
+ offset += data_len;
+ va_offset += data_len;
+ data_left -= data_len;
+ }
+ isert_info("send_wr->send_flags 0x%x\n", send_wr->send_flags);
+
+ return 0;
+unmap_cmd:
+ isert_unmap_data_buf(isert_conn, data);
+
+ return ret;
+}
+
+static int
+isert_reg_frmr(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
+ struct isert_rdma_wr *wr)
+{
+
+ /*
+ * Use the local dma lkey for Writes since it only requires local
+ * access flags. For reads, build up inv+fr+read
+ * wr chains as needed.
+ */
+ if (wr->iser_ib_op == ISER_IB_RDMA_WRITE)
+ return isert_map_lkey(conn, cmd, wr);
+ return isert_reg_read_frmr(conn, cmd, wr);
+}
+
static int
isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
{


2015-07-24 16:19:00

by Steve Wise

[permalink] [raw]
Subject: [PATCH V6 8/9] isert: Use local_dma_lkey whenever possible.

No need to allocate a dma_mr if the device provides a local_dma_lkey.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/ulp/isert/ib_isert.c | 47 ++++++++++++++++++-------------
drivers/infiniband/ulp/isert/ib_isert.h | 1 +
2 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 8ae9208..47bb790 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -249,7 +249,7 @@ isert_alloc_rx_descriptors(struct isert_conn *isert_conn)
rx_sg = &rx_desc->rx_sg;
rx_sg->addr = rx_desc->dma_addr;
rx_sg->length = ISER_RX_PAYLOAD_SIZE;
- rx_sg->lkey = device->mr->lkey;
+ rx_sg->lkey = device->local_dma_lkey;
}

isert_conn->rx_desc_head = 0;
@@ -375,8 +375,9 @@ isert_create_device_ib_res(struct isert_device *device)
if (ret)
return ret;

- /* asign function handlers */
+ /* assign function handlers */
if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS &&
+ dev_attr->device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY &&
dev_attr->device_cap_flags & IB_DEVICE_SIGNATURE_HANDOVER) {
device->use_fastreg = 1;
device->reg_rdma_mem = isert_reg_frmr_pi;
@@ -399,12 +400,17 @@ isert_create_device_ib_res(struct isert_device *device)
goto out_cq;
}

- device->mr = ib_get_dma_mr(device->pd, IB_ACCESS_LOCAL_WRITE);
- if (IS_ERR(device->mr)) {
- ret = PTR_ERR(device->mr);
- isert_err("failed to create dma mr, device %p, ret=%d\n",
- device, ret);
- goto out_mr;
+ if (device->use_fastreg)
+ device->local_dma_lkey = device->ib_device->local_dma_lkey;
+ else {
+ device->mr = ib_get_dma_mr(device->pd, IB_ACCESS_LOCAL_WRITE);
+ if (IS_ERR(device->mr)) {
+ ret = PTR_ERR(device->mr);
+ isert_err("failed to create dma mr, device %p, ret=%d\n",
+ device, ret);
+ goto out_mr;
+ }
+ device->local_dma_lkey = device->mr->lkey;
}

/* Check signature cap */
@@ -425,7 +431,8 @@ isert_free_device_ib_res(struct isert_device *device)
{
isert_info("device %p\n", device);

- ib_dereg_mr(device->mr);
+ if (!device->use_fastreg)
+ ib_dereg_mr(device->mr);
ib_dealloc_pd(device->pd);
isert_free_comps(device);
}
@@ -1108,8 +1115,8 @@ isert_create_send_desc(struct isert_conn *isert_conn,
tx_desc->num_sge = 1;
tx_desc->isert_cmd = isert_cmd;

- if (tx_desc->tx_sg[0].lkey != device->mr->lkey) {
- tx_desc->tx_sg[0].lkey = device->mr->lkey;
+ if (tx_desc->tx_sg[0].lkey != device->local_dma_lkey) {
+ tx_desc->tx_sg[0].lkey = device->local_dma_lkey;
isert_dbg("tx_desc %p lkey mismatch, fixing\n", tx_desc);
}
}
@@ -1132,7 +1139,7 @@ isert_init_tx_hdrs(struct isert_conn *isert_conn,
tx_desc->dma_addr = dma_addr;
tx_desc->tx_sg[0].addr = tx_desc->dma_addr;
tx_desc->tx_sg[0].length = ISER_HEADERS_LEN;
- tx_desc->tx_sg[0].lkey = device->mr->lkey;
+ tx_desc->tx_sg[0].lkey = device->local_dma_lkey;

isert_dbg("Setup tx_sg[0].addr: 0x%llx length: %u lkey: 0x%x\n",
tx_desc->tx_sg[0].addr, tx_desc->tx_sg[0].length,
@@ -1165,7 +1172,7 @@ isert_rdma_post_recvl(struct isert_conn *isert_conn)
memset(&sge, 0, sizeof(struct ib_sge));
sge.addr = isert_conn->login_req_dma;
sge.length = ISER_RX_LOGIN_SIZE;
- sge.lkey = isert_conn->device->mr->lkey;
+ sge.lkey = isert_conn->device->local_dma_lkey;

isert_dbg("Setup sge: addr: %llx length: %d 0x%08x\n",
sge.addr, sge.length, sge.lkey);
@@ -1215,7 +1222,7 @@ isert_put_login_tx(struct iscsi_conn *conn, struct iscsi_login *login,

tx_dsg->addr = isert_conn->login_rsp_dma;
tx_dsg->length = length;
- tx_dsg->lkey = isert_conn->device->mr->lkey;
+ tx_dsg->lkey = isert_conn->device->local_dma_lkey;
tx_desc->num_sge = 2;
}
if (!login->login_failed) {
@@ -1734,6 +1741,7 @@ isert_unmap_lkey(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)

if (wr->send_wr) {
isert_dbg("Cmd %p free send_wr\n", isert_cmd);
+ wr->send_wr_num = 0;
kfree(wr->send_wr);
wr->send_wr = NULL;
}
@@ -1967,7 +1975,6 @@ isert_completion_rdma_read(struct iser_tx_desc *tx_desc,
iscsit_stop_dataout_timer(cmd);
device->unreg_rdma_mem(isert_cmd, isert_conn);
cmd->write_data_done = wr->data.len;
- wr->send_wr_num = 0;

isert_dbg("Cmd: %p RDMA_READ comp calling execute_cmd\n", isert_cmd);
spin_lock_bh(&cmd->istate_lock);
@@ -2232,7 +2239,7 @@ isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
isert_cmd->pdu_buf_len = pdu_len;
tx_dsg->addr = isert_cmd->pdu_buf_dma;
tx_dsg->length = pdu_len;
- tx_dsg->lkey = device->mr->lkey;
+ tx_dsg->lkey = device->local_dma_lkey;
isert_cmd->tx_desc.num_sge = 2;
}

@@ -2360,7 +2367,7 @@ isert_put_reject(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
isert_cmd->pdu_buf_len = ISCSI_HDR_LEN;
tx_dsg->addr = isert_cmd->pdu_buf_dma;
tx_dsg->length = ISCSI_HDR_LEN;
- tx_dsg->lkey = device->mr->lkey;
+ tx_dsg->lkey = device->local_dma_lkey;
isert_cmd->tx_desc.num_sge = 2;

isert_init_send_wr(isert_conn, isert_cmd, send_wr);
@@ -2401,7 +2408,7 @@ isert_put_text_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
isert_cmd->pdu_buf_len = txt_rsp_len;
tx_dsg->addr = isert_cmd->pdu_buf_dma;
tx_dsg->length = txt_rsp_len;
- tx_dsg->lkey = device->mr->lkey;
+ tx_dsg->lkey = device->local_dma_lkey;
isert_cmd->tx_desc.num_sge = 2;
}
isert_init_send_wr(isert_conn, isert_cmd, send_wr);
@@ -2442,7 +2449,7 @@ isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
ib_sge->addr = ib_sg_dma_address(ib_dev, tmp_sg) + page_off;
ib_sge->length = min_t(u32, data_left,
ib_sg_dma_len(ib_dev, tmp_sg) - page_off);
- ib_sge->lkey = device->mr->lkey;
+ ib_sge->lkey = device->local_dma_lkey;

isert_dbg("RDMA ib_sge: addr: 0x%llx length: %u lkey: %x\n",
ib_sge->addr, ib_sge->length, ib_sge->lkey);
@@ -2622,7 +2629,7 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
u32 page_off;

if (mem->dma_nents == 1) {
- sge->lkey = device->mr->lkey;
+ sge->lkey = device->local_dma_lkey;
sge->addr = ib_sg_dma_address(ib_dev, &mem->sg[0]);
sge->length = ib_sg_dma_len(ib_dev, &mem->sg[0]);
isert_dbg("sge: addr: 0x%llx length: %u lkey: %x\n",
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 11cd662..25a0946 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -212,6 +212,7 @@ struct isert_device {
struct ib_device *ib_device;
struct ib_pd *pd;
struct ib_mr *mr;
+ u32 local_dma_lkey;
struct isert_comp *comps;
int comps_used;
struct list_head dev_node;


2015-07-24 16:41:33

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH V6 1/9] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth

On Fri, Jul 24, 2015 at 11:18:21AM -0500, Steve Wise wrote:
> Currently the sg tablesize, which dictates fast register page list
> depth to use, does not take into account the limits of the rdma device.
> So adjust it once we discover the device fastreg max depth limit. Also
> adjust the max_sectors based on the resulting sg tablesize.

Huh. How does this relate to the max_page_list_len argument:

struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)

Shouldn't max_fast_reg_page_list_len be checked during the above?

Ie does this still make sense:

drivers/infiniband/ulp/iser/iser_verbs.c: desc->data_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE + 1);

?

The only ULP that checks this is SRP, so basically, all our ULPs are
probably quietly broken? cxgb3 has a limit of 10 (!?!?!!)

Jason

2015-07-24 16:49:15

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH V6 8/9] isert: Use local_dma_lkey whenever possible.

On Fri, Jul 24, 2015 at 11:18:59AM -0500, Steve Wise wrote:
> No need to allocate a dma_mr if the device provides a local_dma_lkey.

It is probably safe to put your series on top of mine, which
incorporates this patch already.

https://github.com/jgunthorpe/linux/commits/remove-ib_get_dma_mr

Jason

2015-07-24 16:57:27

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs

On Fri, Jul 24, 2015 at 11:19:05AM -0500, Steve Wise wrote:
> Add new register and unregister functions to be used with devices that
> support FRMRs, provide a local dma lkey, yet do not support DIF/PI.
>
> isert_reg_frmr() only needs to use FRMRs for RDMA READ since RDMA WRITE
> can be handled entirely with the local dma lkey. So for RDMA READ,
> it calls isert_reg_read_frmr(). Otherwise is uses the lkey map service
> isert_map_lkey() for RDMA WRITEs.
>
> isert_reg_read_frmr() will create a linked list of WR triplets of the
> form: INV->FRWR->READ. The number of these triplets is dependent on
> the devices fast reg page list length limit.

That ordering seems strange, surely it should be

FRWR->READ->INV

And use IB_WR_RDMA_READ_WITH_INV if possible?

ACCESS_REMOTE rkey's should not be left open across the FROM_DEVICE
DMA flush.

> /* assign function handlers */
> - if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS &&
> - dev_attr->device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY &&
> - dev_attr->device_cap_flags & IB_DEVICE_SIGNATURE_HANDOVER) {
> - device->use_fastreg = 1;
> - device->reg_rdma_mem = isert_reg_frmr_pi;
> - device->unreg_rdma_mem = isert_unreg_frmr_pi;
> + cap_flags = dev_attr->device_cap_flags;
> + if (cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS &&
> + cap_flags & IB_DEVICE_LOCAL_DMA_LKEY) {
> + if (cap_flags & IB_DEVICE_SIGNATURE_HANDOVER) {
> + device->use_fastreg = 1;
> + device->reg_rdma_mem = isert_reg_frmr_pi;
> + device->unreg_rdma_mem = isert_unreg_frmr_pi;
> + } else {
> + device->use_fastreg = 1;
> + device->reg_rdma_mem = isert_reg_frmr;
> + device->unreg_rdma_mem = isert_unreg_frmr;
> + }

The use of FRWR for RDMA READ should be iWarp specific, IB shouldn't
pay that overhead. I am expecting to see a cap_rdma_read_rkey or
something in here ?

Jason

2015-07-24 18:40:11

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH V6 1/9] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth



> -----Original Message-----
> From: Jason Gunthorpe [mailto:[email protected]]
> Sent: Friday, July 24, 2015 11:41 AM
> To: Steve Wise
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH V6 1/9] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth
>
> On Fri, Jul 24, 2015 at 11:18:21AM -0500, Steve Wise wrote:
> > Currently the sg tablesize, which dictates fast register page list
> > depth to use, does not take into account the limits of the rdma device.
> > So adjust it once we discover the device fastreg max depth limit. Also
> > adjust the max_sectors based on the resulting sg tablesize.
>
> Huh. How does this relate to the max_page_list_len argument:
>
> struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
>
> Shouldn't max_fast_reg_page_list_len be checked during the above?
>
> Ie does this still make sense:
>
> drivers/infiniband/ulp/iser/iser_verbs.c: desc->data_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE + 1);
>
> ?
>
> The only ULP that checks this is SRP, so basically, all our ULPs are
> probably quietly broken? cxgb3 has a limit of 10 (!?!?!!)
>

Yea seems like some drivers need to enforce this in ib_alloc_fast_reg_mr() as well as ib_alloc_fast_reg_page_list(), and ULPs need
to not exceed the device max.

I will fix iser to limit the mr and page_list allocation based on the device max.

Steve.


2015-07-24 18:41:26

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH V6 8/9] isert: Use local_dma_lkey whenever possible.



> -----Original Message-----
> From: Jason Gunthorpe [mailto:[email protected]]
> Sent: Friday, July 24, 2015 11:49 AM
> To: Steve Wise
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH V6 8/9] isert: Use local_dma_lkey whenever possible.
>
> On Fri, Jul 24, 2015 at 11:18:59AM -0500, Steve Wise wrote:
> > No need to allocate a dma_mr if the device provides a local_dma_lkey.
>
> It is probably safe to put your series on top of mine, which
> incorporates this patch already.
>
> https://github.com/jgunthorpe/linux/commits/remove-ib_get_dma_mr
>
> Jason

I will rebase on this.




2015-07-24 18:48:03

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs



> -----Original Message-----
> From: Jason Gunthorpe [mailto:[email protected]]
> Sent: Friday, July 24, 2015 11:57 AM
> To: Steve Wise
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs
>
> On Fri, Jul 24, 2015 at 11:19:05AM -0500, Steve Wise wrote:
> > Add new register and unregister functions to be used with devices that
> > support FRMRs, provide a local dma lkey, yet do not support DIF/PI.
> >
> > isert_reg_frmr() only needs to use FRMRs for RDMA READ since RDMA WRITE
> > can be handled entirely with the local dma lkey. So for RDMA READ,
> > it calls isert_reg_read_frmr(). Otherwise is uses the lkey map service
> > isert_map_lkey() for RDMA WRITEs.
> >
> > isert_reg_read_frmr() will create a linked list of WR triplets of the
> > form: INV->FRWR->READ. The number of these triplets is dependent on
> > the devices fast reg page list length limit.
>
> That ordering seems strange, surely it should be
>
> FRWR->READ->INV
>
> And use IB_WR_RDMA_READ_WITH_INV if possible?
>
> ACCESS_REMOTE rkey's should not be left open across the FROM_DEVICE
> DMA flush.
>

You're correct. I was thinking to simplify the IO by always invalidating before re-registering. But it does leave the FRMR
registered and exposes a security hole.

I'll have to rework this.

> > /* assign function handlers */
> > - if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS &&
> > - dev_attr->device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY &&
> > - dev_attr->device_cap_flags & IB_DEVICE_SIGNATURE_HANDOVER) {
> > - device->use_fastreg = 1;
> > - device->reg_rdma_mem = isert_reg_frmr_pi;
> > - device->unreg_rdma_mem = isert_unreg_frmr_pi;
> > + cap_flags = dev_attr->device_cap_flags;
> > + if (cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS &&
> > + cap_flags & IB_DEVICE_LOCAL_DMA_LKEY) {
> > + if (cap_flags & IB_DEVICE_SIGNATURE_HANDOVER) {
> > + device->use_fastreg = 1;
> > + device->reg_rdma_mem = isert_reg_frmr_pi;
> > + device->unreg_rdma_mem = isert_unreg_frmr_pi;
> > + } else {
> > + device->use_fastreg = 1;
> > + device->reg_rdma_mem = isert_reg_frmr;
> > + device->unreg_rdma_mem = isert_unreg_frmr;
> > + }
>
> The use of FRWR for RDMA READ should be iWarp specific, IB shouldn't
> pay that overhead. I am expecting to see a cap_rdma_read_rkey or
> something in here ?
>

Ok. But cap_rdma_read_rkey() doesn't really describe the requirement. The requirement is rkey + REMOTE_WRITE. So it is more like
rdma_cap_read_requires_remote_write() which is ugly and too long (but descriptive)...





2015-07-24 18:51:27

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH V6 4/9] svcrdma: Use max_sge_rd for destination read depths



> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Steve Wise
> Sent: Friday, July 24, 2015 11:19 AM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected]
> Subject: [PATCH V6 4/9] svcrdma: Use max_sge_rd for destination read depths
>
> Signed-off-by: Steve Wise <[email protected]>
> ---
>
> include/linux/sunrpc/svc_rdma.h | 1 +
> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 12 +-----------
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 4 ++++
> 3 files changed, 6 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> index cb94ee4..83211bc 100644
> --- a/include/linux/sunrpc/svc_rdma.h
> +++ b/include/linux/sunrpc/svc_rdma.h
> @@ -132,6 +132,7 @@ struct svcxprt_rdma {
> struct list_head sc_accept_q; /* Conn. waiting accept */
> int sc_ord; /* RDMA read limit */
> int sc_max_sge;
> + int sc_max_sge_rd; /* max sge for read target */
>
> int sc_sq_depth; /* Depth of SQ */
> atomic_t sc_sq_count; /* Number of SQ WR on queue */
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index 2e1348b..cb51742 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -115,15 +115,6 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
> rqstp->rq_arg.tail[0].iov_len = 0;
> }
>
> -static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
> -{
> - if (!rdma_cap_read_multi_sge(xprt->sc_cm_id->device,
> - xprt->sc_cm_id->port_num))
> - return 1;
> - else
> - return min_t(int, sge_count, xprt->sc_max_sge);
> -}
> -
> /* Issue an RDMA_READ using the local lkey to map the data sink */
> int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
> struct svc_rqst *rqstp,
> @@ -144,8 +135,7 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
>
> ctxt->direction = DMA_FROM_DEVICE;
> ctxt->read_hdr = head;
> - pages_needed =
> - min_t(int, pages_needed, rdma_read_max_sge(xprt, pages_needed));
> + pages_needed = min_t(int, pages_needed, xprt->sc_max_sge_rd);
> read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
>
> for (pno = 0; pno < pages_needed; pno++) {
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 6b36279..fdc850f 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -872,6 +872,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> * capabilities of this particular device */
> newxprt->sc_max_sge = min((size_t)devattr.max_sge,
> (size_t)RPCSVC_MAXPAGES);
> + newxprt->sc_max_sge_rd = min_t(size_t, devattr.max_sge_rd,
> + RPCSVC_MAXPAGES);
> newxprt->sc_max_requests = min((size_t)devattr.max_qp_wr,
> (size_t)svcrdma_max_requests);
> newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;
> @@ -1046,6 +1048,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> " remote_ip : %pI4\n"
> " remote_port : %d\n"
> " max_sge : %d\n"
> + " max_sge_rd : %d\n"
> " sq_depth : %d\n"
> " max_requests : %d\n"
> " ord : %d\n",
> @@ -1059,6 +1062,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> ntohs(((struct sockaddr_in *)&newxprt->sc_cm_id->
> route.addr.dst_addr)->sin_port),
> newxprt->sc_max_sge,
> + newxprt->sc_max_sge_rd,
> newxprt->sc_sq_depth,
> newxprt->sc_max_requests,
> newxprt->sc_ord);
>

With the above patch change, we have no more users of the recently created rdma_cap_read_multi_sge(). Should I add a patch to remove it?


2015-07-24 19:14:08

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH V6 1/9] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth

On Fri, Jul 24, 2015 at 01:40:17PM -0500, Steve Wise wrote:
> > Huh. How does this relate to the max_page_list_len argument:
> >
> > struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
> >
> > Shouldn't max_fast_reg_page_list_len be checked during the above?
> >
> > Ie does this still make sense:
> >
> > drivers/infiniband/ulp/iser/iser_verbs.c: desc->data_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE + 1);
> >
> > ?
> >
> > The only ULP that checks this is SRP, so basically, all our ULPs are
> > probably quietly broken? cxgb3 has a limit of 10 (!?!?!!)
> >
>
> Yea seems like some drivers need to enforce this in ib_alloc_fast_reg_mr() as well as ib_alloc_fast_reg_page_list(), and ULPs need
> to not exceed the device max.

Great, Sagi, can you incorporate that in your series so that
ib_alloc_mr's max_entires is checked against
max_fast_reg_page_list_len and EINVAL's if it is too great?

Also, after your series can we drop ib_alloc_fast_reg_page_list, and
then also the associated WR stuff?

Thanks,
Jason

2015-07-24 19:24:17

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs

On Fri, Jul 24, 2015 at 01:48:09PM -0500, Steve Wise wrote:
> > The use of FRWR for RDMA READ should be iWarp specific, IB shouldn't
> > pay that overhead. I am expecting to see a cap_rdma_read_rkey or
> > something in here ?
>
> Ok. But cap_rdma_read_rkey() doesn't really describe the
> requirement. The requirement is rkey + REMOTE_WRITE. So it is more
> like rdma_cap_read_requires_remote_write() which is ugly and too
> long (but descriptive)...

I don't care much what name you pick, just jam something like this in
the description

If set then RDMA_READ must be performed by mapping the local
buffers through a rkey MR with ACCESS_REMOTE_WRITE enabled.
The rkey of this MR should be passed in as the sg_lists's lkey for
IB_WR_RDMA_READ_WITH_INV.

FRWR should be used to register the buffer in the send queue,
and the read should be issued using IB_WR_RDMA_READ_WITH_INV (xx
can we just implicitly rely on this? Are there any iWarp cards that
support FRWR but not WITH_INV?)

Finally, only a single SGE can be used with RDMA_READ, all scattering
must be accomplished with the MR.

This quite dramatically changes what is an allowed scatter for the
transfer, IB can support arbitary unaligned S/G lists, while this is
now forced into gapless page aligned elements.

Your patch takes care of this? And only impacts IB?

Jason

2015-07-24 19:57:38

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs



> -----Original Message-----
> From: Jason Gunthorpe [mailto:[email protected]]
> Sent: Friday, July 24, 2015 2:24 PM
> To: Steve Wise
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs
>
> On Fri, Jul 24, 2015 at 01:48:09PM -0500, Steve Wise wrote:
> > > The use of FRWR for RDMA READ should be iWarp specific, IB shouldn't
> > > pay that overhead. I am expecting to see a cap_rdma_read_rkey or
> > > something in here ?
> >
> > Ok. But cap_rdma_read_rkey() doesn't really describe the
> > requirement. The requirement is rkey + REMOTE_WRITE. So it is more
> > like rdma_cap_read_requires_remote_write() which is ugly and too
> > long (but descriptive)...
>
> I don't care much what name you pick, just jam something like this in
> the description
>
> If set then RDMA_READ must be performed by mapping the local
> buffers through a rkey MR with ACCESS_REMOTE_WRITE enabled.
> The rkey of this MR should be passed in as the sg_lists's lkey for
> IB_WR_RDMA_READ_WITH_INV.
>
> FRWR should be used to register the buffer in the send queue,
> and the read should be issued using IB_WR_RDMA_READ_WITH_INV (xx
> can we just implicitly rely on this? Are there any iWarp cards that
> support FRWR but not WITH_INV?)
>

No. And iWARP devices must support READ_WITH_INV from my reading of the iWARP verbs spec.

I will add all these comments and make use of READ_WITH_INV.

> Finally, only a single SGE can be used with RDMA_READ, all scattering
> must be accomplished with the MR.
>
> This quite dramatically changes what is an allowed scatter for the
> transfer, IB can support arbitary unaligned S/G lists, while this is
> now forced into gapless page aligned elements.
>
> Your patch takes care of this? And only impacts IB?

Did you mean "only impacts iWARP?"

Yes the patch takes care of sg->fr page list packing. And it uses the existing isert_map_fr_pagelist() to pack the sg into the
fastreg page list. This same routine is used by the IB FRMR/PI reg/unreg routines as well.

None of this impacts isert over IB, assuming the suggested change above to only use the new reg/unreg functions for devices that
need the iwarp style of read. IE IB devices will either use the existing FRMR/PI methods if they support FRMR/LOCAL_DMA_LKEY/PI
(mlx5) or they will use the existing DMA_MR methods (all the rest of the IB devices). IW devices will get my new reg/unreg
methods...

Steve.




2015-07-24 22:10:58

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs



> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Steve Wise
> Sent: Friday, July 24, 2015 2:58 PM
> To: 'Jason Gunthorpe'
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Subject: RE: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs
>
>
>
> > -----Original Message-----
> > From: Jason Gunthorpe [mailto:[email protected]]
> > Sent: Friday, July 24, 2015 2:24 PM
> > To: Steve Wise
> > Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> > Subject: Re: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs
> >
> > On Fri, Jul 24, 2015 at 01:48:09PM -0500, Steve Wise wrote:
> > > > The use of FRWR for RDMA READ should be iWarp specific, IB shouldn't
> > > > pay that overhead. I am expecting to see a cap_rdma_read_rkey or
> > > > something in here ?
> > >
> > > Ok. But cap_rdma_read_rkey() doesn't really describe the
> > > requirement. The requirement is rkey + REMOTE_WRITE. So it is more
> > > like rdma_cap_read_requires_remote_write() which is ugly and too
> > > long (but descriptive)...
> >
> > I don't care much what name you pick, just jam something like this in
> > the description
> >
> > If set then RDMA_READ must be performed by mapping the local
> > buffers through a rkey MR with ACCESS_REMOTE_WRITE enabled.
> > The rkey of this MR should be passed in as the sg_lists's lkey for
> > IB_WR_RDMA_READ_WITH_INV.
> >
> > FRWR should be used to register the buffer in the send queue,
> > and the read should be issued using IB_WR_RDMA_READ_WITH_INV (xx
> > can we just implicitly rely on this? Are there any iWarp cards that
> > support FRWR but not WITH_INV?)
> >
>
> No. And iWARP devices must support READ_WITH_INV from my reading of the iWARP verbs spec.
>
> I will add all these comments and make use of READ_WITH_INV.
>
> > Finally, only a single SGE can be used with RDMA_READ, all scattering
> > must be accomplished with the MR.
> >
> > This quite dramatically changes what is an allowed scatter for the
> > transfer, IB can support arbitary unaligned S/G lists, while this is
> > now forced into gapless page aligned elements.
> >
> > Your patch takes care of this? And only impacts IB?
>
> Did you mean "only impacts iWARP?"
>
> Yes the patch takes care of sg->fr page list packing. And it uses the existing isert_map_fr_pagelist() to pack the sg into the
> fastreg page list. This same routine is used by the IB FRMR/PI reg/unreg routines as well.
>

By the way, just to be clear: If you use a FRWR, you by definition only have one SGE entry as the result of the registration. So
regardless of what a device/protocol can do with the destination SGE of an RDMA READ operation, if you use FRWR to register the
destination region, you need only 1 SGE in the RDMA READ WR.

Stevo.


2015-07-24 22:38:43

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs

On Fri, Jul 24, 2015 at 05:11:04PM -0500, Steve Wise wrote:

> By the way, just to be clear: If you use a FRWR, you by definition
> only have one SGE entry as the result of the registration. So
> regardless of what a device/protocol can do with the destination SGE
> of an RDMA READ operation, if you use FRWR to register the
> destination region, you need only 1 SGE in the RDMA READ WR.

FRWR's have horrible restrictions on the scatterlist. To realize
arbitrary scatterlists requires multiple MRs, and thus multiple SGE
entries.

The warning is for a reader who might be familiar with IB and think
iWarp simply needs to have a MR wrapped around the memory and assume
gaps/etc can be supported through a s/g list, like IB can.

Jason

2015-07-26 09:57:44

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH V6 1/9] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth

On 7/24/2015 9:40 PM, Steve Wise wrote:
>
>
>> -----Original Message-----
>> From: Jason Gunthorpe [mailto:[email protected]]
>> Sent: Friday, July 24, 2015 11:41 AM
>> To: Steve Wise
>> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; linux-
>> [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
>> Subject: Re: [PATCH V6 1/9] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth
>>
>> On Fri, Jul 24, 2015 at 11:18:21AM -0500, Steve Wise wrote:
>>> Currently the sg tablesize, which dictates fast register page list
>>> depth to use, does not take into account the limits of the rdma device.
>>> So adjust it once we discover the device fastreg max depth limit. Also
>>> adjust the max_sectors based on the resulting sg tablesize.
>>
>> Huh. How does this relate to the max_page_list_len argument:
>>
>> struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
>>
>> Shouldn't max_fast_reg_page_list_len be checked during the above?
>>
>> Ie does this still make sense:
>>
>> drivers/infiniband/ulp/iser/iser_verbs.c: desc->data_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE + 1);
>>
>> ?
>>
>> The only ULP that checks this is SRP, so basically, all our ULPs are
>> probably quietly broken? cxgb3 has a limit of 10 (!?!?!!)
>>
>
> Yea seems like some drivers need to enforce this in ib_alloc_fast_reg_mr() as well as ib_alloc_fast_reg_page_list(), and ULPs need
> to not exceed the device max.
>
> I will fix iser to limit the mr and page_list allocation based on the device max.

Steve, I have a patch that addresses this in the pipe.
The patch is support for up to 8MB transfer size for 4.3 (hopefully).
So no need for you to tackle this.

2015-07-26 09:58:23

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH V6 1/9] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth

On 7/24/2015 10:14 PM, Jason Gunthorpe wrote:
> On Fri, Jul 24, 2015 at 01:40:17PM -0500, Steve Wise wrote:
>>> Huh. How does this relate to the max_page_list_len argument:
>>>
>>> struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
>>>
>>> Shouldn't max_fast_reg_page_list_len be checked during the above?
>>>
>>> Ie does this still make sense:
>>>
>>> drivers/infiniband/ulp/iser/iser_verbs.c: desc->data_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE + 1);
>>>
>>> ?
>>>
>>> The only ULP that checks this is SRP, so basically, all our ULPs are
>>> probably quietly broken? cxgb3 has a limit of 10 (!?!?!!)
>>>
>>
>> Yea seems like some drivers need to enforce this in ib_alloc_fast_reg_mr() as well as ib_alloc_fast_reg_page_list(), and ULPs need
>> to not exceed the device max.
>
> Great, Sagi, can you incorporate that in your series so that
> ib_alloc_mr's max_entires is checked against
> max_fast_reg_page_list_len and EINVAL's if it is too great?

Yes. I'll take care of that.

2015-07-26 09:58:57

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH V6 4/9] svcrdma: Use max_sge_rd for destination read depths


>> @@ -1059,6 +1062,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>> ntohs(((struct sockaddr_in *)&newxprt->sc_cm_id->
>> route.addr.dst_addr)->sin_port),
>> newxprt->sc_max_sge,
>> + newxprt->sc_max_sge_rd,
>> newxprt->sc_sq_depth,
>> newxprt->sc_max_requests,
>> newxprt->sc_ord);
>>
>
> With the above patch change, we have no more users of the recently created rdma_cap_read_multi_sge(). Should I add a patch to remove it?

Yes please.

2015-07-26 10:08:14

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

On 7/24/2015 7:18 PM, Steve Wise wrote:
> This is in preparation for adding new FRMR-only IO handlers
> for devices that support FRMR and not PI.

Steve,

I've given this some thought and I think we should avoid splitting
logic from PI and iWARP. The reason (other than code duplication) is
that currently the iser target support only up to 1MB IOs. I have some
code (not done yet) to support larger IOs by using multiple
registrations per IO (with or without PI).
With a little tweaking I think we can get iwarp to fit in too...

So, do you mind if I take a crack at it?

2015-07-26 10:23:10

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs

On 7/24/2015 10:24 PM, Jason Gunthorpe wrote:
> On Fri, Jul 24, 2015 at 01:48:09PM -0500, Steve Wise wrote:
>>> The use of FRWR for RDMA READ should be iWarp specific, IB shouldn't
>>> pay that overhead. I am expecting to see a cap_rdma_read_rkey or
>>> something in here ?
>>
>> Ok. But cap_rdma_read_rkey() doesn't really describe the
>> requirement. The requirement is rkey + REMOTE_WRITE. So it is more
>> like rdma_cap_read_requires_remote_write() which is ugly and too
>> long (but descriptive)...
>
> I don't care much what name you pick, just jam something like this in
> the description

I think we can just do if (signature || iwarp) use fastreg else
use local_dma_lkey.

>
> If set then RDMA_READ must be performed by mapping the local
> buffers through a rkey MR with ACCESS_REMOTE_WRITE enabled.
> The rkey of this MR should be passed in as the sg_lists's lkey for
> IB_WR_RDMA_READ_WITH_INV.

I think this would be an incremental patch and not as part of iwarp
support.

Question though, wouldn't it be better to do a single RDMA_READ to say
4 registered keys rather than RDMA_READ_WITH_INV for each?

2015-07-26 10:42:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH V6 4/9] svcrdma: Use max_sge_rd for destination read depths

On Sun, Jul 26, 2015 at 12:58:59PM +0300, Sagi Grimberg wrote:
> >With the above patch change, we have no more users of the recently created rdma_cap_read_multi_sge(). Should I add a patch to remove it?
>
> Yes please.

And in the long run this is another argument for killing the system-wide
REMOTE_WRITE phys MR and require memory registrations for iWarp..

2015-07-26 10:43:30

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

On Sun, Jul 26, 2015 at 01:08:16PM +0300, Sagi Grimberg wrote:
> I've given this some thought and I think we should avoid splitting
> logic from PI and iWARP. The reason (other than code duplication) is
> that currently the iser target support only up to 1MB IOs. I have some
> code (not done yet) to support larger IOs by using multiple
> registrations per IO (with or without PI).

Just curious: How is this going to work with iSER only having a single
rkey/offset/len field?

2015-07-26 11:01:09

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

On 7/26/2015 1:43 PM, Christoph Hellwig wrote:
> On Sun, Jul 26, 2015 at 01:08:16PM +0300, Sagi Grimberg wrote:
>> I've given this some thought and I think we should avoid splitting
>> logic from PI and iWARP. The reason (other than code duplication) is
>> that currently the iser target support only up to 1MB IOs. I have some
>> code (not done yet) to support larger IOs by using multiple
>> registrations per IO (with or without PI).
>
> Just curious: How is this going to work with iSER only having a single
> rkey/offset/len field?
>

Good question,

On the wire iser sends a single rkey, but the target is allowed to
transfer the data however it wants to.

Say that the local target HCA supports only 32 pages (128K bytes for 4K
pages) registration and the initiator sent:
rkey=0x1234
address=0xffffaaaa
length=512K

The target would allocate a 512K buffer and:
register offset 0-128K to lkey=0x1
register offset 128K-256K to lkey=0x2
register offset 256K-384K to lkey=0x3
register offset 384K-512K to lkey=0x4

then constructs sg_list as:
sg_list[0] = {addr=buf, length=128K, lkey=0x1}
sg_list[1] = {addr=buf+128K, length=128K, lkey=0x2}
sg_list[2] = {addr=buf+256K, length=128K, lkey=0x3}
sg_list[3] = {addr=buf+384K, length=128K, lkey=0x4}

Then set rdma_read wr with:
rdma_r_wr.sg_list=&sg_list
rdma_r_wr.rdma.addr=0xffffaaaa
rdma_r_wr.rdma.rkey=0x1234

post_send(rdma_r_wr);

Ideally, the post contains a chain of all 4 registrations and the
rdma_read (and an opportunistic good scsi response).

2015-07-26 15:53:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

On Sun, Jul 26, 2015 at 02:00:51PM +0300, Sagi Grimberg wrote:
> On the wire iser sends a single rkey, but the target is allowed to
> transfer the data however it wants to.

So you're trying to get above the limit of a single RDMA READ, not
above the limit for memory registration in the initiator? In that
case your explanation makes sense, that's just not what I expected
to be the limiting factor.

2015-07-26 16:45:08

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

On 7/26/2015 6:53 PM, Christoph Hellwig wrote:
> On Sun, Jul 26, 2015 at 02:00:51PM +0300, Sagi Grimberg wrote:
>> On the wire iser sends a single rkey, but the target is allowed to
>> transfer the data however it wants to.
>
> So you're trying to get above the limit of a single RDMA READ, not
> above the limit for memory registration in the initiator?

Correct.

> In that case your explanation makes sense, that's just not what I expected
> to be the limiting factor.
>

In the initiator case, there is no way to support transfer size that
exceeds the device registration length capabilities (unless we start
using higher-order atomic allocations which we won't).

2015-07-26 17:40:09

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names


>> Ideally, the post contains a chain of all 4 registrations and the
>> rdma_read (and an opportunistic good scsi response).
>
> Just to be clear: This example is for IB only, correct? IW would
> require rkeys with REMOTE_WRITE and 4 read wrs.

My assumption is that it would depend on max_sge_rd.

IB only? iWARP by definition isn't capable of doing rdma_read to
more than one scatter? Anyway, we'll need to calculate the number
of RDMA_READs.

> And you're ignoring invalidation wrs (or read-with-inv) in the example...

Yes, didn't want to inflate the example too much...

2015-07-26 18:02:59

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

On 7/26/2015 6:00 AM, Sagi Grimberg wrote:
> On 7/26/2015 1:43 PM, Christoph Hellwig wrote:
>> On Sun, Jul 26, 2015 at 01:08:16PM +0300, Sagi Grimberg wrote:
>>> I've given this some thought and I think we should avoid splitting
>>> logic from PI and iWARP. The reason (other than code duplication) is
>>> that currently the iser target support only up to 1MB IOs. I have some
>>> code (not done yet) to support larger IOs by using multiple
>>> registrations per IO (with or without PI).
>>
>> Just curious: How is this going to work with iSER only having a single
>> rkey/offset/len field?
>>
>
> Good question,
>
> On the wire iser sends a single rkey, but the target is allowed to
> transfer the data however it wants to.
>
> Say that the local target HCA supports only 32 pages (128K bytes for 4K
> pages) registration and the initiator sent:
> rkey=0x1234
> address=0xffffaaaa
> length=512K
>
> The target would allocate a 512K buffer and:
> register offset 0-128K to lkey=0x1
> register offset 128K-256K to lkey=0x2
> register offset 256K-384K to lkey=0x3
> register offset 384K-512K to lkey=0x4
>
> then constructs sg_list as:
> sg_list[0] = {addr=buf, length=128K, lkey=0x1}
> sg_list[1] = {addr=buf+128K, length=128K, lkey=0x2}
> sg_list[2] = {addr=buf+256K, length=128K, lkey=0x3}
> sg_list[3] = {addr=buf+384K, length=128K, lkey=0x4}
>
> Then set rdma_read wr with:
> rdma_r_wr.sg_list=&sg_list
> rdma_r_wr.rdma.addr=0xffffaaaa
> rdma_r_wr.rdma.rkey=0x1234
>
> post_send(rdma_r_wr);
>
> Ideally, the post contains a chain of all 4 registrations and the
> rdma_read (and an opportunistic good scsi response).

Just to be clear: This example is for IB only, correct? IW would
require rkeys with REMOTE_WRITE and 4 read wrs. And you're ignoring
invalidation wrs (or read-with-inv) in the example...


2015-07-26 20:15:38

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

On 7/26/2015 12:40 PM, Sagi Grimberg wrote:
>
>>> Ideally, the post contains a chain of all 4 registrations and the
>>> rdma_read (and an opportunistic good scsi response).
>>
>> Just to be clear: This example is for IB only, correct? IW would
>> require rkeys with REMOTE_WRITE and 4 read wrs.
>
> My assumption is that it would depend on max_sge_rd.
>

yea.

> IB only? iWARP by definition isn't capable of doing rdma_read to
> more than one scatter? Anyway, we'll need to calculate the number
> of RDMA_READs.
>

The wire protocol limits the destination to a single stg/to/len (aka
rkey/addr/len). Devices/fw/sw could implement some magic to support a
single stg/to/len that maps to a scatter gather list of stags/tos/lens.

>> And you're ignoring invalidation wrs (or read-with-inv) in the
>> example...
>
> Yes, didn't want to inflate the example too much...


2015-07-26 20:17:22

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

On 7/26/2015 5:08 AM, Sagi Grimberg wrote:
> On 7/24/2015 7:18 PM, Steve Wise wrote:
>> This is in preparation for adding new FRMR-only IO handlers
>> for devices that support FRMR and not PI.
>
> Steve,
>
> I've given this some thought and I think we should avoid splitting
> logic from PI and iWARP. The reason (other than code duplication) is
> that currently the iser target support only up to 1MB IOs. I have some
> code (not done yet) to support larger IOs by using multiple
> registrations per IO (with or without PI).
> With a little tweaking I think we can get iwarp to fit in too...
>
> So, do you mind if I take a crack at it?

Sure, go ahead. Let me know how I can help. Certainly I can test it
for you. I'm very keen to get this in for 4.3 if possible...



2015-07-27 17:07:58

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs

On Sun, Jul 26, 2015 at 01:23:12PM +0300, Sagi Grimberg wrote:

> Question though, wouldn't it be better to do a single RDMA_READ to say
> 4 registered keys rather than RDMA_READ_WITH_INV for each?

RDMA READ is limted to 1 sg in iWarp.

RDMA_READ_WITH_INV and 1 sg is really the only correct way to drive
iWarp.

Jason

2015-07-27 21:45:13

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names



> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Steve Wise
> Sent: Sunday, July 26, 2015 3:17 PM
> To: Sagi Grimberg; [email protected]
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names
>
> On 7/26/2015 5:08 AM, Sagi Grimberg wrote:
> > On 7/24/2015 7:18 PM, Steve Wise wrote:
> >> This is in preparation for adding new FRMR-only IO handlers
> >> for devices that support FRMR and not PI.
> >
> > Steve,
> >
> > I've given this some thought and I think we should avoid splitting
> > logic from PI and iWARP. The reason (other than code duplication) is
> > that currently the iser target support only up to 1MB IOs. I have some
> > code (not done yet) to support larger IOs by using multiple
> > registrations per IO (with or without PI).
> > With a little tweaking I think we can get iwarp to fit in too...
> >
> > So, do you mind if I take a crack at it?
>
> Sure, go ahead. Let me know how I can help. Certainly I can test it
> for you. I'm very keen to get this in for 4.3 if possible...
>
>

Since Sagi is going to work on isert to support iwarp as part of his current isert large work, I'll drop the isert patches. I also want to split up the max_sge_rd patches to their own submission. So I will send out 2 new series for a final submission:

1) iser support for iwarp

2) use max_sge_rd and remove rdma_cap_read_multi_sge().

Steve.



2015-08-03 19:32:35

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names


> > Steve,
> >
> > I've given this some thought and I think we should avoid splitting
> > logic from PI and iWARP. The reason (other than code duplication) is
> > that currently the iser target support only up to 1MB IOs. I have some
> > code (not done yet) to support larger IOs by using multiple
> > registrations per IO (with or without PI).
> > With a little tweaking I think we can get iwarp to fit in too...
> >
> > So, do you mind if I take a crack at it?
>
> Sure, go ahead. Let me know how I can help. Certainly I can test it
> for you. I'm very keen to get this in for 4.3 if possible...
>

Hey Sagi, how is this coming along? How can I help?

Steve.


2015-08-04 17:26:20

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

> Hey Sagi, how is this coming along? How can I help?
>

Hi Steve,

This is taking longer than I expected, the changes needed seem
pretty extensive throughout the IO path. I don't think it will be ready
for 4.3

I'll try to send you soon a preliminary version to play with.
Acceptable?



2015-08-04 17:44:38

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names



> -----Original Message-----
> From: Sagi Grimberg [mailto:[email protected]]
> Sent: Tuesday, August 04, 2015 12:26 PM
> To: Steve Wise; [email protected]
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names
>
> > Hey Sagi, how is this coming along? How can I help?
> >
>
> Hi Steve,
>
> This is taking longer than I expected, the changes needed seem
> pretty extensive throughout the IO path. I don't think it will be ready
> for 4.3
>

Perhaps then we should go with my version that adds iwarp-only FRMR IO handlers for 4.3. Then they can be phased out as the rework matures. Thoughts?

> I'll try to send you soon a preliminary version to play with.
> Acceptable?

I can test the iwarp parts once you think the code is ready to try.

Steve.


2015-08-05 21:23:24

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names


> >
> > > Hey Sagi, how is this coming along? How can I help?
> > >
> >
> > Hi Steve,
> >
> > This is taking longer than I expected, the changes needed seem
> > pretty extensive throughout the IO path. I don't think it will be ready
> > for 4.3
> >
>
> Perhaps then we should go with my version that adds iwarp-only FRMR IO handlers for 4.3. Then they can be phased out as the rework
> matures. Thoughts?
>

Shall I send out the series again for merging into 4.3?

Thanks,

Steve.


2015-08-06 15:37:44

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

On 8/6/2015 12:23 AM, Steve Wise wrote:
>
>>>
>>>> Hey Sagi, how is this coming along? How can I help?
>>>>
>>>
>>> Hi Steve,
>>>
>>> This is taking longer than I expected, the changes needed seem
>>> pretty extensive throughout the IO path. I don't think it will be ready
>>> for 4.3
>>>
>>
>> Perhaps then we should go with my version that adds iwarp-only FRMR IO handlers for 4.3. Then they can be phased out as the rework
>> matures. Thoughts?
>>
>
> Shall I send out the series again for merging into 4.3?

Hi Steve,

Not sure about that.. I'm a bit reluctant in adding a code that
branches the isert code even more than it already is.

Nic, WDYT?

2015-08-12 22:15:41

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

> >>>> Hey Sagi, how is this coming along? How can I help?
> >>>>
> >>>
> >>> Hi Steve,
> >>>
> >>> This is taking longer than I expected, the changes needed seem
> >>> pretty extensive throughout the IO path. I don't think it will be ready
> >>> for 4.3
> >>>
> >>
> >> Perhaps then we should go with my version that adds iwarp-only FRMR IO handlers for 4.3. Then they can be phased out as the rework
> >> matures. Thoughts?
> >>
> >
> > Shall I send out the series again for merging into 4.3?
>
> Hi Steve,
>
> Not sure about that.. I'm a bit reluctant in adding a code that
> branches the isert code even more than it already is.
>
> Nic, WDYT?

Nic is silent...

Sagi, do you have an ETA on when you can have the recode ready for detailed review and test? If we can't make linux-4.3, can we be early in staging it for linux-4.4?

Thanks,

Steve.


2015-08-13 13:09:26

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

>
> Nic is silent...
>
> Sagi, do you have an ETA on when you can have the recode ready for detailed review and test? If we can't make linux-4.3, can we be early in staging it for linux-4.4?

Hi Steve,

I have something, but its not remotely close to be submission ready.
This ended up being a rewrite of the registration path which is pretty
convoluted at the moment. My aim is mostly simplifying it in a way that
iWARP support would be (almost) straight-forward...

I can send you my WIP to test.

Sagi.