2015-03-24 20:30:32

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

This is a series of client-side patches for NFS/RDMA. In preparation
for increasing the transport credit limit and maximum rsize/wsize,
I've re-factored the memory registration logic into separate files,
invoked via a method API.

The two main optimizations in v1 of this series have been dropped.
Sagi Grimberg didn't like the complexity of the solution, and there
isn't enough time to rework it, test the new version, and get it
reviewed before the 4.1 merge window opens. I'm going to prepare
these for 4.2.

Fixes suggested by reviewers have been included before the
refactoring patches to make it easier to backport them to previous
kernels.

The series is available in the nfs-rdma-for-4.1 topic branch at

git://linux-nfs.org/projects/cel/cel-2.6.git

Changes since v1:
- Rebased on 4.0-rc5
- Main optimizations postponed to 4.2
- Addressed review comments from Anna, Sagi, and Devesh

---

Chuck Lever (15):
SUNRPC: Introduce missing well-known netids
xprtrdma: Display IPv6 addresses and port numbers correctly
xprtrdma: Perform a full marshal on retransmit
xprtrdma: Byte-align FRWR registration
xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
xprtrdma: Add vector of ops for each memory registration strategy
xprtrdma: Add a "max_payload" op for each memreg mode
xprtrdma: Add a "register_external" op for each memreg mode
xprtrdma: Add a "deregister_external" op for each memreg mode
xprtrdma: Add "init MRs" memreg op
xprtrdma: Add "reset MRs" memreg op
xprtrdma: Add "destroy MRs" memreg op
xprtrdma: Add "open" memreg op
xprtrdma: Handle non-SEND completions via a callout
xprtrdma: Make rpcrdma_{un}map_one() into inline functions


include/linux/sunrpc/msg_prot.h | 8
net/sunrpc/xprtrdma/Makefile | 3
net/sunrpc/xprtrdma/fmr_ops.c | 208 +++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 353 ++++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 94 +++++
net/sunrpc/xprtrdma/rpc_rdma.c | 87 ++--
net/sunrpc/xprtrdma/transport.c | 61 ++-
net/sunrpc/xprtrdma/verbs.c | 699 +++---------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 90 ++++-
9 files changed, 882 insertions(+), 721 deletions(-)
create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c
create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c
create mode 100644 net/sunrpc/xprtrdma/physical_ops.c

--
Chuck Lever


2015-03-24 20:30:41

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 01/15] SUNRPC: Introduce missing well-known netids

Signed-off-by: Chuck Lever <[email protected]>
---
include/linux/sunrpc/msg_prot.h | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/sunrpc/msg_prot.h b/include/linux/sunrpc/msg_prot.h
index aadc6a0..8073713 100644
--- a/include/linux/sunrpc/msg_prot.h
+++ b/include/linux/sunrpc/msg_prot.h
@@ -142,12 +142,18 @@ typedef __be32 rpc_fraghdr;
(RPC_REPHDRSIZE + (2 + RPC_MAX_AUTH_SIZE/4))

/*
- * RFC1833/RFC3530 rpcbind (v3+) well-known netid's.
+ * Well-known netids. See:
+ *
+ * http://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml
*/
#define RPCBIND_NETID_UDP "udp"
#define RPCBIND_NETID_TCP "tcp"
+#define RPCBIND_NETID_RDMA "rdma"
+#define RPCBIND_NETID_SCTP "sctp"
#define RPCBIND_NETID_UDP6 "udp6"
#define RPCBIND_NETID_TCP6 "tcp6"
+#define RPCBIND_NETID_RDMA6 "rdma6"
+#define RPCBIND_NETID_SCTP6 "sctp6"
#define RPCBIND_NETID_LOCAL "local"

/*


2015-03-24 20:30:50

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 02/15] xprtrdma: Display IPv6 addresses and port numbers correctly

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
---
net/sunrpc/xprtrdma/transport.c | 47 ++++++++++++++++++++++++++++++++-------
net/sunrpc/xprtrdma/verbs.c | 21 +++++++----------
2 files changed, 47 insertions(+), 21 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 2e192ba..9be7f97 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -157,12 +157,47 @@ static struct ctl_table sunrpc_table[] = {
static struct rpc_xprt_ops xprt_rdma_procs; /* forward reference */

static void
+xprt_rdma_format_addresses4(struct rpc_xprt *xprt, struct sockaddr *sap)
+{
+ struct sockaddr_in *sin = (struct sockaddr_in *)sap;
+ char buf[20];
+
+ snprintf(buf, sizeof(buf), "%08x", ntohl(sin->sin_addr.s_addr));
+ xprt->address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf, GFP_KERNEL);
+
+ xprt->address_strings[RPC_DISPLAY_NETID] = RPCBIND_NETID_RDMA;
+}
+
+static void
+xprt_rdma_format_addresses6(struct rpc_xprt *xprt, struct sockaddr *sap)
+{
+ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sap;
+ char buf[40];
+
+ snprintf(buf, sizeof(buf), "%pi6", &sin6->sin6_addr);
+ xprt->address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf, GFP_KERNEL);
+
+ xprt->address_strings[RPC_DISPLAY_NETID] = RPCBIND_NETID_RDMA6;
+}
+
+static void
xprt_rdma_format_addresses(struct rpc_xprt *xprt)
{
struct sockaddr *sap = (struct sockaddr *)
&rpcx_to_rdmad(xprt).addr;
- struct sockaddr_in *sin = (struct sockaddr_in *)sap;
- char buf[64];
+ char buf[128];
+
+ switch (sap->sa_family) {
+ case AF_INET:
+ xprt_rdma_format_addresses4(xprt, sap);
+ break;
+ case AF_INET6:
+ xprt_rdma_format_addresses6(xprt, sap);
+ break;
+ default:
+ pr_err("rpcrdma: Unrecognized address family\n");
+ return;
+ }

(void)rpc_ntop(sap, buf, sizeof(buf));
xprt->address_strings[RPC_DISPLAY_ADDR] = kstrdup(buf, GFP_KERNEL);
@@ -170,16 +205,10 @@ xprt_rdma_format_addresses(struct rpc_xprt *xprt)
snprintf(buf, sizeof(buf), "%u", rpc_get_port(sap));
xprt->address_strings[RPC_DISPLAY_PORT] = kstrdup(buf, GFP_KERNEL);

- xprt->address_strings[RPC_DISPLAY_PROTO] = "rdma";
-
- snprintf(buf, sizeof(buf), "%08x", ntohl(sin->sin_addr.s_addr));
- xprt->address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf, GFP_KERNEL);
-
snprintf(buf, sizeof(buf), "%4hx", rpc_get_port(sap));
xprt->address_strings[RPC_DISPLAY_HEX_PORT] = kstrdup(buf, GFP_KERNEL);

- /* netid */
- xprt->address_strings[RPC_DISPLAY_NETID] = "rdma";
+ xprt->address_strings[RPC_DISPLAY_PROTO] = "rdma";
}

static void
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 124676c..1aa55b7 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -50,6 +50,7 @@
#include <linux/interrupt.h>
#include <linux/slab.h>
#include <linux/prefetch.h>
+#include <linux/sunrpc/addr.h>
#include <asm/bitops.h>

#include "xprt_rdma.h"
@@ -424,7 +425,7 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
struct rpcrdma_ia *ia = &xprt->rx_ia;
struct rpcrdma_ep *ep = &xprt->rx_ep;
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
- struct sockaddr_in *addr = (struct sockaddr_in *) &ep->rep_remote_addr;
+ struct sockaddr *sap = (struct sockaddr *)&ep->rep_remote_addr;
#endif
struct ib_qp_attr *attr = &ia->ri_qp_attr;
struct ib_qp_init_attr *iattr = &ia->ri_qp_init_attr;
@@ -480,9 +481,8 @@ connected:
wake_up_all(&ep->rep_connect_wait);
/*FALLTHROUGH*/
default:
- dprintk("RPC: %s: %pI4:%u (ep 0x%p): %s\n",
- __func__, &addr->sin_addr.s_addr,
- ntohs(addr->sin_port), ep,
+ dprintk("RPC: %s: %pIS:%u (ep 0x%p): %s\n",
+ __func__, sap, rpc_get_port(sap), ep,
CONNECTION_MSG(event->event));
break;
}
@@ -491,19 +491,16 @@ connected:
if (connstate == 1) {
int ird = attr->max_dest_rd_atomic;
int tird = ep->rep_remote_cma.responder_resources;
- printk(KERN_INFO "rpcrdma: connection to %pI4:%u "
- "on %s, memreg %d slots %d ird %d%s\n",
- &addr->sin_addr.s_addr,
- ntohs(addr->sin_port),
+
+ pr_info("rpcrdma: connection to %pIS:%u on %s, memreg %d slots %d ird %d%s\n",
+ sap, rpc_get_port(sap),
ia->ri_id->device->name,
ia->ri_memreg_strategy,
xprt->rx_buf.rb_max_requests,
ird, ird < 4 && ird < tird / 2 ? " (low!)" : "");
} else if (connstate < 0) {
- printk(KERN_INFO "rpcrdma: connection to %pI4:%u closed (%d)\n",
- &addr->sin_addr.s_addr,
- ntohs(addr->sin_port),
- connstate);
+ pr_info("rpcrdma: connection to %pIS:%u closed (%d)\n",
+ sap, rpc_get_port(sap), connstate);
}
#endif



2015-03-24 20:30:59

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 03/15] xprtrdma: Perform a full marshal on retransmit

Commit 6ab59945f292 ("xprtrdma: Update rkeys after transport
reconnect" added logic in the ->send_request path to update the
chunk list when an RPC/RDMA request is retransmitted.

Note that rpc_xdr_encode() resets and re-encodes the entire RPC
send buffer for each retransmit of an RPC. The RPC send buffer
is not preserved from the previous transmission of an RPC.

Revert 6ab59945f292, and instead, just force each request to be
fully marshaled every time through ->send_request. This should
preserve the fix from 6ab59945f292, while also performing pullup
during retransmits.

Signed-off-by: Chuck Lever <[email protected]>
Acked-by: Sagi Grimberg <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 71 ++++++++++++++++++---------------------
net/sunrpc/xprtrdma/transport.c | 5 +--
net/sunrpc/xprtrdma/xprt_rdma.h | 10 -----
3 files changed, 34 insertions(+), 52 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 91ffde8..41456d9 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -53,6 +53,14 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

+enum rpcrdma_chunktype {
+ rpcrdma_noch = 0,
+ rpcrdma_readch,
+ rpcrdma_areadch,
+ rpcrdma_writech,
+ rpcrdma_replych
+};
+
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
static const char transfertypes[][12] = {
"pure inline", /* no chunks */
@@ -284,28 +292,6 @@ out:
}

/*
- * Marshal chunks. This routine returns the header length
- * consumed by marshaling.
- *
- * Returns positive RPC/RDMA header size, or negative errno.
- */
-
-ssize_t
-rpcrdma_marshal_chunks(struct rpc_rqst *rqst, ssize_t result)
-{
- struct rpcrdma_req *req = rpcr_to_rdmar(rqst);
- struct rpcrdma_msg *headerp = rdmab_to_msg(req->rl_rdmabuf);
-
- if (req->rl_rtype != rpcrdma_noch)
- result = rpcrdma_create_chunks(rqst, &rqst->rq_snd_buf,
- headerp, req->rl_rtype);
- else if (req->rl_wtype != rpcrdma_noch)
- result = rpcrdma_create_chunks(rqst, &rqst->rq_rcv_buf,
- headerp, req->rl_wtype);
- return result;
-}
-
-/*
* Copy write data inline.
* This function is used for "small" requests. Data which is passed
* to RPC via iovecs (or page list) is copied directly into the
@@ -397,6 +383,7 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
char *base;
size_t rpclen, padlen;
ssize_t hdrlen;
+ enum rpcrdma_chunktype rtype, wtype;
struct rpcrdma_msg *headerp;

/*
@@ -433,13 +420,13 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
* into pages; otherwise use reply chunks.
*/
if (rqst->rq_rcv_buf.buflen <= RPCRDMA_INLINE_READ_THRESHOLD(rqst))
- req->rl_wtype = rpcrdma_noch;
+ wtype = rpcrdma_noch;
else if (rqst->rq_rcv_buf.page_len == 0)
- req->rl_wtype = rpcrdma_replych;
+ wtype = rpcrdma_replych;
else if (rqst->rq_rcv_buf.flags & XDRBUF_READ)
- req->rl_wtype = rpcrdma_writech;
+ wtype = rpcrdma_writech;
else
- req->rl_wtype = rpcrdma_replych;
+ wtype = rpcrdma_replych;

/*
* Chunks needed for arguments?
@@ -456,16 +443,16 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
* TBD check NFSv4 setacl
*/
if (rqst->rq_snd_buf.len <= RPCRDMA_INLINE_WRITE_THRESHOLD(rqst))
- req->rl_rtype = rpcrdma_noch;
+ rtype = rpcrdma_noch;
else if (rqst->rq_snd_buf.page_len == 0)
- req->rl_rtype = rpcrdma_areadch;
+ rtype = rpcrdma_areadch;
else
- req->rl_rtype = rpcrdma_readch;
+ rtype = rpcrdma_readch;

/* The following simplification is not true forever */
- if (req->rl_rtype != rpcrdma_noch && req->rl_wtype == rpcrdma_replych)
- req->rl_wtype = rpcrdma_noch;
- if (req->rl_rtype != rpcrdma_noch && req->rl_wtype != rpcrdma_noch) {
+ if (rtype != rpcrdma_noch && wtype == rpcrdma_replych)
+ wtype = rpcrdma_noch;
+ if (rtype != rpcrdma_noch && wtype != rpcrdma_noch) {
dprintk("RPC: %s: cannot marshal multiple chunk lists\n",
__func__);
return -EIO;
@@ -479,7 +466,7 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
* When padding is in use and applies to the transfer, insert
* it and change the message type.
*/
- if (req->rl_rtype == rpcrdma_noch) {
+ if (rtype == rpcrdma_noch) {

padlen = rpcrdma_inline_pullup(rqst,
RPCRDMA_INLINE_PAD_VALUE(rqst));
@@ -494,7 +481,7 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
headerp->rm_body.rm_padded.rm_pempty[1] = xdr_zero;
headerp->rm_body.rm_padded.rm_pempty[2] = xdr_zero;
hdrlen += 2 * sizeof(u32); /* extra words in padhdr */
- if (req->rl_wtype != rpcrdma_noch) {
+ if (wtype != rpcrdma_noch) {
dprintk("RPC: %s: invalid chunk list\n",
__func__);
return -EIO;
@@ -515,18 +502,26 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
* on receive. Therefore, we request a reply chunk
* for non-writes wherever feasible and efficient.
*/
- if (req->rl_wtype == rpcrdma_noch)
- req->rl_wtype = rpcrdma_replych;
+ if (wtype == rpcrdma_noch)
+ wtype = rpcrdma_replych;
}
}

- hdrlen = rpcrdma_marshal_chunks(rqst, hdrlen);
+ if (rtype != rpcrdma_noch) {
+ hdrlen = rpcrdma_create_chunks(rqst, &rqst->rq_snd_buf,
+ headerp, rtype);
+ wtype = rtype; /* simplify dprintk */
+
+ } else if (wtype != rpcrdma_noch) {
+ hdrlen = rpcrdma_create_chunks(rqst, &rqst->rq_rcv_buf,
+ headerp, wtype);
+ }
if (hdrlen < 0)
return hdrlen;

dprintk("RPC: %s: %s: hdrlen %zd rpclen %zd padlen %zd"
" headerp 0x%p base 0x%p lkey 0x%x\n",
- __func__, transfertypes[req->rl_wtype], hdrlen, rpclen, padlen,
+ __func__, transfertypes[wtype], hdrlen, rpclen, padlen,
headerp, base, rdmab_lkey(req->rl_rdmabuf));

/*
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 9be7f97..97f6562 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -608,10 +608,7 @@ xprt_rdma_send_request(struct rpc_task *task)
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
int rc = 0;

- if (req->rl_niovs == 0)
- rc = rpcrdma_marshal_req(rqst);
- else if (r_xprt->rx_ia.ri_memreg_strategy != RPCRDMA_ALLPHYSICAL)
- rc = rpcrdma_marshal_chunks(rqst, 0);
+ rc = rpcrdma_marshal_req(rqst);
if (rc < 0)
goto failed_marshal;

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 0a16fb6..c8afd83 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -143,14 +143,6 @@ rdmab_to_msg(struct rpcrdma_regbuf *rb)
return (struct rpcrdma_msg *)rb->rg_base;
}

-enum rpcrdma_chunktype {
- rpcrdma_noch = 0,
- rpcrdma_readch,
- rpcrdma_areadch,
- rpcrdma_writech,
- rpcrdma_replych
-};
-
/*
* struct rpcrdma_rep -- this structure encapsulates state required to recv
* and complete a reply, asychronously. It needs several pieces of
@@ -258,7 +250,6 @@ struct rpcrdma_req {
unsigned int rl_niovs; /* 0, 2 or 4 */
unsigned int rl_nchunks; /* non-zero if chunks */
unsigned int rl_connect_cookie; /* retry detection */
- enum rpcrdma_chunktype rl_rtype, rl_wtype;
struct rpcrdma_buffer *rl_buffer; /* home base for this structure */
struct rpcrdma_rep *rl_reply;/* holder for reply buffer */
struct ib_sge rl_send_iov[4]; /* for active requests */
@@ -418,7 +409,6 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *);
/*
* RPC/RDMA protocol calls - xprtrdma/rpc_rdma.c
*/
-ssize_t rpcrdma_marshal_chunks(struct rpc_rqst *, ssize_t);
int rpcrdma_marshal_req(struct rpc_rqst *);
size_t rpcrdma_max_payload(struct rpcrdma_xprt *);



2015-03-24 20:31:09

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 04/15] xprtrdma: Byte-align FRWR registration

The RPC/RDMA transport's FRWR registration logic registers whole
pages. This means areas in the first and last pages that are not
involved in the RDMA I/O are needlessly exposed to the server.

Buffered I/O is typically page-aligned, so not a problem there. But
for direct I/O, which can be byte-aligned, and for reply chunks,
which are nearly always smaller than a page, the transport could
expose memory outside the I/O buffer.

FRWR allows byte-aligned memory registration, so let's use it as
it was intended.

Reported-by: Sagi Grimberg <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/verbs.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 1aa55b7..60f3317 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1924,23 +1924,19 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
break;
}
- dprintk("RPC: %s: Using frmr %p to map %d segments\n",
- __func__, mw, i);
+ dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n",
+ __func__, mw, i, len);

frmr->fr_state = FRMR_IS_VALID;

memset(&fastreg_wr, 0, sizeof(fastreg_wr));
fastreg_wr.wr_id = (unsigned long)(void *)mw;
fastreg_wr.opcode = IB_WR_FAST_REG_MR;
- fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma;
+ fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma + pageoff;
fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl;
fastreg_wr.wr.fast_reg.page_list_len = page_no;
fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
- fastreg_wr.wr.fast_reg.length = page_no << PAGE_SHIFT;
- if (fastreg_wr.wr.fast_reg.length < len) {
- rc = -EIO;
- goto out_err;
- }
+ fastreg_wr.wr.fast_reg.length = len;

/* Bump the key */
key = (u8)(mr->rkey & 0x000000FF);


2015-03-24 20:31:18

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 05/15] xprtrdma: Prevent infinite loop in rpcrdma_ep_create()

If a provider advertizes a zero max_fast_reg_page_list_len, FRWR
depth detection loops forever. Instead of just failing the mount,
try other memory registration modes.

Fixes: 0fc6c4e7bb28 ("xprtrdma: mind the device's max fast . . .")
Reported-by: Devesh Sharma <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/verbs.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 60f3317..99752b5 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -618,9 +618,10 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)

if (memreg == RPCRDMA_FRMR) {
/* Requires both frmr reg and local dma lkey */
- if ((devattr->device_cap_flags &
+ if (((devattr->device_cap_flags &
(IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) !=
- (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) {
+ (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) ||
+ (devattr->max_fast_reg_page_list_len == 0)) {
dprintk("RPC: %s: FRMR registration "
"not supported by HCA\n", __func__);
memreg = RPCRDMA_MTHCAFMR;


2015-03-24 20:31:27

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 06/15] xprtrdma: Add vector of ops for each memory registration strategy

Instead of employing switch() statements, let's use the typical
Linux kernel idiom for handling behavioral variation: virtual
functions.

Start by defining a vector of operations for each supported memory
registration mode, and by adding a source file for each mode.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
---
net/sunrpc/xprtrdma/Makefile | 3 ++-
net/sunrpc/xprtrdma/fmr_ops.c | 22 ++++++++++++++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 22 ++++++++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 24 ++++++++++++++++++++++++
net/sunrpc/xprtrdma/verbs.c | 11 +++++++----
net/sunrpc/xprtrdma/xprt_rdma.h | 12 ++++++++++++
6 files changed, 89 insertions(+), 5 deletions(-)
create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c
create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c
create mode 100644 net/sunrpc/xprtrdma/physical_ops.c

diff --git a/net/sunrpc/xprtrdma/Makefile b/net/sunrpc/xprtrdma/Makefile
index da5136f..579f72b 100644
--- a/net/sunrpc/xprtrdma/Makefile
+++ b/net/sunrpc/xprtrdma/Makefile
@@ -1,6 +1,7 @@
obj-$(CONFIG_SUNRPC_XPRT_RDMA_CLIENT) += xprtrdma.o

-xprtrdma-y := transport.o rpc_rdma.o verbs.o
+xprtrdma-y := transport.o rpc_rdma.o verbs.o \
+ fmr_ops.o frwr_ops.o physical_ops.o

obj-$(CONFIG_SUNRPC_XPRT_RDMA_SERVER) += svcrdma.o

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
new file mode 100644
index 0000000..ffb7d93
--- /dev/null
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -0,0 +1,22 @@
+/*
+ * Copyright (c) 2015 Oracle. All rights reserved.
+ * Copyright (c) 2003-2007 Network Appliance, Inc. All rights reserved.
+ */
+
+/* Lightweight memory registration using Fast Memory Regions (FMR).
+ * Referred to sometimes as MTHCAFMR mode.
+ *
+ * FMR uses synchronous memory registration and deregistration.
+ * FMR registration is known to be fast, but FMR deregistration
+ * can take tens of usecs to complete.
+ */
+
+#include "xprt_rdma.h"
+
+#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
+# define RPCDBG_FACILITY RPCDBG_TRANS
+#endif
+
+const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
+ .ro_displayname = "fmr",
+};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
new file mode 100644
index 0000000..79173f9
--- /dev/null
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -0,0 +1,22 @@
+/*
+ * Copyright (c) 2015 Oracle. All rights reserved.
+ * Copyright (c) 2003-2007 Network Appliance, Inc. All rights reserved.
+ */
+
+/* Lightweight memory registration using Fast Registration Work
+ * Requests (FRWR). Also referred to sometimes as FRMR mode.
+ *
+ * FRWR features ordered asynchronous registration and deregistration
+ * of arbitrarily sized memory regions. This is the fastest and safest
+ * but most complex memory registration mode.
+ */
+
+#include "xprt_rdma.h"
+
+#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
+# define RPCDBG_FACILITY RPCDBG_TRANS
+#endif
+
+const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
+ .ro_displayname = "frwr",
+};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
new file mode 100644
index 0000000..b0922ac
--- /dev/null
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -0,0 +1,24 @@
+/*
+ * Copyright (c) 2015 Oracle. All rights reserved.
+ * Copyright (c) 2003-2007 Network Appliance, Inc. All rights reserved.
+ */
+
+/* No-op chunk preparation. All client memory is pre-registered.
+ * Sometimes referred to as ALLPHYSICAL mode.
+ *
+ * Physical registration is simple because all client memory is
+ * pre-registered and never deregistered. This mode is good for
+ * adapter bring up, but is considered not safe: the server is
+ * trusted not to abuse its access to client memory not involved
+ * in RDMA I/O.
+ */
+
+#include "xprt_rdma.h"
+
+#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
+# define RPCDBG_FACILITY RPCDBG_TRANS
+#endif
+
+const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
+ .ro_displayname = "physical",
+};
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 99752b5..c3319e1 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -492,10 +492,10 @@ connected:
int ird = attr->max_dest_rd_atomic;
int tird = ep->rep_remote_cma.responder_resources;

- pr_info("rpcrdma: connection to %pIS:%u on %s, memreg %d slots %d ird %d%s\n",
+ pr_info("rpcrdma: connection to %pIS:%u on %s, memreg '%s', %d credits, %d responders%s\n",
sap, rpc_get_port(sap),
ia->ri_id->device->name,
- ia->ri_memreg_strategy,
+ ia->ri_ops->ro_displayname,
xprt->rx_buf.rb_max_requests,
ird, ird < 4 && ird < tird / 2 ? " (low!)" : "");
} else if (connstate < 0) {
@@ -650,13 +650,16 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
*/
switch (memreg) {
case RPCRDMA_FRMR:
+ ia->ri_ops = &rpcrdma_frwr_memreg_ops;
break;
case RPCRDMA_ALLPHYSICAL:
+ ia->ri_ops = &rpcrdma_physical_memreg_ops;
mem_priv = IB_ACCESS_LOCAL_WRITE |
IB_ACCESS_REMOTE_WRITE |
IB_ACCESS_REMOTE_READ;
goto register_setup;
case RPCRDMA_MTHCAFMR:
+ ia->ri_ops = &rpcrdma_fmr_memreg_ops;
if (ia->ri_have_dma_lkey)
break;
mem_priv = IB_ACCESS_LOCAL_WRITE;
@@ -676,8 +679,8 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
rc = -ENOMEM;
goto out3;
}
- dprintk("RPC: %s: memory registration strategy is %d\n",
- __func__, memreg);
+ dprintk("RPC: %s: memory registration strategy is '%s'\n",
+ __func__, ia->ri_ops->ro_displayname);

/* Else will do memory reg/dereg for each chunk */
ia->ri_memreg_strategy = memreg;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index c8afd83..ef3cf4a 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -60,6 +60,7 @@
* Interface Adapter -- one per transport instance
*/
struct rpcrdma_ia {
+ const struct rpcrdma_memreg_ops *ri_ops;
rwlock_t ri_qplock;
struct rdma_cm_id *ri_id;
struct ib_pd *ri_pd;
@@ -331,6 +332,17 @@ struct rpcrdma_stats {
};

/*
+ * Per-registration mode operations
+ */
+struct rpcrdma_memreg_ops {
+ const char *ro_displayname;
+};
+
+extern const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops;
+extern const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops;
+extern const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops;
+
+/*
* RPCRDMA transport -- encapsulates the structures above for
* integration with RPC.
*


2015-03-24 20:31:36

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 07/15] xprtrdma: Add a "max_payload" op for each memreg mode

The max_payload computation is generalized to ensure that the
payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of
data segments that can be transmitted in an inline buffer.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 13 ++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 13 ++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 10 +++++++
net/sunrpc/xprtrdma/transport.c | 5 +++-
net/sunrpc/xprtrdma/verbs.c | 49 +++++++++++-------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 5 +++-
6 files changed, 59 insertions(+), 36 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index ffb7d93..eec2660 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -17,6 +17,19 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

+/* Maximum scatter/gather per FMR */
+#define RPCRDMA_MAX_FMR_SGES (64)
+
+/* FMR mode conveys up to 64 pages of payload per chunk segment.
+ */
+static size_t
+fmr_op_maxpages(struct rpcrdma_xprt *r_xprt)
+{
+ return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
+ rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES);
+}
+
const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
+ .ro_maxpages = fmr_op_maxpages,
.ro_displayname = "fmr",
};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 79173f9..73a5ac8 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -17,6 +17,19 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

+/* FRWR mode conveys a list of pages per chunk segment. The
+ * maximum length of that list is the FRWR page list depth.
+ */
+static size_t
+frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+
+ return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
+ rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth);
+}
+
const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
+ .ro_maxpages = frwr_op_maxpages,
.ro_displayname = "frwr",
};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index b0922ac..28ade19 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -19,6 +19,16 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

+/* PHYSICAL memory registration conveys one page per chunk segment.
+ */
+static size_t
+physical_op_maxpages(struct rpcrdma_xprt *r_xprt)
+{
+ return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
+ rpcrdma_max_segments(r_xprt));
+}
+
const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
+ .ro_maxpages = physical_op_maxpages,
.ro_displayname = "physical",
};
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 97f6562..da71a24 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -406,7 +406,10 @@ xprt_setup_rdma(struct xprt_create *args)
xprt_rdma_connect_worker);

xprt_rdma_format_addresses(xprt);
- xprt->max_payload = rpcrdma_max_payload(new_xprt);
+ xprt->max_payload = new_xprt->rx_ia.ri_ops->ro_maxpages(new_xprt);
+ if (xprt->max_payload == 0)
+ goto out4;
+ xprt->max_payload <<= PAGE_SHIFT;
dprintk("RPC: %s: transport data payload maximum: %zu bytes\n",
__func__, xprt->max_payload);

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index c3319e1..da55cda 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -2212,43 +2212,24 @@ rpcrdma_ep_post_recv(struct rpcrdma_ia *ia,
return rc;
}

-/* Physical mapping means one Read/Write list entry per-page.
- * All list entries must fit within an inline buffer
- *
- * NB: The server must return a Write list for NFS READ,
- * which has the same constraint. Factor in the inline
- * rsize as well.
+/* How many chunk list items fit within our inline buffers?
*/
-static size_t
-rpcrdma_physical_max_payload(struct rpcrdma_xprt *r_xprt)
+unsigned int
+rpcrdma_max_segments(struct rpcrdma_xprt *r_xprt)
{
struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
- unsigned int inline_size, pages;
-
- inline_size = min_t(unsigned int,
- cdata->inline_wsize, cdata->inline_rsize);
- inline_size -= RPCRDMA_HDRLEN_MIN;
- pages = inline_size / sizeof(struct rpcrdma_segment);
- return pages << PAGE_SHIFT;
-}
+ int bytes, segments;

-static size_t
-rpcrdma_mr_max_payload(struct rpcrdma_xprt *r_xprt)
-{
- return RPCRDMA_MAX_DATA_SEGS << PAGE_SHIFT;
-}
-
-size_t
-rpcrdma_max_payload(struct rpcrdma_xprt *r_xprt)
-{
- size_t result;
-
- switch (r_xprt->rx_ia.ri_memreg_strategy) {
- case RPCRDMA_ALLPHYSICAL:
- result = rpcrdma_physical_max_payload(r_xprt);
- break;
- default:
- result = rpcrdma_mr_max_payload(r_xprt);
+ bytes = min_t(unsigned int, cdata->inline_wsize, cdata->inline_rsize);
+ bytes -= RPCRDMA_HDRLEN_MIN;
+ if (bytes < sizeof(struct rpcrdma_segment) * 2) {
+ pr_warn("RPC: %s: inline threshold too small\n",
+ __func__);
+ return 0;
}
- return result;
+
+ segments = 1 << (fls(bytes / sizeof(struct rpcrdma_segment)) - 1);
+ dprintk("RPC: %s: max chunk list size = %d segments\n",
+ __func__, segments);
+ return segments;
}
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index ef3cf4a..59e627e 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -334,7 +334,9 @@ struct rpcrdma_stats {
/*
* Per-registration mode operations
*/
+struct rpcrdma_xprt;
struct rpcrdma_memreg_ops {
+ size_t (*ro_maxpages)(struct rpcrdma_xprt *);
const char *ro_displayname;
};

@@ -411,6 +413,8 @@ struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(struct rpcrdma_ia *,
void rpcrdma_free_regbuf(struct rpcrdma_ia *,
struct rpcrdma_regbuf *);

+unsigned int rpcrdma_max_segments(struct rpcrdma_xprt *);
+
/*
* RPC/RDMA connection management calls - xprtrdma/rpc_rdma.c
*/
@@ -422,7 +426,6 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *);
* RPC/RDMA protocol calls - xprtrdma/rpc_rdma.c
*/
int rpcrdma_marshal_req(struct rpc_rqst *);
-size_t rpcrdma_max_payload(struct rpcrdma_xprt *);

/* Temporary NFS request map cache. Created in svc_rdma.c */
extern struct kmem_cache *svc_rdma_map_cachep;


2015-03-24 20:31:46

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 08/15] xprtrdma: Add a "register_external" op for each memreg mode

There is very little common processing among the different external
memory registration functions. Have rpcrdma_create_chunks() call
the registration method directly. This removes a stack frame and a
switch statement from the external registration path.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 51 +++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 82 ++++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 17 ++++
net/sunrpc/xprtrdma/rpc_rdma.c | 5 +
net/sunrpc/xprtrdma/verbs.c | 168 +-----------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 6 +
6 files changed, 160 insertions(+), 169 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index eec2660..45fb646 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -29,7 +29,58 @@ fmr_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES);
}

+/* Use the ib_map_phys_fmr() verb to register a memory region
+ * for remote access via RDMA READ or RDMA WRITE.
+ */
+static int
+fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
+ int nsegs, bool writing)
+{
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct rpcrdma_mr_seg *seg1 = seg;
+ struct rpcrdma_mw *mw = seg1->rl_mw;
+ u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
+ int len, pageoff, i, rc;
+
+ pageoff = offset_in_page(seg1->mr_offset);
+ seg1->mr_offset -= pageoff; /* start of page */
+ seg1->mr_len += pageoff;
+ len = -pageoff;
+ if (nsegs > RPCRDMA_MAX_FMR_SGES)
+ nsegs = RPCRDMA_MAX_FMR_SGES;
+ for (i = 0; i < nsegs;) {
+ rpcrdma_map_one(ia, seg, writing);
+ physaddrs[i] = seg->mr_dma;
+ len += seg->mr_len;
+ ++seg;
+ ++i;
+ /* Check for holes */
+ if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
+ offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
+ break;
+ }
+
+ rc = ib_map_phys_fmr(mw->r.fmr, physaddrs, i, seg1->mr_dma);
+ if (rc)
+ goto out_maperr;
+
+ seg1->mr_rkey = mw->r.fmr->rkey;
+ seg1->mr_base = seg1->mr_dma + pageoff;
+ seg1->mr_nsegs = i;
+ seg1->mr_len = len;
+ return i;
+
+out_maperr:
+ dprintk("RPC: %s: ib_map_phys_fmr %u@0x%llx+%i (%d) status %i\n",
+ __func__, len, (unsigned long long)seg1->mr_dma,
+ pageoff, i, rc);
+ while (i--)
+ rpcrdma_unmap_one(ia, --seg);
+ return rc;
+}
+
const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
+ .ro_map = fmr_op_map,
.ro_maxpages = fmr_op_maxpages,
.ro_displayname = "fmr",
};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 73a5ac8..23e4d99 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -29,7 +29,89 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth);
}

+/* Post a FAST_REG Work Request to register a memory region
+ * for remote access via RDMA READ or RDMA WRITE.
+ */
+static int
+frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
+ int nsegs, bool writing)
+{
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct rpcrdma_mr_seg *seg1 = seg;
+ struct rpcrdma_mw *mw = seg1->rl_mw;
+ struct rpcrdma_frmr *frmr = &mw->r.frmr;
+ struct ib_mr *mr = frmr->fr_mr;
+ struct ib_send_wr fastreg_wr, *bad_wr;
+ u8 key;
+ int len, pageoff;
+ int i, rc;
+ int seg_len;
+ u64 pa;
+ int page_no;
+
+ pageoff = offset_in_page(seg1->mr_offset);
+ seg1->mr_offset -= pageoff; /* start of page */
+ seg1->mr_len += pageoff;
+ len = -pageoff;
+ if (nsegs > ia->ri_max_frmr_depth)
+ nsegs = ia->ri_max_frmr_depth;
+ for (page_no = i = 0; i < nsegs;) {
+ rpcrdma_map_one(ia, seg, writing);
+ pa = seg->mr_dma;
+ for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
+ frmr->fr_pgl->page_list[page_no++] = pa;
+ pa += PAGE_SIZE;
+ }
+ len += seg->mr_len;
+ ++seg;
+ ++i;
+ /* Check for holes */
+ if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
+ offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
+ break;
+ }
+ dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n",
+ __func__, mw, i, len);
+
+ frmr->fr_state = FRMR_IS_VALID;
+
+ memset(&fastreg_wr, 0, sizeof(fastreg_wr));
+ fastreg_wr.wr_id = (unsigned long)(void *)mw;
+ fastreg_wr.opcode = IB_WR_FAST_REG_MR;
+ fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma + pageoff;
+ fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl;
+ fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
+ fastreg_wr.wr.fast_reg.page_list_len = page_no;
+ fastreg_wr.wr.fast_reg.length = len;
+ fastreg_wr.wr.fast_reg.access_flags = writing ?
+ IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
+ IB_ACCESS_REMOTE_READ;
+ key = (u8)(mr->rkey & 0x000000FF);
+ ib_update_fast_reg_key(mr, ++key);
+ fastreg_wr.wr.fast_reg.rkey = mr->rkey;
+
+ DECR_CQCOUNT(&r_xprt->rx_ep);
+ rc = ib_post_send(ia->ri_id->qp, &fastreg_wr, &bad_wr);
+ if (rc)
+ goto out_senderr;
+
+ seg1->mr_rkey = mr->rkey;
+ seg1->mr_base = seg1->mr_dma + pageoff;
+ seg1->mr_nsegs = i;
+ seg1->mr_len = len;
+ return i;
+
+out_senderr:
+ dprintk("RPC: %s: ib_post_send status %i\n", __func__, rc);
+ ib_update_fast_reg_key(mr, --key);
+ frmr->fr_state = FRMR_IS_INVALID;
+ while (i--)
+ rpcrdma_unmap_one(ia, --seg);
+ return rc;
+}
+
const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
+ .ro_map = frwr_op_map,
.ro_maxpages = frwr_op_maxpages,
.ro_displayname = "frwr",
};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index 28ade19..5a284ee 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -28,7 +28,24 @@ physical_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt));
}

+/* The client's physical memory is already exposed for
+ * remote access via RDMA READ or RDMA WRITE.
+ */
+static int
+physical_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
+ int nsegs, bool writing)
+{
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+
+ rpcrdma_map_one(ia, seg, writing);
+ seg->mr_rkey = ia->ri_bind_mem->rkey;
+ seg->mr_base = seg->mr_dma;
+ seg->mr_nsegs = 1;
+ return 1;
+}
+
const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
+ .ro_map = physical_op_map,
.ro_maxpages = physical_op_maxpages,
.ro_displayname = "physical",
};
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 41456d9..6ab1d03 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -187,6 +187,7 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target,
struct rpcrdma_write_array *warray = NULL;
struct rpcrdma_write_chunk *cur_wchunk = NULL;
__be32 *iptr = headerp->rm_body.rm_chunks;
+ int (*map)(struct rpcrdma_xprt *, struct rpcrdma_mr_seg *, int, bool);

if (type == rpcrdma_readch || type == rpcrdma_areadch) {
/* a read chunk - server will RDMA Read our memory */
@@ -209,9 +210,9 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target,
if (nsegs < 0)
return nsegs;

+ map = r_xprt->rx_ia.ri_ops->ro_map;
do {
- n = rpcrdma_register_external(seg, nsegs,
- cur_wchunk != NULL, r_xprt);
+ n = map(r_xprt, seg, nsegs, cur_wchunk != NULL);
if (n <= 0)
goto out;
if (cur_rchunk) { /* read */
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index da55cda..4318c04 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1858,8 +1858,8 @@ rpcrdma_free_regbuf(struct rpcrdma_ia *ia, struct rpcrdma_regbuf *rb)
* Wrappers for chunk registration, shared by read/write chunk code.
*/

-static void
-rpcrdma_map_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg, int writing)
+void
+rpcrdma_map_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg, bool writing)
{
seg->mr_dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
seg->mr_dmalen = seg->mr_len;
@@ -1879,7 +1879,7 @@ rpcrdma_map_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg, int writing)
}
}

-static void
+void
rpcrdma_unmap_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg)
{
if (seg->mr_page)
@@ -1891,89 +1891,6 @@ rpcrdma_unmap_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg)
}

static int
-rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
- int *nsegs, int writing, struct rpcrdma_ia *ia,
- struct rpcrdma_xprt *r_xprt)
-{
- struct rpcrdma_mr_seg *seg1 = seg;
- struct rpcrdma_mw *mw = seg1->rl_mw;
- struct rpcrdma_frmr *frmr = &mw->r.frmr;
- struct ib_mr *mr = frmr->fr_mr;
- struct ib_send_wr fastreg_wr, *bad_wr;
- u8 key;
- int len, pageoff;
- int i, rc;
- int seg_len;
- u64 pa;
- int page_no;
-
- pageoff = offset_in_page(seg1->mr_offset);
- seg1->mr_offset -= pageoff; /* start of page */
- seg1->mr_len += pageoff;
- len = -pageoff;
- if (*nsegs > ia->ri_max_frmr_depth)
- *nsegs = ia->ri_max_frmr_depth;
- for (page_no = i = 0; i < *nsegs;) {
- rpcrdma_map_one(ia, seg, writing);
- pa = seg->mr_dma;
- for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
- frmr->fr_pgl->page_list[page_no++] = pa;
- pa += PAGE_SIZE;
- }
- len += seg->mr_len;
- ++seg;
- ++i;
- /* Check for holes */
- if ((i < *nsegs && offset_in_page(seg->mr_offset)) ||
- offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
- break;
- }
- dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n",
- __func__, mw, i, len);
-
- frmr->fr_state = FRMR_IS_VALID;
-
- memset(&fastreg_wr, 0, sizeof(fastreg_wr));
- fastreg_wr.wr_id = (unsigned long)(void *)mw;
- fastreg_wr.opcode = IB_WR_FAST_REG_MR;
- fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma + pageoff;
- fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl;
- fastreg_wr.wr.fast_reg.page_list_len = page_no;
- fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
- fastreg_wr.wr.fast_reg.length = len;
-
- /* Bump the key */
- key = (u8)(mr->rkey & 0x000000FF);
- ib_update_fast_reg_key(mr, ++key);
-
- fastreg_wr.wr.fast_reg.access_flags = (writing ?
- IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
- IB_ACCESS_REMOTE_READ);
- fastreg_wr.wr.fast_reg.rkey = mr->rkey;
- DECR_CQCOUNT(&r_xprt->rx_ep);
-
- rc = ib_post_send(ia->ri_id->qp, &fastreg_wr, &bad_wr);
- if (rc) {
- dprintk("RPC: %s: failed ib_post_send for register,"
- " status %i\n", __func__, rc);
- ib_update_fast_reg_key(mr, --key);
- goto out_err;
- } else {
- seg1->mr_rkey = mr->rkey;
- seg1->mr_base = seg1->mr_dma + pageoff;
- seg1->mr_nsegs = i;
- seg1->mr_len = len;
- }
- *nsegs = i;
- return 0;
-out_err:
- frmr->fr_state = FRMR_IS_INVALID;
- while (i--)
- rpcrdma_unmap_one(ia, --seg);
- return rc;
-}
-
-static int
rpcrdma_deregister_frmr_external(struct rpcrdma_mr_seg *seg,
struct rpcrdma_ia *ia, struct rpcrdma_xprt *r_xprt)
{
@@ -2004,49 +1921,6 @@ rpcrdma_deregister_frmr_external(struct rpcrdma_mr_seg *seg,
}

static int
-rpcrdma_register_fmr_external(struct rpcrdma_mr_seg *seg,
- int *nsegs, int writing, struct rpcrdma_ia *ia)
-{
- struct rpcrdma_mr_seg *seg1 = seg;
- u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
- int len, pageoff, i, rc;
-
- pageoff = offset_in_page(seg1->mr_offset);
- seg1->mr_offset -= pageoff; /* start of page */
- seg1->mr_len += pageoff;
- len = -pageoff;
- if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
- *nsegs = RPCRDMA_MAX_DATA_SEGS;
- for (i = 0; i < *nsegs;) {
- rpcrdma_map_one(ia, seg, writing);
- physaddrs[i] = seg->mr_dma;
- len += seg->mr_len;
- ++seg;
- ++i;
- /* Check for holes */
- if ((i < *nsegs && offset_in_page(seg->mr_offset)) ||
- offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
- break;
- }
- rc = ib_map_phys_fmr(seg1->rl_mw->r.fmr, physaddrs, i, seg1->mr_dma);
- if (rc) {
- dprintk("RPC: %s: failed ib_map_phys_fmr "
- "%u@0x%llx+%i (%d)... status %i\n", __func__,
- len, (unsigned long long)seg1->mr_dma,
- pageoff, i, rc);
- while (i--)
- rpcrdma_unmap_one(ia, --seg);
- } else {
- seg1->mr_rkey = seg1->rl_mw->r.fmr->rkey;
- seg1->mr_base = seg1->mr_dma + pageoff;
- seg1->mr_nsegs = i;
- seg1->mr_len = len;
- }
- *nsegs = i;
- return rc;
-}
-
-static int
rpcrdma_deregister_fmr_external(struct rpcrdma_mr_seg *seg,
struct rpcrdma_ia *ia)
{
@@ -2067,42 +1941,6 @@ rpcrdma_deregister_fmr_external(struct rpcrdma_mr_seg *seg,
}

int
-rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
- int nsegs, int writing, struct rpcrdma_xprt *r_xprt)
-{
- struct rpcrdma_ia *ia = &r_xprt->rx_ia;
- int rc = 0;
-
- switch (ia->ri_memreg_strategy) {
-
- case RPCRDMA_ALLPHYSICAL:
- rpcrdma_map_one(ia, seg, writing);
- seg->mr_rkey = ia->ri_bind_mem->rkey;
- seg->mr_base = seg->mr_dma;
- seg->mr_nsegs = 1;
- nsegs = 1;
- break;
-
- /* Registration using frmr registration */
- case RPCRDMA_FRMR:
- rc = rpcrdma_register_frmr_external(seg, &nsegs, writing, ia, r_xprt);
- break;
-
- /* Registration using fmr memory registration */
- case RPCRDMA_MTHCAFMR:
- rc = rpcrdma_register_fmr_external(seg, &nsegs, writing, ia);
- break;
-
- default:
- return -EIO;
- }
- if (rc)
- return rc;
-
- return nsegs;
-}
-
-int
rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
struct rpcrdma_xprt *r_xprt)
{
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 59e627e..7bf077b 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -336,6 +336,8 @@ struct rpcrdma_stats {
*/
struct rpcrdma_xprt;
struct rpcrdma_memreg_ops {
+ int (*ro_map)(struct rpcrdma_xprt *,
+ struct rpcrdma_mr_seg *, int, bool);
size_t (*ro_maxpages)(struct rpcrdma_xprt *);
const char *ro_displayname;
};
@@ -403,8 +405,6 @@ void rpcrdma_buffer_put(struct rpcrdma_req *);
void rpcrdma_recv_buffer_get(struct rpcrdma_req *);
void rpcrdma_recv_buffer_put(struct rpcrdma_rep *);

-int rpcrdma_register_external(struct rpcrdma_mr_seg *,
- int, int, struct rpcrdma_xprt *);
int rpcrdma_deregister_external(struct rpcrdma_mr_seg *,
struct rpcrdma_xprt *);

@@ -414,6 +414,8 @@ void rpcrdma_free_regbuf(struct rpcrdma_ia *,
struct rpcrdma_regbuf *);

unsigned int rpcrdma_max_segments(struct rpcrdma_xprt *);
+void rpcrdma_map_one(struct rpcrdma_ia *, struct rpcrdma_mr_seg *, bool);
+void rpcrdma_unmap_one(struct rpcrdma_ia *, struct rpcrdma_mr_seg *);

/*
* RPC/RDMA connection management calls - xprtrdma/rpc_rdma.c


2015-03-24 20:32:05

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 10/15] xprtrdma: Add "init MRs" memreg op

This method is used when setting up a new transport instance to
create a pool of Memory Region objects that will be used to register
memory during operation.

Memory Regions are not needed for "physical" registration, since
->prepare and ->release are no-ops for that mode.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 42 +++++++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 66 +++++++++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 7 ++
net/sunrpc/xprtrdma/verbs.c | 104 +-----------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 1
5 files changed, 119 insertions(+), 101 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 888aa10..825ce96 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -29,6 +29,47 @@ fmr_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES);
}

+static int
+fmr_op_init(struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
+ int mr_access_flags = IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_READ;
+ struct ib_fmr_attr fmr_attr = {
+ .max_pages = RPCRDMA_MAX_FMR_SGES,
+ .max_maps = 1,
+ .page_shift = PAGE_SHIFT
+ };
+ struct ib_pd *pd = r_xprt->rx_ia.ri_pd;
+ struct rpcrdma_mw *r;
+ int i, rc;
+
+ INIT_LIST_HEAD(&buf->rb_mws);
+ INIT_LIST_HEAD(&buf->rb_all);
+
+ i = (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS;
+ dprintk("RPC: %s: initalizing %d FMRs\n", __func__, i);
+
+ while (i--) {
+ r = kzalloc(sizeof(*r), GFP_KERNEL);
+ if (!r)
+ return -ENOMEM;
+
+ r->r.fmr = ib_alloc_fmr(pd, mr_access_flags, &fmr_attr);
+ if (IS_ERR(r->r.fmr))
+ goto out_fmr_err;
+
+ list_add(&r->mw_list, &buf->rb_mws);
+ list_add(&r->mw_all, &buf->rb_all);
+ }
+ return 0;
+
+out_fmr_err:
+ rc = PTR_ERR(r->r.fmr);
+ dprintk("RPC: %s: ib_alloc_fmr status %i\n", __func__, rc);
+ kfree(r);
+ return rc;
+}
+
/* Use the ib_map_phys_fmr() verb to register a memory region
* for remote access via RDMA READ or RDMA WRITE.
*/
@@ -109,5 +150,6 @@ const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
.ro_unmap = fmr_op_unmap,
.ro_maxpages = fmr_op_maxpages,
+ .ro_init = fmr_op_init,
.ro_displayname = "fmr",
};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 35b725b..9168c15 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -17,6 +17,35 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

+static int
+__frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device,
+ unsigned int depth)
+{
+ struct rpcrdma_frmr *f = &r->r.frmr;
+ int rc;
+
+ f->fr_mr = ib_alloc_fast_reg_mr(pd, depth);
+ if (IS_ERR(f->fr_mr))
+ goto out_mr_err;
+ f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth);
+ if (IS_ERR(f->fr_pgl))
+ goto out_list_err;
+ return 0;
+
+out_mr_err:
+ rc = PTR_ERR(f->fr_mr);
+ dprintk("RPC: %s: ib_alloc_fast_reg_mr status %i\n",
+ __func__, rc);
+ return rc;
+
+out_list_err:
+ rc = PTR_ERR(f->fr_pgl);
+ dprintk("RPC: %s: ib_alloc_fast_reg_page_list status %i\n",
+ __func__, rc);
+ ib_dereg_mr(f->fr_mr);
+ return rc;
+}
+
/* FRWR mode conveys a list of pages per chunk segment. The
* maximum length of that list is the FRWR page list depth.
*/
@@ -29,6 +58,42 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth);
}

+static int
+frwr_op_init(struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
+ struct ib_device *device = r_xprt->rx_ia.ri_id->device;
+ unsigned int depth = r_xprt->rx_ia.ri_max_frmr_depth;
+ struct ib_pd *pd = r_xprt->rx_ia.ri_pd;
+ int i;
+
+ INIT_LIST_HEAD(&buf->rb_mws);
+ INIT_LIST_HEAD(&buf->rb_all);
+
+ i = (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS;
+ dprintk("RPC: %s: initalizing %d FRMRs\n", __func__, i);
+
+ while (i--) {
+ struct rpcrdma_mw *r;
+ int rc;
+
+ r = kzalloc(sizeof(*r), GFP_KERNEL);
+ if (!r)
+ return -ENOMEM;
+
+ rc = __frwr_init(r, pd, device, depth);
+ if (rc) {
+ kfree(r);
+ return rc;
+ }
+
+ list_add(&r->mw_list, &buf->rb_mws);
+ list_add(&r->mw_all, &buf->rb_all);
+ }
+
+ return 0;
+}
+
/* Post a FAST_REG Work Request to register a memory region
* for remote access via RDMA READ or RDMA WRITE.
*/
@@ -149,5 +214,6 @@ const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
.ro_unmap = frwr_op_unmap,
.ro_maxpages = frwr_op_maxpages,
+ .ro_init = frwr_op_init,
.ro_displayname = "frwr",
};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index 5b5a63a..c372051 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -28,6 +28,12 @@ physical_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt));
}

+static int
+physical_op_init(struct rpcrdma_xprt *r_xprt)
+{
+ return 0;
+}
+
/* The client's physical memory is already exposed for
* remote access via RDMA READ or RDMA WRITE.
*/
@@ -57,5 +63,6 @@ const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
.ro_unmap = physical_op_unmap,
.ro_maxpages = physical_op_maxpages,
+ .ro_init = physical_op_init,
.ro_displayname = "physical",
};
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index b167c99..e89a57d 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1124,91 +1124,6 @@ out:
return ERR_PTR(rc);
}

-static int
-rpcrdma_init_fmrs(struct rpcrdma_ia *ia, struct rpcrdma_buffer *buf)
-{
- int mr_access_flags = IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_READ;
- struct ib_fmr_attr fmr_attr = {
- .max_pages = RPCRDMA_MAX_DATA_SEGS,
- .max_maps = 1,
- .page_shift = PAGE_SHIFT
- };
- struct rpcrdma_mw *r;
- int i, rc;
-
- i = (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS;
- dprintk("RPC: %s: initalizing %d FMRs\n", __func__, i);
-
- while (i--) {
- r = kzalloc(sizeof(*r), GFP_KERNEL);
- if (r == NULL)
- return -ENOMEM;
-
- r->r.fmr = ib_alloc_fmr(ia->ri_pd, mr_access_flags, &fmr_attr);
- if (IS_ERR(r->r.fmr)) {
- rc = PTR_ERR(r->r.fmr);
- dprintk("RPC: %s: ib_alloc_fmr failed %i\n",
- __func__, rc);
- goto out_free;
- }
-
- list_add(&r->mw_list, &buf->rb_mws);
- list_add(&r->mw_all, &buf->rb_all);
- }
- return 0;
-
-out_free:
- kfree(r);
- return rc;
-}
-
-static int
-rpcrdma_init_frmrs(struct rpcrdma_ia *ia, struct rpcrdma_buffer *buf)
-{
- struct rpcrdma_frmr *f;
- struct rpcrdma_mw *r;
- int i, rc;
-
- i = (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS;
- dprintk("RPC: %s: initalizing %d FRMRs\n", __func__, i);
-
- while (i--) {
- r = kzalloc(sizeof(*r), GFP_KERNEL);
- if (r == NULL)
- return -ENOMEM;
- f = &r->r.frmr;
-
- f->fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
- ia->ri_max_frmr_depth);
- if (IS_ERR(f->fr_mr)) {
- rc = PTR_ERR(f->fr_mr);
- dprintk("RPC: %s: ib_alloc_fast_reg_mr "
- "failed %i\n", __func__, rc);
- goto out_free;
- }
-
- f->fr_pgl = ib_alloc_fast_reg_page_list(ia->ri_id->device,
- ia->ri_max_frmr_depth);
- if (IS_ERR(f->fr_pgl)) {
- rc = PTR_ERR(f->fr_pgl);
- dprintk("RPC: %s: ib_alloc_fast_reg_page_list "
- "failed %i\n", __func__, rc);
-
- ib_dereg_mr(f->fr_mr);
- goto out_free;
- }
-
- list_add(&r->mw_list, &buf->rb_mws);
- list_add(&r->mw_all, &buf->rb_all);
- }
-
- return 0;
-
-out_free:
- kfree(r);
- return rc;
-}
-
int
rpcrdma_buffer_create(struct rpcrdma_xprt *r_xprt)
{
@@ -1245,22 +1160,9 @@ rpcrdma_buffer_create(struct rpcrdma_xprt *r_xprt)
buf->rb_recv_bufs = (struct rpcrdma_rep **) p;
p = (char *) &buf->rb_recv_bufs[buf->rb_max_requests];

- INIT_LIST_HEAD(&buf->rb_mws);
- INIT_LIST_HEAD(&buf->rb_all);
- switch (ia->ri_memreg_strategy) {
- case RPCRDMA_FRMR:
- rc = rpcrdma_init_frmrs(ia, buf);
- if (rc)
- goto out;
- break;
- case RPCRDMA_MTHCAFMR:
- rc = rpcrdma_init_fmrs(ia, buf);
- if (rc)
- goto out;
- break;
- default:
- break;
- }
+ rc = ia->ri_ops->ro_init(r_xprt);
+ if (rc)
+ goto out;

for (i = 0; i < buf->rb_max_requests; i++) {
struct rpcrdma_req *req;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 9a727f9..90b60fe 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -341,6 +341,7 @@ struct rpcrdma_memreg_ops {
int (*ro_unmap)(struct rpcrdma_xprt *,
struct rpcrdma_mr_seg *);
size_t (*ro_maxpages)(struct rpcrdma_xprt *);
+ int (*ro_init)(struct rpcrdma_xprt *);
const char *ro_displayname;
};



2015-03-24 20:32:13

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 11/15] xprtrdma: Add "reset MRs" memreg op

This method is invoked when a transport instance is about to be
reconnected. Each Memory Region object is reset to its initial
state.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 23 ++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 51 ++++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 6 ++
net/sunrpc/xprtrdma/verbs.c | 103 +-----------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 1
5 files changed, 83 insertions(+), 101 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 825ce96..93261b0 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -146,10 +146,33 @@ out_err:
return nsegs;
}

+/* After a disconnect, unmap all FMRs.
+ *
+ * This is invoked only in the transport connect worker in order
+ * to serialize with rpcrdma_register_fmr_external().
+ */
+static void
+fmr_op_reset(struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
+ struct rpcrdma_mw *r;
+ LIST_HEAD(list);
+ int rc;
+
+ list_for_each_entry(r, &buf->rb_all, mw_all)
+ list_add(&r->r.fmr->list, &list);
+
+ rc = ib_unmap_fmr(&list);
+ if (rc)
+ dprintk("RPC: %s: ib_unmap_fmr failed %i\n",
+ __func__, rc);
+}
+
const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
.ro_unmap = fmr_op_unmap,
.ro_maxpages = fmr_op_maxpages,
.ro_init = fmr_op_init,
+ .ro_reset = fmr_op_reset,
.ro_displayname = "fmr",
};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 9168c15..c2bb29d 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -46,6 +46,18 @@ out_list_err:
return rc;
}

+static void
+__frwr_release(struct rpcrdma_mw *r)
+{
+ int rc;
+
+ rc = ib_dereg_mr(r->r.frmr.fr_mr);
+ if (rc)
+ dprintk("RPC: %s: ib_dereg_mr status %i\n",
+ __func__, rc);
+ ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
+}
+
/* FRWR mode conveys a list of pages per chunk segment. The
* maximum length of that list is the FRWR page list depth.
*/
@@ -210,10 +222,49 @@ out_err:
return nsegs;
}

+/* After a disconnect, a flushed FAST_REG_MR can leave an FRMR in
+ * an unusable state. Find FRMRs in this state and dereg / reg
+ * each. FRMRs that are VALID and attached to an rpcrdma_req are
+ * also torn down.
+ *
+ * This gives all in-use FRMRs a fresh rkey and leaves them INVALID.
+ *
+ * This is invoked only in the transport connect worker in order
+ * to serialize with rpcrdma_register_frmr_external().
+ */
+static void
+frwr_op_reset(struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
+ struct ib_device *device = r_xprt->rx_ia.ri_id->device;
+ unsigned int depth = r_xprt->rx_ia.ri_max_frmr_depth;
+ struct ib_pd *pd = r_xprt->rx_ia.ri_pd;
+ struct rpcrdma_mw *r;
+ int rc;
+
+ list_for_each_entry(r, &buf->rb_all, mw_all) {
+ if (r->r.frmr.fr_state == FRMR_IS_INVALID)
+ continue;
+
+ __frwr_release(r);
+ rc = __frwr_init(r, pd, device, depth);
+ if (rc) {
+ dprintk("RPC: %s: mw %p left %s\n",
+ __func__, r,
+ (r->r.frmr.fr_state == FRMR_IS_STALE ?
+ "stale" : "valid"));
+ continue;
+ }
+
+ r->r.frmr.fr_state = FRMR_IS_INVALID;
+ }
+}
+
const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
.ro_unmap = frwr_op_unmap,
.ro_maxpages = frwr_op_maxpages,
.ro_init = frwr_op_init,
+ .ro_reset = frwr_op_reset,
.ro_displayname = "frwr",
};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index c372051..e060713 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -59,10 +59,16 @@ physical_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
return 1;
}

+static void
+physical_op_reset(struct rpcrdma_xprt *r_xprt)
+{
+}
+
const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
.ro_unmap = physical_op_unmap,
.ro_maxpages = physical_op_maxpages,
.ro_init = physical_op_init,
+ .ro_reset = physical_op_reset,
.ro_displayname = "physical",
};
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index e89a57d..1b2c1f4 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -63,9 +63,6 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

-static void rpcrdma_reset_frmrs(struct rpcrdma_ia *);
-static void rpcrdma_reset_fmrs(struct rpcrdma_ia *);
-
/*
* internal functions
*/
@@ -945,21 +942,9 @@ retry:
rpcrdma_ep_disconnect(ep, ia);
rpcrdma_flush_cqs(ep);

- switch (ia->ri_memreg_strategy) {
- case RPCRDMA_FRMR:
- rpcrdma_reset_frmrs(ia);
- break;
- case RPCRDMA_MTHCAFMR:
- rpcrdma_reset_fmrs(ia);
- break;
- case RPCRDMA_ALLPHYSICAL:
- break;
- default:
- rc = -EIO;
- goto out;
- }
-
xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
+ ia->ri_ops->ro_reset(xprt);
+
id = rpcrdma_create_id(xprt, ia,
(struct sockaddr *)&xprt->rx_data.addr);
if (IS_ERR(id)) {
@@ -1289,90 +1274,6 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
kfree(buf->rb_pool);
}

-/* After a disconnect, unmap all FMRs.
- *
- * This is invoked only in the transport connect worker in order
- * to serialize with rpcrdma_register_fmr_external().
- */
-static void
-rpcrdma_reset_fmrs(struct rpcrdma_ia *ia)
-{
- struct rpcrdma_xprt *r_xprt =
- container_of(ia, struct rpcrdma_xprt, rx_ia);
- struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
- struct list_head *pos;
- struct rpcrdma_mw *r;
- LIST_HEAD(l);
- int rc;
-
- list_for_each(pos, &buf->rb_all) {
- r = list_entry(pos, struct rpcrdma_mw, mw_all);
-
- INIT_LIST_HEAD(&l);
- list_add(&r->r.fmr->list, &l);
- rc = ib_unmap_fmr(&l);
- if (rc)
- dprintk("RPC: %s: ib_unmap_fmr failed %i\n",
- __func__, rc);
- }
-}
-
-/* After a disconnect, a flushed FAST_REG_MR can leave an FRMR in
- * an unusable state. Find FRMRs in this state and dereg / reg
- * each. FRMRs that are VALID and attached to an rpcrdma_req are
- * also torn down.
- *
- * This gives all in-use FRMRs a fresh rkey and leaves them INVALID.
- *
- * This is invoked only in the transport connect worker in order
- * to serialize with rpcrdma_register_frmr_external().
- */
-static void
-rpcrdma_reset_frmrs(struct rpcrdma_ia *ia)
-{
- struct rpcrdma_xprt *r_xprt =
- container_of(ia, struct rpcrdma_xprt, rx_ia);
- struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
- struct list_head *pos;
- struct rpcrdma_mw *r;
- int rc;
-
- list_for_each(pos, &buf->rb_all) {
- r = list_entry(pos, struct rpcrdma_mw, mw_all);
-
- if (r->r.frmr.fr_state == FRMR_IS_INVALID)
- continue;
-
- rc = ib_dereg_mr(r->r.frmr.fr_mr);
- if (rc)
- dprintk("RPC: %s: ib_dereg_mr failed %i\n",
- __func__, rc);
- ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
-
- r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
- ia->ri_max_frmr_depth);
- if (IS_ERR(r->r.frmr.fr_mr)) {
- rc = PTR_ERR(r->r.frmr.fr_mr);
- dprintk("RPC: %s: ib_alloc_fast_reg_mr"
- " failed %i\n", __func__, rc);
- continue;
- }
- r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
- ia->ri_id->device,
- ia->ri_max_frmr_depth);
- if (IS_ERR(r->r.frmr.fr_pgl)) {
- rc = PTR_ERR(r->r.frmr.fr_pgl);
- dprintk("RPC: %s: "
- "ib_alloc_fast_reg_page_list "
- "failed %i\n", __func__, rc);
-
- ib_dereg_mr(r->r.frmr.fr_mr);
- continue;
- }
- r->r.frmr.fr_state = FRMR_IS_INVALID;
- }
-}
-
/* "*mw" can be NULL when rpcrdma_buffer_get_mrs() fails, leaving
* some req segments uninitialized.
*/
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 90b60fe..0680239 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -342,6 +342,7 @@ struct rpcrdma_memreg_ops {
struct rpcrdma_mr_seg *);
size_t (*ro_maxpages)(struct rpcrdma_xprt *);
int (*ro_init)(struct rpcrdma_xprt *);
+ void (*ro_reset)(struct rpcrdma_xprt *);
const char *ro_displayname;
};



2015-03-24 20:32:22

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 12/15] xprtrdma: Add "destroy MRs" memreg op

Memory Region objects associated with a transport instance are
destroyed before the instance is shutdown and destroyed.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 18 ++++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 14 ++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 6 ++++
net/sunrpc/xprtrdma/verbs.c | 52 +-----------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 1 +
5 files changed, 40 insertions(+), 51 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 93261b0..e9ca594 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -168,11 +168,29 @@ fmr_op_reset(struct rpcrdma_xprt *r_xprt)
__func__, rc);
}

+static void
+fmr_op_destroy(struct rpcrdma_buffer *buf)
+{
+ struct rpcrdma_mw *r;
+ int rc;
+
+ while (!list_empty(&buf->rb_all)) {
+ r = list_entry(buf->rb_all.next, struct rpcrdma_mw, mw_all);
+ list_del(&r->mw_all);
+ rc = ib_dealloc_fmr(r->r.fmr);
+ if (rc)
+ dprintk("RPC: %s: ib_dealloc_fmr failed %i\n",
+ __func__, rc);
+ kfree(r);
+ }
+}
+
const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
.ro_unmap = fmr_op_unmap,
.ro_maxpages = fmr_op_maxpages,
.ro_init = fmr_op_init,
.ro_reset = fmr_op_reset,
+ .ro_destroy = fmr_op_destroy,
.ro_displayname = "fmr",
};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index c2bb29d..121e400 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -260,11 +260,25 @@ frwr_op_reset(struct rpcrdma_xprt *r_xprt)
}
}

+static void
+frwr_op_destroy(struct rpcrdma_buffer *buf)
+{
+ struct rpcrdma_mw *r;
+
+ while (!list_empty(&buf->rb_all)) {
+ r = list_entry(buf->rb_all.next, struct rpcrdma_mw, mw_all);
+ list_del(&r->mw_all);
+ __frwr_release(r);
+ kfree(r);
+ }
+}
+
const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
.ro_unmap = frwr_op_unmap,
.ro_maxpages = frwr_op_maxpages,
.ro_init = frwr_op_init,
.ro_reset = frwr_op_reset,
+ .ro_destroy = frwr_op_destroy,
.ro_displayname = "frwr",
};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index e060713..eb39011 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -64,11 +64,17 @@ physical_op_reset(struct rpcrdma_xprt *r_xprt)
{
}

+static void
+physical_op_destroy(struct rpcrdma_buffer *buf)
+{
+}
+
const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
.ro_unmap = physical_op_unmap,
.ro_maxpages = physical_op_maxpages,
.ro_init = physical_op_init,
.ro_reset = physical_op_reset,
+ .ro_destroy = physical_op_destroy,
.ro_displayname = "physical",
};
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 1b2c1f4..a7fb314 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1199,47 +1199,6 @@ rpcrdma_destroy_req(struct rpcrdma_ia *ia, struct rpcrdma_req *req)
kfree(req);
}

-static void
-rpcrdma_destroy_fmrs(struct rpcrdma_buffer *buf)
-{
- struct rpcrdma_mw *r;
- int rc;
-
- while (!list_empty(&buf->rb_all)) {
- r = list_entry(buf->rb_all.next, struct rpcrdma_mw, mw_all);
- list_del(&r->mw_all);
- list_del(&r->mw_list);
-
- rc = ib_dealloc_fmr(r->r.fmr);
- if (rc)
- dprintk("RPC: %s: ib_dealloc_fmr failed %i\n",
- __func__, rc);
-
- kfree(r);
- }
-}
-
-static void
-rpcrdma_destroy_frmrs(struct rpcrdma_buffer *buf)
-{
- struct rpcrdma_mw *r;
- int rc;
-
- while (!list_empty(&buf->rb_all)) {
- r = list_entry(buf->rb_all.next, struct rpcrdma_mw, mw_all);
- list_del(&r->mw_all);
- list_del(&r->mw_list);
-
- rc = ib_dereg_mr(r->r.frmr.fr_mr);
- if (rc)
- dprintk("RPC: %s: ib_dereg_mr failed %i\n",
- __func__, rc);
- ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
-
- kfree(r);
- }
-}
-
void
rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
{
@@ -1260,16 +1219,7 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
rpcrdma_destroy_req(ia, buf->rb_send_bufs[i]);
}

- switch (ia->ri_memreg_strategy) {
- case RPCRDMA_FRMR:
- rpcrdma_destroy_frmrs(buf);
- break;
- case RPCRDMA_MTHCAFMR:
- rpcrdma_destroy_fmrs(buf);
- break;
- default:
- break;
- }
+ ia->ri_ops->ro_destroy(buf);

kfree(buf->rb_pool);
}
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 0680239..b95e223 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -343,6 +343,7 @@ struct rpcrdma_memreg_ops {
size_t (*ro_maxpages)(struct rpcrdma_xprt *);
int (*ro_init)(struct rpcrdma_xprt *);
void (*ro_reset)(struct rpcrdma_xprt *);
+ void (*ro_destroy)(struct rpcrdma_buffer *);
const char *ro_displayname;
};



2015-03-24 20:31:55

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 09/15] xprtrdma: Add a "deregister_external" op for each memreg mode

There is very little common processing among the different external
memory deregistration functions.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 27 ++++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 36 ++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 10 ++++
net/sunrpc/xprtrdma/rpc_rdma.c | 11 +++--
net/sunrpc/xprtrdma/transport.c | 4 +-
net/sunrpc/xprtrdma/verbs.c | 81 ------------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 5 +-
7 files changed, 84 insertions(+), 90 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 45fb646..888aa10 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -79,8 +79,35 @@ out_maperr:
return rc;
}

+/* Use the ib_unmap_fmr() verb to prevent further remote
+ * access via RDMA READ or RDMA WRITE.
+ */
+static int
+fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
+{
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct rpcrdma_mr_seg *seg1 = seg;
+ int rc, nsegs = seg->mr_nsegs;
+ LIST_HEAD(l);
+
+ list_add(&seg1->rl_mw->r.fmr->list, &l);
+ rc = ib_unmap_fmr(&l);
+ read_lock(&ia->ri_qplock);
+ while (seg1->mr_nsegs--)
+ rpcrdma_unmap_one(ia, seg++);
+ read_unlock(&ia->ri_qplock);
+ if (rc)
+ goto out_err;
+ return nsegs;
+
+out_err:
+ dprintk("RPC: %s: ib_unmap_fmr status %i\n", __func__, rc);
+ return nsegs;
+}
+
const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
+ .ro_unmap = fmr_op_unmap,
.ro_maxpages = fmr_op_maxpages,
.ro_displayname = "fmr",
};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 23e4d99..35b725b 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -110,8 +110,44 @@ out_senderr:
return rc;
}

+/* Post a LOCAL_INV Work Request to prevent further remote access
+ * via RDMA READ or RDMA WRITE.
+ */
+static int
+frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
+{
+ struct rpcrdma_mr_seg *seg1 = seg;
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct ib_send_wr invalidate_wr, *bad_wr;
+ int rc, nsegs = seg->mr_nsegs;
+
+ seg1->rl_mw->r.frmr.fr_state = FRMR_IS_INVALID;
+
+ memset(&invalidate_wr, 0, sizeof(invalidate_wr));
+ invalidate_wr.wr_id = (unsigned long)(void *)seg1->rl_mw;
+ invalidate_wr.opcode = IB_WR_LOCAL_INV;
+ invalidate_wr.ex.invalidate_rkey = seg1->rl_mw->r.frmr.fr_mr->rkey;
+ DECR_CQCOUNT(&r_xprt->rx_ep);
+
+ read_lock(&ia->ri_qplock);
+ while (seg1->mr_nsegs--)
+ rpcrdma_unmap_one(ia, seg++);
+ rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
+ read_unlock(&ia->ri_qplock);
+ if (rc)
+ goto out_err;
+ return nsegs;
+
+out_err:
+ /* Force rpcrdma_buffer_get() to retry */
+ seg1->rl_mw->r.frmr.fr_state = FRMR_IS_STALE;
+ dprintk("RPC: %s: ib_post_send status %i\n", __func__, rc);
+ return nsegs;
+}
+
const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
+ .ro_unmap = frwr_op_unmap,
.ro_maxpages = frwr_op_maxpages,
.ro_displayname = "frwr",
};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index 5a284ee..5b5a63a 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -44,8 +44,18 @@ physical_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
return 1;
}

+/* Unmap a memory region, but leave it registered.
+ */
+static int
+physical_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
+{
+ rpcrdma_unmap_one(&r_xprt->rx_ia, seg);
+ return 1;
+}
+
const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
+ .ro_unmap = physical_op_unmap,
.ro_maxpages = physical_op_maxpages,
.ro_displayname = "physical",
};
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 6ab1d03..2c53ea9 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -284,11 +284,12 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target,
return (unsigned char *)iptr - (unsigned char *)headerp;

out:
- if (r_xprt->rx_ia.ri_memreg_strategy != RPCRDMA_FRMR) {
- for (pos = 0; nchunks--;)
- pos += rpcrdma_deregister_external(
- &req->rl_segments[pos], r_xprt);
- }
+ if (r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
+ return n;
+
+ for (pos = 0; nchunks--;)
+ pos += r_xprt->rx_ia.ri_ops->ro_unmap(r_xprt,
+ &req->rl_segments[pos]);
return n;
}

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index da71a24..54f23b1 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -584,8 +584,8 @@ xprt_rdma_free(void *buffer)

for (i = 0; req->rl_nchunks;) {
--req->rl_nchunks;
- i += rpcrdma_deregister_external(
- &req->rl_segments[i], r_xprt);
+ i += r_xprt->rx_ia.ri_ops->ro_unmap(r_xprt,
+ &req->rl_segments[i]);
}

rpcrdma_buffer_put(req);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 4318c04..b167c99 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1510,7 +1510,7 @@ rpcrdma_buffer_put_sendbuf(struct rpcrdma_req *req, struct rpcrdma_buffer *buf)
}
}

-/* rpcrdma_unmap_one() was already done by rpcrdma_deregister_frmr_external().
+/* rpcrdma_unmap_one() was already done during deregistration.
* Redo only the ib_post_send().
*/
static void
@@ -1890,85 +1890,6 @@ rpcrdma_unmap_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg)
seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
}

-static int
-rpcrdma_deregister_frmr_external(struct rpcrdma_mr_seg *seg,
- struct rpcrdma_ia *ia, struct rpcrdma_xprt *r_xprt)
-{
- struct rpcrdma_mr_seg *seg1 = seg;
- struct ib_send_wr invalidate_wr, *bad_wr;
- int rc;
-
- seg1->rl_mw->r.frmr.fr_state = FRMR_IS_INVALID;
-
- memset(&invalidate_wr, 0, sizeof invalidate_wr);
- invalidate_wr.wr_id = (unsigned long)(void *)seg1->rl_mw;
- invalidate_wr.opcode = IB_WR_LOCAL_INV;
- invalidate_wr.ex.invalidate_rkey = seg1->rl_mw->r.frmr.fr_mr->rkey;
- DECR_CQCOUNT(&r_xprt->rx_ep);
-
- read_lock(&ia->ri_qplock);
- while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(ia, seg++);
- rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
- read_unlock(&ia->ri_qplock);
- if (rc) {
- /* Force rpcrdma_buffer_get() to retry */
- seg1->rl_mw->r.frmr.fr_state = FRMR_IS_STALE;
- dprintk("RPC: %s: failed ib_post_send for invalidate,"
- " status %i\n", __func__, rc);
- }
- return rc;
-}
-
-static int
-rpcrdma_deregister_fmr_external(struct rpcrdma_mr_seg *seg,
- struct rpcrdma_ia *ia)
-{
- struct rpcrdma_mr_seg *seg1 = seg;
- LIST_HEAD(l);
- int rc;
-
- list_add(&seg1->rl_mw->r.fmr->list, &l);
- rc = ib_unmap_fmr(&l);
- read_lock(&ia->ri_qplock);
- while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(ia, seg++);
- read_unlock(&ia->ri_qplock);
- if (rc)
- dprintk("RPC: %s: failed ib_unmap_fmr,"
- " status %i\n", __func__, rc);
- return rc;
-}
-
-int
-rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
- struct rpcrdma_xprt *r_xprt)
-{
- struct rpcrdma_ia *ia = &r_xprt->rx_ia;
- int nsegs = seg->mr_nsegs, rc;
-
- switch (ia->ri_memreg_strategy) {
-
- case RPCRDMA_ALLPHYSICAL:
- read_lock(&ia->ri_qplock);
- rpcrdma_unmap_one(ia, seg);
- read_unlock(&ia->ri_qplock);
- break;
-
- case RPCRDMA_FRMR:
- rc = rpcrdma_deregister_frmr_external(seg, ia, r_xprt);
- break;
-
- case RPCRDMA_MTHCAFMR:
- rc = rpcrdma_deregister_fmr_external(seg, ia);
- break;
-
- default:
- break;
- }
- return nsegs;
-}
-
/*
* Prepost any receive buffer, then post send.
*
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 7bf077b..9a727f9 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -338,6 +338,8 @@ struct rpcrdma_xprt;
struct rpcrdma_memreg_ops {
int (*ro_map)(struct rpcrdma_xprt *,
struct rpcrdma_mr_seg *, int, bool);
+ int (*ro_unmap)(struct rpcrdma_xprt *,
+ struct rpcrdma_mr_seg *);
size_t (*ro_maxpages)(struct rpcrdma_xprt *);
const char *ro_displayname;
};
@@ -405,9 +407,6 @@ void rpcrdma_buffer_put(struct rpcrdma_req *);
void rpcrdma_recv_buffer_get(struct rpcrdma_req *);
void rpcrdma_recv_buffer_put(struct rpcrdma_rep *);

-int rpcrdma_deregister_external(struct rpcrdma_mr_seg *,
- struct rpcrdma_xprt *);
-
struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(struct rpcrdma_ia *,
size_t, gfp_t);
void rpcrdma_free_regbuf(struct rpcrdma_ia *,


2015-03-24 20:32:32

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 13/15] xprtrdma: Add "open" memreg op

The open op determines the size of various transport data structures
based on device capabilities and memory registration mode.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 8 ++++++
net/sunrpc/xprtrdma/frwr_ops.c | 48 +++++++++++++++++++++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 8 ++++++
net/sunrpc/xprtrdma/verbs.c | 49 ++----------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 3 ++
5 files changed, 70 insertions(+), 46 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index e9ca594..e8a9837 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -20,6 +20,13 @@
/* Maximum scatter/gather per FMR */
#define RPCRDMA_MAX_FMR_SGES (64)

+static int
+fmr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
+ struct rpcrdma_create_data_internal *cdata)
+{
+ return 0;
+}
+
/* FMR mode conveys up to 64 pages of payload per chunk segment.
*/
static size_t
@@ -188,6 +195,7 @@ fmr_op_destroy(struct rpcrdma_buffer *buf)
const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
.ro_unmap = fmr_op_unmap,
+ .ro_open = fmr_op_open,
.ro_maxpages = fmr_op_maxpages,
.ro_init = fmr_op_init,
.ro_reset = fmr_op_reset,
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 121e400..e17d54d 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -58,6 +58,53 @@ __frwr_release(struct rpcrdma_mw *r)
ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
}

+static int
+frwr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
+ struct rpcrdma_create_data_internal *cdata)
+{
+ struct ib_device_attr *devattr = &ia->ri_devattr;
+ int depth, delta;
+
+ ia->ri_max_frmr_depth =
+ min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
+ devattr->max_fast_reg_page_list_len);
+ dprintk("RPC: %s: device's max FR page list len = %u\n",
+ __func__, ia->ri_max_frmr_depth);
+
+ /* Add room for frmr register and invalidate WRs.
+ * 1. FRMR reg WR for head
+ * 2. FRMR invalidate WR for head
+ * 3. N FRMR reg WRs for pagelist
+ * 4. N FRMR invalidate WRs for pagelist
+ * 5. FRMR reg WR for tail
+ * 6. FRMR invalidate WR for tail
+ * 7. The RDMA_SEND WR
+ */
+ depth = 7;
+
+ /* Calculate N if the device max FRMR depth is smaller than
+ * RPCRDMA_MAX_DATA_SEGS.
+ */
+ if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
+ delta = RPCRDMA_MAX_DATA_SEGS - ia->ri_max_frmr_depth;
+ do {
+ depth += 2; /* FRMR reg + invalidate */
+ delta -= ia->ri_max_frmr_depth;
+ } while (delta > 0);
+ }
+
+ ep->rep_attr.cap.max_send_wr *= depth;
+ if (ep->rep_attr.cap.max_send_wr > devattr->max_qp_wr) {
+ cdata->max_requests = devattr->max_qp_wr / depth;
+ if (!cdata->max_requests)
+ return -EINVAL;
+ ep->rep_attr.cap.max_send_wr = cdata->max_requests *
+ depth;
+ }
+
+ return 0;
+}
+
/* FRWR mode conveys a list of pages per chunk segment. The
* maximum length of that list is the FRWR page list depth.
*/
@@ -276,6 +323,7 @@ frwr_op_destroy(struct rpcrdma_buffer *buf)
const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
.ro_unmap = frwr_op_unmap,
+ .ro_open = frwr_op_open,
.ro_maxpages = frwr_op_maxpages,
.ro_init = frwr_op_init,
.ro_reset = frwr_op_reset,
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index eb39011..0ba130b 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -19,6 +19,13 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

+static int
+physical_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
+ struct rpcrdma_create_data_internal *cdata)
+{
+ return 0;
+}
+
/* PHYSICAL memory registration conveys one page per chunk segment.
*/
static size_t
@@ -72,6 +79,7 @@ physical_op_destroy(struct rpcrdma_buffer *buf)
const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
.ro_unmap = physical_op_unmap,
+ .ro_open = physical_op_open,
.ro_maxpages = physical_op_maxpages,
.ro_init = physical_op_init,
.ro_reset = physical_op_reset,
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index a7fb314..b697b3e 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -622,11 +622,6 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
dprintk("RPC: %s: FRMR registration "
"not supported by HCA\n", __func__);
memreg = RPCRDMA_MTHCAFMR;
- } else {
- /* Mind the ia limit on FRMR page list depth */
- ia->ri_max_frmr_depth = min_t(unsigned int,
- RPCRDMA_MAX_DATA_SEGS,
- devattr->max_fast_reg_page_list_len);
}
}
if (memreg == RPCRDMA_MTHCAFMR) {
@@ -741,49 +736,11 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,

ep->rep_attr.event_handler = rpcrdma_qp_async_error_upcall;
ep->rep_attr.qp_context = ep;
- /* send_cq and recv_cq initialized below */
ep->rep_attr.srq = NULL;
ep->rep_attr.cap.max_send_wr = cdata->max_requests;
- switch (ia->ri_memreg_strategy) {
- case RPCRDMA_FRMR: {
- int depth = 7;
-
- /* Add room for frmr register and invalidate WRs.
- * 1. FRMR reg WR for head
- * 2. FRMR invalidate WR for head
- * 3. N FRMR reg WRs for pagelist
- * 4. N FRMR invalidate WRs for pagelist
- * 5. FRMR reg WR for tail
- * 6. FRMR invalidate WR for tail
- * 7. The RDMA_SEND WR
- */
-
- /* Calculate N if the device max FRMR depth is smaller than
- * RPCRDMA_MAX_DATA_SEGS.
- */
- if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
- int delta = RPCRDMA_MAX_DATA_SEGS -
- ia->ri_max_frmr_depth;
-
- do {
- depth += 2; /* FRMR reg + invalidate */
- delta -= ia->ri_max_frmr_depth;
- } while (delta > 0);
-
- }
- ep->rep_attr.cap.max_send_wr *= depth;
- if (ep->rep_attr.cap.max_send_wr > devattr->max_qp_wr) {
- cdata->max_requests = devattr->max_qp_wr / depth;
- if (!cdata->max_requests)
- return -EINVAL;
- ep->rep_attr.cap.max_send_wr = cdata->max_requests *
- depth;
- }
- break;
- }
- default:
- break;
- }
+ rc = ia->ri_ops->ro_open(ia, ep, cdata);
+ if (rc)
+ return rc;
ep->rep_attr.cap.max_recv_wr = cdata->max_requests;
ep->rep_attr.cap.max_send_sge = (cdata->padding ? 4 : 2);
ep->rep_attr.cap.max_recv_sge = 1;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index b95e223..9036fb4 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -340,6 +340,9 @@ struct rpcrdma_memreg_ops {
struct rpcrdma_mr_seg *, int, bool);
int (*ro_unmap)(struct rpcrdma_xprt *,
struct rpcrdma_mr_seg *);
+ int (*ro_open)(struct rpcrdma_ia *,
+ struct rpcrdma_ep *,
+ struct rpcrdma_create_data_internal *);
size_t (*ro_maxpages)(struct rpcrdma_xprt *);
int (*ro_init)(struct rpcrdma_xprt *);
void (*ro_reset)(struct rpcrdma_xprt *);


2015-03-24 20:32:40

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 14/15] xprtrdma: Handle non-SEND completions via a callout

Allow each memory registration mode to plug in a callout that handles
the completion of a memory registration operation.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
---
net/sunrpc/xprtrdma/frwr_ops.c | 17 +++++++++++++++++
net/sunrpc/xprtrdma/verbs.c | 16 ++++++----------
net/sunrpc/xprtrdma/xprt_rdma.h | 5 +++++
3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index e17d54d..ea59c1b 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -117,6 +117,22 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth);
}

+/* If FAST_REG or LOCAL_INV failed, indicate the frmr needs to be reset. */
+static void
+frwr_sendcompletion(struct ib_wc *wc)
+{
+ struct rpcrdma_mw *r;
+
+ if (likely(wc->status == IB_WC_SUCCESS))
+ return;
+
+ /* WARNING: Only wr_id and status are reliable at this point */
+ r = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
+ dprintk("RPC: %s: frmr %p (stale), status %d\n",
+ __func__, r, wc->status);
+ r->r.frmr.fr_state = FRMR_IS_STALE;
+}
+
static int
frwr_op_init(struct rpcrdma_xprt *r_xprt)
{
@@ -148,6 +164,7 @@ frwr_op_init(struct rpcrdma_xprt *r_xprt)

list_add(&r->mw_list, &buf->rb_mws);
list_add(&r->mw_all, &buf->rb_all);
+ r->mw_sendcompletion = frwr_sendcompletion;
}

return 0;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index b697b3e..cac06f2 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -186,7 +186,7 @@ static const char * const wc_status[] = {
"remote access error",
"remote operation error",
"transport retry counter exceeded",
- "RNR retrycounter exceeded",
+ "RNR retry counter exceeded",
"local RDD violation error",
"remove invalid RD request",
"operation aborted",
@@ -204,21 +204,17 @@ static const char * const wc_status[] = {
static void
rpcrdma_sendcq_process_wc(struct ib_wc *wc)
{
- if (likely(wc->status == IB_WC_SUCCESS))
- return;
-
/* WARNING: Only wr_id and status are reliable at this point */
- if (wc->wr_id == 0ULL) {
- if (wc->status != IB_WC_WR_FLUSH_ERR)
+ if (wc->wr_id == RPCRDMA_IGNORE_COMPLETION) {
+ if (wc->status != IB_WC_SUCCESS &&
+ wc->status != IB_WC_WR_FLUSH_ERR)
pr_err("RPC: %s: SEND: %s\n",
__func__, COMPLETION_MSG(wc->status));
} else {
struct rpcrdma_mw *r;

r = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
- r->r.frmr.fr_state = FRMR_IS_STALE;
- pr_err("RPC: %s: frmr %p (stale): %s\n",
- __func__, r, COMPLETION_MSG(wc->status));
+ r->mw_sendcompletion(wc);
}
}

@@ -1622,7 +1618,7 @@ rpcrdma_ep_post(struct rpcrdma_ia *ia,
}

send_wr.next = NULL;
- send_wr.wr_id = 0ULL; /* no send cookie */
+ send_wr.wr_id = RPCRDMA_IGNORE_COMPLETION;
send_wr.sg_list = req->rl_send_iov;
send_wr.num_sge = req->rl_niovs;
send_wr.opcode = IB_WR_SEND;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 9036fb4..54bcbe4 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -106,6 +106,10 @@ struct rpcrdma_ep {
#define INIT_CQCOUNT(ep) atomic_set(&(ep)->rep_cqcount, (ep)->rep_cqinit)
#define DECR_CQCOUNT(ep) atomic_sub_return(1, &(ep)->rep_cqcount)

+/* Force completion handler to ignore the signal
+ */
+#define RPCRDMA_IGNORE_COMPLETION (0ULL)
+
/* Registered buffer -- registered kmalloc'd memory for RDMA SEND/RECV
*
* The below structure appears at the front of a large region of kmalloc'd
@@ -206,6 +210,7 @@ struct rpcrdma_mw {
struct ib_fmr *fmr;
struct rpcrdma_frmr frmr;
} r;
+ void (*mw_sendcompletion)(struct ib_wc *);
struct list_head mw_list;
struct list_head mw_all;
};


2015-03-24 20:32:50

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 15/15] xprtrdma: Make rpcrdma_{un}map_one() into inline functions

These functions are called in a loop for each page transferred via
RDMA READ or WRITE. Extract loop invariants and inline them to
reduce CPU overhead.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 10 ++++++--
net/sunrpc/xprtrdma/frwr_ops.c | 10 ++++++--
net/sunrpc/xprtrdma/physical_ops.c | 10 ++++++--
net/sunrpc/xprtrdma/verbs.c | 44 ++++++-----------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 45 ++++++++++++++++++++++++++++++++++--
5 files changed, 73 insertions(+), 46 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index e8a9837..a91ba2c 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -85,6 +85,8 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
int nsegs, bool writing)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct ib_device *device = ia->ri_id->device;
+ enum dma_data_direction direction = rpcrdma_data_dir(writing);
struct rpcrdma_mr_seg *seg1 = seg;
struct rpcrdma_mw *mw = seg1->rl_mw;
u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
@@ -97,7 +99,7 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
if (nsegs > RPCRDMA_MAX_FMR_SGES)
nsegs = RPCRDMA_MAX_FMR_SGES;
for (i = 0; i < nsegs;) {
- rpcrdma_map_one(ia, seg, writing);
+ rpcrdma_map_one(device, seg, direction);
physaddrs[i] = seg->mr_dma;
len += seg->mr_len;
++seg;
@@ -123,7 +125,7 @@ out_maperr:
__func__, len, (unsigned long long)seg1->mr_dma,
pageoff, i, rc);
while (i--)
- rpcrdma_unmap_one(ia, --seg);
+ rpcrdma_unmap_one(device, --seg);
return rc;
}

@@ -135,14 +137,16 @@ fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_mr_seg *seg1 = seg;
+ struct ib_device *device;
int rc, nsegs = seg->mr_nsegs;
LIST_HEAD(l);

list_add(&seg1->rl_mw->r.fmr->list, &l);
rc = ib_unmap_fmr(&l);
read_lock(&ia->ri_qplock);
+ device = ia->ri_id->device;
while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(ia, seg++);
+ rpcrdma_unmap_one(device, seg++);
read_unlock(&ia->ri_qplock);
if (rc)
goto out_err;
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index ea59c1b..0a7b9df 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -178,6 +178,8 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
int nsegs, bool writing)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct ib_device *device = ia->ri_id->device;
+ enum dma_data_direction direction = rpcrdma_data_dir(writing);
struct rpcrdma_mr_seg *seg1 = seg;
struct rpcrdma_mw *mw = seg1->rl_mw;
struct rpcrdma_frmr *frmr = &mw->r.frmr;
@@ -197,7 +199,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
if (nsegs > ia->ri_max_frmr_depth)
nsegs = ia->ri_max_frmr_depth;
for (page_no = i = 0; i < nsegs;) {
- rpcrdma_map_one(ia, seg, writing);
+ rpcrdma_map_one(device, seg, direction);
pa = seg->mr_dma;
for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
frmr->fr_pgl->page_list[page_no++] = pa;
@@ -247,7 +249,7 @@ out_senderr:
ib_update_fast_reg_key(mr, --key);
frmr->fr_state = FRMR_IS_INVALID;
while (i--)
- rpcrdma_unmap_one(ia, --seg);
+ rpcrdma_unmap_one(device, --seg);
return rc;
}

@@ -261,6 +263,7 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct ib_send_wr invalidate_wr, *bad_wr;
int rc, nsegs = seg->mr_nsegs;
+ struct ib_device *device;

seg1->rl_mw->r.frmr.fr_state = FRMR_IS_INVALID;

@@ -271,8 +274,9 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
DECR_CQCOUNT(&r_xprt->rx_ep);

read_lock(&ia->ri_qplock);
+ device = ia->ri_id->device;
while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(ia, seg++);
+ rpcrdma_unmap_one(device, seg++);
rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
read_unlock(&ia->ri_qplock);
if (rc)
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index 0ba130b..ba518af 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -50,7 +50,8 @@ physical_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;

- rpcrdma_map_one(ia, seg, writing);
+ rpcrdma_map_one(ia->ri_id->device, seg,
+ rpcrdma_data_dir(writing));
seg->mr_rkey = ia->ri_bind_mem->rkey;
seg->mr_base = seg->mr_dma;
seg->mr_nsegs = 1;
@@ -62,7 +63,12 @@ physical_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
static int
physical_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
{
- rpcrdma_unmap_one(&r_xprt->rx_ia, seg);
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+
+ read_lock(&ia->ri_qplock);
+ rpcrdma_unmap_one(ia->ri_id->device, seg);
+ read_unlock(&ia->ri_qplock);
+
return 1;
}

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index cac06f2..4870d27 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1436,6 +1436,14 @@ rpcrdma_recv_buffer_put(struct rpcrdma_rep *rep)
* Wrappers for internal-use kmalloc memory registration, used by buffer code.
*/

+void
+rpcrdma_mapping_error(struct rpcrdma_mr_seg *seg)
+{
+ dprintk("RPC: map_one: offset %p iova %llx len %zu\n",
+ seg->mr_offset,
+ (unsigned long long)seg->mr_dma, seg->mr_dmalen);
+}
+
static int
rpcrdma_register_internal(struct rpcrdma_ia *ia, void *va, int len,
struct ib_mr **mrp, struct ib_sge *iov)
@@ -1561,42 +1569,6 @@ rpcrdma_free_regbuf(struct rpcrdma_ia *ia, struct rpcrdma_regbuf *rb)
}

/*
- * Wrappers for chunk registration, shared by read/write chunk code.
- */
-
-void
-rpcrdma_map_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg, bool writing)
-{
- seg->mr_dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
- seg->mr_dmalen = seg->mr_len;
- if (seg->mr_page)
- seg->mr_dma = ib_dma_map_page(ia->ri_id->device,
- seg->mr_page, offset_in_page(seg->mr_offset),
- seg->mr_dmalen, seg->mr_dir);
- else
- seg->mr_dma = ib_dma_map_single(ia->ri_id->device,
- seg->mr_offset,
- seg->mr_dmalen, seg->mr_dir);
- if (ib_dma_mapping_error(ia->ri_id->device, seg->mr_dma)) {
- dprintk("RPC: %s: mr_dma %llx mr_offset %p mr_dma_len %zu\n",
- __func__,
- (unsigned long long)seg->mr_dma,
- seg->mr_offset, seg->mr_dmalen);
- }
-}
-
-void
-rpcrdma_unmap_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg)
-{
- if (seg->mr_page)
- ib_dma_unmap_page(ia->ri_id->device,
- seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
- else
- ib_dma_unmap_single(ia->ri_id->device,
- seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
-}
-
-/*
* Prepost any receive buffer, then post send.
*
* Receive buffer is donated to hardware, reclaimed upon recv completion.
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 54bcbe4..78e0b8b 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -424,8 +424,49 @@ void rpcrdma_free_regbuf(struct rpcrdma_ia *,
struct rpcrdma_regbuf *);

unsigned int rpcrdma_max_segments(struct rpcrdma_xprt *);
-void rpcrdma_map_one(struct rpcrdma_ia *, struct rpcrdma_mr_seg *, bool);
-void rpcrdma_unmap_one(struct rpcrdma_ia *, struct rpcrdma_mr_seg *);
+
+/*
+ * Wrappers for chunk registration, shared by read/write chunk code.
+ */
+
+void rpcrdma_mapping_error(struct rpcrdma_mr_seg *);
+
+static inline enum dma_data_direction
+rpcrdma_data_dir(bool writing)
+{
+ return writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+}
+
+static inline void
+rpcrdma_map_one(struct ib_device *device, struct rpcrdma_mr_seg *seg,
+ enum dma_data_direction direction)
+{
+ seg->mr_dir = direction;
+ seg->mr_dmalen = seg->mr_len;
+
+ if (seg->mr_page)
+ seg->mr_dma = ib_dma_map_page(device,
+ seg->mr_page, offset_in_page(seg->mr_offset),
+ seg->mr_dmalen, seg->mr_dir);
+ else
+ seg->mr_dma = ib_dma_map_single(device,
+ seg->mr_offset,
+ seg->mr_dmalen, seg->mr_dir);
+
+ if (ib_dma_mapping_error(device, seg->mr_dma))
+ rpcrdma_mapping_error(seg);
+}
+
+static inline void
+rpcrdma_unmap_one(struct ib_device *device, struct rpcrdma_mr_seg *seg)
+{
+ if (seg->mr_page)
+ ib_dma_unmap_page(device,
+ seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
+ else
+ ib_dma_unmap_single(device,
+ seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
+}

/*
* RPC/RDMA connection management calls - xprtrdma/rpc_rdma.c


2015-03-26 18:39:50

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

Hey Chuck,

I didn't see anything that needs to be fixed up in these patches. Are they ready for me?

Anna

On 03/24/2015 04:30 PM, Chuck Lever wrote:
> This is a series of client-side patches for NFS/RDMA. In preparation
> for increasing the transport credit limit and maximum rsize/wsize,
> I've re-factored the memory registration logic into separate files,
> invoked via a method API.
>
> The two main optimizations in v1 of this series have been dropped.
> Sagi Grimberg didn't like the complexity of the solution, and there
> isn't enough time to rework it, test the new version, and get it
> reviewed before the 4.1 merge window opens. I'm going to prepare
> these for 4.2.
>
> Fixes suggested by reviewers have been included before the
> refactoring patches to make it easier to backport them to previous
> kernels.
>
> The series is available in the nfs-rdma-for-4.1 topic branch at
>
> git://linux-nfs.org/projects/cel/cel-2.6.git
>
> Changes since v1:
> - Rebased on 4.0-rc5
> - Main optimizations postponed to 4.2
> - Addressed review comments from Anna, Sagi, and Devesh
>
> ---
>
> Chuck Lever (15):
> SUNRPC: Introduce missing well-known netids
> xprtrdma: Display IPv6 addresses and port numbers correctly
> xprtrdma: Perform a full marshal on retransmit
> xprtrdma: Byte-align FRWR registration
> xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
> xprtrdma: Add vector of ops for each memory registration strategy
> xprtrdma: Add a "max_payload" op for each memreg mode
> xprtrdma: Add a "register_external" op for each memreg mode
> xprtrdma: Add a "deregister_external" op for each memreg mode
> xprtrdma: Add "init MRs" memreg op
> xprtrdma: Add "reset MRs" memreg op
> xprtrdma: Add "destroy MRs" memreg op
> xprtrdma: Add "open" memreg op
> xprtrdma: Handle non-SEND completions via a callout
> xprtrdma: Make rpcrdma_{un}map_one() into inline functions
>
>
> include/linux/sunrpc/msg_prot.h | 8
> net/sunrpc/xprtrdma/Makefile | 3
> net/sunrpc/xprtrdma/fmr_ops.c | 208 +++++++++++
> net/sunrpc/xprtrdma/frwr_ops.c | 353 ++++++++++++++++++
> net/sunrpc/xprtrdma/physical_ops.c | 94 +++++
> net/sunrpc/xprtrdma/rpc_rdma.c | 87 ++--
> net/sunrpc/xprtrdma/transport.c | 61 ++-
> net/sunrpc/xprtrdma/verbs.c | 699 +++---------------------------------
> net/sunrpc/xprtrdma/xprt_rdma.h | 90 ++++-
> 9 files changed, 882 insertions(+), 721 deletions(-)
> create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c
> create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c
> create mode 100644 net/sunrpc/xprtrdma/physical_ops.c
>
> --
> Chuck Lever
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2015-03-26 18:43:41

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1


On Mar 26, 2015, at 1:39 PM, Anna Schumaker <[email protected]> wrote:

> Hey Chuck,
>
> I didn't see anything that needs to be fixed up in these patches. Are they ready for me?

Thanks for the review. IMO we can go one of two routes:

- Wait for HCA vendors to test this latest version of the series, or

- Merge it now, and simply apply any needed fixes on top before the 4.1
window opens.

What do you prefer? Is it possible to get this series in front of the
zero-day test folks before you merge?


> Anna
>
> On 03/24/2015 04:30 PM, Chuck Lever wrote:
>> This is a series of client-side patches for NFS/RDMA. In preparation
>> for increasing the transport credit limit and maximum rsize/wsize,
>> I've re-factored the memory registration logic into separate files,
>> invoked via a method API.
>>
>> The two main optimizations in v1 of this series have been dropped.
>> Sagi Grimberg didn't like the complexity of the solution, and there
>> isn't enough time to rework it, test the new version, and get it
>> reviewed before the 4.1 merge window opens. I'm going to prepare
>> these for 4.2.
>>
>> Fixes suggested by reviewers have been included before the
>> refactoring patches to make it easier to backport them to previous
>> kernels.
>>
>> The series is available in the nfs-rdma-for-4.1 topic branch at
>>
>> git://linux-nfs.org/projects/cel/cel-2.6.git
>>
>> Changes since v1:
>> - Rebased on 4.0-rc5
>> - Main optimizations postponed to 4.2
>> - Addressed review comments from Anna, Sagi, and Devesh
>>
>> ---
>>
>> Chuck Lever (15):
>> SUNRPC: Introduce missing well-known netids
>> xprtrdma: Display IPv6 addresses and port numbers correctly
>> xprtrdma: Perform a full marshal on retransmit
>> xprtrdma: Byte-align FRWR registration
>> xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
>> xprtrdma: Add vector of ops for each memory registration strategy
>> xprtrdma: Add a "max_payload" op for each memreg mode
>> xprtrdma: Add a "register_external" op for each memreg mode
>> xprtrdma: Add a "deregister_external" op for each memreg mode
>> xprtrdma: Add "init MRs" memreg op
>> xprtrdma: Add "reset MRs" memreg op
>> xprtrdma: Add "destroy MRs" memreg op
>> xprtrdma: Add "open" memreg op
>> xprtrdma: Handle non-SEND completions via a callout
>> xprtrdma: Make rpcrdma_{un}map_one() into inline functions
>>
>>
>> include/linux/sunrpc/msg_prot.h | 8
>> net/sunrpc/xprtrdma/Makefile | 3
>> net/sunrpc/xprtrdma/fmr_ops.c | 208 +++++++++++
>> net/sunrpc/xprtrdma/frwr_ops.c | 353 ++++++++++++++++++
>> net/sunrpc/xprtrdma/physical_ops.c | 94 +++++
>> net/sunrpc/xprtrdma/rpc_rdma.c | 87 ++--
>> net/sunrpc/xprtrdma/transport.c | 61 ++-
>> net/sunrpc/xprtrdma/verbs.c | 699 +++---------------------------------
>> net/sunrpc/xprtrdma/xprt_rdma.h | 90 ++++-
>> 9 files changed, 882 insertions(+), 721 deletions(-)
>> create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c
>> create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c
>> create mode 100644 net/sunrpc/xprtrdma/physical_ops.c
>>
>> --
>> Chuck Lever
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2015-03-27 05:43:08

by Devesh Sharma

[permalink] [raw]
Subject: RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

SGkgQ2h1Y2ssDQoNCkkgaGF2ZSB2YWxpZGF0ZWQgdGhlc2Ugc2V0IG9mIHBhdGNoZXMgd2l0aCBv
Y3JkbWEgZGV2aWNlLCBpb3pvbmUgcGFzc2VzIHdpdGggdGhlc2UuDQoNCi1SZWdhcmRzDQpEZXZl
c2gNCg0KPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1yZG1hLW93
bmVyQHZnZXIua2VybmVsLm9yZyBbbWFpbHRvOmxpbnV4LXJkbWEtDQo+IG93bmVyQHZnZXIua2Vy
bmVsLm9yZ10gT24gQmVoYWxmIE9mIEFubmEgU2NodW1ha2VyDQo+IFNlbnQ6IEZyaWRheSwgTWFy
Y2ggMjcsIDIwMTUgMTI6MTAgQU0NCj4gVG86IENodWNrIExldmVyOyBsaW51eC1yZG1hQHZnZXIu
a2VybmVsLm9yZzsgbGludXgtbmZzQHZnZXIua2VybmVsLm9yZw0KPiBTdWJqZWN0OiBSZTogW1BB
VENIIHYyIDAwLzE1XSBORlMvUkRNQSBwYXRjaGVzIHByb3Bvc2VkIGZvciA0LjENCj4gDQo+IEhl
eSBDaHVjaywNCj4gDQo+IEkgZGlkbid0IHNlZSBhbnl0aGluZyB0aGF0IG5lZWRzIHRvIGJlIGZp
eGVkIHVwIGluIHRoZXNlIHBhdGNoZXMuICBBcmUgdGhleSByZWFkeQ0KPiBmb3IgbWU/DQo+IA0K
PiBBbm5hDQo+IA0KPiBPbiAwMy8yNC8yMDE1IDA0OjMwIFBNLCBDaHVjayBMZXZlciB3cm90ZToN
Cj4gPiBUaGlzIGlzIGEgc2VyaWVzIG9mIGNsaWVudC1zaWRlIHBhdGNoZXMgZm9yIE5GUy9SRE1B
LiBJbiBwcmVwYXJhdGlvbg0KPiA+IGZvciBpbmNyZWFzaW5nIHRoZSB0cmFuc3BvcnQgY3JlZGl0
IGxpbWl0IGFuZCBtYXhpbXVtIHJzaXplL3dzaXplLA0KPiA+IEkndmUgcmUtZmFjdG9yZWQgdGhl
IG1lbW9yeSByZWdpc3RyYXRpb24gbG9naWMgaW50byBzZXBhcmF0ZSBmaWxlcywNCj4gPiBpbnZv
a2VkIHZpYSBhIG1ldGhvZCBBUEkuDQo+ID4NCj4gPiBUaGUgdHdvIG1haW4gb3B0aW1pemF0aW9u
cyBpbiB2MSBvZiB0aGlzIHNlcmllcyBoYXZlIGJlZW4gZHJvcHBlZC4NCj4gPiBTYWdpIEdyaW1i
ZXJnIGRpZG4ndCBsaWtlIHRoZSBjb21wbGV4aXR5IG9mIHRoZSBzb2x1dGlvbiwgYW5kIHRoZXJl
DQo+ID4gaXNuJ3QgZW5vdWdoIHRpbWUgdG8gcmV3b3JrIGl0LCB0ZXN0IHRoZSBuZXcgdmVyc2lv
biwgYW5kIGdldCBpdA0KPiA+IHJldmlld2VkIGJlZm9yZSB0aGUgNC4xIG1lcmdlIHdpbmRvdyBv
cGVucy4gSSdtIGdvaW5nIHRvIHByZXBhcmUgdGhlc2UNCj4gPiBmb3IgNC4yLg0KPiA+DQo+ID4g
Rml4ZXMgc3VnZ2VzdGVkIGJ5IHJldmlld2VycyBoYXZlIGJlZW4gaW5jbHVkZWQgYmVmb3JlIHRo
ZSByZWZhY3RvcmluZw0KPiA+IHBhdGNoZXMgdG8gbWFrZSBpdCBlYXNpZXIgdG8gYmFja3BvcnQg
dGhlbSB0byBwcmV2aW91cyBrZXJuZWxzLg0KPiA+DQo+ID4gVGhlIHNlcmllcyBpcyBhdmFpbGFi
bGUgaW4gdGhlIG5mcy1yZG1hLWZvci00LjEgdG9waWMgYnJhbmNoIGF0DQo+ID4NCj4gPiBnaXQ6
Ly9saW51eC1uZnMub3JnL3Byb2plY3RzL2NlbC9jZWwtMi42LmdpdA0KPiA+DQo+ID4gQ2hhbmdl
cyBzaW5jZSB2MToNCj4gPiAtIFJlYmFzZWQgb24gNC4wLXJjNQ0KPiA+IC0gTWFpbiBvcHRpbWl6
YXRpb25zIHBvc3Rwb25lZCB0byA0LjINCj4gPiAtIEFkZHJlc3NlZCByZXZpZXcgY29tbWVudHMg
ZnJvbSBBbm5hLCBTYWdpLCBhbmQgRGV2ZXNoDQo+ID4NCj4gPiAtLS0NCj4gPg0KPiA+IENodWNr
IExldmVyICgxNSk6DQo+ID4gICAgICAgU1VOUlBDOiBJbnRyb2R1Y2UgbWlzc2luZyB3ZWxsLWtu
b3duIG5ldGlkcw0KPiA+ICAgICAgIHhwcnRyZG1hOiBEaXNwbGF5IElQdjYgYWRkcmVzc2VzIGFu
ZCBwb3J0IG51bWJlcnMgY29ycmVjdGx5DQo+ID4gICAgICAgeHBydHJkbWE6IFBlcmZvcm0gYSBm
dWxsIG1hcnNoYWwgb24gcmV0cmFuc21pdA0KPiA+ICAgICAgIHhwcnRyZG1hOiBCeXRlLWFsaWdu
IEZSV1IgcmVnaXN0cmF0aW9uDQo+ID4gICAgICAgeHBydHJkbWE6IFByZXZlbnQgaW5maW5pdGUg
bG9vcCBpbiBycGNyZG1hX2VwX2NyZWF0ZSgpDQo+ID4gICAgICAgeHBydHJkbWE6IEFkZCB2ZWN0
b3Igb2Ygb3BzIGZvciBlYWNoIG1lbW9yeSByZWdpc3RyYXRpb24gc3RyYXRlZ3kNCj4gPiAgICAg
ICB4cHJ0cmRtYTogQWRkIGEgIm1heF9wYXlsb2FkIiBvcCBmb3IgZWFjaCBtZW1yZWcgbW9kZQ0K
PiA+ICAgICAgIHhwcnRyZG1hOiBBZGQgYSAicmVnaXN0ZXJfZXh0ZXJuYWwiIG9wIGZvciBlYWNo
IG1lbXJlZyBtb2RlDQo+ID4gICAgICAgeHBydHJkbWE6IEFkZCBhICJkZXJlZ2lzdGVyX2V4dGVy
bmFsIiBvcCBmb3IgZWFjaCBtZW1yZWcgbW9kZQ0KPiA+ICAgICAgIHhwcnRyZG1hOiBBZGQgImlu
aXQgTVJzIiBtZW1yZWcgb3ANCj4gPiAgICAgICB4cHJ0cmRtYTogQWRkICJyZXNldCBNUnMiIG1l
bXJlZyBvcA0KPiA+ICAgICAgIHhwcnRyZG1hOiBBZGQgImRlc3Ryb3kgTVJzIiBtZW1yZWcgb3AN
Cj4gPiAgICAgICB4cHJ0cmRtYTogQWRkICJvcGVuIiBtZW1yZWcgb3ANCj4gPiAgICAgICB4cHJ0
cmRtYTogSGFuZGxlIG5vbi1TRU5EIGNvbXBsZXRpb25zIHZpYSBhIGNhbGxvdXQNCj4gPiAgICAg
ICB4cHJ0cmRtYTogTWFrZSBycGNyZG1hX3t1bn1tYXBfb25lKCkgaW50byBpbmxpbmUgZnVuY3Rp
b25zDQo+ID4NCj4gPg0KPiA+ICBpbmNsdWRlL2xpbnV4L3N1bnJwYy9tc2dfcHJvdC5oICAgIHwg
ICAgOA0KPiA+ICBuZXQvc3VucnBjL3hwcnRyZG1hL01ha2VmaWxlICAgICAgIHwgICAgMw0KPiA+
ICBuZXQvc3VucnBjL3hwcnRyZG1hL2Ztcl9vcHMuYyAgICAgIHwgIDIwOCArKysrKysrKysrKw0K
PiA+ICBuZXQvc3VucnBjL3hwcnRyZG1hL2Zyd3Jfb3BzLmMgICAgIHwgIDM1MyArKysrKysrKysr
KysrKysrKysNCj4gPiAgbmV0L3N1bnJwYy94cHJ0cmRtYS9waHlzaWNhbF9vcHMuYyB8ICAgOTQg
KysrKysNCj4gPiAgbmV0L3N1bnJwYy94cHJ0cmRtYS9ycGNfcmRtYS5jICAgICB8ICAgODcgKyst
LQ0KPiA+ICBuZXQvc3VucnBjL3hwcnRyZG1hL3RyYW5zcG9ydC5jICAgIHwgICA2MSArKy0NCj4g
PiAgbmV0L3N1bnJwYy94cHJ0cmRtYS92ZXJicy5jICAgICAgICB8ICA2OTkgKysrLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+ID4gIG5ldC9zdW5ycGMveHBydHJkbWEveHBydF9y
ZG1hLmggICAgfCAgIDkwICsrKystDQo+ID4gIDkgZmlsZXMgY2hhbmdlZCwgODgyIGluc2VydGlv
bnMoKyksIDcyMSBkZWxldGlvbnMoLSkgIGNyZWF0ZSBtb2RlDQo+ID4gMTAwNjQ0IG5ldC9zdW5y
cGMveHBydHJkbWEvZm1yX29wcy5jICBjcmVhdGUgbW9kZSAxMDA2NDQNCj4gPiBuZXQvc3VucnBj
L3hwcnRyZG1hL2Zyd3Jfb3BzLmMgIGNyZWF0ZSBtb2RlIDEwMDY0NA0KPiA+IG5ldC9zdW5ycGMv
eHBydHJkbWEvcGh5c2ljYWxfb3BzLmMNCj4gPg0KPiA+IC0tDQo+ID4gQ2h1Y2sgTGV2ZXINCj4g
PiAtLQ0KPiA+IFRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1
bnN1YnNjcmliZSBsaW51eC1uZnMiDQo+ID4gaW4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1h
am9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcgTW9yZSBtYWpvcmRvbW8NCj4gPiBpbmZvIGF0ICBodHRw
Oi8vdmdlci5rZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwNCj4gPg0KPiANCj4gLS0NCj4g
VG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJl
IGxpbnV4LXJkbWEiIGluIHRoZSBib2R5DQo+IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdl
ci5rZXJuZWwub3JnIE1vcmUgbWFqb3Jkb21vIGluZm8gYXQNCj4gaHR0cDovL3ZnZXIua2VybmVs
Lm9yZy9tYWpvcmRvbW8taW5mby5odG1sDQo=

2015-03-27 05:44:56

by Devesh Sharma

[permalink] [raw]
Subject: RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1yZG1hLW93bmVyQHZn
ZXIua2VybmVsLm9yZyBbbWFpbHRvOmxpbnV4LXJkbWEtDQo+IG93bmVyQHZnZXIua2VybmVsLm9y
Z10gT24gQmVoYWxmIE9mIERldmVzaCBTaGFybWENCj4gU2VudDogRnJpZGF5LCBNYXJjaCAyNywg
MjAxNSAxMToxMyBBTQ0KPiBUbzogQW5uYSBTY2h1bWFrZXI7IENodWNrIExldmVyOyBsaW51eC1y
ZG1hQHZnZXIua2VybmVsLm9yZzsgbGludXgtDQo+IG5mc0B2Z2VyLmtlcm5lbC5vcmcNCj4gU3Vi
amVjdDogUkU6IFtQQVRDSCB2MiAwMC8xNV0gTkZTL1JETUEgcGF0Y2hlcyBwcm9wb3NlZCBmb3Ig
NC4xDQo+IA0KPiBIaSBDaHVjaywNCj4gDQo+IEkgaGF2ZSB2YWxpZGF0ZWQgdGhlc2Ugc2V0IG9m
IHBhdGNoZXMgd2l0aCBvY3JkbWEgZGV2aWNlLCBpb3pvbmUgcGFzc2VzIHdpdGgNCj4gdGhlc2Uu
DQoNCg0KVGhhbmtzIHRvIE1lZ2huYS4NCg0KPiANCj4gLVJlZ2FyZHMNCj4gRGV2ZXNoDQo+IA0K
PiA+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4gRnJvbTogbGludXgtcmRtYS1vd25l
ckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzpsaW51eC1yZG1hLQ0KPiA+IG93bmVyQHZnZXIua2Vy
bmVsLm9yZ10gT24gQmVoYWxmIE9mIEFubmEgU2NodW1ha2VyDQo+ID4gU2VudDogRnJpZGF5LCBN
YXJjaCAyNywgMjAxNSAxMjoxMCBBTQ0KPiA+IFRvOiBDaHVjayBMZXZlcjsgbGludXgtcmRtYUB2
Z2VyLmtlcm5lbC5vcmc7IGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmcNCj4gPiBTdWJqZWN0OiBS
ZTogW1BBVENIIHYyIDAwLzE1XSBORlMvUkRNQSBwYXRjaGVzIHByb3Bvc2VkIGZvciA0LjENCj4g
Pg0KPiA+IEhleSBDaHVjaywNCj4gPg0KPiA+IEkgZGlkbid0IHNlZSBhbnl0aGluZyB0aGF0IG5l
ZWRzIHRvIGJlIGZpeGVkIHVwIGluIHRoZXNlIHBhdGNoZXMuICBBcmUNCj4gPiB0aGV5IHJlYWR5
IGZvciBtZT8NCj4gPg0KPiA+IEFubmENCj4gPg0KPiA+IE9uIDAzLzI0LzIwMTUgMDQ6MzAgUE0s
IENodWNrIExldmVyIHdyb3RlOg0KPiA+ID4gVGhpcyBpcyBhIHNlcmllcyBvZiBjbGllbnQtc2lk
ZSBwYXRjaGVzIGZvciBORlMvUkRNQS4gSW4gcHJlcGFyYXRpb24NCj4gPiA+IGZvciBpbmNyZWFz
aW5nIHRoZSB0cmFuc3BvcnQgY3JlZGl0IGxpbWl0IGFuZCBtYXhpbXVtIHJzaXplL3dzaXplLA0K
PiA+ID4gSSd2ZSByZS1mYWN0b3JlZCB0aGUgbWVtb3J5IHJlZ2lzdHJhdGlvbiBsb2dpYyBpbnRv
IHNlcGFyYXRlIGZpbGVzLA0KPiA+ID4gaW52b2tlZCB2aWEgYSBtZXRob2QgQVBJLg0KPiA+ID4N
Cj4gPiA+IFRoZSB0d28gbWFpbiBvcHRpbWl6YXRpb25zIGluIHYxIG9mIHRoaXMgc2VyaWVzIGhh
dmUgYmVlbiBkcm9wcGVkLg0KPiA+ID4gU2FnaSBHcmltYmVyZyBkaWRuJ3QgbGlrZSB0aGUgY29t
cGxleGl0eSBvZiB0aGUgc29sdXRpb24sIGFuZCB0aGVyZQ0KPiA+ID4gaXNuJ3QgZW5vdWdoIHRp
bWUgdG8gcmV3b3JrIGl0LCB0ZXN0IHRoZSBuZXcgdmVyc2lvbiwgYW5kIGdldCBpdA0KPiA+ID4g
cmV2aWV3ZWQgYmVmb3JlIHRoZSA0LjEgbWVyZ2Ugd2luZG93IG9wZW5zLiBJJ20gZ29pbmcgdG8g
cHJlcGFyZQ0KPiA+ID4gdGhlc2UgZm9yIDQuMi4NCj4gPiA+DQo+ID4gPiBGaXhlcyBzdWdnZXN0
ZWQgYnkgcmV2aWV3ZXJzIGhhdmUgYmVlbiBpbmNsdWRlZCBiZWZvcmUgdGhlDQo+ID4gPiByZWZh
Y3RvcmluZyBwYXRjaGVzIHRvIG1ha2UgaXQgZWFzaWVyIHRvIGJhY2twb3J0IHRoZW0gdG8gcHJl
dmlvdXMga2VybmVscy4NCj4gPiA+DQo+ID4gPiBUaGUgc2VyaWVzIGlzIGF2YWlsYWJsZSBpbiB0
aGUgbmZzLXJkbWEtZm9yLTQuMSB0b3BpYyBicmFuY2ggYXQNCj4gPiA+DQo+ID4gPiBnaXQ6Ly9s
aW51eC1uZnMub3JnL3Byb2plY3RzL2NlbC9jZWwtMi42LmdpdA0KPiA+ID4NCj4gPiA+IENoYW5n
ZXMgc2luY2UgdjE6DQo+ID4gPiAtIFJlYmFzZWQgb24gNC4wLXJjNQ0KPiA+ID4gLSBNYWluIG9w
dGltaXphdGlvbnMgcG9zdHBvbmVkIHRvIDQuMg0KPiA+ID4gLSBBZGRyZXNzZWQgcmV2aWV3IGNv
bW1lbnRzIGZyb20gQW5uYSwgU2FnaSwgYW5kIERldmVzaA0KPiA+ID4NCj4gPiA+IC0tLQ0KPiA+
ID4NCj4gPiA+IENodWNrIExldmVyICgxNSk6DQo+ID4gPiAgICAgICBTVU5SUEM6IEludHJvZHVj
ZSBtaXNzaW5nIHdlbGwta25vd24gbmV0aWRzDQo+ID4gPiAgICAgICB4cHJ0cmRtYTogRGlzcGxh
eSBJUHY2IGFkZHJlc3NlcyBhbmQgcG9ydCBudW1iZXJzIGNvcnJlY3RseQ0KPiA+ID4gICAgICAg
eHBydHJkbWE6IFBlcmZvcm0gYSBmdWxsIG1hcnNoYWwgb24gcmV0cmFuc21pdA0KPiA+ID4gICAg
ICAgeHBydHJkbWE6IEJ5dGUtYWxpZ24gRlJXUiByZWdpc3RyYXRpb24NCj4gPiA+ICAgICAgIHhw
cnRyZG1hOiBQcmV2ZW50IGluZmluaXRlIGxvb3AgaW4gcnBjcmRtYV9lcF9jcmVhdGUoKQ0KPiA+
ID4gICAgICAgeHBydHJkbWE6IEFkZCB2ZWN0b3Igb2Ygb3BzIGZvciBlYWNoIG1lbW9yeSByZWdp
c3RyYXRpb24gc3RyYXRlZ3kNCj4gPiA+ICAgICAgIHhwcnRyZG1hOiBBZGQgYSAibWF4X3BheWxv
YWQiIG9wIGZvciBlYWNoIG1lbXJlZyBtb2RlDQo+ID4gPiAgICAgICB4cHJ0cmRtYTogQWRkIGEg
InJlZ2lzdGVyX2V4dGVybmFsIiBvcCBmb3IgZWFjaCBtZW1yZWcgbW9kZQ0KPiA+ID4gICAgICAg
eHBydHJkbWE6IEFkZCBhICJkZXJlZ2lzdGVyX2V4dGVybmFsIiBvcCBmb3IgZWFjaCBtZW1yZWcg
bW9kZQ0KPiA+ID4gICAgICAgeHBydHJkbWE6IEFkZCAiaW5pdCBNUnMiIG1lbXJlZyBvcA0KPiA+
ID4gICAgICAgeHBydHJkbWE6IEFkZCAicmVzZXQgTVJzIiBtZW1yZWcgb3ANCj4gPiA+ICAgICAg
IHhwcnRyZG1hOiBBZGQgImRlc3Ryb3kgTVJzIiBtZW1yZWcgb3ANCj4gPiA+ICAgICAgIHhwcnRy
ZG1hOiBBZGQgIm9wZW4iIG1lbXJlZyBvcA0KPiA+ID4gICAgICAgeHBydHJkbWE6IEhhbmRsZSBu
b24tU0VORCBjb21wbGV0aW9ucyB2aWEgYSBjYWxsb3V0DQo+ID4gPiAgICAgICB4cHJ0cmRtYTog
TWFrZSBycGNyZG1hX3t1bn1tYXBfb25lKCkgaW50byBpbmxpbmUgZnVuY3Rpb25zDQo+ID4gPg0K
PiA+ID4NCj4gPiA+ICBpbmNsdWRlL2xpbnV4L3N1bnJwYy9tc2dfcHJvdC5oICAgIHwgICAgOA0K
PiA+ID4gIG5ldC9zdW5ycGMveHBydHJkbWEvTWFrZWZpbGUgICAgICAgfCAgICAzDQo+ID4gPiAg
bmV0L3N1bnJwYy94cHJ0cmRtYS9mbXJfb3BzLmMgICAgICB8ICAyMDggKysrKysrKysrKysNCj4g
PiA+ICBuZXQvc3VucnBjL3hwcnRyZG1hL2Zyd3Jfb3BzLmMgICAgIHwgIDM1MyArKysrKysrKysr
KysrKysrKysNCj4gPiA+ICBuZXQvc3VucnBjL3hwcnRyZG1hL3BoeXNpY2FsX29wcy5jIHwgICA5
NCArKysrKw0KPiA+ID4gIG5ldC9zdW5ycGMveHBydHJkbWEvcnBjX3JkbWEuYyAgICAgfCAgIDg3
ICsrLS0NCj4gPiA+ICBuZXQvc3VucnBjL3hwcnRyZG1hL3RyYW5zcG9ydC5jICAgIHwgICA2MSAr
Ky0NCj4gPiA+ICBuZXQvc3VucnBjL3hwcnRyZG1hL3ZlcmJzLmMgICAgICAgIHwgIDY5OSArKyst
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCj4gPiA+ICBuZXQvc3VucnBjL3hwcnRy
ZG1hL3hwcnRfcmRtYS5oICAgIHwgICA5MCArKysrLQ0KPiA+ID4gIDkgZmlsZXMgY2hhbmdlZCwg
ODgyIGluc2VydGlvbnMoKyksIDcyMSBkZWxldGlvbnMoLSkgIGNyZWF0ZSBtb2RlDQo+ID4gPiAx
MDA2NDQgbmV0L3N1bnJwYy94cHJ0cmRtYS9mbXJfb3BzLmMgIGNyZWF0ZSBtb2RlIDEwMDY0NA0K
PiA+ID4gbmV0L3N1bnJwYy94cHJ0cmRtYS9mcndyX29wcy5jICBjcmVhdGUgbW9kZSAxMDA2NDQN
Cj4gPiA+IG5ldC9zdW5ycGMveHBydHJkbWEvcGh5c2ljYWxfb3BzLmMNCj4gPiA+DQo+ID4gPiAt
LQ0KPiA+ID4gQ2h1Y2sgTGV2ZXINCj4gPiA+IC0tDQo+ID4gPiBUbyB1bnN1YnNjcmliZSBmcm9t
IHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUgbGludXgtbmZzIg0KPiA+ID4g
aW4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcgTW9y
ZSBtYWpvcmRvbW8NCj4gPiA+IGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jk
b21vLWluZm8uaHRtbA0KPiA+ID4NCj4gPg0KPiA+IC0tDQo+ID4gVG8gdW5zdWJzY3JpYmUgZnJv
bSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LXJkbWEiDQo+ID4g
aW4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcgTW9y
ZSBtYWpvcmRvbW8NCj4gPiBpbmZvIGF0IGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21v
LWluZm8uaHRtbA0KPiATICDsubscICYgfiAmIBggICstICDdthcgIHcgIMubICAgbSBiICBrdmYg
ICBebiByICAgeiAaICBoICAgICYgIB4gRyAgIGggAygg6ZqOIN2iaiIgIBogG20gICAgIHoNCj4g
3pYgICBmICAgaCAgIH4gbQ0K

2015-03-27 14:17:47

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1


On Mar 27, 2015, at 12:44 AM, Devesh Sharma <[email protected]> wrote:

>> -----Original Message-----
>> From: [email protected] [mailto:linux-rdma-
>> [email protected]] On Behalf Of Devesh Sharma
>> Sent: Friday, March 27, 2015 11:13 AM
>> To: Anna Schumaker; Chuck Lever; [email protected]; linux-
>> [email protected]
>> Subject: RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
>>
>> Hi Chuck,
>>
>> I have validated these set of patches with ocrdma device, iozone passes with
>> these.
>
>
> Thanks to Meghna.

Hi Devesh-

Is there a Tested-by tag that Anna can add to these patches?


>>
>> -Regards
>> Devesh
>>
>>> -----Original Message-----
>>> From: [email protected] [mailto:linux-rdma-
>>> [email protected]] On Behalf Of Anna Schumaker
>>> Sent: Friday, March 27, 2015 12:10 AM
>>> To: Chuck Lever; [email protected]; [email protected]
>>> Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
>>>
>>> Hey Chuck,
>>>
>>> I didn't see anything that needs to be fixed up in these patches. Are
>>> they ready for me?
>>>
>>> Anna
>>>
>>> On 03/24/2015 04:30 PM, Chuck Lever wrote:
>>>> This is a series of client-side patches for NFS/RDMA. In preparation
>>>> for increasing the transport credit limit and maximum rsize/wsize,
>>>> I've re-factored the memory registration logic into separate files,
>>>> invoked via a method API.
>>>>
>>>> The two main optimizations in v1 of this series have been dropped.
>>>> Sagi Grimberg didn't like the complexity of the solution, and there
>>>> isn't enough time to rework it, test the new version, and get it
>>>> reviewed before the 4.1 merge window opens. I'm going to prepare
>>>> these for 4.2.
>>>>
>>>> Fixes suggested by reviewers have been included before the
>>>> refactoring patches to make it easier to backport them to previous kernels.
>>>>
>>>> The series is available in the nfs-rdma-for-4.1 topic branch at
>>>>
>>>> git://linux-nfs.org/projects/cel/cel-2.6.git
>>>>
>>>> Changes since v1:
>>>> - Rebased on 4.0-rc5
>>>> - Main optimizations postponed to 4.2
>>>> - Addressed review comments from Anna, Sagi, and Devesh
>>>>
>>>> ---
>>>>
>>>> Chuck Lever (15):
>>>> SUNRPC: Introduce missing well-known netids
>>>> xprtrdma: Display IPv6 addresses and port numbers correctly
>>>> xprtrdma: Perform a full marshal on retransmit
>>>> xprtrdma: Byte-align FRWR registration
>>>> xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
>>>> xprtrdma: Add vector of ops for each memory registration strategy
>>>> xprtrdma: Add a "max_payload" op for each memreg mode
>>>> xprtrdma: Add a "register_external" op for each memreg mode
>>>> xprtrdma: Add a "deregister_external" op for each memreg mode
>>>> xprtrdma: Add "init MRs" memreg op
>>>> xprtrdma: Add "reset MRs" memreg op
>>>> xprtrdma: Add "destroy MRs" memreg op
>>>> xprtrdma: Add "open" memreg op
>>>> xprtrdma: Handle non-SEND completions via a callout
>>>> xprtrdma: Make rpcrdma_{un}map_one() into inline functions
>>>>
>>>>
>>>> include/linux/sunrpc/msg_prot.h | 8
>>>> net/sunrpc/xprtrdma/Makefile | 3
>>>> net/sunrpc/xprtrdma/fmr_ops.c | 208 +++++++++++
>>>> net/sunrpc/xprtrdma/frwr_ops.c | 353 ++++++++++++++++++
>>>> net/sunrpc/xprtrdma/physical_ops.c | 94 +++++
>>>> net/sunrpc/xprtrdma/rpc_rdma.c | 87 ++--
>>>> net/sunrpc/xprtrdma/transport.c | 61 ++-
>>>> net/sunrpc/xprtrdma/verbs.c | 699 +++---------------------------------
>>>> net/sunrpc/xprtrdma/xprt_rdma.h | 90 ++++-
>>>> 9 files changed, 882 insertions(+), 721 deletions(-) create mode
>>>> 100644 net/sunrpc/xprtrdma/fmr_ops.c create mode 100644
>>>> net/sunrpc/xprtrdma/frwr_ops.c create mode 100644
>>>> net/sunrpc/xprtrdma/physical_ops.c
>>>>
>>>> --
>>>> Chuck Lever
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
>>>> in the body of a message to [email protected] More majordomo
>>>> info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
>>> in the body of a message to [email protected] More majordomo
>>> info at http://vger.kernel.org/majordomo-info.html
>>  칻 & ~ &  +- ݶ w ˛ m b kvf ^n r z  h &  G h ( 階 ݢj"  m z
>> ޖ f h ~ m

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2015-03-27 16:02:48

by Devesh Sharma

[permalink] [raw]
Subject: RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

WWVzLCBZb3UgY2FuIGFkZCBNeSBhbmQgTWVnaG5hJ3MgbmFtZSBpbiB0ZXN0ZWQtYnkgdGFnDQoN
Ci1UaGFua3MNCg0KPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBDaHVjayBM
ZXZlciBbbWFpbHRvOmNodWNrLmxldmVyQG9yYWNsZS5jb21dDQo+IFNlbnQ6IEZyaWRheSwgTWFy
Y2ggMjcsIDIwMTUgNzo0OCBQTQ0KPiBUbzogRGV2ZXNoIFNoYXJtYQ0KPiBDYzogQW5uYSBTY2h1
bWFrZXI7IGxpbnV4LXJkbWFAdmdlci5rZXJuZWwub3JnOyBMaW51eCBORlMgTWFpbGluZyBMaXN0
Ow0KPiBNZWdoYW5hIENoZXJpcGFkeQ0KPiBTdWJqZWN0OiBSZTogW1BBVENIIHYyIDAwLzE1XSBO
RlMvUkRNQSBwYXRjaGVzIHByb3Bvc2VkIGZvciA0LjENCj4gDQo+IA0KPiBPbiBNYXIgMjcsIDIw
MTUsIGF0IDEyOjQ0IEFNLCBEZXZlc2ggU2hhcm1hDQo+IDxEZXZlc2guU2hhcm1hQEVtdWxleC5D
b20+IHdyb3RlOg0KPiANCj4gPj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gPj4gRnJv
bTogbGludXgtcmRtYS1vd25lckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzpsaW51eC1yZG1hLQ0K
PiA+PiBvd25lckB2Z2VyLmtlcm5lbC5vcmddIE9uIEJlaGFsZiBPZiBEZXZlc2ggU2hhcm1hDQo+
ID4+IFNlbnQ6IEZyaWRheSwgTWFyY2ggMjcsIDIwMTUgMTE6MTMgQU0NCj4gPj4gVG86IEFubmEg
U2NodW1ha2VyOyBDaHVjayBMZXZlcjsgbGludXgtcmRtYUB2Z2VyLmtlcm5lbC5vcmc7IGxpbnV4
LQ0KPiA+PiBuZnNAdmdlci5rZXJuZWwub3JnDQo+ID4+IFN1YmplY3Q6IFJFOiBbUEFUQ0ggdjIg
MDAvMTVdIE5GUy9SRE1BIHBhdGNoZXMgcHJvcG9zZWQgZm9yIDQuMQ0KPiA+Pg0KPiA+PiBIaSBD
aHVjaywNCj4gPj4NCj4gPj4gSSBoYXZlIHZhbGlkYXRlZCB0aGVzZSBzZXQgb2YgcGF0Y2hlcyB3
aXRoIG9jcmRtYSBkZXZpY2UsIGlvem9uZQ0KPiA+PiBwYXNzZXMgd2l0aCB0aGVzZS4NCj4gPg0K
PiA+DQo+ID4gVGhhbmtzIHRvIE1lZ2huYS4NCj4gDQo+IEhpIERldmVzaC0NCj4gDQo+IElzIHRo
ZXJlIGEgVGVzdGVkLWJ5IHRhZyB0aGF0IEFubmEgY2FuIGFkZCB0byB0aGVzZSBwYXRjaGVzPw0K
PiANCj4gDQo+ID4+DQo+ID4+IC1SZWdhcmRzDQo+ID4+IERldmVzaA0KPiA+Pg0KPiA+Pj4gLS0t
LS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gPj4+IEZyb206IGxpbnV4LXJkbWEtb3duZXJAdmdl
ci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtcmRtYS0NCj4gPj4+IG93bmVyQHZnZXIua2VybmVs
Lm9yZ10gT24gQmVoYWxmIE9mIEFubmEgU2NodW1ha2VyDQo+ID4+PiBTZW50OiBGcmlkYXksIE1h
cmNoIDI3LCAyMDE1IDEyOjEwIEFNDQo+ID4+PiBUbzogQ2h1Y2sgTGV2ZXI7IGxpbnV4LXJkbWFA
dmdlci5rZXJuZWwub3JnOw0KPiA+Pj4gbGludXgtbmZzQHZnZXIua2VybmVsLm9yZw0KPiA+Pj4g
U3ViamVjdDogUmU6IFtQQVRDSCB2MiAwMC8xNV0gTkZTL1JETUEgcGF0Y2hlcyBwcm9wb3NlZCBm
b3IgNC4xDQo+ID4+Pg0KPiA+Pj4gSGV5IENodWNrLA0KPiA+Pj4NCj4gPj4+IEkgZGlkbid0IHNl
ZSBhbnl0aGluZyB0aGF0IG5lZWRzIHRvIGJlIGZpeGVkIHVwIGluIHRoZXNlIHBhdGNoZXMuDQo+
ID4+PiBBcmUgdGhleSByZWFkeSBmb3IgbWU/DQo+ID4+Pg0KPiA+Pj4gQW5uYQ0KPiA+Pj4NCj4g
Pj4+IE9uIDAzLzI0LzIwMTUgMDQ6MzAgUE0sIENodWNrIExldmVyIHdyb3RlOg0KPiA+Pj4+IFRo
aXMgaXMgYSBzZXJpZXMgb2YgY2xpZW50LXNpZGUgcGF0Y2hlcyBmb3IgTkZTL1JETUEuIEluDQo+
ID4+Pj4gcHJlcGFyYXRpb24gZm9yIGluY3JlYXNpbmcgdGhlIHRyYW5zcG9ydCBjcmVkaXQgbGlt
aXQgYW5kIG1heGltdW0NCj4gPj4+PiByc2l6ZS93c2l6ZSwgSSd2ZSByZS1mYWN0b3JlZCB0aGUg
bWVtb3J5IHJlZ2lzdHJhdGlvbiBsb2dpYyBpbnRvDQo+ID4+Pj4gc2VwYXJhdGUgZmlsZXMsIGlu
dm9rZWQgdmlhIGEgbWV0aG9kIEFQSS4NCj4gPj4+Pg0KPiA+Pj4+IFRoZSB0d28gbWFpbiBvcHRp
bWl6YXRpb25zIGluIHYxIG9mIHRoaXMgc2VyaWVzIGhhdmUgYmVlbiBkcm9wcGVkLg0KPiA+Pj4+
IFNhZ2kgR3JpbWJlcmcgZGlkbid0IGxpa2UgdGhlIGNvbXBsZXhpdHkgb2YgdGhlIHNvbHV0aW9u
LCBhbmQgdGhlcmUNCj4gPj4+PiBpc24ndCBlbm91Z2ggdGltZSB0byByZXdvcmsgaXQsIHRlc3Qg
dGhlIG5ldyB2ZXJzaW9uLCBhbmQgZ2V0IGl0DQo+ID4+Pj4gcmV2aWV3ZWQgYmVmb3JlIHRoZSA0
LjEgbWVyZ2Ugd2luZG93IG9wZW5zLiBJJ20gZ29pbmcgdG8gcHJlcGFyZQ0KPiA+Pj4+IHRoZXNl
IGZvciA0LjIuDQo+ID4+Pj4NCj4gPj4+PiBGaXhlcyBzdWdnZXN0ZWQgYnkgcmV2aWV3ZXJzIGhh
dmUgYmVlbiBpbmNsdWRlZCBiZWZvcmUgdGhlDQo+ID4+Pj4gcmVmYWN0b3JpbmcgcGF0Y2hlcyB0
byBtYWtlIGl0IGVhc2llciB0byBiYWNrcG9ydCB0aGVtIHRvIHByZXZpb3VzDQo+IGtlcm5lbHMu
DQo+ID4+Pj4NCj4gPj4+PiBUaGUgc2VyaWVzIGlzIGF2YWlsYWJsZSBpbiB0aGUgbmZzLXJkbWEt
Zm9yLTQuMSB0b3BpYyBicmFuY2ggYXQNCj4gPj4+Pg0KPiA+Pj4+IGdpdDovL2xpbnV4LW5mcy5v
cmcvcHJvamVjdHMvY2VsL2NlbC0yLjYuZ2l0DQo+ID4+Pj4NCj4gPj4+PiBDaGFuZ2VzIHNpbmNl
IHYxOg0KPiA+Pj4+IC0gUmViYXNlZCBvbiA0LjAtcmM1DQo+ID4+Pj4gLSBNYWluIG9wdGltaXph
dGlvbnMgcG9zdHBvbmVkIHRvIDQuMg0KPiA+Pj4+IC0gQWRkcmVzc2VkIHJldmlldyBjb21tZW50
cyBmcm9tIEFubmEsIFNhZ2ksIGFuZCBEZXZlc2gNCj4gPj4+Pg0KPiA+Pj4+IC0tLQ0KPiA+Pj4+
DQo+ID4+Pj4gQ2h1Y2sgTGV2ZXIgKDE1KToNCj4gPj4+PiAgICAgIFNVTlJQQzogSW50cm9kdWNl
IG1pc3Npbmcgd2VsbC1rbm93biBuZXRpZHMNCj4gPj4+PiAgICAgIHhwcnRyZG1hOiBEaXNwbGF5
IElQdjYgYWRkcmVzc2VzIGFuZCBwb3J0IG51bWJlcnMgY29ycmVjdGx5DQo+ID4+Pj4gICAgICB4
cHJ0cmRtYTogUGVyZm9ybSBhIGZ1bGwgbWFyc2hhbCBvbiByZXRyYW5zbWl0DQo+ID4+Pj4gICAg
ICB4cHJ0cmRtYTogQnl0ZS1hbGlnbiBGUldSIHJlZ2lzdHJhdGlvbg0KPiA+Pj4+ICAgICAgeHBy
dHJkbWE6IFByZXZlbnQgaW5maW5pdGUgbG9vcCBpbiBycGNyZG1hX2VwX2NyZWF0ZSgpDQo+ID4+
Pj4gICAgICB4cHJ0cmRtYTogQWRkIHZlY3RvciBvZiBvcHMgZm9yIGVhY2ggbWVtb3J5IHJlZ2lz
dHJhdGlvbiBzdHJhdGVneQ0KPiA+Pj4+ICAgICAgeHBydHJkbWE6IEFkZCBhICJtYXhfcGF5bG9h
ZCIgb3AgZm9yIGVhY2ggbWVtcmVnIG1vZGUNCj4gPj4+PiAgICAgIHhwcnRyZG1hOiBBZGQgYSAi
cmVnaXN0ZXJfZXh0ZXJuYWwiIG9wIGZvciBlYWNoIG1lbXJlZyBtb2RlDQo+ID4+Pj4gICAgICB4
cHJ0cmRtYTogQWRkIGEgImRlcmVnaXN0ZXJfZXh0ZXJuYWwiIG9wIGZvciBlYWNoIG1lbXJlZyBt
b2RlDQo+ID4+Pj4gICAgICB4cHJ0cmRtYTogQWRkICJpbml0IE1ScyIgbWVtcmVnIG9wDQo+ID4+
Pj4gICAgICB4cHJ0cmRtYTogQWRkICJyZXNldCBNUnMiIG1lbXJlZyBvcA0KPiA+Pj4+ICAgICAg
eHBydHJkbWE6IEFkZCAiZGVzdHJveSBNUnMiIG1lbXJlZyBvcA0KPiA+Pj4+ICAgICAgeHBydHJk
bWE6IEFkZCAib3BlbiIgbWVtcmVnIG9wDQo+ID4+Pj4gICAgICB4cHJ0cmRtYTogSGFuZGxlIG5v
bi1TRU5EIGNvbXBsZXRpb25zIHZpYSBhIGNhbGxvdXQNCj4gPj4+PiAgICAgIHhwcnRyZG1hOiBN
YWtlIHJwY3JkbWFfe3VufW1hcF9vbmUoKSBpbnRvIGlubGluZSBmdW5jdGlvbnMNCj4gPj4+Pg0K
PiA+Pj4+DQo+ID4+Pj4gaW5jbHVkZS9saW51eC9zdW5ycGMvbXNnX3Byb3QuaCAgICB8ICAgIDgN
Cj4gPj4+PiBuZXQvc3VucnBjL3hwcnRyZG1hL01ha2VmaWxlICAgICAgIHwgICAgMw0KPiA+Pj4+
IG5ldC9zdW5ycGMveHBydHJkbWEvZm1yX29wcy5jICAgICAgfCAgMjA4ICsrKysrKysrKysrDQo+
ID4+Pj4gbmV0L3N1bnJwYy94cHJ0cmRtYS9mcndyX29wcy5jICAgICB8ICAzNTMgKysrKysrKysr
KysrKysrKysrDQo+ID4+Pj4gbmV0L3N1bnJwYy94cHJ0cmRtYS9waHlzaWNhbF9vcHMuYyB8ICAg
OTQgKysrKysNCj4gPj4+PiBuZXQvc3VucnBjL3hwcnRyZG1hL3JwY19yZG1hLmMgICAgIHwgICA4
NyArKy0tDQo+ID4+Pj4gbmV0L3N1bnJwYy94cHJ0cmRtYS90cmFuc3BvcnQuYyAgICB8ICAgNjEg
KystDQo+ID4+Pj4gbmV0L3N1bnJwYy94cHJ0cmRtYS92ZXJicy5jICAgICAgICB8ICA2OTkgKysr
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+ID4+Pj4gbmV0L3N1bnJwYy94cHJ0
cmRtYS94cHJ0X3JkbWEuaCAgICB8ICAgOTAgKysrKy0NCj4gPj4+PiA5IGZpbGVzIGNoYW5nZWQs
IDg4MiBpbnNlcnRpb25zKCspLCA3MjEgZGVsZXRpb25zKC0pICBjcmVhdGUgbW9kZQ0KPiA+Pj4+
IDEwMDY0NCBuZXQvc3VucnBjL3hwcnRyZG1hL2Ztcl9vcHMuYyAgY3JlYXRlIG1vZGUgMTAwNjQ0
DQo+ID4+Pj4gbmV0L3N1bnJwYy94cHJ0cmRtYS9mcndyX29wcy5jICBjcmVhdGUgbW9kZSAxMDA2
NDQNCj4gPj4+PiBuZXQvc3VucnBjL3hwcnRyZG1hL3BoeXNpY2FsX29wcy5jDQo+ID4+Pj4NCj4g
Pj4+PiAtLQ0KPiA+Pj4+IENodWNrIExldmVyDQo+ID4+Pj4gLS0NCj4gPj4+PiBUbyB1bnN1YnNj
cmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUgbGludXgtbmZz
Ig0KPiA+Pj4+IGluIHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJu
ZWwub3JnIE1vcmUNCj4gPj4+PiBtYWpvcmRvbW8gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2VybmVs
Lm9yZy9tYWpvcmRvbW8taW5mby5odG1sDQo+ID4+Pj4NCj4gPj4+DQo+ID4+PiAtLQ0KPiA+Pj4g
VG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJl
IGxpbnV4LXJkbWEiDQo+ID4+PiBpbiB0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21v
QHZnZXIua2VybmVsLm9yZyBNb3JlDQo+IG1ham9yZG9tbw0KPiA+Pj4gaW5mbyBhdCBodHRwOi8v
dmdlci5rZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwNCj4gPj4gEyAg7Lm7HCAmIH4gJiAY
ICArLSAg3bYXICB3ICDLmyAgIG0gYiAga3ZmICAgXm4gciAgIHogGiAgaCAgICAmICAeIEcgICBo
IAMoIOmajiDdomoiICAaIBttDQo+IHoNCj4gPj4g3pYgICBmICAgaCAgIH4gbQ0KPiANCj4gLS0N
Cj4gQ2h1Y2sgTGV2ZXINCj4gY2h1Y2tbZG90XWxldmVyW2F0XW9yYWNsZVtkb3RdY29tDQo+IA0K
PiANCg0K

2015-03-30 14:18:35

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

Hey Chuck,

Chelsio's QA regression tested this series on iw_cxgb4. Tests out good.

Tests ran: spew, ffsb, xdd, fio, dbench, and cthon with both v3 and v4.

Thanks,

Steve.

2015-03-30 17:53:22

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1


On Mar 30, 2015, at 10:18 AM, Steve Wise <[email protected]> wrote:

> Hey Chuck,
>
> Chelsio's QA regression tested this series on iw_cxgb4. Tests out good.
>
> Tests ran: spew, ffsb, xdd, fio, dbench, and cthon with both v3 and v4.

Thanks, Steve. Who should I credit in the Tested-by tag?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2015-03-30 18:01:19

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

Veeresh U. Kokatnur <[email protected]>


> -----Original Message-----
> From: Chuck Lever [mailto:[email protected]]
> Sent: Monday, March 30, 2015 12:53 PM
> To: Steve Wise
> Cc: linux-rdma; Linux NFS Mailing List
> Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
>
>
> On Mar 30, 2015, at 10:18 AM, Steve Wise <[email protected]> wrote:
>
> > Hey Chuck,
> >
> > Chelsio's QA regression tested this series on iw_cxgb4. Tests out good.
> >
> > Tests ran: spew, ffsb, xdd, fio, dbench, and cthon with both v3 and v4.
>
> Thanks, Steve. Who should I credit in the Tested-by tag?
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>