2015-03-30 18:33:26

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 00/15] NFS/RDMA patches proposed for 4.1

This is a series of client-side patches for NFS/RDMA. In preparation
for increasing the transport credit limit and maximum rsize/wsize,
I've re-factored the memory registration logic into separate files,
invoked via a method API.

The series is available in the nfs-rdma-for-4.1 topic branch at

git://linux-nfs.org/projects/cel/cel-2.6.git

Changes since v2:
- Rebased on 4.0-rc6
- One minor fix squashed into 01/15
- Tested-by tags added

Changes since v1:
- Rebased on 4.0-rc5
- Main optimizations postponed to 4.2
- Addressed review comments from Anna, Sagi, and Devesh

---

Chuck Lever (15):
SUNRPC: Introduce missing well-known netids
xprtrdma: Display IPv6 addresses and port numbers correctly
xprtrdma: Perform a full marshal on retransmit
xprtrdma: Byte-align FRWR registration
xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
xprtrdma: Add vector of ops for each memory registration strategy
xprtrdma: Add a "max_payload" op for each memreg mode
xprtrdma: Add a "register_external" op for each memreg mode
xprtrdma: Add a "deregister_external" op for each memreg mode
xprtrdma: Add "init MRs" memreg op
xprtrdma: Add "reset MRs" memreg op
xprtrdma: Add "destroy MRs" memreg op
xprtrdma: Add "open" memreg op
xprtrdma: Handle non-SEND completions via a callout
xprtrdma: Make rpcrdma_{un}map_one() into inline functions


include/linux/sunrpc/msg_prot.h | 8
include/linux/sunrpc/xprtrdma.h | 5
net/sunrpc/xprtrdma/Makefile | 3
net/sunrpc/xprtrdma/fmr_ops.c | 208 +++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 353 ++++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 94 +++++
net/sunrpc/xprtrdma/rpc_rdma.c | 87 ++--
net/sunrpc/xprtrdma/transport.c | 61 ++-
net/sunrpc/xprtrdma/verbs.c | 699 +++---------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 90 ++++-
10 files changed, 882 insertions(+), 726 deletions(-)
create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c
create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c
create mode 100644 net/sunrpc/xprtrdma/physical_ops.c

--
Chuck Lever


2015-03-30 18:33:37

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 01/15] SUNRPC: Introduce missing well-known netids

Signed-off-by: Chuck Lever <[email protected]>
---
include/linux/sunrpc/msg_prot.h | 8 +++++++-
include/linux/sunrpc/xprtrdma.h | 5 -----
2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/sunrpc/msg_prot.h b/include/linux/sunrpc/msg_prot.h
index aadc6a0..8073713 100644
--- a/include/linux/sunrpc/msg_prot.h
+++ b/include/linux/sunrpc/msg_prot.h
@@ -142,12 +142,18 @@ typedef __be32 rpc_fraghdr;
(RPC_REPHDRSIZE + (2 + RPC_MAX_AUTH_SIZE/4))

/*
- * RFC1833/RFC3530 rpcbind (v3+) well-known netid's.
+ * Well-known netids. See:
+ *
+ * http://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml
*/
#define RPCBIND_NETID_UDP "udp"
#define RPCBIND_NETID_TCP "tcp"
+#define RPCBIND_NETID_RDMA "rdma"
+#define RPCBIND_NETID_SCTP "sctp"
#define RPCBIND_NETID_UDP6 "udp6"
#define RPCBIND_NETID_TCP6 "tcp6"
+#define RPCBIND_NETID_RDMA6 "rdma6"
+#define RPCBIND_NETID_SCTP6 "sctp6"
#define RPCBIND_NETID_LOCAL "local"

/*
diff --git a/include/linux/sunrpc/xprtrdma.h b/include/linux/sunrpc/xprtrdma.h
index 64a0a0a..c984c85 100644
--- a/include/linux/sunrpc/xprtrdma.h
+++ b/include/linux/sunrpc/xprtrdma.h
@@ -41,11 +41,6 @@
#define _LINUX_SUNRPC_XPRTRDMA_H

/*
- * rpcbind (v3+) RDMA netid.
- */
-#define RPCBIND_NETID_RDMA "rdma"
-
-/*
* Constants. Max RPC/NFS header is big enough to account for
* additional marshaling buffers passed down by Linux client.
*


2015-03-30 18:33:47

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 02/15] xprtrdma: Display IPv6 addresses and port numbers correctly

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/transport.c | 47 ++++++++++++++++++++++++++++++++-------
net/sunrpc/xprtrdma/verbs.c | 21 +++++++----------
2 files changed, 47 insertions(+), 21 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 2e192ba..9be7f97 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -157,12 +157,47 @@ static struct ctl_table sunrpc_table[] = {
static struct rpc_xprt_ops xprt_rdma_procs; /* forward reference */

static void
+xprt_rdma_format_addresses4(struct rpc_xprt *xprt, struct sockaddr *sap)
+{
+ struct sockaddr_in *sin = (struct sockaddr_in *)sap;
+ char buf[20];
+
+ snprintf(buf, sizeof(buf), "%08x", ntohl(sin->sin_addr.s_addr));
+ xprt->address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf, GFP_KERNEL);
+
+ xprt->address_strings[RPC_DISPLAY_NETID] = RPCBIND_NETID_RDMA;
+}
+
+static void
+xprt_rdma_format_addresses6(struct rpc_xprt *xprt, struct sockaddr *sap)
+{
+ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sap;
+ char buf[40];
+
+ snprintf(buf, sizeof(buf), "%pi6", &sin6->sin6_addr);
+ xprt->address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf, GFP_KERNEL);
+
+ xprt->address_strings[RPC_DISPLAY_NETID] = RPCBIND_NETID_RDMA6;
+}
+
+static void
xprt_rdma_format_addresses(struct rpc_xprt *xprt)
{
struct sockaddr *sap = (struct sockaddr *)
&rpcx_to_rdmad(xprt).addr;
- struct sockaddr_in *sin = (struct sockaddr_in *)sap;
- char buf[64];
+ char buf[128];
+
+ switch (sap->sa_family) {
+ case AF_INET:
+ xprt_rdma_format_addresses4(xprt, sap);
+ break;
+ case AF_INET6:
+ xprt_rdma_format_addresses6(xprt, sap);
+ break;
+ default:
+ pr_err("rpcrdma: Unrecognized address family\n");
+ return;
+ }

(void)rpc_ntop(sap, buf, sizeof(buf));
xprt->address_strings[RPC_DISPLAY_ADDR] = kstrdup(buf, GFP_KERNEL);
@@ -170,16 +205,10 @@ xprt_rdma_format_addresses(struct rpc_xprt *xprt)
snprintf(buf, sizeof(buf), "%u", rpc_get_port(sap));
xprt->address_strings[RPC_DISPLAY_PORT] = kstrdup(buf, GFP_KERNEL);

- xprt->address_strings[RPC_DISPLAY_PROTO] = "rdma";
-
- snprintf(buf, sizeof(buf), "%08x", ntohl(sin->sin_addr.s_addr));
- xprt->address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf, GFP_KERNEL);
-
snprintf(buf, sizeof(buf), "%4hx", rpc_get_port(sap));
xprt->address_strings[RPC_DISPLAY_HEX_PORT] = kstrdup(buf, GFP_KERNEL);

- /* netid */
- xprt->address_strings[RPC_DISPLAY_NETID] = "rdma";
+ xprt->address_strings[RPC_DISPLAY_PROTO] = "rdma";
}

static void
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 124676c..1aa55b7 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -50,6 +50,7 @@
#include <linux/interrupt.h>
#include <linux/slab.h>
#include <linux/prefetch.h>
+#include <linux/sunrpc/addr.h>
#include <asm/bitops.h>

#include "xprt_rdma.h"
@@ -424,7 +425,7 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
struct rpcrdma_ia *ia = &xprt->rx_ia;
struct rpcrdma_ep *ep = &xprt->rx_ep;
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
- struct sockaddr_in *addr = (struct sockaddr_in *) &ep->rep_remote_addr;
+ struct sockaddr *sap = (struct sockaddr *)&ep->rep_remote_addr;
#endif
struct ib_qp_attr *attr = &ia->ri_qp_attr;
struct ib_qp_init_attr *iattr = &ia->ri_qp_init_attr;
@@ -480,9 +481,8 @@ connected:
wake_up_all(&ep->rep_connect_wait);
/*FALLTHROUGH*/
default:
- dprintk("RPC: %s: %pI4:%u (ep 0x%p): %s\n",
- __func__, &addr->sin_addr.s_addr,
- ntohs(addr->sin_port), ep,
+ dprintk("RPC: %s: %pIS:%u (ep 0x%p): %s\n",
+ __func__, sap, rpc_get_port(sap), ep,
CONNECTION_MSG(event->event));
break;
}
@@ -491,19 +491,16 @@ connected:
if (connstate == 1) {
int ird = attr->max_dest_rd_atomic;
int tird = ep->rep_remote_cma.responder_resources;
- printk(KERN_INFO "rpcrdma: connection to %pI4:%u "
- "on %s, memreg %d slots %d ird %d%s\n",
- &addr->sin_addr.s_addr,
- ntohs(addr->sin_port),
+
+ pr_info("rpcrdma: connection to %pIS:%u on %s, memreg %d slots %d ird %d%s\n",
+ sap, rpc_get_port(sap),
ia->ri_id->device->name,
ia->ri_memreg_strategy,
xprt->rx_buf.rb_max_requests,
ird, ird < 4 && ird < tird / 2 ? " (low!)" : "");
} else if (connstate < 0) {
- printk(KERN_INFO "rpcrdma: connection to %pI4:%u closed (%d)\n",
- &addr->sin_addr.s_addr,
- ntohs(addr->sin_port),
- connstate);
+ pr_info("rpcrdma: connection to %pIS:%u closed (%d)\n",
+ sap, rpc_get_port(sap), connstate);
}
#endif



2015-03-30 18:33:56

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 03/15] xprtrdma: Perform a full marshal on retransmit

Commit 6ab59945f292 ("xprtrdma: Update rkeys after transport
reconnect" added logic in the ->send_request path to update the
chunk list when an RPC/RDMA request is retransmitted.

Note that rpc_xdr_encode() resets and re-encodes the entire RPC
send buffer for each retransmit of an RPC. The RPC send buffer
is not preserved from the previous transmission of an RPC.

Revert 6ab59945f292, and instead, just force each request to be
fully marshaled every time through ->send_request. This should
preserve the fix from 6ab59945f292, while also performing pullup
during retransmits.

Signed-off-by: Chuck Lever <[email protected]>
Acked-by: Sagi Grimberg <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 71 ++++++++++++++++++---------------------
net/sunrpc/xprtrdma/transport.c | 5 +--
net/sunrpc/xprtrdma/xprt_rdma.h | 10 -----
3 files changed, 34 insertions(+), 52 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 91ffde8..41456d9 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -53,6 +53,14 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

+enum rpcrdma_chunktype {
+ rpcrdma_noch = 0,
+ rpcrdma_readch,
+ rpcrdma_areadch,
+ rpcrdma_writech,
+ rpcrdma_replych
+};
+
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
static const char transfertypes[][12] = {
"pure inline", /* no chunks */
@@ -284,28 +292,6 @@ out:
}

/*
- * Marshal chunks. This routine returns the header length
- * consumed by marshaling.
- *
- * Returns positive RPC/RDMA header size, or negative errno.
- */
-
-ssize_t
-rpcrdma_marshal_chunks(struct rpc_rqst *rqst, ssize_t result)
-{
- struct rpcrdma_req *req = rpcr_to_rdmar(rqst);
- struct rpcrdma_msg *headerp = rdmab_to_msg(req->rl_rdmabuf);
-
- if (req->rl_rtype != rpcrdma_noch)
- result = rpcrdma_create_chunks(rqst, &rqst->rq_snd_buf,
- headerp, req->rl_rtype);
- else if (req->rl_wtype != rpcrdma_noch)
- result = rpcrdma_create_chunks(rqst, &rqst->rq_rcv_buf,
- headerp, req->rl_wtype);
- return result;
-}
-
-/*
* Copy write data inline.
* This function is used for "small" requests. Data which is passed
* to RPC via iovecs (or page list) is copied directly into the
@@ -397,6 +383,7 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
char *base;
size_t rpclen, padlen;
ssize_t hdrlen;
+ enum rpcrdma_chunktype rtype, wtype;
struct rpcrdma_msg *headerp;

/*
@@ -433,13 +420,13 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
* into pages; otherwise use reply chunks.
*/
if (rqst->rq_rcv_buf.buflen <= RPCRDMA_INLINE_READ_THRESHOLD(rqst))
- req->rl_wtype = rpcrdma_noch;
+ wtype = rpcrdma_noch;
else if (rqst->rq_rcv_buf.page_len == 0)
- req->rl_wtype = rpcrdma_replych;
+ wtype = rpcrdma_replych;
else if (rqst->rq_rcv_buf.flags & XDRBUF_READ)
- req->rl_wtype = rpcrdma_writech;
+ wtype = rpcrdma_writech;
else
- req->rl_wtype = rpcrdma_replych;
+ wtype = rpcrdma_replych;

/*
* Chunks needed for arguments?
@@ -456,16 +443,16 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
* TBD check NFSv4 setacl
*/
if (rqst->rq_snd_buf.len <= RPCRDMA_INLINE_WRITE_THRESHOLD(rqst))
- req->rl_rtype = rpcrdma_noch;
+ rtype = rpcrdma_noch;
else if (rqst->rq_snd_buf.page_len == 0)
- req->rl_rtype = rpcrdma_areadch;
+ rtype = rpcrdma_areadch;
else
- req->rl_rtype = rpcrdma_readch;
+ rtype = rpcrdma_readch;

/* The following simplification is not true forever */
- if (req->rl_rtype != rpcrdma_noch && req->rl_wtype == rpcrdma_replych)
- req->rl_wtype = rpcrdma_noch;
- if (req->rl_rtype != rpcrdma_noch && req->rl_wtype != rpcrdma_noch) {
+ if (rtype != rpcrdma_noch && wtype == rpcrdma_replych)
+ wtype = rpcrdma_noch;
+ if (rtype != rpcrdma_noch && wtype != rpcrdma_noch) {
dprintk("RPC: %s: cannot marshal multiple chunk lists\n",
__func__);
return -EIO;
@@ -479,7 +466,7 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
* When padding is in use and applies to the transfer, insert
* it and change the message type.
*/
- if (req->rl_rtype == rpcrdma_noch) {
+ if (rtype == rpcrdma_noch) {

padlen = rpcrdma_inline_pullup(rqst,
RPCRDMA_INLINE_PAD_VALUE(rqst));
@@ -494,7 +481,7 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
headerp->rm_body.rm_padded.rm_pempty[1] = xdr_zero;
headerp->rm_body.rm_padded.rm_pempty[2] = xdr_zero;
hdrlen += 2 * sizeof(u32); /* extra words in padhdr */
- if (req->rl_wtype != rpcrdma_noch) {
+ if (wtype != rpcrdma_noch) {
dprintk("RPC: %s: invalid chunk list\n",
__func__);
return -EIO;
@@ -515,18 +502,26 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
* on receive. Therefore, we request a reply chunk
* for non-writes wherever feasible and efficient.
*/
- if (req->rl_wtype == rpcrdma_noch)
- req->rl_wtype = rpcrdma_replych;
+ if (wtype == rpcrdma_noch)
+ wtype = rpcrdma_replych;
}
}

- hdrlen = rpcrdma_marshal_chunks(rqst, hdrlen);
+ if (rtype != rpcrdma_noch) {
+ hdrlen = rpcrdma_create_chunks(rqst, &rqst->rq_snd_buf,
+ headerp, rtype);
+ wtype = rtype; /* simplify dprintk */
+
+ } else if (wtype != rpcrdma_noch) {
+ hdrlen = rpcrdma_create_chunks(rqst, &rqst->rq_rcv_buf,
+ headerp, wtype);
+ }
if (hdrlen < 0)
return hdrlen;

dprintk("RPC: %s: %s: hdrlen %zd rpclen %zd padlen %zd"
" headerp 0x%p base 0x%p lkey 0x%x\n",
- __func__, transfertypes[req->rl_wtype], hdrlen, rpclen, padlen,
+ __func__, transfertypes[wtype], hdrlen, rpclen, padlen,
headerp, base, rdmab_lkey(req->rl_rdmabuf));

/*
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 9be7f97..97f6562 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -608,10 +608,7 @@ xprt_rdma_send_request(struct rpc_task *task)
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
int rc = 0;

- if (req->rl_niovs == 0)
- rc = rpcrdma_marshal_req(rqst);
- else if (r_xprt->rx_ia.ri_memreg_strategy != RPCRDMA_ALLPHYSICAL)
- rc = rpcrdma_marshal_chunks(rqst, 0);
+ rc = rpcrdma_marshal_req(rqst);
if (rc < 0)
goto failed_marshal;

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 0a16fb6..c8afd83 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -143,14 +143,6 @@ rdmab_to_msg(struct rpcrdma_regbuf *rb)
return (struct rpcrdma_msg *)rb->rg_base;
}

-enum rpcrdma_chunktype {
- rpcrdma_noch = 0,
- rpcrdma_readch,
- rpcrdma_areadch,
- rpcrdma_writech,
- rpcrdma_replych
-};
-
/*
* struct rpcrdma_rep -- this structure encapsulates state required to recv
* and complete a reply, asychronously. It needs several pieces of
@@ -258,7 +250,6 @@ struct rpcrdma_req {
unsigned int rl_niovs; /* 0, 2 or 4 */
unsigned int rl_nchunks; /* non-zero if chunks */
unsigned int rl_connect_cookie; /* retry detection */
- enum rpcrdma_chunktype rl_rtype, rl_wtype;
struct rpcrdma_buffer *rl_buffer; /* home base for this structure */
struct rpcrdma_rep *rl_reply;/* holder for reply buffer */
struct ib_sge rl_send_iov[4]; /* for active requests */
@@ -418,7 +409,6 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *);
/*
* RPC/RDMA protocol calls - xprtrdma/rpc_rdma.c
*/
-ssize_t rpcrdma_marshal_chunks(struct rpc_rqst *, ssize_t);
int rpcrdma_marshal_req(struct rpc_rqst *);
size_t rpcrdma_max_payload(struct rpcrdma_xprt *);



2015-03-30 18:34:05

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 04/15] xprtrdma: Byte-align FRWR registration

The RPC/RDMA transport's FRWR registration logic registers whole
pages. This means areas in the first and last pages that are not
involved in the RDMA I/O are needlessly exposed to the server.

Buffered I/O is typically page-aligned, so not a problem there. But
for direct I/O, which can be byte-aligned, and for reply chunks,
which are nearly always smaller than a page, the transport could
expose memory outside the I/O buffer.

FRWR allows byte-aligned memory registration, so let's use it as
it was intended.

Reported-by: Sagi Grimberg <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/verbs.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 1aa55b7..60f3317 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1924,23 +1924,19 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
break;
}
- dprintk("RPC: %s: Using frmr %p to map %d segments\n",
- __func__, mw, i);
+ dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n",
+ __func__, mw, i, len);

frmr->fr_state = FRMR_IS_VALID;

memset(&fastreg_wr, 0, sizeof(fastreg_wr));
fastreg_wr.wr_id = (unsigned long)(void *)mw;
fastreg_wr.opcode = IB_WR_FAST_REG_MR;
- fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma;
+ fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma + pageoff;
fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl;
fastreg_wr.wr.fast_reg.page_list_len = page_no;
fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
- fastreg_wr.wr.fast_reg.length = page_no << PAGE_SHIFT;
- if (fastreg_wr.wr.fast_reg.length < len) {
- rc = -EIO;
- goto out_err;
- }
+ fastreg_wr.wr.fast_reg.length = len;

/* Bump the key */
key = (u8)(mr->rkey & 0x000000FF);


2015-03-30 18:34:14

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 05/15] xprtrdma: Prevent infinite loop in rpcrdma_ep_create()

If a provider advertizes a zero max_fast_reg_page_list_len, FRWR
depth detection loops forever. Instead of just failing the mount,
try other memory registration modes.

Fixes: 0fc6c4e7bb28 ("xprtrdma: mind the device's max fast . . .")
Reported-by: Devesh Sharma <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/verbs.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 60f3317..99752b5 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -618,9 +618,10 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)

if (memreg == RPCRDMA_FRMR) {
/* Requires both frmr reg and local dma lkey */
- if ((devattr->device_cap_flags &
+ if (((devattr->device_cap_flags &
(IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) !=
- (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) {
+ (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) ||
+ (devattr->max_fast_reg_page_list_len == 0)) {
dprintk("RPC: %s: FRMR registration "
"not supported by HCA\n", __func__);
memreg = RPCRDMA_MTHCAFMR;


2015-03-30 18:34:24

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 06/15] xprtrdma: Add vector of ops for each memory registration strategy

Instead of employing switch() statements, let's use the typical
Linux kernel idiom for handling behavioral variation: virtual
functions.

Start by defining a vector of operations for each supported memory
registration mode, and by adding a source file for each mode.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/Makefile | 3 ++-
net/sunrpc/xprtrdma/fmr_ops.c | 22 ++++++++++++++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 22 ++++++++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 24 ++++++++++++++++++++++++
net/sunrpc/xprtrdma/verbs.c | 11 +++++++----
net/sunrpc/xprtrdma/xprt_rdma.h | 12 ++++++++++++
6 files changed, 89 insertions(+), 5 deletions(-)
create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c
create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c
create mode 100644 net/sunrpc/xprtrdma/physical_ops.c

diff --git a/net/sunrpc/xprtrdma/Makefile b/net/sunrpc/xprtrdma/Makefile
index da5136f..579f72b 100644
--- a/net/sunrpc/xprtrdma/Makefile
+++ b/net/sunrpc/xprtrdma/Makefile
@@ -1,6 +1,7 @@
obj-$(CONFIG_SUNRPC_XPRT_RDMA_CLIENT) += xprtrdma.o

-xprtrdma-y := transport.o rpc_rdma.o verbs.o
+xprtrdma-y := transport.o rpc_rdma.o verbs.o \
+ fmr_ops.o frwr_ops.o physical_ops.o

obj-$(CONFIG_SUNRPC_XPRT_RDMA_SERVER) += svcrdma.o

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
new file mode 100644
index 0000000..ffb7d93
--- /dev/null
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -0,0 +1,22 @@
+/*
+ * Copyright (c) 2015 Oracle. All rights reserved.
+ * Copyright (c) 2003-2007 Network Appliance, Inc. All rights reserved.
+ */
+
+/* Lightweight memory registration using Fast Memory Regions (FMR).
+ * Referred to sometimes as MTHCAFMR mode.
+ *
+ * FMR uses synchronous memory registration and deregistration.
+ * FMR registration is known to be fast, but FMR deregistration
+ * can take tens of usecs to complete.
+ */
+
+#include "xprt_rdma.h"
+
+#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
+# define RPCDBG_FACILITY RPCDBG_TRANS
+#endif
+
+const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
+ .ro_displayname = "fmr",
+};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
new file mode 100644
index 0000000..79173f9
--- /dev/null
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -0,0 +1,22 @@
+/*
+ * Copyright (c) 2015 Oracle. All rights reserved.
+ * Copyright (c) 2003-2007 Network Appliance, Inc. All rights reserved.
+ */
+
+/* Lightweight memory registration using Fast Registration Work
+ * Requests (FRWR). Also referred to sometimes as FRMR mode.
+ *
+ * FRWR features ordered asynchronous registration and deregistration
+ * of arbitrarily sized memory regions. This is the fastest and safest
+ * but most complex memory registration mode.
+ */
+
+#include "xprt_rdma.h"
+
+#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
+# define RPCDBG_FACILITY RPCDBG_TRANS
+#endif
+
+const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
+ .ro_displayname = "frwr",
+};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
new file mode 100644
index 0000000..b0922ac
--- /dev/null
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -0,0 +1,24 @@
+/*
+ * Copyright (c) 2015 Oracle. All rights reserved.
+ * Copyright (c) 2003-2007 Network Appliance, Inc. All rights reserved.
+ */
+
+/* No-op chunk preparation. All client memory is pre-registered.
+ * Sometimes referred to as ALLPHYSICAL mode.
+ *
+ * Physical registration is simple because all client memory is
+ * pre-registered and never deregistered. This mode is good for
+ * adapter bring up, but is considered not safe: the server is
+ * trusted not to abuse its access to client memory not involved
+ * in RDMA I/O.
+ */
+
+#include "xprt_rdma.h"
+
+#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
+# define RPCDBG_FACILITY RPCDBG_TRANS
+#endif
+
+const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
+ .ro_displayname = "physical",
+};
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 99752b5..c3319e1 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -492,10 +492,10 @@ connected:
int ird = attr->max_dest_rd_atomic;
int tird = ep->rep_remote_cma.responder_resources;

- pr_info("rpcrdma: connection to %pIS:%u on %s, memreg %d slots %d ird %d%s\n",
+ pr_info("rpcrdma: connection to %pIS:%u on %s, memreg '%s', %d credits, %d responders%s\n",
sap, rpc_get_port(sap),
ia->ri_id->device->name,
- ia->ri_memreg_strategy,
+ ia->ri_ops->ro_displayname,
xprt->rx_buf.rb_max_requests,
ird, ird < 4 && ird < tird / 2 ? " (low!)" : "");
} else if (connstate < 0) {
@@ -650,13 +650,16 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
*/
switch (memreg) {
case RPCRDMA_FRMR:
+ ia->ri_ops = &rpcrdma_frwr_memreg_ops;
break;
case RPCRDMA_ALLPHYSICAL:
+ ia->ri_ops = &rpcrdma_physical_memreg_ops;
mem_priv = IB_ACCESS_LOCAL_WRITE |
IB_ACCESS_REMOTE_WRITE |
IB_ACCESS_REMOTE_READ;
goto register_setup;
case RPCRDMA_MTHCAFMR:
+ ia->ri_ops = &rpcrdma_fmr_memreg_ops;
if (ia->ri_have_dma_lkey)
break;
mem_priv = IB_ACCESS_LOCAL_WRITE;
@@ -676,8 +679,8 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
rc = -ENOMEM;
goto out3;
}
- dprintk("RPC: %s: memory registration strategy is %d\n",
- __func__, memreg);
+ dprintk("RPC: %s: memory registration strategy is '%s'\n",
+ __func__, ia->ri_ops->ro_displayname);

/* Else will do memory reg/dereg for each chunk */
ia->ri_memreg_strategy = memreg;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index c8afd83..ef3cf4a 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -60,6 +60,7 @@
* Interface Adapter -- one per transport instance
*/
struct rpcrdma_ia {
+ const struct rpcrdma_memreg_ops *ri_ops;
rwlock_t ri_qplock;
struct rdma_cm_id *ri_id;
struct ib_pd *ri_pd;
@@ -331,6 +332,17 @@ struct rpcrdma_stats {
};

/*
+ * Per-registration mode operations
+ */
+struct rpcrdma_memreg_ops {
+ const char *ro_displayname;
+};
+
+extern const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops;
+extern const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops;
+extern const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops;
+
+/*
* RPCRDMA transport -- encapsulates the structures above for
* integration with RPC.
*


2015-03-30 18:34:33

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 07/15] xprtrdma: Add a "max_payload" op for each memreg mode

The max_payload computation is generalized to ensure that the
payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of
data segments that can be transmitted in an inline buffer.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 13 ++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 13 ++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 10 +++++++
net/sunrpc/xprtrdma/transport.c | 5 +++-
net/sunrpc/xprtrdma/verbs.c | 49 +++++++++++-------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 5 +++-
6 files changed, 59 insertions(+), 36 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index ffb7d93..eec2660 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -17,6 +17,19 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

+/* Maximum scatter/gather per FMR */
+#define RPCRDMA_MAX_FMR_SGES (64)
+
+/* FMR mode conveys up to 64 pages of payload per chunk segment.
+ */
+static size_t
+fmr_op_maxpages(struct rpcrdma_xprt *r_xprt)
+{
+ return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
+ rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES);
+}
+
const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
+ .ro_maxpages = fmr_op_maxpages,
.ro_displayname = "fmr",
};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 79173f9..73a5ac8 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -17,6 +17,19 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

+/* FRWR mode conveys a list of pages per chunk segment. The
+ * maximum length of that list is the FRWR page list depth.
+ */
+static size_t
+frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+
+ return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
+ rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth);
+}
+
const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
+ .ro_maxpages = frwr_op_maxpages,
.ro_displayname = "frwr",
};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index b0922ac..28ade19 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -19,6 +19,16 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

+/* PHYSICAL memory registration conveys one page per chunk segment.
+ */
+static size_t
+physical_op_maxpages(struct rpcrdma_xprt *r_xprt)
+{
+ return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
+ rpcrdma_max_segments(r_xprt));
+}
+
const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
+ .ro_maxpages = physical_op_maxpages,
.ro_displayname = "physical",
};
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 97f6562..da71a24 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -406,7 +406,10 @@ xprt_setup_rdma(struct xprt_create *args)
xprt_rdma_connect_worker);

xprt_rdma_format_addresses(xprt);
- xprt->max_payload = rpcrdma_max_payload(new_xprt);
+ xprt->max_payload = new_xprt->rx_ia.ri_ops->ro_maxpages(new_xprt);
+ if (xprt->max_payload == 0)
+ goto out4;
+ xprt->max_payload <<= PAGE_SHIFT;
dprintk("RPC: %s: transport data payload maximum: %zu bytes\n",
__func__, xprt->max_payload);

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index c3319e1..da55cda 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -2212,43 +2212,24 @@ rpcrdma_ep_post_recv(struct rpcrdma_ia *ia,
return rc;
}

-/* Physical mapping means one Read/Write list entry per-page.
- * All list entries must fit within an inline buffer
- *
- * NB: The server must return a Write list for NFS READ,
- * which has the same constraint. Factor in the inline
- * rsize as well.
+/* How many chunk list items fit within our inline buffers?
*/
-static size_t
-rpcrdma_physical_max_payload(struct rpcrdma_xprt *r_xprt)
+unsigned int
+rpcrdma_max_segments(struct rpcrdma_xprt *r_xprt)
{
struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
- unsigned int inline_size, pages;
-
- inline_size = min_t(unsigned int,
- cdata->inline_wsize, cdata->inline_rsize);
- inline_size -= RPCRDMA_HDRLEN_MIN;
- pages = inline_size / sizeof(struct rpcrdma_segment);
- return pages << PAGE_SHIFT;
-}
+ int bytes, segments;

-static size_t
-rpcrdma_mr_max_payload(struct rpcrdma_xprt *r_xprt)
-{
- return RPCRDMA_MAX_DATA_SEGS << PAGE_SHIFT;
-}
-
-size_t
-rpcrdma_max_payload(struct rpcrdma_xprt *r_xprt)
-{
- size_t result;
-
- switch (r_xprt->rx_ia.ri_memreg_strategy) {
- case RPCRDMA_ALLPHYSICAL:
- result = rpcrdma_physical_max_payload(r_xprt);
- break;
- default:
- result = rpcrdma_mr_max_payload(r_xprt);
+ bytes = min_t(unsigned int, cdata->inline_wsize, cdata->inline_rsize);
+ bytes -= RPCRDMA_HDRLEN_MIN;
+ if (bytes < sizeof(struct rpcrdma_segment) * 2) {
+ pr_warn("RPC: %s: inline threshold too small\n",
+ __func__);
+ return 0;
}
- return result;
+
+ segments = 1 << (fls(bytes / sizeof(struct rpcrdma_segment)) - 1);
+ dprintk("RPC: %s: max chunk list size = %d segments\n",
+ __func__, segments);
+ return segments;
}
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index ef3cf4a..59e627e 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -334,7 +334,9 @@ struct rpcrdma_stats {
/*
* Per-registration mode operations
*/
+struct rpcrdma_xprt;
struct rpcrdma_memreg_ops {
+ size_t (*ro_maxpages)(struct rpcrdma_xprt *);
const char *ro_displayname;
};

@@ -411,6 +413,8 @@ struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(struct rpcrdma_ia *,
void rpcrdma_free_regbuf(struct rpcrdma_ia *,
struct rpcrdma_regbuf *);

+unsigned int rpcrdma_max_segments(struct rpcrdma_xprt *);
+
/*
* RPC/RDMA connection management calls - xprtrdma/rpc_rdma.c
*/
@@ -422,7 +426,6 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *);
* RPC/RDMA protocol calls - xprtrdma/rpc_rdma.c
*/
int rpcrdma_marshal_req(struct rpc_rqst *);
-size_t rpcrdma_max_payload(struct rpcrdma_xprt *);

/* Temporary NFS request map cache. Created in svc_rdma.c */
extern struct kmem_cache *svc_rdma_map_cachep;


2015-03-30 18:34:42

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 08/15] xprtrdma: Add a "register_external" op for each memreg mode

There is very little common processing among the different external
memory registration functions. Have rpcrdma_create_chunks() call
the registration method directly. This removes a stack frame and a
switch statement from the external registration path.

Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 51 +++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 82 ++++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 17 ++++
net/sunrpc/xprtrdma/rpc_rdma.c | 5 +
net/sunrpc/xprtrdma/verbs.c | 168 +-----------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 6 +
6 files changed, 160 insertions(+), 169 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index eec2660..45fb646 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -29,7 +29,58 @@ fmr_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES);
}

+/* Use the ib_map_phys_fmr() verb to register a memory region
+ * for remote access via RDMA READ or RDMA WRITE.
+ */
+static int
+fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
+ int nsegs, bool writing)
+{
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct rpcrdma_mr_seg *seg1 = seg;
+ struct rpcrdma_mw *mw = seg1->rl_mw;
+ u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
+ int len, pageoff, i, rc;
+
+ pageoff = offset_in_page(seg1->mr_offset);
+ seg1->mr_offset -= pageoff; /* start of page */
+ seg1->mr_len += pageoff;
+ len = -pageoff;
+ if (nsegs > RPCRDMA_MAX_FMR_SGES)
+ nsegs = RPCRDMA_MAX_FMR_SGES;
+ for (i = 0; i < nsegs;) {
+ rpcrdma_map_one(ia, seg, writing);
+ physaddrs[i] = seg->mr_dma;
+ len += seg->mr_len;
+ ++seg;
+ ++i;
+ /* Check for holes */
+ if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
+ offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
+ break;
+ }
+
+ rc = ib_map_phys_fmr(mw->r.fmr, physaddrs, i, seg1->mr_dma);
+ if (rc)
+ goto out_maperr;
+
+ seg1->mr_rkey = mw->r.fmr->rkey;
+ seg1->mr_base = seg1->mr_dma + pageoff;
+ seg1->mr_nsegs = i;
+ seg1->mr_len = len;
+ return i;
+
+out_maperr:
+ dprintk("RPC: %s: ib_map_phys_fmr %u@0x%llx+%i (%d) status %i\n",
+ __func__, len, (unsigned long long)seg1->mr_dma,
+ pageoff, i, rc);
+ while (i--)
+ rpcrdma_unmap_one(ia, --seg);
+ return rc;
+}
+
const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
+ .ro_map = fmr_op_map,
.ro_maxpages = fmr_op_maxpages,
.ro_displayname = "fmr",
};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 73a5ac8..23e4d99 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -29,7 +29,89 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth);
}

+/* Post a FAST_REG Work Request to register a memory region
+ * for remote access via RDMA READ or RDMA WRITE.
+ */
+static int
+frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
+ int nsegs, bool writing)
+{
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct rpcrdma_mr_seg *seg1 = seg;
+ struct rpcrdma_mw *mw = seg1->rl_mw;
+ struct rpcrdma_frmr *frmr = &mw->r.frmr;
+ struct ib_mr *mr = frmr->fr_mr;
+ struct ib_send_wr fastreg_wr, *bad_wr;
+ u8 key;
+ int len, pageoff;
+ int i, rc;
+ int seg_len;
+ u64 pa;
+ int page_no;
+
+ pageoff = offset_in_page(seg1->mr_offset);
+ seg1->mr_offset -= pageoff; /* start of page */
+ seg1->mr_len += pageoff;
+ len = -pageoff;
+ if (nsegs > ia->ri_max_frmr_depth)
+ nsegs = ia->ri_max_frmr_depth;
+ for (page_no = i = 0; i < nsegs;) {
+ rpcrdma_map_one(ia, seg, writing);
+ pa = seg->mr_dma;
+ for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
+ frmr->fr_pgl->page_list[page_no++] = pa;
+ pa += PAGE_SIZE;
+ }
+ len += seg->mr_len;
+ ++seg;
+ ++i;
+ /* Check for holes */
+ if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
+ offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
+ break;
+ }
+ dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n",
+ __func__, mw, i, len);
+
+ frmr->fr_state = FRMR_IS_VALID;
+
+ memset(&fastreg_wr, 0, sizeof(fastreg_wr));
+ fastreg_wr.wr_id = (unsigned long)(void *)mw;
+ fastreg_wr.opcode = IB_WR_FAST_REG_MR;
+ fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma + pageoff;
+ fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl;
+ fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
+ fastreg_wr.wr.fast_reg.page_list_len = page_no;
+ fastreg_wr.wr.fast_reg.length = len;
+ fastreg_wr.wr.fast_reg.access_flags = writing ?
+ IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
+ IB_ACCESS_REMOTE_READ;
+ key = (u8)(mr->rkey & 0x000000FF);
+ ib_update_fast_reg_key(mr, ++key);
+ fastreg_wr.wr.fast_reg.rkey = mr->rkey;
+
+ DECR_CQCOUNT(&r_xprt->rx_ep);
+ rc = ib_post_send(ia->ri_id->qp, &fastreg_wr, &bad_wr);
+ if (rc)
+ goto out_senderr;
+
+ seg1->mr_rkey = mr->rkey;
+ seg1->mr_base = seg1->mr_dma + pageoff;
+ seg1->mr_nsegs = i;
+ seg1->mr_len = len;
+ return i;
+
+out_senderr:
+ dprintk("RPC: %s: ib_post_send status %i\n", __func__, rc);
+ ib_update_fast_reg_key(mr, --key);
+ frmr->fr_state = FRMR_IS_INVALID;
+ while (i--)
+ rpcrdma_unmap_one(ia, --seg);
+ return rc;
+}
+
const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
+ .ro_map = frwr_op_map,
.ro_maxpages = frwr_op_maxpages,
.ro_displayname = "frwr",
};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index 28ade19..5a284ee 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -28,7 +28,24 @@ physical_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt));
}

+/* The client's physical memory is already exposed for
+ * remote access via RDMA READ or RDMA WRITE.
+ */
+static int
+physical_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
+ int nsegs, bool writing)
+{
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+
+ rpcrdma_map_one(ia, seg, writing);
+ seg->mr_rkey = ia->ri_bind_mem->rkey;
+ seg->mr_base = seg->mr_dma;
+ seg->mr_nsegs = 1;
+ return 1;
+}
+
const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
+ .ro_map = physical_op_map,
.ro_maxpages = physical_op_maxpages,
.ro_displayname = "physical",
};
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 41456d9..6ab1d03 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -187,6 +187,7 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target,
struct rpcrdma_write_array *warray = NULL;
struct rpcrdma_write_chunk *cur_wchunk = NULL;
__be32 *iptr = headerp->rm_body.rm_chunks;
+ int (*map)(struct rpcrdma_xprt *, struct rpcrdma_mr_seg *, int, bool);

if (type == rpcrdma_readch || type == rpcrdma_areadch) {
/* a read chunk - server will RDMA Read our memory */
@@ -209,9 +210,9 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target,
if (nsegs < 0)
return nsegs;

+ map = r_xprt->rx_ia.ri_ops->ro_map;
do {
- n = rpcrdma_register_external(seg, nsegs,
- cur_wchunk != NULL, r_xprt);
+ n = map(r_xprt, seg, nsegs, cur_wchunk != NULL);
if (n <= 0)
goto out;
if (cur_rchunk) { /* read */
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index da55cda..4318c04 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1858,8 +1858,8 @@ rpcrdma_free_regbuf(struct rpcrdma_ia *ia, struct rpcrdma_regbuf *rb)
* Wrappers for chunk registration, shared by read/write chunk code.
*/

-static void
-rpcrdma_map_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg, int writing)
+void
+rpcrdma_map_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg, bool writing)
{
seg->mr_dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
seg->mr_dmalen = seg->mr_len;
@@ -1879,7 +1879,7 @@ rpcrdma_map_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg, int writing)
}
}

-static void
+void
rpcrdma_unmap_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg)
{
if (seg->mr_page)
@@ -1891,89 +1891,6 @@ rpcrdma_unmap_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg)
}

static int
-rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
- int *nsegs, int writing, struct rpcrdma_ia *ia,
- struct rpcrdma_xprt *r_xprt)
-{
- struct rpcrdma_mr_seg *seg1 = seg;
- struct rpcrdma_mw *mw = seg1->rl_mw;
- struct rpcrdma_frmr *frmr = &mw->r.frmr;
- struct ib_mr *mr = frmr->fr_mr;
- struct ib_send_wr fastreg_wr, *bad_wr;
- u8 key;
- int len, pageoff;
- int i, rc;
- int seg_len;
- u64 pa;
- int page_no;
-
- pageoff = offset_in_page(seg1->mr_offset);
- seg1->mr_offset -= pageoff; /* start of page */
- seg1->mr_len += pageoff;
- len = -pageoff;
- if (*nsegs > ia->ri_max_frmr_depth)
- *nsegs = ia->ri_max_frmr_depth;
- for (page_no = i = 0; i < *nsegs;) {
- rpcrdma_map_one(ia, seg, writing);
- pa = seg->mr_dma;
- for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
- frmr->fr_pgl->page_list[page_no++] = pa;
- pa += PAGE_SIZE;
- }
- len += seg->mr_len;
- ++seg;
- ++i;
- /* Check for holes */
- if ((i < *nsegs && offset_in_page(seg->mr_offset)) ||
- offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
- break;
- }
- dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n",
- __func__, mw, i, len);
-
- frmr->fr_state = FRMR_IS_VALID;
-
- memset(&fastreg_wr, 0, sizeof(fastreg_wr));
- fastreg_wr.wr_id = (unsigned long)(void *)mw;
- fastreg_wr.opcode = IB_WR_FAST_REG_MR;
- fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma + pageoff;
- fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl;
- fastreg_wr.wr.fast_reg.page_list_len = page_no;
- fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
- fastreg_wr.wr.fast_reg.length = len;
-
- /* Bump the key */
- key = (u8)(mr->rkey & 0x000000FF);
- ib_update_fast_reg_key(mr, ++key);
-
- fastreg_wr.wr.fast_reg.access_flags = (writing ?
- IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
- IB_ACCESS_REMOTE_READ);
- fastreg_wr.wr.fast_reg.rkey = mr->rkey;
- DECR_CQCOUNT(&r_xprt->rx_ep);
-
- rc = ib_post_send(ia->ri_id->qp, &fastreg_wr, &bad_wr);
- if (rc) {
- dprintk("RPC: %s: failed ib_post_send for register,"
- " status %i\n", __func__, rc);
- ib_update_fast_reg_key(mr, --key);
- goto out_err;
- } else {
- seg1->mr_rkey = mr->rkey;
- seg1->mr_base = seg1->mr_dma + pageoff;
- seg1->mr_nsegs = i;
- seg1->mr_len = len;
- }
- *nsegs = i;
- return 0;
-out_err:
- frmr->fr_state = FRMR_IS_INVALID;
- while (i--)
- rpcrdma_unmap_one(ia, --seg);
- return rc;
-}
-
-static int
rpcrdma_deregister_frmr_external(struct rpcrdma_mr_seg *seg,
struct rpcrdma_ia *ia, struct rpcrdma_xprt *r_xprt)
{
@@ -2004,49 +1921,6 @@ rpcrdma_deregister_frmr_external(struct rpcrdma_mr_seg *seg,
}

static int
-rpcrdma_register_fmr_external(struct rpcrdma_mr_seg *seg,
- int *nsegs, int writing, struct rpcrdma_ia *ia)
-{
- struct rpcrdma_mr_seg *seg1 = seg;
- u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
- int len, pageoff, i, rc;
-
- pageoff = offset_in_page(seg1->mr_offset);
- seg1->mr_offset -= pageoff; /* start of page */
- seg1->mr_len += pageoff;
- len = -pageoff;
- if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
- *nsegs = RPCRDMA_MAX_DATA_SEGS;
- for (i = 0; i < *nsegs;) {
- rpcrdma_map_one(ia, seg, writing);
- physaddrs[i] = seg->mr_dma;
- len += seg->mr_len;
- ++seg;
- ++i;
- /* Check for holes */
- if ((i < *nsegs && offset_in_page(seg->mr_offset)) ||
- offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
- break;
- }
- rc = ib_map_phys_fmr(seg1->rl_mw->r.fmr, physaddrs, i, seg1->mr_dma);
- if (rc) {
- dprintk("RPC: %s: failed ib_map_phys_fmr "
- "%u@0x%llx+%i (%d)... status %i\n", __func__,
- len, (unsigned long long)seg1->mr_dma,
- pageoff, i, rc);
- while (i--)
- rpcrdma_unmap_one(ia, --seg);
- } else {
- seg1->mr_rkey = seg1->rl_mw->r.fmr->rkey;
- seg1->mr_base = seg1->mr_dma + pageoff;
- seg1->mr_nsegs = i;
- seg1->mr_len = len;
- }
- *nsegs = i;
- return rc;
-}
-
-static int
rpcrdma_deregister_fmr_external(struct rpcrdma_mr_seg *seg,
struct rpcrdma_ia *ia)
{
@@ -2067,42 +1941,6 @@ rpcrdma_deregister_fmr_external(struct rpcrdma_mr_seg *seg,
}

int
-rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
- int nsegs, int writing, struct rpcrdma_xprt *r_xprt)
-{
- struct rpcrdma_ia *ia = &r_xprt->rx_ia;
- int rc = 0;
-
- switch (ia->ri_memreg_strategy) {
-
- case RPCRDMA_ALLPHYSICAL:
- rpcrdma_map_one(ia, seg, writing);
- seg->mr_rkey = ia->ri_bind_mem->rkey;
- seg->mr_base = seg->mr_dma;
- seg->mr_nsegs = 1;
- nsegs = 1;
- break;
-
- /* Registration using frmr registration */
- case RPCRDMA_FRMR:
- rc = rpcrdma_register_frmr_external(seg, &nsegs, writing, ia, r_xprt);
- break;
-
- /* Registration using fmr memory registration */
- case RPCRDMA_MTHCAFMR:
- rc = rpcrdma_register_fmr_external(seg, &nsegs, writing, ia);
- break;
-
- default:
- return -EIO;
- }
- if (rc)
- return rc;
-
- return nsegs;
-}
-
-int
rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
struct rpcrdma_xprt *r_xprt)
{
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 59e627e..7bf077b 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -336,6 +336,8 @@ struct rpcrdma_stats {
*/
struct rpcrdma_xprt;
struct rpcrdma_memreg_ops {
+ int (*ro_map)(struct rpcrdma_xprt *,
+ struct rpcrdma_mr_seg *, int, bool);
size_t (*ro_maxpages)(struct rpcrdma_xprt *);
const char *ro_displayname;
};
@@ -403,8 +405,6 @@ void rpcrdma_buffer_put(struct rpcrdma_req *);
void rpcrdma_recv_buffer_get(struct rpcrdma_req *);
void rpcrdma_recv_buffer_put(struct rpcrdma_rep *);

-int rpcrdma_register_external(struct rpcrdma_mr_seg *,
- int, int, struct rpcrdma_xprt *);
int rpcrdma_deregister_external(struct rpcrdma_mr_seg *,
struct rpcrdma_xprt *);

@@ -414,6 +414,8 @@ void rpcrdma_free_regbuf(struct rpcrdma_ia *,
struct rpcrdma_regbuf *);

unsigned int rpcrdma_max_segments(struct rpcrdma_xprt *);
+void rpcrdma_map_one(struct rpcrdma_ia *, struct rpcrdma_mr_seg *, bool);
+void rpcrdma_unmap_one(struct rpcrdma_ia *, struct rpcrdma_mr_seg *);

/*
* RPC/RDMA connection management calls - xprtrdma/rpc_rdma.c


2015-03-30 18:34:51

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 09/15] xprtrdma: Add a "deregister_external" op for each memreg mode

There is very little common processing among the different external
memory deregistration functions.

Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 27 ++++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 36 ++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 10 ++++
net/sunrpc/xprtrdma/rpc_rdma.c | 11 +++--
net/sunrpc/xprtrdma/transport.c | 4 +-
net/sunrpc/xprtrdma/verbs.c | 81 ------------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 5 +-
7 files changed, 84 insertions(+), 90 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 45fb646..888aa10 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -79,8 +79,35 @@ out_maperr:
return rc;
}

+/* Use the ib_unmap_fmr() verb to prevent further remote
+ * access via RDMA READ or RDMA WRITE.
+ */
+static int
+fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
+{
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct rpcrdma_mr_seg *seg1 = seg;
+ int rc, nsegs = seg->mr_nsegs;
+ LIST_HEAD(l);
+
+ list_add(&seg1->rl_mw->r.fmr->list, &l);
+ rc = ib_unmap_fmr(&l);
+ read_lock(&ia->ri_qplock);
+ while (seg1->mr_nsegs--)
+ rpcrdma_unmap_one(ia, seg++);
+ read_unlock(&ia->ri_qplock);
+ if (rc)
+ goto out_err;
+ return nsegs;
+
+out_err:
+ dprintk("RPC: %s: ib_unmap_fmr status %i\n", __func__, rc);
+ return nsegs;
+}
+
const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
+ .ro_unmap = fmr_op_unmap,
.ro_maxpages = fmr_op_maxpages,
.ro_displayname = "fmr",
};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 23e4d99..35b725b 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -110,8 +110,44 @@ out_senderr:
return rc;
}

+/* Post a LOCAL_INV Work Request to prevent further remote access
+ * via RDMA READ or RDMA WRITE.
+ */
+static int
+frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
+{
+ struct rpcrdma_mr_seg *seg1 = seg;
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct ib_send_wr invalidate_wr, *bad_wr;
+ int rc, nsegs = seg->mr_nsegs;
+
+ seg1->rl_mw->r.frmr.fr_state = FRMR_IS_INVALID;
+
+ memset(&invalidate_wr, 0, sizeof(invalidate_wr));
+ invalidate_wr.wr_id = (unsigned long)(void *)seg1->rl_mw;
+ invalidate_wr.opcode = IB_WR_LOCAL_INV;
+ invalidate_wr.ex.invalidate_rkey = seg1->rl_mw->r.frmr.fr_mr->rkey;
+ DECR_CQCOUNT(&r_xprt->rx_ep);
+
+ read_lock(&ia->ri_qplock);
+ while (seg1->mr_nsegs--)
+ rpcrdma_unmap_one(ia, seg++);
+ rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
+ read_unlock(&ia->ri_qplock);
+ if (rc)
+ goto out_err;
+ return nsegs;
+
+out_err:
+ /* Force rpcrdma_buffer_get() to retry */
+ seg1->rl_mw->r.frmr.fr_state = FRMR_IS_STALE;
+ dprintk("RPC: %s: ib_post_send status %i\n", __func__, rc);
+ return nsegs;
+}
+
const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
+ .ro_unmap = frwr_op_unmap,
.ro_maxpages = frwr_op_maxpages,
.ro_displayname = "frwr",
};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index 5a284ee..5b5a63a 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -44,8 +44,18 @@ physical_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
return 1;
}

+/* Unmap a memory region, but leave it registered.
+ */
+static int
+physical_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
+{
+ rpcrdma_unmap_one(&r_xprt->rx_ia, seg);
+ return 1;
+}
+
const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
+ .ro_unmap = physical_op_unmap,
.ro_maxpages = physical_op_maxpages,
.ro_displayname = "physical",
};
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 6ab1d03..2c53ea9 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -284,11 +284,12 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target,
return (unsigned char *)iptr - (unsigned char *)headerp;

out:
- if (r_xprt->rx_ia.ri_memreg_strategy != RPCRDMA_FRMR) {
- for (pos = 0; nchunks--;)
- pos += rpcrdma_deregister_external(
- &req->rl_segments[pos], r_xprt);
- }
+ if (r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
+ return n;
+
+ for (pos = 0; nchunks--;)
+ pos += r_xprt->rx_ia.ri_ops->ro_unmap(r_xprt,
+ &req->rl_segments[pos]);
return n;
}

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index da71a24..54f23b1 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -584,8 +584,8 @@ xprt_rdma_free(void *buffer)

for (i = 0; req->rl_nchunks;) {
--req->rl_nchunks;
- i += rpcrdma_deregister_external(
- &req->rl_segments[i], r_xprt);
+ i += r_xprt->rx_ia.ri_ops->ro_unmap(r_xprt,
+ &req->rl_segments[i]);
}

rpcrdma_buffer_put(req);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 4318c04..b167c99 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1510,7 +1510,7 @@ rpcrdma_buffer_put_sendbuf(struct rpcrdma_req *req, struct rpcrdma_buffer *buf)
}
}

-/* rpcrdma_unmap_one() was already done by rpcrdma_deregister_frmr_external().
+/* rpcrdma_unmap_one() was already done during deregistration.
* Redo only the ib_post_send().
*/
static void
@@ -1890,85 +1890,6 @@ rpcrdma_unmap_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg)
seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
}

-static int
-rpcrdma_deregister_frmr_external(struct rpcrdma_mr_seg *seg,
- struct rpcrdma_ia *ia, struct rpcrdma_xprt *r_xprt)
-{
- struct rpcrdma_mr_seg *seg1 = seg;
- struct ib_send_wr invalidate_wr, *bad_wr;
- int rc;
-
- seg1->rl_mw->r.frmr.fr_state = FRMR_IS_INVALID;
-
- memset(&invalidate_wr, 0, sizeof invalidate_wr);
- invalidate_wr.wr_id = (unsigned long)(void *)seg1->rl_mw;
- invalidate_wr.opcode = IB_WR_LOCAL_INV;
- invalidate_wr.ex.invalidate_rkey = seg1->rl_mw->r.frmr.fr_mr->rkey;
- DECR_CQCOUNT(&r_xprt->rx_ep);
-
- read_lock(&ia->ri_qplock);
- while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(ia, seg++);
- rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
- read_unlock(&ia->ri_qplock);
- if (rc) {
- /* Force rpcrdma_buffer_get() to retry */
- seg1->rl_mw->r.frmr.fr_state = FRMR_IS_STALE;
- dprintk("RPC: %s: failed ib_post_send for invalidate,"
- " status %i\n", __func__, rc);
- }
- return rc;
-}
-
-static int
-rpcrdma_deregister_fmr_external(struct rpcrdma_mr_seg *seg,
- struct rpcrdma_ia *ia)
-{
- struct rpcrdma_mr_seg *seg1 = seg;
- LIST_HEAD(l);
- int rc;
-
- list_add(&seg1->rl_mw->r.fmr->list, &l);
- rc = ib_unmap_fmr(&l);
- read_lock(&ia->ri_qplock);
- while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(ia, seg++);
- read_unlock(&ia->ri_qplock);
- if (rc)
- dprintk("RPC: %s: failed ib_unmap_fmr,"
- " status %i\n", __func__, rc);
- return rc;
-}
-
-int
-rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
- struct rpcrdma_xprt *r_xprt)
-{
- struct rpcrdma_ia *ia = &r_xprt->rx_ia;
- int nsegs = seg->mr_nsegs, rc;
-
- switch (ia->ri_memreg_strategy) {
-
- case RPCRDMA_ALLPHYSICAL:
- read_lock(&ia->ri_qplock);
- rpcrdma_unmap_one(ia, seg);
- read_unlock(&ia->ri_qplock);
- break;
-
- case RPCRDMA_FRMR:
- rc = rpcrdma_deregister_frmr_external(seg, ia, r_xprt);
- break;
-
- case RPCRDMA_MTHCAFMR:
- rc = rpcrdma_deregister_fmr_external(seg, ia);
- break;
-
- default:
- break;
- }
- return nsegs;
-}
-
/*
* Prepost any receive buffer, then post send.
*
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 7bf077b..9a727f9 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -338,6 +338,8 @@ struct rpcrdma_xprt;
struct rpcrdma_memreg_ops {
int (*ro_map)(struct rpcrdma_xprt *,
struct rpcrdma_mr_seg *, int, bool);
+ int (*ro_unmap)(struct rpcrdma_xprt *,
+ struct rpcrdma_mr_seg *);
size_t (*ro_maxpages)(struct rpcrdma_xprt *);
const char *ro_displayname;
};
@@ -405,9 +407,6 @@ void rpcrdma_buffer_put(struct rpcrdma_req *);
void rpcrdma_recv_buffer_get(struct rpcrdma_req *);
void rpcrdma_recv_buffer_put(struct rpcrdma_rep *);

-int rpcrdma_deregister_external(struct rpcrdma_mr_seg *,
- struct rpcrdma_xprt *);
-
struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(struct rpcrdma_ia *,
size_t, gfp_t);
void rpcrdma_free_regbuf(struct rpcrdma_ia *,


2015-03-30 18:35:01

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 10/15] xprtrdma: Add "init MRs" memreg op

This method is used when setting up a new transport instance to
create a pool of Memory Region objects that will be used to register
memory during operation.

Memory Regions are not needed for "physical" registration, since
->prepare and ->release are no-ops for that mode.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 42 +++++++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 66 +++++++++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 7 ++
net/sunrpc/xprtrdma/verbs.c | 104 +-----------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 1
5 files changed, 119 insertions(+), 101 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 888aa10..825ce96 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -29,6 +29,47 @@ fmr_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES);
}

+static int
+fmr_op_init(struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
+ int mr_access_flags = IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_READ;
+ struct ib_fmr_attr fmr_attr = {
+ .max_pages = RPCRDMA_MAX_FMR_SGES,
+ .max_maps = 1,
+ .page_shift = PAGE_SHIFT
+ };
+ struct ib_pd *pd = r_xprt->rx_ia.ri_pd;
+ struct rpcrdma_mw *r;
+ int i, rc;
+
+ INIT_LIST_HEAD(&buf->rb_mws);
+ INIT_LIST_HEAD(&buf->rb_all);
+
+ i = (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS;
+ dprintk("RPC: %s: initalizing %d FMRs\n", __func__, i);
+
+ while (i--) {
+ r = kzalloc(sizeof(*r), GFP_KERNEL);
+ if (!r)
+ return -ENOMEM;
+
+ r->r.fmr = ib_alloc_fmr(pd, mr_access_flags, &fmr_attr);
+ if (IS_ERR(r->r.fmr))
+ goto out_fmr_err;
+
+ list_add(&r->mw_list, &buf->rb_mws);
+ list_add(&r->mw_all, &buf->rb_all);
+ }
+ return 0;
+
+out_fmr_err:
+ rc = PTR_ERR(r->r.fmr);
+ dprintk("RPC: %s: ib_alloc_fmr status %i\n", __func__, rc);
+ kfree(r);
+ return rc;
+}
+
/* Use the ib_map_phys_fmr() verb to register a memory region
* for remote access via RDMA READ or RDMA WRITE.
*/
@@ -109,5 +150,6 @@ const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
.ro_unmap = fmr_op_unmap,
.ro_maxpages = fmr_op_maxpages,
+ .ro_init = fmr_op_init,
.ro_displayname = "fmr",
};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 35b725b..9168c15 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -17,6 +17,35 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

+static int
+__frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device,
+ unsigned int depth)
+{
+ struct rpcrdma_frmr *f = &r->r.frmr;
+ int rc;
+
+ f->fr_mr = ib_alloc_fast_reg_mr(pd, depth);
+ if (IS_ERR(f->fr_mr))
+ goto out_mr_err;
+ f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth);
+ if (IS_ERR(f->fr_pgl))
+ goto out_list_err;
+ return 0;
+
+out_mr_err:
+ rc = PTR_ERR(f->fr_mr);
+ dprintk("RPC: %s: ib_alloc_fast_reg_mr status %i\n",
+ __func__, rc);
+ return rc;
+
+out_list_err:
+ rc = PTR_ERR(f->fr_pgl);
+ dprintk("RPC: %s: ib_alloc_fast_reg_page_list status %i\n",
+ __func__, rc);
+ ib_dereg_mr(f->fr_mr);
+ return rc;
+}
+
/* FRWR mode conveys a list of pages per chunk segment. The
* maximum length of that list is the FRWR page list depth.
*/
@@ -29,6 +58,42 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth);
}

+static int
+frwr_op_init(struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
+ struct ib_device *device = r_xprt->rx_ia.ri_id->device;
+ unsigned int depth = r_xprt->rx_ia.ri_max_frmr_depth;
+ struct ib_pd *pd = r_xprt->rx_ia.ri_pd;
+ int i;
+
+ INIT_LIST_HEAD(&buf->rb_mws);
+ INIT_LIST_HEAD(&buf->rb_all);
+
+ i = (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS;
+ dprintk("RPC: %s: initalizing %d FRMRs\n", __func__, i);
+
+ while (i--) {
+ struct rpcrdma_mw *r;
+ int rc;
+
+ r = kzalloc(sizeof(*r), GFP_KERNEL);
+ if (!r)
+ return -ENOMEM;
+
+ rc = __frwr_init(r, pd, device, depth);
+ if (rc) {
+ kfree(r);
+ return rc;
+ }
+
+ list_add(&r->mw_list, &buf->rb_mws);
+ list_add(&r->mw_all, &buf->rb_all);
+ }
+
+ return 0;
+}
+
/* Post a FAST_REG Work Request to register a memory region
* for remote access via RDMA READ or RDMA WRITE.
*/
@@ -149,5 +214,6 @@ const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
.ro_unmap = frwr_op_unmap,
.ro_maxpages = frwr_op_maxpages,
+ .ro_init = frwr_op_init,
.ro_displayname = "frwr",
};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index 5b5a63a..c372051 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -28,6 +28,12 @@ physical_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt));
}

+static int
+physical_op_init(struct rpcrdma_xprt *r_xprt)
+{
+ return 0;
+}
+
/* The client's physical memory is already exposed for
* remote access via RDMA READ or RDMA WRITE.
*/
@@ -57,5 +63,6 @@ const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
.ro_unmap = physical_op_unmap,
.ro_maxpages = physical_op_maxpages,
+ .ro_init = physical_op_init,
.ro_displayname = "physical",
};
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index b167c99..e89a57d 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1124,91 +1124,6 @@ out:
return ERR_PTR(rc);
}

-static int
-rpcrdma_init_fmrs(struct rpcrdma_ia *ia, struct rpcrdma_buffer *buf)
-{
- int mr_access_flags = IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_READ;
- struct ib_fmr_attr fmr_attr = {
- .max_pages = RPCRDMA_MAX_DATA_SEGS,
- .max_maps = 1,
- .page_shift = PAGE_SHIFT
- };
- struct rpcrdma_mw *r;
- int i, rc;
-
- i = (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS;
- dprintk("RPC: %s: initalizing %d FMRs\n", __func__, i);
-
- while (i--) {
- r = kzalloc(sizeof(*r), GFP_KERNEL);
- if (r == NULL)
- return -ENOMEM;
-
- r->r.fmr = ib_alloc_fmr(ia->ri_pd, mr_access_flags, &fmr_attr);
- if (IS_ERR(r->r.fmr)) {
- rc = PTR_ERR(r->r.fmr);
- dprintk("RPC: %s: ib_alloc_fmr failed %i\n",
- __func__, rc);
- goto out_free;
- }
-
- list_add(&r->mw_list, &buf->rb_mws);
- list_add(&r->mw_all, &buf->rb_all);
- }
- return 0;
-
-out_free:
- kfree(r);
- return rc;
-}
-
-static int
-rpcrdma_init_frmrs(struct rpcrdma_ia *ia, struct rpcrdma_buffer *buf)
-{
- struct rpcrdma_frmr *f;
- struct rpcrdma_mw *r;
- int i, rc;
-
- i = (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS;
- dprintk("RPC: %s: initalizing %d FRMRs\n", __func__, i);
-
- while (i--) {
- r = kzalloc(sizeof(*r), GFP_KERNEL);
- if (r == NULL)
- return -ENOMEM;
- f = &r->r.frmr;
-
- f->fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
- ia->ri_max_frmr_depth);
- if (IS_ERR(f->fr_mr)) {
- rc = PTR_ERR(f->fr_mr);
- dprintk("RPC: %s: ib_alloc_fast_reg_mr "
- "failed %i\n", __func__, rc);
- goto out_free;
- }
-
- f->fr_pgl = ib_alloc_fast_reg_page_list(ia->ri_id->device,
- ia->ri_max_frmr_depth);
- if (IS_ERR(f->fr_pgl)) {
- rc = PTR_ERR(f->fr_pgl);
- dprintk("RPC: %s: ib_alloc_fast_reg_page_list "
- "failed %i\n", __func__, rc);
-
- ib_dereg_mr(f->fr_mr);
- goto out_free;
- }
-
- list_add(&r->mw_list, &buf->rb_mws);
- list_add(&r->mw_all, &buf->rb_all);
- }
-
- return 0;
-
-out_free:
- kfree(r);
- return rc;
-}
-
int
rpcrdma_buffer_create(struct rpcrdma_xprt *r_xprt)
{
@@ -1245,22 +1160,9 @@ rpcrdma_buffer_create(struct rpcrdma_xprt *r_xprt)
buf->rb_recv_bufs = (struct rpcrdma_rep **) p;
p = (char *) &buf->rb_recv_bufs[buf->rb_max_requests];

- INIT_LIST_HEAD(&buf->rb_mws);
- INIT_LIST_HEAD(&buf->rb_all);
- switch (ia->ri_memreg_strategy) {
- case RPCRDMA_FRMR:
- rc = rpcrdma_init_frmrs(ia, buf);
- if (rc)
- goto out;
- break;
- case RPCRDMA_MTHCAFMR:
- rc = rpcrdma_init_fmrs(ia, buf);
- if (rc)
- goto out;
- break;
- default:
- break;
- }
+ rc = ia->ri_ops->ro_init(r_xprt);
+ if (rc)
+ goto out;

for (i = 0; i < buf->rb_max_requests; i++) {
struct rpcrdma_req *req;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 9a727f9..90b60fe 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -341,6 +341,7 @@ struct rpcrdma_memreg_ops {
int (*ro_unmap)(struct rpcrdma_xprt *,
struct rpcrdma_mr_seg *);
size_t (*ro_maxpages)(struct rpcrdma_xprt *);
+ int (*ro_init)(struct rpcrdma_xprt *);
const char *ro_displayname;
};



2015-03-30 18:35:10

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 11/15] xprtrdma: Add "reset MRs" memreg op

This method is invoked when a transport instance is about to be
reconnected. Each Memory Region object is reset to its initial
state.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 23 ++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 51 ++++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 6 ++
net/sunrpc/xprtrdma/verbs.c | 103 +-----------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 1
5 files changed, 83 insertions(+), 101 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 825ce96..93261b0 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -146,10 +146,33 @@ out_err:
return nsegs;
}

+/* After a disconnect, unmap all FMRs.
+ *
+ * This is invoked only in the transport connect worker in order
+ * to serialize with rpcrdma_register_fmr_external().
+ */
+static void
+fmr_op_reset(struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
+ struct rpcrdma_mw *r;
+ LIST_HEAD(list);
+ int rc;
+
+ list_for_each_entry(r, &buf->rb_all, mw_all)
+ list_add(&r->r.fmr->list, &list);
+
+ rc = ib_unmap_fmr(&list);
+ if (rc)
+ dprintk("RPC: %s: ib_unmap_fmr failed %i\n",
+ __func__, rc);
+}
+
const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
.ro_unmap = fmr_op_unmap,
.ro_maxpages = fmr_op_maxpages,
.ro_init = fmr_op_init,
+ .ro_reset = fmr_op_reset,
.ro_displayname = "fmr",
};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 9168c15..c2bb29d 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -46,6 +46,18 @@ out_list_err:
return rc;
}

+static void
+__frwr_release(struct rpcrdma_mw *r)
+{
+ int rc;
+
+ rc = ib_dereg_mr(r->r.frmr.fr_mr);
+ if (rc)
+ dprintk("RPC: %s: ib_dereg_mr status %i\n",
+ __func__, rc);
+ ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
+}
+
/* FRWR mode conveys a list of pages per chunk segment. The
* maximum length of that list is the FRWR page list depth.
*/
@@ -210,10 +222,49 @@ out_err:
return nsegs;
}

+/* After a disconnect, a flushed FAST_REG_MR can leave an FRMR in
+ * an unusable state. Find FRMRs in this state and dereg / reg
+ * each. FRMRs that are VALID and attached to an rpcrdma_req are
+ * also torn down.
+ *
+ * This gives all in-use FRMRs a fresh rkey and leaves them INVALID.
+ *
+ * This is invoked only in the transport connect worker in order
+ * to serialize with rpcrdma_register_frmr_external().
+ */
+static void
+frwr_op_reset(struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
+ struct ib_device *device = r_xprt->rx_ia.ri_id->device;
+ unsigned int depth = r_xprt->rx_ia.ri_max_frmr_depth;
+ struct ib_pd *pd = r_xprt->rx_ia.ri_pd;
+ struct rpcrdma_mw *r;
+ int rc;
+
+ list_for_each_entry(r, &buf->rb_all, mw_all) {
+ if (r->r.frmr.fr_state == FRMR_IS_INVALID)
+ continue;
+
+ __frwr_release(r);
+ rc = __frwr_init(r, pd, device, depth);
+ if (rc) {
+ dprintk("RPC: %s: mw %p left %s\n",
+ __func__, r,
+ (r->r.frmr.fr_state == FRMR_IS_STALE ?
+ "stale" : "valid"));
+ continue;
+ }
+
+ r->r.frmr.fr_state = FRMR_IS_INVALID;
+ }
+}
+
const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
.ro_unmap = frwr_op_unmap,
.ro_maxpages = frwr_op_maxpages,
.ro_init = frwr_op_init,
+ .ro_reset = frwr_op_reset,
.ro_displayname = "frwr",
};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index c372051..e060713 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -59,10 +59,16 @@ physical_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
return 1;
}

+static void
+physical_op_reset(struct rpcrdma_xprt *r_xprt)
+{
+}
+
const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
.ro_unmap = physical_op_unmap,
.ro_maxpages = physical_op_maxpages,
.ro_init = physical_op_init,
+ .ro_reset = physical_op_reset,
.ro_displayname = "physical",
};
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index e89a57d..1b2c1f4 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -63,9 +63,6 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

-static void rpcrdma_reset_frmrs(struct rpcrdma_ia *);
-static void rpcrdma_reset_fmrs(struct rpcrdma_ia *);
-
/*
* internal functions
*/
@@ -945,21 +942,9 @@ retry:
rpcrdma_ep_disconnect(ep, ia);
rpcrdma_flush_cqs(ep);

- switch (ia->ri_memreg_strategy) {
- case RPCRDMA_FRMR:
- rpcrdma_reset_frmrs(ia);
- break;
- case RPCRDMA_MTHCAFMR:
- rpcrdma_reset_fmrs(ia);
- break;
- case RPCRDMA_ALLPHYSICAL:
- break;
- default:
- rc = -EIO;
- goto out;
- }
-
xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
+ ia->ri_ops->ro_reset(xprt);
+
id = rpcrdma_create_id(xprt, ia,
(struct sockaddr *)&xprt->rx_data.addr);
if (IS_ERR(id)) {
@@ -1289,90 +1274,6 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
kfree(buf->rb_pool);
}

-/* After a disconnect, unmap all FMRs.
- *
- * This is invoked only in the transport connect worker in order
- * to serialize with rpcrdma_register_fmr_external().
- */
-static void
-rpcrdma_reset_fmrs(struct rpcrdma_ia *ia)
-{
- struct rpcrdma_xprt *r_xprt =
- container_of(ia, struct rpcrdma_xprt, rx_ia);
- struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
- struct list_head *pos;
- struct rpcrdma_mw *r;
- LIST_HEAD(l);
- int rc;
-
- list_for_each(pos, &buf->rb_all) {
- r = list_entry(pos, struct rpcrdma_mw, mw_all);
-
- INIT_LIST_HEAD(&l);
- list_add(&r->r.fmr->list, &l);
- rc = ib_unmap_fmr(&l);
- if (rc)
- dprintk("RPC: %s: ib_unmap_fmr failed %i\n",
- __func__, rc);
- }
-}
-
-/* After a disconnect, a flushed FAST_REG_MR can leave an FRMR in
- * an unusable state. Find FRMRs in this state and dereg / reg
- * each. FRMRs that are VALID and attached to an rpcrdma_req are
- * also torn down.
- *
- * This gives all in-use FRMRs a fresh rkey and leaves them INVALID.
- *
- * This is invoked only in the transport connect worker in order
- * to serialize with rpcrdma_register_frmr_external().
- */
-static void
-rpcrdma_reset_frmrs(struct rpcrdma_ia *ia)
-{
- struct rpcrdma_xprt *r_xprt =
- container_of(ia, struct rpcrdma_xprt, rx_ia);
- struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
- struct list_head *pos;
- struct rpcrdma_mw *r;
- int rc;
-
- list_for_each(pos, &buf->rb_all) {
- r = list_entry(pos, struct rpcrdma_mw, mw_all);
-
- if (r->r.frmr.fr_state == FRMR_IS_INVALID)
- continue;
-
- rc = ib_dereg_mr(r->r.frmr.fr_mr);
- if (rc)
- dprintk("RPC: %s: ib_dereg_mr failed %i\n",
- __func__, rc);
- ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
-
- r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
- ia->ri_max_frmr_depth);
- if (IS_ERR(r->r.frmr.fr_mr)) {
- rc = PTR_ERR(r->r.frmr.fr_mr);
- dprintk("RPC: %s: ib_alloc_fast_reg_mr"
- " failed %i\n", __func__, rc);
- continue;
- }
- r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
- ia->ri_id->device,
- ia->ri_max_frmr_depth);
- if (IS_ERR(r->r.frmr.fr_pgl)) {
- rc = PTR_ERR(r->r.frmr.fr_pgl);
- dprintk("RPC: %s: "
- "ib_alloc_fast_reg_page_list "
- "failed %i\n", __func__, rc);
-
- ib_dereg_mr(r->r.frmr.fr_mr);
- continue;
- }
- r->r.frmr.fr_state = FRMR_IS_INVALID;
- }
-}
-
/* "*mw" can be NULL when rpcrdma_buffer_get_mrs() fails, leaving
* some req segments uninitialized.
*/
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 90b60fe..0680239 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -342,6 +342,7 @@ struct rpcrdma_memreg_ops {
struct rpcrdma_mr_seg *);
size_t (*ro_maxpages)(struct rpcrdma_xprt *);
int (*ro_init)(struct rpcrdma_xprt *);
+ void (*ro_reset)(struct rpcrdma_xprt *);
const char *ro_displayname;
};



2015-03-30 18:35:19

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 12/15] xprtrdma: Add "destroy MRs" memreg op

Memory Region objects associated with a transport instance are
destroyed before the instance is shutdown and destroyed.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 18 ++++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 14 ++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 6 ++++
net/sunrpc/xprtrdma/verbs.c | 52 +-----------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 1 +
5 files changed, 40 insertions(+), 51 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 93261b0..e9ca594 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -168,11 +168,29 @@ fmr_op_reset(struct rpcrdma_xprt *r_xprt)
__func__, rc);
}

+static void
+fmr_op_destroy(struct rpcrdma_buffer *buf)
+{
+ struct rpcrdma_mw *r;
+ int rc;
+
+ while (!list_empty(&buf->rb_all)) {
+ r = list_entry(buf->rb_all.next, struct rpcrdma_mw, mw_all);
+ list_del(&r->mw_all);
+ rc = ib_dealloc_fmr(r->r.fmr);
+ if (rc)
+ dprintk("RPC: %s: ib_dealloc_fmr failed %i\n",
+ __func__, rc);
+ kfree(r);
+ }
+}
+
const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
.ro_unmap = fmr_op_unmap,
.ro_maxpages = fmr_op_maxpages,
.ro_init = fmr_op_init,
.ro_reset = fmr_op_reset,
+ .ro_destroy = fmr_op_destroy,
.ro_displayname = "fmr",
};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index c2bb29d..121e400 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -260,11 +260,25 @@ frwr_op_reset(struct rpcrdma_xprt *r_xprt)
}
}

+static void
+frwr_op_destroy(struct rpcrdma_buffer *buf)
+{
+ struct rpcrdma_mw *r;
+
+ while (!list_empty(&buf->rb_all)) {
+ r = list_entry(buf->rb_all.next, struct rpcrdma_mw, mw_all);
+ list_del(&r->mw_all);
+ __frwr_release(r);
+ kfree(r);
+ }
+}
+
const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
.ro_unmap = frwr_op_unmap,
.ro_maxpages = frwr_op_maxpages,
.ro_init = frwr_op_init,
.ro_reset = frwr_op_reset,
+ .ro_destroy = frwr_op_destroy,
.ro_displayname = "frwr",
};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index e060713..eb39011 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -64,11 +64,17 @@ physical_op_reset(struct rpcrdma_xprt *r_xprt)
{
}

+static void
+physical_op_destroy(struct rpcrdma_buffer *buf)
+{
+}
+
const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
.ro_unmap = physical_op_unmap,
.ro_maxpages = physical_op_maxpages,
.ro_init = physical_op_init,
.ro_reset = physical_op_reset,
+ .ro_destroy = physical_op_destroy,
.ro_displayname = "physical",
};
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 1b2c1f4..a7fb314 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1199,47 +1199,6 @@ rpcrdma_destroy_req(struct rpcrdma_ia *ia, struct rpcrdma_req *req)
kfree(req);
}

-static void
-rpcrdma_destroy_fmrs(struct rpcrdma_buffer *buf)
-{
- struct rpcrdma_mw *r;
- int rc;
-
- while (!list_empty(&buf->rb_all)) {
- r = list_entry(buf->rb_all.next, struct rpcrdma_mw, mw_all);
- list_del(&r->mw_all);
- list_del(&r->mw_list);
-
- rc = ib_dealloc_fmr(r->r.fmr);
- if (rc)
- dprintk("RPC: %s: ib_dealloc_fmr failed %i\n",
- __func__, rc);
-
- kfree(r);
- }
-}
-
-static void
-rpcrdma_destroy_frmrs(struct rpcrdma_buffer *buf)
-{
- struct rpcrdma_mw *r;
- int rc;
-
- while (!list_empty(&buf->rb_all)) {
- r = list_entry(buf->rb_all.next, struct rpcrdma_mw, mw_all);
- list_del(&r->mw_all);
- list_del(&r->mw_list);
-
- rc = ib_dereg_mr(r->r.frmr.fr_mr);
- if (rc)
- dprintk("RPC: %s: ib_dereg_mr failed %i\n",
- __func__, rc);
- ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
-
- kfree(r);
- }
-}
-
void
rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
{
@@ -1260,16 +1219,7 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
rpcrdma_destroy_req(ia, buf->rb_send_bufs[i]);
}

- switch (ia->ri_memreg_strategy) {
- case RPCRDMA_FRMR:
- rpcrdma_destroy_frmrs(buf);
- break;
- case RPCRDMA_MTHCAFMR:
- rpcrdma_destroy_fmrs(buf);
- break;
- default:
- break;
- }
+ ia->ri_ops->ro_destroy(buf);

kfree(buf->rb_pool);
}
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 0680239..b95e223 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -343,6 +343,7 @@ struct rpcrdma_memreg_ops {
size_t (*ro_maxpages)(struct rpcrdma_xprt *);
int (*ro_init)(struct rpcrdma_xprt *);
void (*ro_reset)(struct rpcrdma_xprt *);
+ void (*ro_destroy)(struct rpcrdma_buffer *);
const char *ro_displayname;
};



2015-03-30 18:35:29

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 13/15] xprtrdma: Add "open" memreg op

The open op determines the size of various transport data structures
based on device capabilities and memory registration mode.

Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 8 ++++++
net/sunrpc/xprtrdma/frwr_ops.c | 48 +++++++++++++++++++++++++++++++++++
net/sunrpc/xprtrdma/physical_ops.c | 8 ++++++
net/sunrpc/xprtrdma/verbs.c | 49 ++----------------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 3 ++
5 files changed, 70 insertions(+), 46 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index e9ca594..e8a9837 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -20,6 +20,13 @@
/* Maximum scatter/gather per FMR */
#define RPCRDMA_MAX_FMR_SGES (64)

+static int
+fmr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
+ struct rpcrdma_create_data_internal *cdata)
+{
+ return 0;
+}
+
/* FMR mode conveys up to 64 pages of payload per chunk segment.
*/
static size_t
@@ -188,6 +195,7 @@ fmr_op_destroy(struct rpcrdma_buffer *buf)
const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
.ro_unmap = fmr_op_unmap,
+ .ro_open = fmr_op_open,
.ro_maxpages = fmr_op_maxpages,
.ro_init = fmr_op_init,
.ro_reset = fmr_op_reset,
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 121e400..e17d54d 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -58,6 +58,53 @@ __frwr_release(struct rpcrdma_mw *r)
ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
}

+static int
+frwr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
+ struct rpcrdma_create_data_internal *cdata)
+{
+ struct ib_device_attr *devattr = &ia->ri_devattr;
+ int depth, delta;
+
+ ia->ri_max_frmr_depth =
+ min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
+ devattr->max_fast_reg_page_list_len);
+ dprintk("RPC: %s: device's max FR page list len = %u\n",
+ __func__, ia->ri_max_frmr_depth);
+
+ /* Add room for frmr register and invalidate WRs.
+ * 1. FRMR reg WR for head
+ * 2. FRMR invalidate WR for head
+ * 3. N FRMR reg WRs for pagelist
+ * 4. N FRMR invalidate WRs for pagelist
+ * 5. FRMR reg WR for tail
+ * 6. FRMR invalidate WR for tail
+ * 7. The RDMA_SEND WR
+ */
+ depth = 7;
+
+ /* Calculate N if the device max FRMR depth is smaller than
+ * RPCRDMA_MAX_DATA_SEGS.
+ */
+ if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
+ delta = RPCRDMA_MAX_DATA_SEGS - ia->ri_max_frmr_depth;
+ do {
+ depth += 2; /* FRMR reg + invalidate */
+ delta -= ia->ri_max_frmr_depth;
+ } while (delta > 0);
+ }
+
+ ep->rep_attr.cap.max_send_wr *= depth;
+ if (ep->rep_attr.cap.max_send_wr > devattr->max_qp_wr) {
+ cdata->max_requests = devattr->max_qp_wr / depth;
+ if (!cdata->max_requests)
+ return -EINVAL;
+ ep->rep_attr.cap.max_send_wr = cdata->max_requests *
+ depth;
+ }
+
+ return 0;
+}
+
/* FRWR mode conveys a list of pages per chunk segment. The
* maximum length of that list is the FRWR page list depth.
*/
@@ -276,6 +323,7 @@ frwr_op_destroy(struct rpcrdma_buffer *buf)
const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
.ro_unmap = frwr_op_unmap,
+ .ro_open = frwr_op_open,
.ro_maxpages = frwr_op_maxpages,
.ro_init = frwr_op_init,
.ro_reset = frwr_op_reset,
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index eb39011..0ba130b 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -19,6 +19,13 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

+static int
+physical_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
+ struct rpcrdma_create_data_internal *cdata)
+{
+ return 0;
+}
+
/* PHYSICAL memory registration conveys one page per chunk segment.
*/
static size_t
@@ -72,6 +79,7 @@ physical_op_destroy(struct rpcrdma_buffer *buf)
const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
.ro_unmap = physical_op_unmap,
+ .ro_open = physical_op_open,
.ro_maxpages = physical_op_maxpages,
.ro_init = physical_op_init,
.ro_reset = physical_op_reset,
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index a7fb314..b697b3e 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -622,11 +622,6 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
dprintk("RPC: %s: FRMR registration "
"not supported by HCA\n", __func__);
memreg = RPCRDMA_MTHCAFMR;
- } else {
- /* Mind the ia limit on FRMR page list depth */
- ia->ri_max_frmr_depth = min_t(unsigned int,
- RPCRDMA_MAX_DATA_SEGS,
- devattr->max_fast_reg_page_list_len);
}
}
if (memreg == RPCRDMA_MTHCAFMR) {
@@ -741,49 +736,11 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,

ep->rep_attr.event_handler = rpcrdma_qp_async_error_upcall;
ep->rep_attr.qp_context = ep;
- /* send_cq and recv_cq initialized below */
ep->rep_attr.srq = NULL;
ep->rep_attr.cap.max_send_wr = cdata->max_requests;
- switch (ia->ri_memreg_strategy) {
- case RPCRDMA_FRMR: {
- int depth = 7;
-
- /* Add room for frmr register and invalidate WRs.
- * 1. FRMR reg WR for head
- * 2. FRMR invalidate WR for head
- * 3. N FRMR reg WRs for pagelist
- * 4. N FRMR invalidate WRs for pagelist
- * 5. FRMR reg WR for tail
- * 6. FRMR invalidate WR for tail
- * 7. The RDMA_SEND WR
- */
-
- /* Calculate N if the device max FRMR depth is smaller than
- * RPCRDMA_MAX_DATA_SEGS.
- */
- if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
- int delta = RPCRDMA_MAX_DATA_SEGS -
- ia->ri_max_frmr_depth;
-
- do {
- depth += 2; /* FRMR reg + invalidate */
- delta -= ia->ri_max_frmr_depth;
- } while (delta > 0);
-
- }
- ep->rep_attr.cap.max_send_wr *= depth;
- if (ep->rep_attr.cap.max_send_wr > devattr->max_qp_wr) {
- cdata->max_requests = devattr->max_qp_wr / depth;
- if (!cdata->max_requests)
- return -EINVAL;
- ep->rep_attr.cap.max_send_wr = cdata->max_requests *
- depth;
- }
- break;
- }
- default:
- break;
- }
+ rc = ia->ri_ops->ro_open(ia, ep, cdata);
+ if (rc)
+ return rc;
ep->rep_attr.cap.max_recv_wr = cdata->max_requests;
ep->rep_attr.cap.max_send_sge = (cdata->padding ? 4 : 2);
ep->rep_attr.cap.max_recv_sge = 1;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index b95e223..9036fb4 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -340,6 +340,9 @@ struct rpcrdma_memreg_ops {
struct rpcrdma_mr_seg *, int, bool);
int (*ro_unmap)(struct rpcrdma_xprt *,
struct rpcrdma_mr_seg *);
+ int (*ro_open)(struct rpcrdma_ia *,
+ struct rpcrdma_ep *,
+ struct rpcrdma_create_data_internal *);
size_t (*ro_maxpages)(struct rpcrdma_xprt *);
int (*ro_init)(struct rpcrdma_xprt *);
void (*ro_reset)(struct rpcrdma_xprt *);


2015-03-30 18:35:47

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 15/15] xprtrdma: Make rpcrdma_{un}map_one() into inline functions

These functions are called in a loop for each page transferred via
RDMA READ or WRITE. Extract loop invariants and inline them to
reduce CPU overhead.

Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 10 ++++++--
net/sunrpc/xprtrdma/frwr_ops.c | 10 ++++++--
net/sunrpc/xprtrdma/physical_ops.c | 10 ++++++--
net/sunrpc/xprtrdma/verbs.c | 44 ++++++-----------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 45 ++++++++++++++++++++++++++++++++++--
5 files changed, 73 insertions(+), 46 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index e8a9837..a91ba2c 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -85,6 +85,8 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
int nsegs, bool writing)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct ib_device *device = ia->ri_id->device;
+ enum dma_data_direction direction = rpcrdma_data_dir(writing);
struct rpcrdma_mr_seg *seg1 = seg;
struct rpcrdma_mw *mw = seg1->rl_mw;
u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
@@ -97,7 +99,7 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
if (nsegs > RPCRDMA_MAX_FMR_SGES)
nsegs = RPCRDMA_MAX_FMR_SGES;
for (i = 0; i < nsegs;) {
- rpcrdma_map_one(ia, seg, writing);
+ rpcrdma_map_one(device, seg, direction);
physaddrs[i] = seg->mr_dma;
len += seg->mr_len;
++seg;
@@ -123,7 +125,7 @@ out_maperr:
__func__, len, (unsigned long long)seg1->mr_dma,
pageoff, i, rc);
while (i--)
- rpcrdma_unmap_one(ia, --seg);
+ rpcrdma_unmap_one(device, --seg);
return rc;
}

@@ -135,14 +137,16 @@ fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_mr_seg *seg1 = seg;
+ struct ib_device *device;
int rc, nsegs = seg->mr_nsegs;
LIST_HEAD(l);

list_add(&seg1->rl_mw->r.fmr->list, &l);
rc = ib_unmap_fmr(&l);
read_lock(&ia->ri_qplock);
+ device = ia->ri_id->device;
while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(ia, seg++);
+ rpcrdma_unmap_one(device, seg++);
read_unlock(&ia->ri_qplock);
if (rc)
goto out_err;
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index ea59c1b..0a7b9df 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -178,6 +178,8 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
int nsegs, bool writing)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct ib_device *device = ia->ri_id->device;
+ enum dma_data_direction direction = rpcrdma_data_dir(writing);
struct rpcrdma_mr_seg *seg1 = seg;
struct rpcrdma_mw *mw = seg1->rl_mw;
struct rpcrdma_frmr *frmr = &mw->r.frmr;
@@ -197,7 +199,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
if (nsegs > ia->ri_max_frmr_depth)
nsegs = ia->ri_max_frmr_depth;
for (page_no = i = 0; i < nsegs;) {
- rpcrdma_map_one(ia, seg, writing);
+ rpcrdma_map_one(device, seg, direction);
pa = seg->mr_dma;
for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
frmr->fr_pgl->page_list[page_no++] = pa;
@@ -247,7 +249,7 @@ out_senderr:
ib_update_fast_reg_key(mr, --key);
frmr->fr_state = FRMR_IS_INVALID;
while (i--)
- rpcrdma_unmap_one(ia, --seg);
+ rpcrdma_unmap_one(device, --seg);
return rc;
}

@@ -261,6 +263,7 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct ib_send_wr invalidate_wr, *bad_wr;
int rc, nsegs = seg->mr_nsegs;
+ struct ib_device *device;

seg1->rl_mw->r.frmr.fr_state = FRMR_IS_INVALID;

@@ -271,8 +274,9 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
DECR_CQCOUNT(&r_xprt->rx_ep);

read_lock(&ia->ri_qplock);
+ device = ia->ri_id->device;
while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(ia, seg++);
+ rpcrdma_unmap_one(device, seg++);
rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
read_unlock(&ia->ri_qplock);
if (rc)
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index 0ba130b..ba518af 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -50,7 +50,8 @@ physical_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;

- rpcrdma_map_one(ia, seg, writing);
+ rpcrdma_map_one(ia->ri_id->device, seg,
+ rpcrdma_data_dir(writing));
seg->mr_rkey = ia->ri_bind_mem->rkey;
seg->mr_base = seg->mr_dma;
seg->mr_nsegs = 1;
@@ -62,7 +63,12 @@ physical_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
static int
physical_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
{
- rpcrdma_unmap_one(&r_xprt->rx_ia, seg);
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+
+ read_lock(&ia->ri_qplock);
+ rpcrdma_unmap_one(ia->ri_id->device, seg);
+ read_unlock(&ia->ri_qplock);
+
return 1;
}

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index cac06f2..4870d27 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1436,6 +1436,14 @@ rpcrdma_recv_buffer_put(struct rpcrdma_rep *rep)
* Wrappers for internal-use kmalloc memory registration, used by buffer code.
*/

+void
+rpcrdma_mapping_error(struct rpcrdma_mr_seg *seg)
+{
+ dprintk("RPC: map_one: offset %p iova %llx len %zu\n",
+ seg->mr_offset,
+ (unsigned long long)seg->mr_dma, seg->mr_dmalen);
+}
+
static int
rpcrdma_register_internal(struct rpcrdma_ia *ia, void *va, int len,
struct ib_mr **mrp, struct ib_sge *iov)
@@ -1561,42 +1569,6 @@ rpcrdma_free_regbuf(struct rpcrdma_ia *ia, struct rpcrdma_regbuf *rb)
}

/*
- * Wrappers for chunk registration, shared by read/write chunk code.
- */
-
-void
-rpcrdma_map_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg, bool writing)
-{
- seg->mr_dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
- seg->mr_dmalen = seg->mr_len;
- if (seg->mr_page)
- seg->mr_dma = ib_dma_map_page(ia->ri_id->device,
- seg->mr_page, offset_in_page(seg->mr_offset),
- seg->mr_dmalen, seg->mr_dir);
- else
- seg->mr_dma = ib_dma_map_single(ia->ri_id->device,
- seg->mr_offset,
- seg->mr_dmalen, seg->mr_dir);
- if (ib_dma_mapping_error(ia->ri_id->device, seg->mr_dma)) {
- dprintk("RPC: %s: mr_dma %llx mr_offset %p mr_dma_len %zu\n",
- __func__,
- (unsigned long long)seg->mr_dma,
- seg->mr_offset, seg->mr_dmalen);
- }
-}
-
-void
-rpcrdma_unmap_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg)
-{
- if (seg->mr_page)
- ib_dma_unmap_page(ia->ri_id->device,
- seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
- else
- ib_dma_unmap_single(ia->ri_id->device,
- seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
-}
-
-/*
* Prepost any receive buffer, then post send.
*
* Receive buffer is donated to hardware, reclaimed upon recv completion.
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 54bcbe4..78e0b8b 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -424,8 +424,49 @@ void rpcrdma_free_regbuf(struct rpcrdma_ia *,
struct rpcrdma_regbuf *);

unsigned int rpcrdma_max_segments(struct rpcrdma_xprt *);
-void rpcrdma_map_one(struct rpcrdma_ia *, struct rpcrdma_mr_seg *, bool);
-void rpcrdma_unmap_one(struct rpcrdma_ia *, struct rpcrdma_mr_seg *);
+
+/*
+ * Wrappers for chunk registration, shared by read/write chunk code.
+ */
+
+void rpcrdma_mapping_error(struct rpcrdma_mr_seg *);
+
+static inline enum dma_data_direction
+rpcrdma_data_dir(bool writing)
+{
+ return writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+}
+
+static inline void
+rpcrdma_map_one(struct ib_device *device, struct rpcrdma_mr_seg *seg,
+ enum dma_data_direction direction)
+{
+ seg->mr_dir = direction;
+ seg->mr_dmalen = seg->mr_len;
+
+ if (seg->mr_page)
+ seg->mr_dma = ib_dma_map_page(device,
+ seg->mr_page, offset_in_page(seg->mr_offset),
+ seg->mr_dmalen, seg->mr_dir);
+ else
+ seg->mr_dma = ib_dma_map_single(device,
+ seg->mr_offset,
+ seg->mr_dmalen, seg->mr_dir);
+
+ if (ib_dma_mapping_error(device, seg->mr_dma))
+ rpcrdma_mapping_error(seg);
+}
+
+static inline void
+rpcrdma_unmap_one(struct ib_device *device, struct rpcrdma_mr_seg *seg)
+{
+ if (seg->mr_page)
+ ib_dma_unmap_page(device,
+ seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
+ else
+ ib_dma_unmap_single(device,
+ seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
+}

/*
* RPC/RDMA connection management calls - xprtrdma/rpc_rdma.c


2015-03-30 18:35:38

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 14/15] xprtrdma: Handle non-SEND completions via a callout

Allow each memory registration mode to plug in a callout that handles
the completion of a memory registration operation.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Tested-by: Devesh Sharma <[email protected]>
Tested-by: Meghana Cheripady <[email protected]>
Tested-by: Veeresh U. Kokatnur <[email protected]>
---
net/sunrpc/xprtrdma/frwr_ops.c | 17 +++++++++++++++++
net/sunrpc/xprtrdma/verbs.c | 16 ++++++----------
net/sunrpc/xprtrdma/xprt_rdma.h | 5 +++++
3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index e17d54d..ea59c1b 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -117,6 +117,22 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth);
}

+/* If FAST_REG or LOCAL_INV failed, indicate the frmr needs to be reset. */
+static void
+frwr_sendcompletion(struct ib_wc *wc)
+{
+ struct rpcrdma_mw *r;
+
+ if (likely(wc->status == IB_WC_SUCCESS))
+ return;
+
+ /* WARNING: Only wr_id and status are reliable at this point */
+ r = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
+ dprintk("RPC: %s: frmr %p (stale), status %d\n",
+ __func__, r, wc->status);
+ r->r.frmr.fr_state = FRMR_IS_STALE;
+}
+
static int
frwr_op_init(struct rpcrdma_xprt *r_xprt)
{
@@ -148,6 +164,7 @@ frwr_op_init(struct rpcrdma_xprt *r_xprt)

list_add(&r->mw_list, &buf->rb_mws);
list_add(&r->mw_all, &buf->rb_all);
+ r->mw_sendcompletion = frwr_sendcompletion;
}

return 0;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index b697b3e..cac06f2 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -186,7 +186,7 @@ static const char * const wc_status[] = {
"remote access error",
"remote operation error",
"transport retry counter exceeded",
- "RNR retrycounter exceeded",
+ "RNR retry counter exceeded",
"local RDD violation error",
"remove invalid RD request",
"operation aborted",
@@ -204,21 +204,17 @@ static const char * const wc_status[] = {
static void
rpcrdma_sendcq_process_wc(struct ib_wc *wc)
{
- if (likely(wc->status == IB_WC_SUCCESS))
- return;
-
/* WARNING: Only wr_id and status are reliable at this point */
- if (wc->wr_id == 0ULL) {
- if (wc->status != IB_WC_WR_FLUSH_ERR)
+ if (wc->wr_id == RPCRDMA_IGNORE_COMPLETION) {
+ if (wc->status != IB_WC_SUCCESS &&
+ wc->status != IB_WC_WR_FLUSH_ERR)
pr_err("RPC: %s: SEND: %s\n",
__func__, COMPLETION_MSG(wc->status));
} else {
struct rpcrdma_mw *r;

r = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
- r->r.frmr.fr_state = FRMR_IS_STALE;
- pr_err("RPC: %s: frmr %p (stale): %s\n",
- __func__, r, COMPLETION_MSG(wc->status));
+ r->mw_sendcompletion(wc);
}
}

@@ -1622,7 +1618,7 @@ rpcrdma_ep_post(struct rpcrdma_ia *ia,
}

send_wr.next = NULL;
- send_wr.wr_id = 0ULL; /* no send cookie */
+ send_wr.wr_id = RPCRDMA_IGNORE_COMPLETION;
send_wr.sg_list = req->rl_send_iov;
send_wr.num_sge = req->rl_niovs;
send_wr.opcode = IB_WR_SEND;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 9036fb4..54bcbe4 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -106,6 +106,10 @@ struct rpcrdma_ep {
#define INIT_CQCOUNT(ep) atomic_set(&(ep)->rep_cqcount, (ep)->rep_cqinit)
#define DECR_CQCOUNT(ep) atomic_sub_return(1, &(ep)->rep_cqcount)

+/* Force completion handler to ignore the signal
+ */
+#define RPCRDMA_IGNORE_COMPLETION (0ULL)
+
/* Registered buffer -- registered kmalloc'd memory for RDMA SEND/RECV
*
* The below structure appears at the front of a large region of kmalloc'd
@@ -206,6 +210,7 @@ struct rpcrdma_mw {
struct ib_fmr *fmr;
struct rpcrdma_frmr frmr;
} r;
+ void (*mw_sendcompletion)(struct ib_wc *);
struct list_head mw_list;
struct list_head mw_all;
};


2015-03-31 14:26:44

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v3 00/15] NFS/RDMA patches proposed for 4.1

On 03/30/2015 02:33 PM, Chuck Lever wrote:
> This is a series of client-side patches for NFS/RDMA. In preparation
> for increasing the transport credit limit and maximum rsize/wsize,
> I've re-factored the memory registration logic into separate files,
> invoked via a method API.
>
> The series is available in the nfs-rdma-for-4.1 topic branch at
>
> git://linux-nfs.org/projects/cel/cel-2.6.git

Thanks! I've applied these and pushed them out to my nfs-rdma git tree.

Anna

>
> Changes since v2:
> - Rebased on 4.0-rc6
> - One minor fix squashed into 01/15
> - Tested-by tags added
>
> Changes since v1:
> - Rebased on 4.0-rc5
> - Main optimizations postponed to 4.2
> - Addressed review comments from Anna, Sagi, and Devesh
>
> ---
>
> Chuck Lever (15):
> SUNRPC: Introduce missing well-known netids
> xprtrdma: Display IPv6 addresses and port numbers correctly
> xprtrdma: Perform a full marshal on retransmit
> xprtrdma: Byte-align FRWR registration
> xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
> xprtrdma: Add vector of ops for each memory registration strategy
> xprtrdma: Add a "max_payload" op for each memreg mode
> xprtrdma: Add a "register_external" op for each memreg mode
> xprtrdma: Add a "deregister_external" op for each memreg mode
> xprtrdma: Add "init MRs" memreg op
> xprtrdma: Add "reset MRs" memreg op
> xprtrdma: Add "destroy MRs" memreg op
> xprtrdma: Add "open" memreg op
> xprtrdma: Handle non-SEND completions via a callout
> xprtrdma: Make rpcrdma_{un}map_one() into inline functions
>
>
> include/linux/sunrpc/msg_prot.h | 8
> include/linux/sunrpc/xprtrdma.h | 5
> net/sunrpc/xprtrdma/Makefile | 3
> net/sunrpc/xprtrdma/fmr_ops.c | 208 +++++++++++
> net/sunrpc/xprtrdma/frwr_ops.c | 353 ++++++++++++++++++
> net/sunrpc/xprtrdma/physical_ops.c | 94 +++++
> net/sunrpc/xprtrdma/rpc_rdma.c | 87 ++--
> net/sunrpc/xprtrdma/transport.c | 61 ++-
> net/sunrpc/xprtrdma/verbs.c | 699 +++---------------------------------
> net/sunrpc/xprtrdma/xprt_rdma.h | 90 ++++-
> 10 files changed, 882 insertions(+), 726 deletions(-)
> create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c
> create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c
> create mode 100644 net/sunrpc/xprtrdma/physical_ops.c
>
> --
> Chuck Lever
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>