2015-05-11 18:02:19

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 00/16] NFS/RDMA patches proposed for 4.2

I'd like these patches to be considered for merging upstream. This
patch series includes:

- JIT allocation of rpcrdma_mw structures
- Break-up of rb_lock
- Reduction of how many rpcrdma_mw structs are needed per transport

These are pre-requisites for increasing the RPC slot count and
r/wsize on RPC/RDMA transports, and provide scalability benefits
even on their own. And:

- A generic transport fault injector

This is useful to discover regressions in logic that handles
transport reconnection.

You can find these in my git repo in the "nfs-rdma-for-4.2" topic
branch. See:

git://git.linux-nfs.org/projects/cel/cel-2.6.git


Changes since v1:

- Rebased on 4.1-rc3
- Transport fault injector controlled from debugfs rather than /proc
- Transport fault injector works for all transport types
- bc_send() clean up suggested by Christoph Hellwig
- Added Reviewed-by: tags. Many thanks to reviewers!
- Addressed all review comments but one: Sagi's comment about
ri_device remains unresolved.

---

Chuck Lever (16):
SUNRPC: Transport fault injection
xprtrdma: Warn when there are orphaned IB objects
xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt
xprtrdma: Remove rr_func
xprtrdma: Use ib_device pointer safely
xprtrdma: Introduce helpers for allocating MWs
xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external()
xprtrdma: Introduce an FRMR recovery workqueue
xprtrdma: Acquire MRs in rpcrdma_register_external()
xprtrdma: Remove unused LOCAL_INV recovery logic
xprtrdma: Remove ->ro_reset
xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy
xprtrdma: Split rb_lock
xprtrdma: Stack relief in fmr_op_map()
xprtrdma: Reduce per-transport MR allocation
SUNRPC: Clean up bc_send()


include/linux/sunrpc/bc_xprt.h | 1
include/linux/sunrpc/xprt.h | 19 +++
include/linux/sunrpc/xprtrdma.h | 3
net/sunrpc/Makefile | 2
net/sunrpc/bc_svc.c | 63 ---------
net/sunrpc/clnt.c | 1
net/sunrpc/debugfs.c | 77 +++++++++++
net/sunrpc/svc.c | 33 ++++-
net/sunrpc/xprt.c | 2
net/sunrpc/xprtrdma/fmr_ops.c | 120 +++++++++++------
net/sunrpc/xprtrdma/frwr_ops.c | 227 +++++++++++++++++++++++---------
net/sunrpc/xprtrdma/physical_ops.c | 14 --
net/sunrpc/xprtrdma/rpc_rdma.c | 8 -
net/sunrpc/xprtrdma/transport.c | 30 +++-
net/sunrpc/xprtrdma/verbs.c | 257 +++++++++---------------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 38 ++++-
net/sunrpc/xprtsock.c | 10 +
17 files changed, 492 insertions(+), 413 deletions(-)
delete mode 100644 net/sunrpc/bc_svc.c

--
Chuck Lever


2015-05-11 18:02:29

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 01/16] SUNRPC: Transport fault injection

It has been exceptionally useful to exercise the logic that handles
local immediate errors and RDMA connection loss. To enable
developers to test this regularly and repeatably, add logic to
simulate connection loss every so often.

Fault injection is disabled by default. It is enabled with

$ sudo echo xxx > /sys/kernel/debug/sunrpc/inject_fault/disconnect

where "xxx" is a large positive number of transport method calls
before a disconnect. A value of several thousand is usually a good
number that allows reasonable forward progress while still causing a
lot of connection drops.

These hooks are disabled when SUNRPC_DEBUG is turned off.

Signed-off-by: Chuck Lever <[email protected]>
---
include/linux/sunrpc/xprt.h | 19 ++++++++++
net/sunrpc/clnt.c | 1 +
net/sunrpc/debugfs.c | 77 +++++++++++++++++++++++++++++++++++++++
net/sunrpc/xprt.c | 2 +
net/sunrpc/xprtrdma/transport.c | 13 ++++++-
net/sunrpc/xprtsock.c | 10 +++++
6 files changed, 121 insertions(+), 1 deletion(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 8b93ef5..178190a 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -133,6 +133,7 @@ struct rpc_xprt_ops {
void (*close)(struct rpc_xprt *xprt);
void (*destroy)(struct rpc_xprt *xprt);
void (*print_stats)(struct rpc_xprt *xprt, struct seq_file *seq);
+ void (*inject_disconnect)(struct rpc_xprt *xprt);
};

/*
@@ -241,6 +242,7 @@ struct rpc_xprt {
const char *address_strings[RPC_DISPLAY_MAX];
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
struct dentry *debugfs; /* debugfs directory */
+ atomic_t inject_disconnect;
#endif
};

@@ -431,6 +433,23 @@ static inline int xprt_test_and_set_binding(struct rpc_xprt *xprt)
return test_and_set_bit(XPRT_BINDING, &xprt->state);
}

+#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
+extern unsigned int rpc_inject_disconnect;
+static inline void xprt_inject_disconnect(struct rpc_xprt *xprt)
+{
+ if (!rpc_inject_disconnect)
+ return;
+ if (atomic_dec_return(&xprt->inject_disconnect))
+ return;
+ atomic_set(&xprt->inject_disconnect, rpc_inject_disconnect);
+ xprt->ops->inject_disconnect(xprt);
+}
+#else
+static inline void xprt_inject_disconnect(struct rpc_xprt *xprt)
+{
+}
+#endif
+
#endif /* __KERNEL__*/

#endif /* _LINUX_SUNRPC_XPRT_H */
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index e6ce151..db4efb6 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1614,6 +1614,7 @@ call_allocate(struct rpc_task *task)
req->rq_callsize + req->rq_rcvsize);
if (req->rq_buffer != NULL)
return;
+ xprt_inject_disconnect(xprt);

dprintk("RPC: %5u rpc_buffer allocation failed\n", task->tk_pid);

diff --git a/net/sunrpc/debugfs.c b/net/sunrpc/debugfs.c
index 82962f7..7cc1b8a 100644
--- a/net/sunrpc/debugfs.c
+++ b/net/sunrpc/debugfs.c
@@ -10,9 +10,12 @@
#include "netns.h"

static struct dentry *topdir;
+static struct dentry *rpc_fault_dir;
static struct dentry *rpc_clnt_dir;
static struct dentry *rpc_xprt_dir;

+unsigned int rpc_inject_disconnect;
+
struct rpc_clnt_iter {
struct rpc_clnt *clnt;
loff_t pos;
@@ -257,6 +260,8 @@ rpc_xprt_debugfs_register(struct rpc_xprt *xprt)
debugfs_remove_recursive(xprt->debugfs);
xprt->debugfs = NULL;
}
+
+ atomic_set(&xprt->inject_disconnect, rpc_inject_disconnect);
}

void
@@ -266,11 +271,78 @@ rpc_xprt_debugfs_unregister(struct rpc_xprt *xprt)
xprt->debugfs = NULL;
}

+static int
+fault_open(struct inode *inode, struct file *filp)
+{
+ filp->private_data = kmalloc(128, GFP_KERNEL);
+ if (!filp->private_data)
+ return -ENOMEM;
+ return 0;
+}
+
+static int
+fault_release(struct inode *inode, struct file *filp)
+{
+ kfree(filp->private_data);
+ return 0;
+}
+
+static ssize_t
+fault_disconnect_read(struct file *filp, char __user *user_buf,
+ size_t len, loff_t *offset)
+{
+ char *buffer = (char *)filp->private_data;
+ size_t size;
+
+ size = sprintf(buffer, "%u\n", rpc_inject_disconnect);
+ return simple_read_from_buffer(user_buf, len, offset, buffer, size);
+}
+
+static ssize_t
+fault_disconnect_write(struct file *filp, const char __user *user_buf,
+ size_t len, loff_t *offset)
+{
+ char buffer[16];
+
+ len = min(len, sizeof(buffer) - 1);
+ if (copy_from_user(buffer, user_buf, len))
+ return -EFAULT;
+ buffer[len] = '\0';
+ if (kstrtouint(buffer, 10, &rpc_inject_disconnect))
+ return -EINVAL;
+ return len;
+}
+
+static const struct file_operations fault_disconnect_fops = {
+ .owner = THIS_MODULE,
+ .open = fault_open,
+ .read = fault_disconnect_read,
+ .write = fault_disconnect_write,
+ .release = fault_release,
+};
+
+static struct dentry *
+inject_fault_dir(struct dentry *topdir)
+{
+ struct dentry *faultdir;
+
+ faultdir = debugfs_create_dir("inject_fault", topdir);
+ if (!faultdir)
+ return NULL;
+
+ if (!debugfs_create_file("disconnect", S_IFREG | S_IRUSR, faultdir,
+ NULL, &fault_disconnect_fops))
+ return NULL;
+
+ return faultdir;
+}
+
void __exit
sunrpc_debugfs_exit(void)
{
debugfs_remove_recursive(topdir);
topdir = NULL;
+ rpc_fault_dir = NULL;
rpc_clnt_dir = NULL;
rpc_xprt_dir = NULL;
}
@@ -282,6 +354,10 @@ sunrpc_debugfs_init(void)
if (!topdir)
return;

+ rpc_fault_dir = inject_fault_dir(topdir);
+ if (!rpc_fault_dir)
+ goto out_remove;
+
rpc_clnt_dir = debugfs_create_dir("rpc_clnt", topdir);
if (!rpc_clnt_dir)
goto out_remove;
@@ -294,5 +370,6 @@ sunrpc_debugfs_init(void)
out_remove:
debugfs_remove_recursive(topdir);
topdir = NULL;
+ rpc_fault_dir = NULL;
rpc_clnt_dir = NULL;
}
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 1d4fe24..e1fb538 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -967,6 +967,7 @@ void xprt_transmit(struct rpc_task *task)
task->tk_status = status;
return;
}
+ xprt_inject_disconnect(xprt);

dprintk("RPC: %5u xmit complete\n", task->tk_pid);
task->tk_flags |= RPC_TASK_SENT;
@@ -1285,6 +1286,7 @@ void xprt_release(struct rpc_task *task)
spin_unlock_bh(&xprt->transport_lock);
if (req->rq_buffer)
xprt->ops->buf_free(req->rq_buffer);
+ xprt_inject_disconnect(xprt);
if (req->rq_cred != NULL)
put_rpccred(req->rq_cred);
task->tk_rqstp = NULL;
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 54f23b1..dfcd52e 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -246,6 +246,16 @@ xprt_rdma_connect_worker(struct work_struct *work)
xprt_clear_connecting(xprt);
}

+static void
+xprt_rdma_inject_disconnect(struct rpc_xprt *xprt)
+{
+ struct rpcrdma_xprt *r_xprt = container_of(xprt, struct rpcrdma_xprt,
+ rx_xprt);
+
+ pr_info("rpcrdma: injecting transport disconnect on xprt=%p\n", xprt);
+ rdma_disconnect(r_xprt->rx_ia.ri_id);
+}
+
/*
* xprt_rdma_destroy
*
@@ -700,7 +710,8 @@ static struct rpc_xprt_ops xprt_rdma_procs = {
.send_request = xprt_rdma_send_request,
.close = xprt_rdma_close,
.destroy = xprt_rdma_destroy,
- .print_stats = xprt_rdma_print_stats
+ .print_stats = xprt_rdma_print_stats,
+ .inject_disconnect = xprt_rdma_inject_disconnect
};

static struct xprt_class xprt_rdma = {
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 66891e3..a0e7138 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -863,6 +863,13 @@ static void xs_close(struct rpc_xprt *xprt)
xprt_disconnect_done(xprt);
}

+static void xs_inject_disconnect(struct rpc_xprt *xprt)
+{
+ dprintk("RPC: injecting transport disconnect on xprt=%p\n",
+ xprt);
+ xprt_disconnect_done(xprt);
+}
+
static void xs_xprt_free(struct rpc_xprt *xprt)
{
xs_free_peer_addresses(xprt);
@@ -2482,6 +2489,7 @@ static struct rpc_xprt_ops xs_udp_ops = {
.close = xs_close,
.destroy = xs_destroy,
.print_stats = xs_udp_print_stats,
+ .inject_disconnect = xs_inject_disconnect,
};

static struct rpc_xprt_ops xs_tcp_ops = {
@@ -2498,6 +2506,7 @@ static struct rpc_xprt_ops xs_tcp_ops = {
.close = xs_tcp_shutdown,
.destroy = xs_destroy,
.print_stats = xs_tcp_print_stats,
+ .inject_disconnect = xs_inject_disconnect,
};

/*
@@ -2515,6 +2524,7 @@ static struct rpc_xprt_ops bc_tcp_ops = {
.close = bc_close,
.destroy = bc_destroy,
.print_stats = xs_tcp_print_stats,
+ .inject_disconnect = xs_inject_disconnect,
};

static int xs_init_anyaddr(const int family, struct sockaddr *sap)


2015-05-11 18:02:38

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 02/16] xprtrdma: Warn when there are orphaned IB objects

WARN during transport destruction if ib_dealloc_pd() fails. This is
a sign that xprtrdma orphaned one or more RDMA API objects at some
point, which can pin lower layer kernel modules and cause shutdown
to hang.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Devesh Sharma <[email protected]>
---
net/sunrpc/xprtrdma/verbs.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 4870d27..51900e6 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -702,17 +702,17 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
dprintk("RPC: %s: ib_dereg_mr returned %i\n",
__func__, rc);
}
+
if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
if (ia->ri_id->qp)
rdma_destroy_qp(ia->ri_id);
rdma_destroy_id(ia->ri_id);
ia->ri_id = NULL;
}
- if (ia->ri_pd != NULL && !IS_ERR(ia->ri_pd)) {
- rc = ib_dealloc_pd(ia->ri_pd);
- dprintk("RPC: %s: ib_dealloc_pd returned %i\n",
- __func__, rc);
- }
+
+ /* If the pd is still busy, xprtrdma missed freeing a resource */
+ if (ia->ri_pd && !IS_ERR(ia->ri_pd))
+ WARN_ON(ib_dealloc_pd(ia->ri_pd));
}

/*


2015-05-11 18:02:48

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 03/16] xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt

Clean up: Instead of carrying a pointer to the buffer pool and
the rpc_xprt, carry a pointer to the controlling rpcrdma_xprt.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 4 ++--
net/sunrpc/xprtrdma/transport.c | 7 ++-----
net/sunrpc/xprtrdma/verbs.c | 8 +++++---
net/sunrpc/xprtrdma/xprt_rdma.h | 3 +--
4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 2c53ea9..98a3b95 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -732,8 +732,8 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
struct rpcrdma_msg *headerp;
struct rpcrdma_req *req;
struct rpc_rqst *rqst;
- struct rpc_xprt *xprt = rep->rr_xprt;
- struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
+ struct rpcrdma_xprt *r_xprt = rep->rr_rxprt;
+ struct rpc_xprt *xprt = &r_xprt->rx_xprt;
__be32 *iptr;
int rdmalen, status;
unsigned long cwnd;
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index dfcd52e..25f7a6e 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -627,12 +627,9 @@ xprt_rdma_send_request(struct rpc_task *task)

if (req->rl_reply == NULL) /* e.g. reconnection */
rpcrdma_recv_buffer_get(req);
-
- if (req->rl_reply) {
+ /* rpcrdma_recv_buffer_get may have set rl_reply, so check again */
+ if (req->rl_reply)
req->rl_reply->rr_func = rpcrdma_reply_handler;
- /* this need only be done once, but... */
- req->rl_reply->rr_xprt = xprt;
- }

/* Must suppress retransmit to maintain credits */
if (req->rl_connect_cookie == xprt->connect_cookie)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 51900e6..c55bfbc 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -278,6 +278,7 @@ rpcrdma_recvcq_process_wc(struct ib_wc *wc, struct list_head *sched_list)
{
struct rpcrdma_rep *rep =
(struct rpcrdma_rep *)(unsigned long)wc->wr_id;
+ struct rpcrdma_ia *ia;

/* WARNING: Only wr_id and status are reliable at this point */
if (wc->status != IB_WC_SUCCESS)
@@ -290,8 +291,9 @@ rpcrdma_recvcq_process_wc(struct ib_wc *wc, struct list_head *sched_list)
dprintk("RPC: %s: rep %p opcode 'recv', length %u: success\n",
__func__, rep, wc->byte_len);

+ ia = &rep->rr_rxprt->rx_ia;
rep->rr_len = wc->byte_len;
- ib_dma_sync_single_for_cpu(rdmab_to_ia(rep->rr_buffer)->ri_id->device,
+ ib_dma_sync_single_for_cpu(ia->ri_id->device,
rdmab_addr(rep->rr_rdmabuf),
rep->rr_len, DMA_FROM_DEVICE);
prefetch(rdmab_to_msg(rep->rr_rdmabuf));
@@ -1053,7 +1055,7 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt)
goto out_free;
}

- rep->rr_buffer = &r_xprt->rx_buf;
+ rep->rr_rxprt = r_xprt;
return rep;

out_free:
@@ -1423,7 +1425,7 @@ rpcrdma_recv_buffer_get(struct rpcrdma_req *req)
void
rpcrdma_recv_buffer_put(struct rpcrdma_rep *rep)
{
- struct rpcrdma_buffer *buffers = rep->rr_buffer;
+ struct rpcrdma_buffer *buffers = &rep->rr_rxprt->rx_buf;
unsigned long flags;

rep->rr_func = NULL;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 78e0b8b..c3d57c0 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -173,8 +173,7 @@ struct rpcrdma_buffer;

struct rpcrdma_rep {
unsigned int rr_len;
- struct rpcrdma_buffer *rr_buffer;
- struct rpc_xprt *rr_xprt;
+ struct rpcrdma_xprt *rr_rxprt;
void (*rr_func)(struct rpcrdma_rep *);
struct list_head rr_list;
struct rpcrdma_regbuf *rr_rdmabuf;


2015-05-11 18:02:57

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 04/16] xprtrdma: Remove rr_func

A posted rpcrdma_rep never has rr_func set to anything but
rpcrdma_reply_handler.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 1 -
net/sunrpc/xprtrdma/transport.c | 3 ---
net/sunrpc/xprtrdma/verbs.c | 10 +---------
net/sunrpc/xprtrdma/xprt_rdma.h | 1 -
4 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 98a3b95..3f422ca 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -770,7 +770,6 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
rep->rr_len);
repost:
r_xprt->rx_stats.bad_reply_count++;
- rep->rr_func = rpcrdma_reply_handler;
if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, &r_xprt->rx_ep, rep))
rpcrdma_recv_buffer_put(rep);

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 25f7a6e..7c12556 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -627,9 +627,6 @@ xprt_rdma_send_request(struct rpc_task *task)

if (req->rl_reply == NULL) /* e.g. reconnection */
rpcrdma_recv_buffer_get(req);
- /* rpcrdma_recv_buffer_get may have set rl_reply, so check again */
- if (req->rl_reply)
- req->rl_reply->rr_func = rpcrdma_reply_handler;

/* Must suppress retransmit to maintain credits */
if (req->rl_connect_cookie == xprt->connect_cookie)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index c55bfbc..8e0bd84 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -80,7 +80,6 @@ static void
rpcrdma_run_tasklet(unsigned long data)
{
struct rpcrdma_rep *rep;
- void (*func)(struct rpcrdma_rep *);
unsigned long flags;

data = data;
@@ -89,14 +88,9 @@ rpcrdma_run_tasklet(unsigned long data)
rep = list_entry(rpcrdma_tasklets_g.next,
struct rpcrdma_rep, rr_list);
list_del(&rep->rr_list);
- func = rep->rr_func;
- rep->rr_func = NULL;
spin_unlock_irqrestore(&rpcrdma_tk_lock_g, flags);

- if (func)
- func(rep);
- else
- rpcrdma_recv_buffer_put(rep);
+ rpcrdma_reply_handler(rep);

spin_lock_irqsave(&rpcrdma_tk_lock_g, flags);
}
@@ -1213,7 +1207,6 @@ rpcrdma_buffer_put_sendbuf(struct rpcrdma_req *req, struct rpcrdma_buffer *buf)
req->rl_niovs = 0;
if (req->rl_reply) {
buf->rb_recv_bufs[--buf->rb_recv_index] = req->rl_reply;
- req->rl_reply->rr_func = NULL;
req->rl_reply = NULL;
}
}
@@ -1428,7 +1421,6 @@ rpcrdma_recv_buffer_put(struct rpcrdma_rep *rep)
struct rpcrdma_buffer *buffers = &rep->rr_rxprt->rx_buf;
unsigned long flags;

- rep->rr_func = NULL;
spin_lock_irqsave(&buffers->rb_lock, flags);
buffers->rb_recv_bufs[--buffers->rb_recv_index] = rep;
spin_unlock_irqrestore(&buffers->rb_lock, flags);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index c3d57c0..230e7fe 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -174,7 +174,6 @@ struct rpcrdma_buffer;
struct rpcrdma_rep {
unsigned int rr_len;
struct rpcrdma_xprt *rr_rxprt;
- void (*rr_func)(struct rpcrdma_rep *);
struct list_head rr_list;
struct rpcrdma_regbuf *rr_rdmabuf;
};


2015-05-11 18:03:07

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 05/16] xprtrdma: Use ib_device pointer safely

The connect worker can replace ri_id, but prevents ri_id->device
from changing during the lifetime of a transport instance. The old
ID is kept around until a new ID is created and the ->device is
confirmed to be the same.

Cache a copy of ri_id->device in rpcrdma_ia and in rpcrdma_rep.
The cached copy can be used safely in code that does not serialize
with the connect worker.

Other code can use it to save an extra address generation (one
pointer dereference instead of two).

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 8 +----
net/sunrpc/xprtrdma/frwr_ops.c | 12 +++----
net/sunrpc/xprtrdma/physical_ops.c | 8 +----
net/sunrpc/xprtrdma/verbs.c | 61 +++++++++++++++++++-----------------
net/sunrpc/xprtrdma/xprt_rdma.h | 2 +
5 files changed, 43 insertions(+), 48 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 302d4eb..0a96155 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -85,7 +85,7 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
int nsegs, bool writing)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
- struct ib_device *device = ia->ri_id->device;
+ struct ib_device *device = ia->ri_device;
enum dma_data_direction direction = rpcrdma_data_dir(writing);
struct rpcrdma_mr_seg *seg1 = seg;
struct rpcrdma_mw *mw = seg1->rl_mw;
@@ -137,17 +137,13 @@ fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_mr_seg *seg1 = seg;
- struct ib_device *device;
int rc, nsegs = seg->mr_nsegs;
LIST_HEAD(l);

list_add(&seg1->rl_mw->r.fmr->list, &l);
rc = ib_unmap_fmr(&l);
- read_lock(&ia->ri_qplock);
- device = ia->ri_id->device;
while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(device, seg++);
- read_unlock(&ia->ri_qplock);
+ rpcrdma_unmap_one(ia->ri_device, seg++);
if (rc)
goto out_err;
return nsegs;
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index dff0481..66a85fa 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -137,7 +137,7 @@ static int
frwr_op_init(struct rpcrdma_xprt *r_xprt)
{
struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
- struct ib_device *device = r_xprt->rx_ia.ri_id->device;
+ struct ib_device *device = r_xprt->rx_ia.ri_device;
unsigned int depth = r_xprt->rx_ia.ri_max_frmr_depth;
struct ib_pd *pd = r_xprt->rx_ia.ri_pd;
int i;
@@ -178,7 +178,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
int nsegs, bool writing)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
- struct ib_device *device = ia->ri_id->device;
+ struct ib_device *device = ia->ri_device;
enum dma_data_direction direction = rpcrdma_data_dir(writing);
struct rpcrdma_mr_seg *seg1 = seg;
struct rpcrdma_mw *mw = seg1->rl_mw;
@@ -263,7 +263,6 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct ib_send_wr invalidate_wr, *bad_wr;
int rc, nsegs = seg->mr_nsegs;
- struct ib_device *device;

seg1->rl_mw->r.frmr.fr_state = FRMR_IS_INVALID;

@@ -273,10 +272,9 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
invalidate_wr.ex.invalidate_rkey = seg1->rl_mw->r.frmr.fr_mr->rkey;
DECR_CQCOUNT(&r_xprt->rx_ep);

- read_lock(&ia->ri_qplock);
- device = ia->ri_id->device;
while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(device, seg++);
+ rpcrdma_unmap_one(ia->ri_device, seg++);
+ read_lock(&ia->ri_qplock);
rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
read_unlock(&ia->ri_qplock);
if (rc)
@@ -304,7 +302,7 @@ static void
frwr_op_reset(struct rpcrdma_xprt *r_xprt)
{
struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
- struct ib_device *device = r_xprt->rx_ia.ri_id->device;
+ struct ib_device *device = r_xprt->rx_ia.ri_device;
unsigned int depth = r_xprt->rx_ia.ri_max_frmr_depth;
struct ib_pd *pd = r_xprt->rx_ia.ri_pd;
struct rpcrdma_mw *r;
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index ba518af..da149e8 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -50,8 +50,7 @@ physical_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;

- rpcrdma_map_one(ia->ri_id->device, seg,
- rpcrdma_data_dir(writing));
+ rpcrdma_map_one(ia->ri_device, seg, rpcrdma_data_dir(writing));
seg->mr_rkey = ia->ri_bind_mem->rkey;
seg->mr_base = seg->mr_dma;
seg->mr_nsegs = 1;
@@ -65,10 +64,7 @@ physical_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;

- read_lock(&ia->ri_qplock);
- rpcrdma_unmap_one(ia->ri_id->device, seg);
- read_unlock(&ia->ri_qplock);
-
+ rpcrdma_unmap_one(ia->ri_device, seg);
return 1;
}

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 8e0bd84..ddd5b36 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -272,7 +272,6 @@ rpcrdma_recvcq_process_wc(struct ib_wc *wc, struct list_head *sched_list)
{
struct rpcrdma_rep *rep =
(struct rpcrdma_rep *)(unsigned long)wc->wr_id;
- struct rpcrdma_ia *ia;

/* WARNING: Only wr_id and status are reliable at this point */
if (wc->status != IB_WC_SUCCESS)
@@ -285,9 +284,8 @@ rpcrdma_recvcq_process_wc(struct ib_wc *wc, struct list_head *sched_list)
dprintk("RPC: %s: rep %p opcode 'recv', length %u: success\n",
__func__, rep, wc->byte_len);

- ia = &rep->rr_rxprt->rx_ia;
rep->rr_len = wc->byte_len;
- ib_dma_sync_single_for_cpu(ia->ri_id->device,
+ ib_dma_sync_single_for_cpu(rep->rr_device,
rdmab_addr(rep->rr_rdmabuf),
rep->rr_len, DMA_FROM_DEVICE);
prefetch(rdmab_to_msg(rep->rr_rdmabuf));
@@ -483,7 +481,7 @@ connected:

pr_info("rpcrdma: connection to %pIS:%u on %s, memreg '%s', %d credits, %d responders%s\n",
sap, rpc_get_port(sap),
- ia->ri_id->device->name,
+ ia->ri_device->name,
ia->ri_ops->ro_displayname,
xprt->rx_buf.rb_max_requests,
ird, ird < 4 && ird < tird / 2 ? " (low!)" : "");
@@ -584,8 +582,9 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
rc = PTR_ERR(ia->ri_id);
goto out1;
}
+ ia->ri_device = ia->ri_id->device;

- ia->ri_pd = ib_alloc_pd(ia->ri_id->device);
+ ia->ri_pd = ib_alloc_pd(ia->ri_device);
if (IS_ERR(ia->ri_pd)) {
rc = PTR_ERR(ia->ri_pd);
dprintk("RPC: %s: ib_alloc_pd() failed %i\n",
@@ -593,7 +592,7 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
goto out2;
}

- rc = ib_query_device(ia->ri_id->device, devattr);
+ rc = ib_query_device(ia->ri_device, devattr);
if (rc) {
dprintk("RPC: %s: ib_query_device failed %d\n",
__func__, rc);
@@ -602,7 +601,7 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)

if (devattr->device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY) {
ia->ri_have_dma_lkey = 1;
- ia->ri_dma_lkey = ia->ri_id->device->local_dma_lkey;
+ ia->ri_dma_lkey = ia->ri_device->local_dma_lkey;
}

if (memreg == RPCRDMA_FRMR) {
@@ -617,7 +616,7 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
}
}
if (memreg == RPCRDMA_MTHCAFMR) {
- if (!ia->ri_id->device->alloc_fmr) {
+ if (!ia->ri_device->alloc_fmr) {
dprintk("RPC: %s: MTHCAFMR registration "
"not supported by HCA\n", __func__);
memreg = RPCRDMA_ALLPHYSICAL;
@@ -767,9 +766,9 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
init_waitqueue_head(&ep->rep_connect_wait);
INIT_DELAYED_WORK(&ep->rep_connect_worker, rpcrdma_connect_worker);

- sendcq = ib_create_cq(ia->ri_id->device, rpcrdma_sendcq_upcall,
- rpcrdma_cq_async_error_upcall, ep,
- ep->rep_attr.cap.max_send_wr + 1, 0);
+ sendcq = ib_create_cq(ia->ri_device, rpcrdma_sendcq_upcall,
+ rpcrdma_cq_async_error_upcall, ep,
+ ep->rep_attr.cap.max_send_wr + 1, 0);
if (IS_ERR(sendcq)) {
rc = PTR_ERR(sendcq);
dprintk("RPC: %s: failed to create send CQ: %i\n",
@@ -784,9 +783,9 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
goto out2;
}

- recvcq = ib_create_cq(ia->ri_id->device, rpcrdma_recvcq_upcall,
- rpcrdma_cq_async_error_upcall, ep,
- ep->rep_attr.cap.max_recv_wr + 1, 0);
+ recvcq = ib_create_cq(ia->ri_device, rpcrdma_recvcq_upcall,
+ rpcrdma_cq_async_error_upcall, ep,
+ ep->rep_attr.cap.max_recv_wr + 1, 0);
if (IS_ERR(recvcq)) {
rc = PTR_ERR(recvcq);
dprintk("RPC: %s: failed to create recv CQ: %i\n",
@@ -907,7 +906,7 @@ retry:
* More stuff I haven't thought of!
* Rrrgh!
*/
- if (ia->ri_id->device != id->device) {
+ if (ia->ri_device != id->device) {
printk("RPC: %s: can't reconnect on "
"different device!\n", __func__);
rdma_destroy_id(id);
@@ -1049,6 +1048,7 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt)
goto out_free;
}

+ rep->rr_device = ia->ri_device;
rep->rr_rxprt = r_xprt;
return rep;

@@ -1449,9 +1449,9 @@ rpcrdma_register_internal(struct rpcrdma_ia *ia, void *va, int len,
/*
* All memory passed here was kmalloc'ed, therefore phys-contiguous.
*/
- iov->addr = ib_dma_map_single(ia->ri_id->device,
+ iov->addr = ib_dma_map_single(ia->ri_device,
va, len, DMA_BIDIRECTIONAL);
- if (ib_dma_mapping_error(ia->ri_id->device, iov->addr))
+ if (ib_dma_mapping_error(ia->ri_device, iov->addr))
return -ENOMEM;

iov->length = len;
@@ -1495,8 +1495,8 @@ rpcrdma_deregister_internal(struct rpcrdma_ia *ia,
{
int rc;

- ib_dma_unmap_single(ia->ri_id->device,
- iov->addr, iov->length, DMA_BIDIRECTIONAL);
+ ib_dma_unmap_single(ia->ri_device,
+ iov->addr, iov->length, DMA_BIDIRECTIONAL);

if (NULL == mr)
return 0;
@@ -1589,15 +1589,18 @@ rpcrdma_ep_post(struct rpcrdma_ia *ia,
send_wr.num_sge = req->rl_niovs;
send_wr.opcode = IB_WR_SEND;
if (send_wr.num_sge == 4) /* no need to sync any pad (constant) */
- ib_dma_sync_single_for_device(ia->ri_id->device,
- req->rl_send_iov[3].addr, req->rl_send_iov[3].length,
- DMA_TO_DEVICE);
- ib_dma_sync_single_for_device(ia->ri_id->device,
- req->rl_send_iov[1].addr, req->rl_send_iov[1].length,
- DMA_TO_DEVICE);
- ib_dma_sync_single_for_device(ia->ri_id->device,
- req->rl_send_iov[0].addr, req->rl_send_iov[0].length,
- DMA_TO_DEVICE);
+ ib_dma_sync_single_for_device(ia->ri_device,
+ req->rl_send_iov[3].addr,
+ req->rl_send_iov[3].length,
+ DMA_TO_DEVICE);
+ ib_dma_sync_single_for_device(ia->ri_device,
+ req->rl_send_iov[1].addr,
+ req->rl_send_iov[1].length,
+ DMA_TO_DEVICE);
+ ib_dma_sync_single_for_device(ia->ri_device,
+ req->rl_send_iov[0].addr,
+ req->rl_send_iov[0].length,
+ DMA_TO_DEVICE);

if (DECR_CQCOUNT(ep) > 0)
send_wr.send_flags = 0;
@@ -1630,7 +1633,7 @@ rpcrdma_ep_post_recv(struct rpcrdma_ia *ia,
recv_wr.sg_list = &rep->rr_rdmabuf->rg_iov;
recv_wr.num_sge = 1;

- ib_dma_sync_single_for_cpu(ia->ri_id->device,
+ ib_dma_sync_single_for_cpu(ia->ri_device,
rdmab_addr(rep->rr_rdmabuf),
rdmab_length(rep->rr_rdmabuf),
DMA_BIDIRECTIONAL);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 230e7fe..300423d 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -62,6 +62,7 @@
struct rpcrdma_ia {
const struct rpcrdma_memreg_ops *ri_ops;
rwlock_t ri_qplock;
+ struct ib_device *ri_device;
struct rdma_cm_id *ri_id;
struct ib_pd *ri_pd;
struct ib_mr *ri_bind_mem;
@@ -173,6 +174,7 @@ struct rpcrdma_buffer;

struct rpcrdma_rep {
unsigned int rr_len;
+ struct ib_device *rr_device;
struct rpcrdma_xprt *rr_rxprt;
struct list_head rr_list;
struct rpcrdma_regbuf *rr_rdmabuf;


2015-05-11 18:03:17

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 06/16] xprtrdma: Introduce helpers for allocating MWs

We eventually want to handle allocating MWs one at a time, as
needed, instead of grabbing 64 and throwing them at each RPC in the
pipeline.

Add a helper for grabbing an MW off rb_mws, and a helper for
returning an MW to rb_mws. These will be used in a subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
---
net/sunrpc/xprtrdma/verbs.c | 31 +++++++++++++++++++++++++++++++
net/sunrpc/xprtrdma/xprt_rdma.h | 2 ++
2 files changed, 33 insertions(+)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index ddd5b36..b7ca73e 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1173,6 +1173,37 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
kfree(buf->rb_pool);
}

+struct rpcrdma_mw *
+rpcrdma_get_mw(struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
+ struct rpcrdma_mw *mw = NULL;
+ unsigned long flags;
+
+ spin_lock_irqsave(&buf->rb_lock, flags);
+ if (!list_empty(&buf->rb_mws)) {
+ mw = list_first_entry(&buf->rb_mws,
+ struct rpcrdma_mw, mw_list);
+ list_del_init(&mw->mw_list);
+ }
+ spin_unlock_irqrestore(&buf->rb_lock, flags);
+
+ if (!mw)
+ pr_err("RPC: %s: no MWs available\n", __func__);
+ return mw;
+}
+
+void
+rpcrdma_put_mw(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mw *mw)
+{
+ struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
+ unsigned long flags;
+
+ spin_lock_irqsave(&buf->rb_lock, flags);
+ list_add_tail(&mw->mw_list, &buf->rb_mws);
+ spin_unlock_irqrestore(&buf->rb_lock, flags);
+}
+
/* "*mw" can be NULL when rpcrdma_buffer_get_mrs() fails, leaving
* some req segments uninitialized.
*/
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 300423d..5b801d5 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -413,6 +413,8 @@ int rpcrdma_ep_post_recv(struct rpcrdma_ia *, struct rpcrdma_ep *,
int rpcrdma_buffer_create(struct rpcrdma_xprt *);
void rpcrdma_buffer_destroy(struct rpcrdma_buffer *);

+struct rpcrdma_mw *rpcrdma_get_mw(struct rpcrdma_xprt *);
+void rpcrdma_put_mw(struct rpcrdma_xprt *, struct rpcrdma_mw *);
struct rpcrdma_req *rpcrdma_buffer_get(struct rpcrdma_buffer *);
void rpcrdma_buffer_put(struct rpcrdma_req *);
void rpcrdma_recv_buffer_get(struct rpcrdma_req *);


2015-05-11 18:03:26

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 07/16] xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external()

Acquiring 64 FMRs in rpcrdma_buffer_get() while holding the buffer
pool lock is expensive, and unnecessary because FMR mode can
transfer up to a 1MB payload using just a single ib_fmr.

Instead, acquire ib_fmrs one-at-a-time as chunks are registered, and
return them to rb_mws immediately during deregistration.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 52 ++++++++++++++++++++++++++++++++++++++---
net/sunrpc/xprtrdma/verbs.c | 26 ---------------------
2 files changed, 48 insertions(+), 30 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 0a96155..53fb649 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -11,6 +11,21 @@
* can take tens of usecs to complete.
*/

+/* Normal operation
+ *
+ * A Memory Region is prepared for RDMA READ or WRITE using the
+ * ib_map_phys_fmr verb (fmr_op_map). When the RDMA operation is
+ * finished, the Memory Region is unmapped using the ib_unmap_fmr
+ * verb (fmr_op_unmap).
+ */
+
+/* Transport recovery
+ *
+ * After a transport reconnect, fmr_op_map re-uses the MR already
+ * allocated for the RPC, but generates a fresh rkey then maps the
+ * MR again. This process is synchronous.
+ */
+
#include "xprt_rdma.h"

#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
@@ -77,6 +92,15 @@ out_fmr_err:
return rc;
}

+static int
+__fmr_unmap(struct rpcrdma_mw *r)
+{
+ LIST_HEAD(l);
+
+ list_add(&r->r.fmr->list, &l);
+ return ib_unmap_fmr(&l);
+}
+
/* Use the ib_map_phys_fmr() verb to register a memory region
* for remote access via RDMA READ or RDMA WRITE.
*/
@@ -88,9 +112,22 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
struct ib_device *device = ia->ri_device;
enum dma_data_direction direction = rpcrdma_data_dir(writing);
struct rpcrdma_mr_seg *seg1 = seg;
- struct rpcrdma_mw *mw = seg1->rl_mw;
u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
int len, pageoff, i, rc;
+ struct rpcrdma_mw *mw;
+
+ mw = seg1->rl_mw;
+ seg1->rl_mw = NULL;
+ if (!mw) {
+ mw = rpcrdma_get_mw(r_xprt);
+ if (!mw)
+ return -ENOMEM;
+ } else {
+ /* this is a retransmit; generate a fresh rkey */
+ rc = __fmr_unmap(mw);
+ if (rc)
+ return rc;
+ }

pageoff = offset_in_page(seg1->mr_offset);
seg1->mr_offset -= pageoff; /* start of page */
@@ -114,6 +151,7 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
if (rc)
goto out_maperr;

+ seg1->rl_mw = mw;
seg1->mr_rkey = mw->r.fmr->rkey;
seg1->mr_base = seg1->mr_dma + pageoff;
seg1->mr_nsegs = i;
@@ -137,18 +175,24 @@ fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_mr_seg *seg1 = seg;
+ struct rpcrdma_mw *mw = seg1->rl_mw;
int rc, nsegs = seg->mr_nsegs;
- LIST_HEAD(l);

- list_add(&seg1->rl_mw->r.fmr->list, &l);
- rc = ib_unmap_fmr(&l);
+ dprintk("RPC: %s: FMR %p\n", __func__, mw);
+
+ seg1->rl_mw = NULL;
while (seg1->mr_nsegs--)
rpcrdma_unmap_one(ia->ri_device, seg++);
+ rc = __fmr_unmap(mw);
if (rc)
goto out_err;
+ rpcrdma_put_mw(r_xprt, mw);
return nsegs;

out_err:
+ /* The FMR is abandoned, but remains in rb_all. fmr_op_destroy
+ * will attempt to release it when the transport is destroyed.
+ */
dprintk("RPC: %s: ib_unmap_fmr status %i\n", __func__, rc);
return nsegs;
}
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index b7ca73e..3188e36 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1324,28 +1324,6 @@ rpcrdma_buffer_get_frmrs(struct rpcrdma_req *req, struct rpcrdma_buffer *buf,
return NULL;
}

-static struct rpcrdma_req *
-rpcrdma_buffer_get_fmrs(struct rpcrdma_req *req, struct rpcrdma_buffer *buf)
-{
- struct rpcrdma_mw *r;
- int i;
-
- i = RPCRDMA_MAX_SEGS - 1;
- while (!list_empty(&buf->rb_mws)) {
- r = list_entry(buf->rb_mws.next,
- struct rpcrdma_mw, mw_list);
- list_del(&r->mw_list);
- req->rl_segments[i].rl_mw = r;
- if (unlikely(i-- == 0))
- return req; /* Success */
- }
-
- /* Not enough entries on rb_mws for this req */
- rpcrdma_buffer_put_sendbuf(req, buf);
- rpcrdma_buffer_put_mrs(req, buf);
- return NULL;
-}
-
/*
* Get a set of request/reply buffers.
*
@@ -1387,9 +1365,6 @@ rpcrdma_buffer_get(struct rpcrdma_buffer *buffers)
case RPCRDMA_FRMR:
req = rpcrdma_buffer_get_frmrs(req, buffers, &stale);
break;
- case RPCRDMA_MTHCAFMR:
- req = rpcrdma_buffer_get_fmrs(req, buffers);
- break;
default:
break;
}
@@ -1414,7 +1389,6 @@ rpcrdma_buffer_put(struct rpcrdma_req *req)
rpcrdma_buffer_put_sendbuf(req, buffers);
switch (ia->ri_memreg_strategy) {
case RPCRDMA_FRMR:
- case RPCRDMA_MTHCAFMR:
rpcrdma_buffer_put_mrs(req, buffers);
break;
default:


2015-05-11 18:04:07

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 11/16] xprtrdma: Remove ->ro_reset

An RPC can exit at any time. When it does so, xprt_rdma_free() is
called, and it calls ->op_unmap().

If ->ro_reset() is running due to a transport disconnect, the two
methods can race while processing the same rpcrdma_mw. The results
are unpredictable.

Because of this, in previous patches I've altered ->ro_map() to
handle MR reset. ->ro_reset() is no longer needed and can be
removed.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Devesh Sharma <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 23 ---------------------
net/sunrpc/xprtrdma/frwr_ops.c | 39 ------------------------------------
net/sunrpc/xprtrdma/physical_ops.c | 6 ------
net/sunrpc/xprtrdma/verbs.c | 2 --
net/sunrpc/xprtrdma/xprt_rdma.h | 1 -
5 files changed, 71 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 53fb649..5dd77da 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -197,28 +197,6 @@ out_err:
return nsegs;
}

-/* After a disconnect, unmap all FMRs.
- *
- * This is invoked only in the transport connect worker in order
- * to serialize with rpcrdma_register_fmr_external().
- */
-static void
-fmr_op_reset(struct rpcrdma_xprt *r_xprt)
-{
- struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
- struct rpcrdma_mw *r;
- LIST_HEAD(list);
- int rc;
-
- list_for_each_entry(r, &buf->rb_all, mw_all)
- list_add(&r->r.fmr->list, &list);
-
- rc = ib_unmap_fmr(&list);
- if (rc)
- dprintk("RPC: %s: ib_unmap_fmr failed %i\n",
- __func__, rc);
-}
-
static void
fmr_op_destroy(struct rpcrdma_buffer *buf)
{
@@ -242,7 +220,6 @@ const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_open = fmr_op_open,
.ro_maxpages = fmr_op_maxpages,
.ro_init = fmr_op_init,
- .ro_reset = fmr_op_reset,
.ro_destroy = fmr_op_destroy,
.ro_displayname = "fmr",
};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 133edf6..8622792 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -430,44 +430,6 @@ out_err:
return nsegs;
}

-/* After a disconnect, a flushed FAST_REG_MR can leave an FRMR in
- * an unusable state. Find FRMRs in this state and dereg / reg
- * each. FRMRs that are VALID and attached to an rpcrdma_req are
- * also torn down.
- *
- * This gives all in-use FRMRs a fresh rkey and leaves them INVALID.
- *
- * This is invoked only in the transport connect worker in order
- * to serialize with rpcrdma_register_frmr_external().
- */
-static void
-frwr_op_reset(struct rpcrdma_xprt *r_xprt)
-{
- struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
- struct ib_device *device = r_xprt->rx_ia.ri_device;
- unsigned int depth = r_xprt->rx_ia.ri_max_frmr_depth;
- struct ib_pd *pd = r_xprt->rx_ia.ri_pd;
- struct rpcrdma_mw *r;
- int rc;
-
- list_for_each_entry(r, &buf->rb_all, mw_all) {
- if (r->r.frmr.fr_state == FRMR_IS_INVALID)
- continue;
-
- __frwr_release(r);
- rc = __frwr_init(r, pd, device, depth);
- if (rc) {
- dprintk("RPC: %s: mw %p left %s\n",
- __func__, r,
- (r->r.frmr.fr_state == FRMR_IS_STALE ?
- "stale" : "valid"));
- continue;
- }
-
- r->r.frmr.fr_state = FRMR_IS_INVALID;
- }
-}
-
static void
frwr_op_destroy(struct rpcrdma_buffer *buf)
{
@@ -490,7 +452,6 @@ const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_open = frwr_op_open,
.ro_maxpages = frwr_op_maxpages,
.ro_init = frwr_op_init,
- .ro_reset = frwr_op_reset,
.ro_destroy = frwr_op_destroy,
.ro_displayname = "frwr",
};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index da149e8..41985d0 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -69,11 +69,6 @@ physical_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
}

static void
-physical_op_reset(struct rpcrdma_xprt *r_xprt)
-{
-}
-
-static void
physical_op_destroy(struct rpcrdma_buffer *buf)
{
}
@@ -84,7 +79,6 @@ const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_open = physical_op_open,
.ro_maxpages = physical_op_maxpages,
.ro_init = physical_op_init,
- .ro_reset = physical_op_reset,
.ro_destroy = physical_op_destroy,
.ro_displayname = "physical",
};
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index a891cf7..db9303a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -891,8 +891,6 @@ retry:
rpcrdma_flush_cqs(ep);

xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
- ia->ri_ops->ro_reset(xprt);
-
id = rpcrdma_create_id(xprt, ia,
(struct sockaddr *)&xprt->rx_data.addr);
if (IS_ERR(id)) {
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index c5862a4..f19376d 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -352,7 +352,6 @@ struct rpcrdma_memreg_ops {
struct rpcrdma_create_data_internal *);
size_t (*ro_maxpages)(struct rpcrdma_xprt *);
int (*ro_init)(struct rpcrdma_xprt *);
- void (*ro_reset)(struct rpcrdma_xprt *);
void (*ro_destroy)(struct rpcrdma_buffer *);
const char *ro_displayname;
};


2015-05-11 18:04:15

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 12/16] xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy

Clean up: This field is no longer used.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Devesh Sharma <[email protected]>
---
include/linux/sunrpc/xprtrdma.h | 3 ++-
net/sunrpc/xprtrdma/verbs.c | 3 ---
net/sunrpc/xprtrdma/xprt_rdma.h | 1 -
3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/include/linux/sunrpc/xprtrdma.h b/include/linux/sunrpc/xprtrdma.h
index c984c85..b176130 100644
--- a/include/linux/sunrpc/xprtrdma.h
+++ b/include/linux/sunrpc/xprtrdma.h
@@ -56,7 +56,8 @@

#define RPCRDMA_INLINE_PAD_THRESH (512)/* payload threshold to pad (bytes) */

-/* memory registration strategies */
+/* Memory registration strategies, by number.
+ * This is part of a kernel / user space API. Do not remove. */
enum rpcrdma_memreg {
RPCRDMA_BOUNCEBUFFERS = 0,
RPCRDMA_REGISTER,
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index db9303a..cc1a526 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -665,9 +665,6 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
dprintk("RPC: %s: memory registration strategy is '%s'\n",
__func__, ia->ri_ops->ro_displayname);

- /* Else will do memory reg/dereg for each chunk */
- ia->ri_memreg_strategy = memreg;
-
rwlock_init(&ia->ri_qplock);
return 0;

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index f19376d..3ecee38 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -70,7 +70,6 @@ struct rpcrdma_ia {
int ri_have_dma_lkey;
struct completion ri_done;
int ri_async_rc;
- enum rpcrdma_memreg ri_memreg_strategy;
unsigned int ri_max_frmr_depth;
struct ib_device_attr ri_devattr;
struct ib_qp_attr ri_qp_attr;


2015-05-11 18:04:25

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 13/16] xprtrdma: Split rb_lock

/proc/lock_stat showed contention between rpcrdma_buffer_get/put
and the MR allocation functions during I/O intensive workloads.

Now that MRs are no longer allocated in rpcrdma_buffer_get(),
there's no reason the rb_mws list has to be managed using the
same lock as the send/receive buffers. Split that lock. The
new lock does not need to disable interrupts because buffer
get/put is never called in an interrupt context.

struct rpcrdma_buffer is re-arranged to ensure rb_mwlock and rb_mws
are always in a different cacheline than rb_lock and the buffer
pointers.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 1 +
net/sunrpc/xprtrdma/frwr_ops.c | 1 +
net/sunrpc/xprtrdma/verbs.c | 10 ++++------
net/sunrpc/xprtrdma/xprt_rdma.h | 16 +++++++++-------
4 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 5dd77da..52f9ad5 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -65,6 +65,7 @@ fmr_op_init(struct rpcrdma_xprt *r_xprt)
struct rpcrdma_mw *r;
int i, rc;

+ spin_lock_init(&buf->rb_mwlock);
INIT_LIST_HEAD(&buf->rb_mws);
INIT_LIST_HEAD(&buf->rb_all);

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 8622792..18b7305 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -266,6 +266,7 @@ frwr_op_init(struct rpcrdma_xprt *r_xprt)
struct ib_pd *pd = r_xprt->rx_ia.ri_pd;
int i;

+ spin_lock_init(&buf->rb_mwlock);
INIT_LIST_HEAD(&buf->rb_mws);
INIT_LIST_HEAD(&buf->rb_all);

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index cc1a526..2340835 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1173,15 +1173,14 @@ rpcrdma_get_mw(struct rpcrdma_xprt *r_xprt)
{
struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
struct rpcrdma_mw *mw = NULL;
- unsigned long flags;

- spin_lock_irqsave(&buf->rb_lock, flags);
+ spin_lock(&buf->rb_mwlock);
if (!list_empty(&buf->rb_mws)) {
mw = list_first_entry(&buf->rb_mws,
struct rpcrdma_mw, mw_list);
list_del_init(&mw->mw_list);
}
- spin_unlock_irqrestore(&buf->rb_lock, flags);
+ spin_unlock(&buf->rb_mwlock);

if (!mw)
pr_err("RPC: %s: no MWs available\n", __func__);
@@ -1192,11 +1191,10 @@ void
rpcrdma_put_mw(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mw *mw)
{
struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
- unsigned long flags;

- spin_lock_irqsave(&buf->rb_lock, flags);
+ spin_lock(&buf->rb_mwlock);
list_add_tail(&mw->mw_list, &buf->rb_mws);
- spin_unlock_irqrestore(&buf->rb_lock, flags);
+ spin_unlock(&buf->rb_mwlock);
}

static void
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 3ecee38..df92884 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -282,15 +282,17 @@ rpcr_to_rdmar(struct rpc_rqst *rqst)
* One of these is associated with a transport instance
*/
struct rpcrdma_buffer {
- spinlock_t rb_lock; /* protects indexes */
- u32 rb_max_requests;/* client max requests */
- struct list_head rb_mws; /* optional memory windows/fmrs/frmrs */
- struct list_head rb_all;
- int rb_send_index;
+ spinlock_t rb_mwlock; /* protect rb_mws list */
+ struct list_head rb_mws;
+ struct list_head rb_all;
+ char *rb_pool;
+
+ spinlock_t rb_lock; /* protect buf arrays */
+ u32 rb_max_requests;
+ int rb_send_index;
+ int rb_recv_index;
struct rpcrdma_req **rb_send_bufs;
- int rb_recv_index;
struct rpcrdma_rep **rb_recv_bufs;
- char *rb_pool;
};
#define rdmab_to_ia(b) (&container_of((b), struct rpcrdma_xprt, rx_buf)->rx_ia)



2015-05-11 18:03:55

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 10/16] xprtrdma: Remove unused LOCAL_INV recovery logic

Clean up: Remove functions no longer used to recover broken FRMRs.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Devesh Sharma <[email protected]>
---
net/sunrpc/xprtrdma/verbs.c | 109 -------------------------------------------
1 file changed, 109 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 768bb77..a891cf7 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1204,33 +1204,6 @@ rpcrdma_put_mw(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mw *mw)
spin_unlock_irqrestore(&buf->rb_lock, flags);
}

-/* "*mw" can be NULL when rpcrdma_buffer_get_mrs() fails, leaving
- * some req segments uninitialized.
- */
-static void
-rpcrdma_buffer_put_mr(struct rpcrdma_mw **mw, struct rpcrdma_buffer *buf)
-{
- if (*mw) {
- list_add_tail(&(*mw)->mw_list, &buf->rb_mws);
- *mw = NULL;
- }
-}
-
-/* Cycle mw's back in reverse order, and "spin" them.
- * This delays and scrambles reuse as much as possible.
- */
-static void
-rpcrdma_buffer_put_mrs(struct rpcrdma_req *req, struct rpcrdma_buffer *buf)
-{
- struct rpcrdma_mr_seg *seg = req->rl_segments;
- struct rpcrdma_mr_seg *seg1 = seg;
- int i;
-
- for (i = 1, seg++; i < RPCRDMA_MAX_SEGS; seg++, i++)
- rpcrdma_buffer_put_mr(&seg->rl_mw, buf);
- rpcrdma_buffer_put_mr(&seg1->rl_mw, buf);
-}
-
static void
rpcrdma_buffer_put_sendbuf(struct rpcrdma_req *req, struct rpcrdma_buffer *buf)
{
@@ -1242,88 +1215,6 @@ rpcrdma_buffer_put_sendbuf(struct rpcrdma_req *req, struct rpcrdma_buffer *buf)
}
}

-/* rpcrdma_unmap_one() was already done during deregistration.
- * Redo only the ib_post_send().
- */
-static void
-rpcrdma_retry_local_inv(struct rpcrdma_mw *r, struct rpcrdma_ia *ia)
-{
- struct rpcrdma_xprt *r_xprt =
- container_of(ia, struct rpcrdma_xprt, rx_ia);
- struct ib_send_wr invalidate_wr, *bad_wr;
- int rc;
-
- dprintk("RPC: %s: FRMR %p is stale\n", __func__, r);
-
- /* When this FRMR is re-inserted into rb_mws, it is no longer stale */
- r->r.frmr.fr_state = FRMR_IS_INVALID;
-
- memset(&invalidate_wr, 0, sizeof(invalidate_wr));
- invalidate_wr.wr_id = (unsigned long)(void *)r;
- invalidate_wr.opcode = IB_WR_LOCAL_INV;
- invalidate_wr.ex.invalidate_rkey = r->r.frmr.fr_mr->rkey;
- DECR_CQCOUNT(&r_xprt->rx_ep);
-
- dprintk("RPC: %s: frmr %p invalidating rkey %08x\n",
- __func__, r, r->r.frmr.fr_mr->rkey);
-
- read_lock(&ia->ri_qplock);
- rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
- read_unlock(&ia->ri_qplock);
- if (rc) {
- /* Force rpcrdma_buffer_get() to retry */
- r->r.frmr.fr_state = FRMR_IS_STALE;
- dprintk("RPC: %s: ib_post_send failed, %i\n",
- __func__, rc);
- }
-}
-
-static void
-rpcrdma_retry_flushed_linv(struct list_head *stale,
- struct rpcrdma_buffer *buf)
-{
- struct rpcrdma_ia *ia = rdmab_to_ia(buf);
- struct list_head *pos;
- struct rpcrdma_mw *r;
- unsigned long flags;
-
- list_for_each(pos, stale) {
- r = list_entry(pos, struct rpcrdma_mw, mw_list);
- rpcrdma_retry_local_inv(r, ia);
- }
-
- spin_lock_irqsave(&buf->rb_lock, flags);
- list_splice_tail(stale, &buf->rb_mws);
- spin_unlock_irqrestore(&buf->rb_lock, flags);
-}
-
-static struct rpcrdma_req *
-rpcrdma_buffer_get_frmrs(struct rpcrdma_req *req, struct rpcrdma_buffer *buf,
- struct list_head *stale)
-{
- struct rpcrdma_mw *r;
- int i;
-
- i = RPCRDMA_MAX_SEGS - 1;
- while (!list_empty(&buf->rb_mws)) {
- r = list_entry(buf->rb_mws.next,
- struct rpcrdma_mw, mw_list);
- list_del(&r->mw_list);
- if (r->r.frmr.fr_state == FRMR_IS_STALE) {
- list_add(&r->mw_list, stale);
- continue;
- }
- req->rl_segments[i].rl_mw = r;
- if (unlikely(i-- == 0))
- return req; /* Success */
- }
-
- /* Not enough entries on rb_mws for this req */
- rpcrdma_buffer_put_sendbuf(req, buf);
- rpcrdma_buffer_put_mrs(req, buf);
- return NULL;
-}
-
/*
* Get a set of request/reply buffers.
*


2015-05-11 18:04:34

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 14/16] xprtrdma: Stack relief in fmr_op_map()

fmr_op_map() declares a 64 element array of u64 in automatic
storage. This is 512 bytes (8 * 64) on the stack.

Instead, when FMR memory registration is in use, pre-allocate a
physaddr array for each rpcrdma_mw.

This is a pre-requisite for increasing the r/wsize maximum for
FMR on platforms with 4KB pages.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Devesh Sharma <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 32 ++++++++++++++++++++++----------
net/sunrpc/xprtrdma/xprt_rdma.h | 7 ++++++-
2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 52f9ad5..4a53ad5 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -72,13 +72,19 @@ fmr_op_init(struct rpcrdma_xprt *r_xprt)
i = (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS;
dprintk("RPC: %s: initializing %d FMRs\n", __func__, i);

+ rc = -ENOMEM;
while (i--) {
r = kzalloc(sizeof(*r), GFP_KERNEL);
if (!r)
- return -ENOMEM;
+ goto out;
+
+ r->r.fmr.physaddrs = kmalloc(RPCRDMA_MAX_FMR_SGES *
+ sizeof(u64), GFP_KERNEL);
+ if (!r->r.fmr.physaddrs)
+ goto out_free;

- r->r.fmr = ib_alloc_fmr(pd, mr_access_flags, &fmr_attr);
- if (IS_ERR(r->r.fmr))
+ r->r.fmr.fmr = ib_alloc_fmr(pd, mr_access_flags, &fmr_attr);
+ if (IS_ERR(r->r.fmr.fmr))
goto out_fmr_err;

list_add(&r->mw_list, &buf->rb_mws);
@@ -87,9 +93,12 @@ fmr_op_init(struct rpcrdma_xprt *r_xprt)
return 0;

out_fmr_err:
- rc = PTR_ERR(r->r.fmr);
+ rc = PTR_ERR(r->r.fmr.fmr);
dprintk("RPC: %s: ib_alloc_fmr status %i\n", __func__, rc);
+ kfree(r->r.fmr.physaddrs);
+out_free:
kfree(r);
+out:
return rc;
}

@@ -98,7 +107,7 @@ __fmr_unmap(struct rpcrdma_mw *r)
{
LIST_HEAD(l);

- list_add(&r->r.fmr->list, &l);
+ list_add(&r->r.fmr.fmr->list, &l);
return ib_unmap_fmr(&l);
}

@@ -113,7 +122,6 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
struct ib_device *device = ia->ri_device;
enum dma_data_direction direction = rpcrdma_data_dir(writing);
struct rpcrdma_mr_seg *seg1 = seg;
- u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
int len, pageoff, i, rc;
struct rpcrdma_mw *mw;

@@ -138,7 +146,7 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
nsegs = RPCRDMA_MAX_FMR_SGES;
for (i = 0; i < nsegs;) {
rpcrdma_map_one(device, seg, direction);
- physaddrs[i] = seg->mr_dma;
+ mw->r.fmr.physaddrs[i] = seg->mr_dma;
len += seg->mr_len;
++seg;
++i;
@@ -148,12 +156,13 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
break;
}

- rc = ib_map_phys_fmr(mw->r.fmr, physaddrs, i, seg1->mr_dma);
+ rc = ib_map_phys_fmr(mw->r.fmr.fmr, mw->r.fmr.physaddrs,
+ i, seg1->mr_dma);
if (rc)
goto out_maperr;

seg1->rl_mw = mw;
- seg1->mr_rkey = mw->r.fmr->rkey;
+ seg1->mr_rkey = mw->r.fmr.fmr->rkey;
seg1->mr_base = seg1->mr_dma + pageoff;
seg1->mr_nsegs = i;
seg1->mr_len = len;
@@ -207,10 +216,13 @@ fmr_op_destroy(struct rpcrdma_buffer *buf)
while (!list_empty(&buf->rb_all)) {
r = list_entry(buf->rb_all.next, struct rpcrdma_mw, mw_all);
list_del(&r->mw_all);
- rc = ib_dealloc_fmr(r->r.fmr);
+ kfree(r->r.fmr.physaddrs);
+
+ rc = ib_dealloc_fmr(r->r.fmr.fmr);
if (rc)
dprintk("RPC: %s: ib_dealloc_fmr failed %i\n",
__func__, rc);
+
kfree(r);
}
}
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index df92884..110d685 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -206,9 +206,14 @@ struct rpcrdma_frmr {
struct rpcrdma_xprt *fr_xprt;
};

+struct rpcrdma_fmr {
+ struct ib_fmr *fmr;
+ u64 *physaddrs;
+};
+
struct rpcrdma_mw {
union {
- struct ib_fmr *fmr;
+ struct rpcrdma_fmr fmr;
struct rpcrdma_frmr frmr;
} r;
void (*mw_sendcompletion)(struct ib_wc *);


2015-05-11 18:04:44

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 15/16] xprtrdma: Reduce per-transport MR allocation

Reduce resource consumption per-transport to make way for increasing
the credit limit and maximum r/wsize. Pre-allocate fewer MRs.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Devesh Sharma <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 6 ++++--
net/sunrpc/xprtrdma/frwr_ops.c | 6 ++++--
2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 4a53ad5..f1e8daf 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -69,8 +69,10 @@ fmr_op_init(struct rpcrdma_xprt *r_xprt)
INIT_LIST_HEAD(&buf->rb_mws);
INIT_LIST_HEAD(&buf->rb_all);

- i = (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS;
- dprintk("RPC: %s: initializing %d FMRs\n", __func__, i);
+ i = max_t(int, RPCRDMA_MAX_DATA_SEGS / RPCRDMA_MAX_FMR_SGES, 1);
+ i += 2; /* head + tail */
+ i *= buf->rb_max_requests; /* one set for each RPC slot */
+ dprintk("RPC: %s: initalizing %d FMRs\n", __func__, i);

rc = -ENOMEM;
while (i--) {
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 18b7305..661fbc1 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -270,8 +270,10 @@ frwr_op_init(struct rpcrdma_xprt *r_xprt)
INIT_LIST_HEAD(&buf->rb_mws);
INIT_LIST_HEAD(&buf->rb_all);

- i = (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS;
- dprintk("RPC: %s: initializing %d FRMRs\n", __func__, i);
+ i = max_t(int, RPCRDMA_MAX_DATA_SEGS / depth, 1);
+ i += 2; /* head + tail */
+ i *= buf->rb_max_requests; /* one set for each RPC slot */
+ dprintk("RPC: %s: initalizing %d FRMRs\n", __func__, i);

while (i--) {
struct rpcrdma_mw *r;


2015-05-11 18:04:53

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 16/16] SUNRPC: Clean up bc_send()

Clean up: Merge bc_send() into bc_svc_process().

Note: even thought this touches svc.c, it is a client-side change.

Signed-off-by: Chuck Lever <[email protected]>
---
include/linux/sunrpc/bc_xprt.h | 1 -
net/sunrpc/Makefile | 2 +
net/sunrpc/bc_svc.c | 63 ----------------------------------------
net/sunrpc/svc.c | 33 ++++++++++++++++-----
4 files changed, 26 insertions(+), 73 deletions(-)
delete mode 100644 net/sunrpc/bc_svc.c

diff --git a/include/linux/sunrpc/bc_xprt.h b/include/linux/sunrpc/bc_xprt.h
index 2ca67b5..8df43c9f 100644
--- a/include/linux/sunrpc/bc_xprt.h
+++ b/include/linux/sunrpc/bc_xprt.h
@@ -37,7 +37,6 @@ void xprt_complete_bc_request(struct rpc_rqst *req, uint32_t copied);
void xprt_free_bc_request(struct rpc_rqst *req);
int xprt_setup_backchannel(struct rpc_xprt *, unsigned int min_reqs);
void xprt_destroy_backchannel(struct rpc_xprt *, unsigned int max_reqs);
-int bc_send(struct rpc_rqst *req);

/*
* Determine if a shared backchannel is in use
diff --git a/net/sunrpc/Makefile b/net/sunrpc/Makefile
index 15e6f6c..1b8e68d 100644
--- a/net/sunrpc/Makefile
+++ b/net/sunrpc/Makefile
@@ -15,6 +15,6 @@ sunrpc-y := clnt.o xprt.o socklib.o xprtsock.o sched.o \
sunrpc_syms.o cache.o rpc_pipe.o \
svc_xprt.o
sunrpc-$(CONFIG_SUNRPC_DEBUG) += debugfs.o
-sunrpc-$(CONFIG_SUNRPC_BACKCHANNEL) += backchannel_rqst.o bc_svc.o
+sunrpc-$(CONFIG_SUNRPC_BACKCHANNEL) += backchannel_rqst.o
sunrpc-$(CONFIG_PROC_FS) += stats.o
sunrpc-$(CONFIG_SYSCTL) += sysctl.o
diff --git a/net/sunrpc/bc_svc.c b/net/sunrpc/bc_svc.c
deleted file mode 100644
index 15c7a8a..0000000
--- a/net/sunrpc/bc_svc.c
+++ /dev/null
@@ -1,63 +0,0 @@
-/******************************************************************************
-
-(c) 2007 Network Appliance, Inc. All Rights Reserved.
-(c) 2009 NetApp. All Rights Reserved.
-
-NetApp provides this source code under the GPL v2 License.
-The GPL v2 license is available at
-http://opensource.org/licenses/gpl-license.php.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
-CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-******************************************************************************/
-
-/*
- * The NFSv4.1 callback service helper routines.
- * They implement the transport level processing required to send the
- * reply over an existing open connection previously established by the client.
- */
-
-#include <linux/module.h>
-
-#include <linux/sunrpc/xprt.h>
-#include <linux/sunrpc/sched.h>
-#include <linux/sunrpc/bc_xprt.h>
-
-#define RPCDBG_FACILITY RPCDBG_SVCDSP
-
-/* Empty callback ops */
-static const struct rpc_call_ops nfs41_callback_ops = {
-};
-
-
-/*
- * Send the callback reply
- */
-int bc_send(struct rpc_rqst *req)
-{
- struct rpc_task *task;
- int ret;
-
- dprintk("RPC: bc_send req= %p\n", req);
- task = rpc_run_bc_task(req, &nfs41_callback_ops);
- if (IS_ERR(task))
- ret = PTR_ERR(task);
- else {
- WARN_ON_ONCE(atomic_read(&task->tk_count) != 1);
- ret = task->tk_status;
- rpc_put_task(task);
- }
- dprintk("RPC: bc_send ret= %d\n", ret);
- return ret;
-}
-
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 78974e4..e144902 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1350,6 +1350,11 @@ bc_svc_process(struct svc_serv *serv, struct rpc_rqst *req,
{
struct kvec *argv = &rqstp->rq_arg.head[0];
struct kvec *resv = &rqstp->rq_res.head[0];
+ static const struct rpc_call_ops reply_ops = { };
+ struct rpc_task *task;
+ int error;
+
+ dprintk("svc: %s(%p)\n", __func__, req);

/* Build the svc_rqst used by the common processing routine */
rqstp->rq_xprt = serv->sv_bc_xprt;
@@ -1372,21 +1377,33 @@ bc_svc_process(struct svc_serv *serv, struct rpc_rqst *req,

/*
* Skip the next two words because they've already been
- * processed in the trasport
+ * processed in the transport
*/
svc_getu32(argv); /* XID */
svc_getnl(argv); /* CALLDIR */

- /* Returns 1 for send, 0 for drop */
- if (svc_process_common(rqstp, argv, resv)) {
- memcpy(&req->rq_snd_buf, &rqstp->rq_res,
- sizeof(req->rq_snd_buf));
- return bc_send(req);
- } else {
- /* drop request */
+ /* Parse and execute the bc call */
+ if (!svc_process_common(rqstp, argv, resv)) {
+ /* Processing error: drop the request */
xprt_free_bc_request(req);
return 0;
}
+
+ /* Finally, send the reply synchronously */
+ memcpy(&req->rq_snd_buf, &rqstp->rq_res, sizeof(req->rq_snd_buf));
+ task = rpc_run_bc_task(req, &reply_ops);
+ if (IS_ERR(task)) {
+ error = PTR_ERR(task);
+ goto out;
+ }
+
+ WARN_ON_ONCE(atomic_read(&task->tk_count) != 1);
+ error = task->tk_status;
+ rpc_put_task(task);
+
+out:
+ dprintk("svc: %s(), error=%d\n", __func__, error);
+ return error;
}
EXPORT_SYMBOL_GPL(bc_svc_process);
#endif /* CONFIG_SUNRPC_BACKCHANNEL */


2015-05-11 18:03:45

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 09/16] xprtrdma: Acquire MRs in rpcrdma_register_external()

Acquiring 64 MRs in rpcrdma_buffer_get() while holding the buffer
pool lock is expensive, and unnecessary because most modern adapters
can transfer 100s of KBs of payload using just a single MR.

Instead, acquire MRs one-at-a-time as chunks are registered, and
return them to rb_mws immediately during deregistration.

Note: commit 539431a437d2 ("xprtrdma: Don't invalidate FRMRs if
registration fails") is reverted: There is now a valid case where
registration can fail (with -ENOMEM) but the QP is still in RTS.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
---
net/sunrpc/xprtrdma/frwr_ops.c | 100 +++++++++++++++++++++++++++++++++++-----
net/sunrpc/xprtrdma/rpc_rdma.c | 3 -
net/sunrpc/xprtrdma/verbs.c | 21 --------
3 files changed, 89 insertions(+), 35 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index a06d9a3..133edf6 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -11,6 +11,62 @@
* but most complex memory registration mode.
*/

+/* Normal operation
+ *
+ * A Memory Region is prepared for RDMA READ or WRITE using a FAST_REG
+ * Work Request (frmr_op_map). When the RDMA operation is finished, this
+ * Memory Region is invalidated using a LOCAL_INV Work Request
+ * (frmr_op_unmap).
+ *
+ * Typically these Work Requests are not signaled, and neither are RDMA
+ * SEND Work Requests (with the exception of signaling occasionally to
+ * prevent provider work queue overflows). This greatly reduces HCA
+ * interrupt workload.
+ *
+ * As an optimization, frwr_op_unmap marks MRs INVALID before the
+ * LOCAL_INV WR is posted. If posting succeeds, the MR is placed on
+ * rb_mws immediately so that no work (like managing a linked list
+ * under a spinlock) is needed in the completion upcall.
+ *
+ * But this means that frwr_op_map() can occasionally encounter an MR
+ * that is INVALID but the LOCAL_INV WR has not completed. Work Queue
+ * ordering prevents a subsequent FAST_REG WR from executing against
+ * that MR while it is still being invalidated.
+ */
+
+/* Transport recovery
+ *
+ * ->op_map and the transport connect worker cannot run at the same
+ * time, but ->op_unmap can fire while the transport connect worker
+ * is running. Thus MR recovery is handled in ->op_map, to guarantee
+ * that recovered MRs are owned by a sending RPC, and not one where
+ * ->op_unmap could fire at the same time transport reconnect is
+ * being done.
+ *
+ * When the underlying transport disconnects, MRs are left in one of
+ * three states:
+ *
+ * INVALID: The MR was not in use before the QP entered ERROR state.
+ * (Or, the LOCAL_INV WR has not completed or flushed yet).
+ *
+ * STALE: The MR was being registered or unregistered when the QP
+ * entered ERROR state, and the pending WR was flushed.
+ *
+ * VALID: The MR was registered before the QP entered ERROR state.
+ *
+ * When frwr_op_map encounters STALE and VALID MRs, they are recovered
+ * with ib_dereg_mr and then are re-initialized. Beause MR recovery
+ * allocates fresh resources, it is deferred to a workqueue, and the
+ * recovered MRs are placed back on the rb_mws list when recovery is
+ * complete. frwr_op_map allocates another MR for the current RPC while
+ * the broken MR is reset.
+ *
+ * To ensure that frwr_op_map doesn't encounter an MR that is marked
+ * INVALID but that is about to be flushed due to a previous transport
+ * disconnect, the transport connect worker attempts to drain all
+ * pending send queue WRs before the transport is reconnected.
+ */
+
#include "xprt_rdma.h"

#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
@@ -250,9 +306,9 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
struct ib_device *device = ia->ri_device;
enum dma_data_direction direction = rpcrdma_data_dir(writing);
struct rpcrdma_mr_seg *seg1 = seg;
- struct rpcrdma_mw *mw = seg1->rl_mw;
- struct rpcrdma_frmr *frmr = &mw->r.frmr;
- struct ib_mr *mr = frmr->fr_mr;
+ struct rpcrdma_mw *mw;
+ struct rpcrdma_frmr *frmr;
+ struct ib_mr *mr;
struct ib_send_wr fastreg_wr, *bad_wr;
u8 key;
int len, pageoff;
@@ -261,12 +317,25 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
u64 pa;
int page_no;

+ mw = seg1->rl_mw;
+ seg1->rl_mw = NULL;
+ do {
+ if (mw)
+ __frwr_queue_recovery(mw);
+ mw = rpcrdma_get_mw(r_xprt);
+ if (!mw)
+ return -ENOMEM;
+ } while (mw->r.frmr.fr_state != FRMR_IS_INVALID);
+ frmr = &mw->r.frmr;
+ frmr->fr_state = FRMR_IS_VALID;
+
pageoff = offset_in_page(seg1->mr_offset);
seg1->mr_offset -= pageoff; /* start of page */
seg1->mr_len += pageoff;
len = -pageoff;
if (nsegs > ia->ri_max_frmr_depth)
nsegs = ia->ri_max_frmr_depth;
+
for (page_no = i = 0; i < nsegs;) {
rpcrdma_map_one(device, seg, direction);
pa = seg->mr_dma;
@@ -285,8 +354,6 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n",
__func__, mw, i, len);

- frmr->fr_state = FRMR_IS_VALID;
-
memset(&fastreg_wr, 0, sizeof(fastreg_wr));
fastreg_wr.wr_id = (unsigned long)(void *)mw;
fastreg_wr.opcode = IB_WR_FAST_REG_MR;
@@ -298,6 +365,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
fastreg_wr.wr.fast_reg.access_flags = writing ?
IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
IB_ACCESS_REMOTE_READ;
+ mr = frmr->fr_mr;
key = (u8)(mr->rkey & 0x000000FF);
ib_update_fast_reg_key(mr, ++key);
fastreg_wr.wr.fast_reg.rkey = mr->rkey;
@@ -307,6 +375,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
if (rc)
goto out_senderr;

+ seg1->rl_mw = mw;
seg1->mr_rkey = mr->rkey;
seg1->mr_base = seg1->mr_dma + pageoff;
seg1->mr_nsegs = i;
@@ -315,10 +384,9 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,

out_senderr:
dprintk("RPC: %s: ib_post_send status %i\n", __func__, rc);
- ib_update_fast_reg_key(mr, --key);
- frmr->fr_state = FRMR_IS_INVALID;
while (i--)
rpcrdma_unmap_one(device, --seg);
+ __frwr_queue_recovery(mw);
return rc;
}

@@ -330,15 +398,19 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
{
struct rpcrdma_mr_seg *seg1 = seg;
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct rpcrdma_mw *mw = seg1->rl_mw;
struct ib_send_wr invalidate_wr, *bad_wr;
int rc, nsegs = seg->mr_nsegs;

- seg1->rl_mw->r.frmr.fr_state = FRMR_IS_INVALID;
+ dprintk("RPC: %s: FRMR %p\n", __func__, mw);
+
+ seg1->rl_mw = NULL;
+ mw->r.frmr.fr_state = FRMR_IS_INVALID;

memset(&invalidate_wr, 0, sizeof(invalidate_wr));
- invalidate_wr.wr_id = (unsigned long)(void *)seg1->rl_mw;
+ invalidate_wr.wr_id = (unsigned long)(void *)mw;
invalidate_wr.opcode = IB_WR_LOCAL_INV;
- invalidate_wr.ex.invalidate_rkey = seg1->rl_mw->r.frmr.fr_mr->rkey;
+ invalidate_wr.ex.invalidate_rkey = mw->r.frmr.fr_mr->rkey;
DECR_CQCOUNT(&r_xprt->rx_ep);

while (seg1->mr_nsegs--)
@@ -348,12 +420,13 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
read_unlock(&ia->ri_qplock);
if (rc)
goto out_err;
+
+ rpcrdma_put_mw(r_xprt, mw);
return nsegs;

out_err:
- /* Force rpcrdma_buffer_get() to retry */
- seg1->rl_mw->r.frmr.fr_state = FRMR_IS_STALE;
dprintk("RPC: %s: ib_post_send status %i\n", __func__, rc);
+ __frwr_queue_recovery(mw);
return nsegs;
}

@@ -400,6 +473,9 @@ frwr_op_destroy(struct rpcrdma_buffer *buf)
{
struct rpcrdma_mw *r;

+ /* Ensure stale MWs for "buf" are no longer in flight */
+ flush_workqueue(frwr_recovery_wq);
+
while (!list_empty(&buf->rb_all)) {
r = list_entry(buf->rb_all.next, struct rpcrdma_mw, mw_all);
list_del(&r->mw_all);
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 3f422ca..84ea37d 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -284,9 +284,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target,
return (unsigned char *)iptr - (unsigned char *)headerp;

out:
- if (r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
- return n;
-
for (pos = 0; nchunks--;)
pos += r_xprt->rx_ia.ri_ops->ro_unmap(r_xprt,
&req->rl_segments[pos]);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 3188e36..768bb77 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1336,12 +1336,11 @@ rpcrdma_buffer_get_frmrs(struct rpcrdma_req *req, struct rpcrdma_buffer *buf,
struct rpcrdma_req *
rpcrdma_buffer_get(struct rpcrdma_buffer *buffers)
{
- struct rpcrdma_ia *ia = rdmab_to_ia(buffers);
- struct list_head stale;
struct rpcrdma_req *req;
unsigned long flags;

spin_lock_irqsave(&buffers->rb_lock, flags);
+
if (buffers->rb_send_index == buffers->rb_max_requests) {
spin_unlock_irqrestore(&buffers->rb_lock, flags);
dprintk("RPC: %s: out of request buffers\n", __func__);
@@ -1360,17 +1359,7 @@ rpcrdma_buffer_get(struct rpcrdma_buffer *buffers)
}
buffers->rb_send_bufs[buffers->rb_send_index++] = NULL;

- INIT_LIST_HEAD(&stale);
- switch (ia->ri_memreg_strategy) {
- case RPCRDMA_FRMR:
- req = rpcrdma_buffer_get_frmrs(req, buffers, &stale);
- break;
- default:
- break;
- }
spin_unlock_irqrestore(&buffers->rb_lock, flags);
- if (!list_empty(&stale))
- rpcrdma_retry_flushed_linv(&stale, buffers);
return req;
}

@@ -1382,18 +1371,10 @@ void
rpcrdma_buffer_put(struct rpcrdma_req *req)
{
struct rpcrdma_buffer *buffers = req->rl_buffer;
- struct rpcrdma_ia *ia = rdmab_to_ia(buffers);
unsigned long flags;

spin_lock_irqsave(&buffers->rb_lock, flags);
rpcrdma_buffer_put_sendbuf(req, buffers);
- switch (ia->ri_memreg_strategy) {
- case RPCRDMA_FRMR:
- rpcrdma_buffer_put_mrs(req, buffers);
- break;
- default:
- break;
- }
spin_unlock_irqrestore(&buffers->rb_lock, flags);
}



2015-05-11 18:03:36

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 08/16] xprtrdma: Introduce an FRMR recovery workqueue

After a transport disconnect, FRMRs can be left in an undetermined
state. In particular, the MR's rkey is no good.

Currently, FRMRs are fixed up by the transport connect worker, but
that can race with ->ro_unmap if an RPC happens to exit while the
transport connect worker is running.

A better way of dealing with broken FRMRs is to detect them before
they are re-used by ->ro_map. Such FRMRs are either already invalid
or are owned by the sending RPC, and thus no race with ->ro_unmap
is possible.

Introduce a mechanism for handing broken FRMRs to a workqueue to be
reset in a context that is appropriate for allocating resources
(ie. an ib_alloc_fast_reg_mr() API call).

This mechanism is not yet used, but will be in subsequent patches.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Reviewed-By: Devesh Sharma <[email protected]>
---
net/sunrpc/xprtrdma/frwr_ops.c | 71 ++++++++++++++++++++++++++++++++++++++-
net/sunrpc/xprtrdma/transport.c | 11 +++++-
net/sunrpc/xprtrdma/xprt_rdma.h | 5 +++
3 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 66a85fa..a06d9a3 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -17,6 +17,74 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif

+static struct workqueue_struct *frwr_recovery_wq;
+
+#define FRWR_RECOVERY_WQ_FLAGS (WQ_UNBOUND | WQ_MEM_RECLAIM)
+
+int
+frwr_alloc_recovery_wq(void)
+{
+ frwr_recovery_wq = alloc_workqueue("frwr_recovery",
+ FRWR_RECOVERY_WQ_FLAGS, 0);
+ return !frwr_recovery_wq ? -ENOMEM : 0;
+}
+
+void
+frwr_destroy_recovery_wq(void)
+{
+ struct workqueue_struct *wq;
+
+ if (!frwr_recovery_wq)
+ return;
+
+ wq = frwr_recovery_wq;
+ frwr_recovery_wq = NULL;
+ destroy_workqueue(wq);
+}
+
+/* Deferred reset of a single FRMR. Generate a fresh rkey by
+ * replacing the MR.
+ *
+ * There's no recovery if this fails. The FRMR is abandoned, but
+ * remains in rb_all. It will be cleaned up when the transport is
+ * destroyed.
+ */
+static void
+__frwr_recovery_worker(struct work_struct *work)
+{
+ struct rpcrdma_mw *r = container_of(work, struct rpcrdma_mw,
+ r.frmr.fr_work);
+ struct rpcrdma_xprt *r_xprt = r->r.frmr.fr_xprt;
+ unsigned int depth = r_xprt->rx_ia.ri_max_frmr_depth;
+ struct ib_pd *pd = r_xprt->rx_ia.ri_pd;
+
+ if (ib_dereg_mr(r->r.frmr.fr_mr))
+ goto out_fail;
+
+ r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(pd, depth);
+ if (IS_ERR(r->r.frmr.fr_mr))
+ goto out_fail;
+
+ dprintk("RPC: %s: recovered FRMR %p\n", __func__, r);
+ r->r.frmr.fr_state = FRMR_IS_INVALID;
+ rpcrdma_put_mw(r_xprt, r);
+ return;
+
+out_fail:
+ pr_warn("RPC: %s: FRMR %p unrecovered\n",
+ __func__, r);
+}
+
+/* A broken MR was discovered in a context that can't sleep.
+ * Defer recovery to the recovery worker.
+ */
+static void
+__frwr_queue_recovery(struct rpcrdma_mw *r)
+{
+ INIT_WORK(&r->r.frmr.fr_work, __frwr_recovery_worker);
+ queue_work(frwr_recovery_wq, &r->r.frmr.fr_work);
+}
+
static int
__frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device,
unsigned int depth)
@@ -128,7 +196,7 @@ frwr_sendcompletion(struct ib_wc *wc)

/* WARNING: Only wr_id and status are reliable at this point */
r = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
- dprintk("RPC: %s: frmr %p (stale), status %d\n",
+ pr_warn("RPC: %s: frmr %p flushed, status %d\n",
__func__, r, wc->status);
r->r.frmr.fr_state = FRMR_IS_STALE;
}
@@ -165,6 +233,7 @@ frwr_op_init(struct rpcrdma_xprt *r_xprt)
list_add(&r->mw_list, &buf->rb_mws);
list_add(&r->mw_all, &buf->rb_all);
r->mw_sendcompletion = frwr_sendcompletion;
+ r->r.frmr.fr_xprt = r_xprt;
}

return 0;
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 7c12556..6f8943c 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -731,17 +731,24 @@ static void __exit xprt_rdma_cleanup(void)
if (rc)
dprintk("RPC: %s: xprt_unregister returned %i\n",
__func__, rc);
+
+ frwr_destroy_recovery_wq();
}

static int __init xprt_rdma_init(void)
{
int rc;

- rc = xprt_register_transport(&xprt_rdma);
-
+ rc = frwr_alloc_recovery_wq();
if (rc)
return rc;

+ rc = xprt_register_transport(&xprt_rdma);
+ if (rc) {
+ frwr_destroy_recovery_wq();
+ return rc;
+ }
+
dprintk("RPCRDMA Module Init, register RPC RDMA transport\n");

dprintk("Defaults:\n");
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 5b801d5..c5862a4 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -203,6 +203,8 @@ struct rpcrdma_frmr {
struct ib_fast_reg_page_list *fr_pgl;
struct ib_mr *fr_mr;
enum rpcrdma_frmr_state fr_state;
+ struct work_struct fr_work;
+ struct rpcrdma_xprt *fr_xprt;
};

struct rpcrdma_mw {
@@ -427,6 +429,9 @@ void rpcrdma_free_regbuf(struct rpcrdma_ia *,

unsigned int rpcrdma_max_segments(struct rpcrdma_xprt *);

+int frwr_alloc_recovery_wq(void);
+void frwr_destroy_recovery_wq(void);
+
/*
* Wrappers for chunk registration, shared by read/write chunk code.
*/


2015-05-12 18:12:42

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v2 02/16] xprtrdma: Warn when there are orphaned IB objects

Hi Chuck,

On 05/11/2015 02:02 PM, Chuck Lever wrote:
> WARN during transport destruction if ib_dealloc_pd() fails. This is
> a sign that xprtrdma orphaned one or more RDMA API objects at some
> point, which can pin lower layer kernel modules and cause shutdown
> to hang.

I'm curious, what would cause an RDMA object to get orphaned in the first place? Is there any way to prevent that?

Anna

>
> Signed-off-by: Chuck Lever <[email protected]>
> Reviewed-by: Steve Wise <[email protected]>
> Reviewed-by: Sagi Grimberg <[email protected]>
> Reviewed-by: Devesh Sharma <[email protected]>
> ---
> net/sunrpc/xprtrdma/verbs.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 4870d27..51900e6 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -702,17 +702,17 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
> dprintk("RPC: %s: ib_dereg_mr returned %i\n",
> __func__, rc);
> }
> +
> if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
> if (ia->ri_id->qp)
> rdma_destroy_qp(ia->ri_id);
> rdma_destroy_id(ia->ri_id);
> ia->ri_id = NULL;
> }
> - if (ia->ri_pd != NULL && !IS_ERR(ia->ri_pd)) {
> - rc = ib_dealloc_pd(ia->ri_pd);
> - dprintk("RPC: %s: ib_dealloc_pd returned %i\n",
> - __func__, rc);
> - }
> +
> + /* If the pd is still busy, xprtrdma missed freeing a resource */
> + if (ia->ri_pd && !IS_ERR(ia->ri_pd))
> + WARN_ON(ib_dealloc_pd(ia->ri_pd));
> }
>
> /*
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2015-05-12 18:13:03

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v2 02/16] xprtrdma: Warn when there are orphaned IB objects

Hi Chuck,

On 05/11/2015 02:02 PM, Chuck Lever wrote:
> WARN during transport destruction if ib_dealloc_pd() fails. This is
> a sign that xprtrdma orphaned one or more RDMA API objects at some
> point, which can pin lower layer kernel modules and cause shutdown
> to hang.

I'm curious, what would cause an RDMA object to get orphaned in the first place? Is there any way to prevent that?

Anna

>
> Signed-off-by: Chuck Lever <[email protected]>
> Reviewed-by: Steve Wise <[email protected]>
> Reviewed-by: Sagi Grimberg <[email protected]>
> Reviewed-by: Devesh Sharma <[email protected]>
> ---
> net/sunrpc/xprtrdma/verbs.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 4870d27..51900e6 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -702,17 +702,17 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
> dprintk("RPC: %s: ib_dereg_mr returned %i\n",
> __func__, rc);
> }
> +
> if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
> if (ia->ri_id->qp)
> rdma_destroy_qp(ia->ri_id);
> rdma_destroy_id(ia->ri_id);
> ia->ri_id = NULL;
> }
> - if (ia->ri_pd != NULL && !IS_ERR(ia->ri_pd)) {
> - rc = ib_dealloc_pd(ia->ri_pd);
> - dprintk("RPC: %s: ib_dealloc_pd returned %i\n",
> - __func__, rc);
> - }
> +
> + /* If the pd is still busy, xprtrdma missed freeing a resource */
> + if (ia->ri_pd && !IS_ERR(ia->ri_pd))
> + WARN_ON(ib_dealloc_pd(ia->ri_pd));
> }
>
> /*
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2015-05-12 18:14:05

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v2 02/16] xprtrdma: Warn when there are orphaned IB objects

Hi Chuck,

On 05/11/2015 02:02 PM, Chuck Lever wrote:
> WARN during transport destruction if ib_dealloc_pd() fails. This is
> a sign that xprtrdma orphaned one or more RDMA API objects at some
> point, which can pin lower layer kernel modules and cause shutdown
> to hang.

I'm curious, what would cause the API objects to get orphaned in the first place? Is there any way to prevent it?

Anna

>
> Signed-off-by: Chuck Lever <[email protected]>
> Reviewed-by: Steve Wise <[email protected]>
> Reviewed-by: Sagi Grimberg <[email protected]>
> Reviewed-by: Devesh Sharma <[email protected]>
> ---
> net/sunrpc/xprtrdma/verbs.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 4870d27..51900e6 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -702,17 +702,17 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
> dprintk("RPC: %s: ib_dereg_mr returned %i\n",
> __func__, rc);
> }
> +
> if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
> if (ia->ri_id->qp)
> rdma_destroy_qp(ia->ri_id);
> rdma_destroy_id(ia->ri_id);
> ia->ri_id = NULL;
> }
> - if (ia->ri_pd != NULL && !IS_ERR(ia->ri_pd)) {
> - rc = ib_dealloc_pd(ia->ri_pd);
> - dprintk("RPC: %s: ib_dealloc_pd returned %i\n",
> - __func__, rc);
> - }
> +
> + /* If the pd is still busy, xprtrdma missed freeing a resource */
> + if (ia->ri_pd && !IS_ERR(ia->ri_pd))
> + WARN_ON(ib_dealloc_pd(ia->ri_pd));
> }
>
> /*
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2015-05-12 18:19:36

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH v2 02/16] xprtrdma: Warn when there are orphaned IB objects


On May 12, 2015, at 2:12 PM, Anna Schumaker <[email protected]> wrote:

> Hi Chuck,
>
> On 05/11/2015 02:02 PM, Chuck Lever wrote:
>> WARN during transport destruction if ib_dealloc_pd() fails. This is
>> a sign that xprtrdma orphaned one or more RDMA API objects at some
>> point, which can pin lower layer kernel modules and cause shutdown
>> to hang.
>
> I'm curious, what would cause an RDMA object to get orphaned in the first place?

A leaked object means there?s a software bug in the API consumer, which
is xprtrdma in this case. xprtrdma is supposed to track and clean up
every object it creates.

> Is there any way to prevent that?

The usual thing to do is find and fix the bug that allowed the leak.

> Anna
>
>>
>> Signed-off-by: Chuck Lever <[email protected]>
>> Reviewed-by: Steve Wise <[email protected]>
>> Reviewed-by: Sagi Grimberg <[email protected]>
>> Reviewed-by: Devesh Sharma <[email protected]>
>> ---
>> net/sunrpc/xprtrdma/verbs.c | 10 +++++-----
>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index 4870d27..51900e6 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -702,17 +702,17 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
>> dprintk("RPC: %s: ib_dereg_mr returned %i\n",
>> __func__, rc);
>> }
>> +
>> if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
>> if (ia->ri_id->qp)
>> rdma_destroy_qp(ia->ri_id);
>> rdma_destroy_id(ia->ri_id);
>> ia->ri_id = NULL;
>> }
>> - if (ia->ri_pd != NULL && !IS_ERR(ia->ri_pd)) {
>> - rc = ib_dealloc_pd(ia->ri_pd);
>> - dprintk("RPC: %s: ib_dealloc_pd returned %i\n",
>> - __func__, rc);
>> - }
>> +
>> + /* If the pd is still busy, xprtrdma missed freeing a resource */
>> + if (ia->ri_pd && !IS_ERR(ia->ri_pd))
>> + WARN_ON(ib_dealloc_pd(ia->ri_pd));
>> }
>>
>> /*
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2015-05-20 17:51:08

by Devesh Sharma

[permalink] [raw]
Subject: RE: [PATCH v2 00/16] NFS/RDMA patches proposed for 4.2

Medusa test passes with average load.

Tested-By: Devesh Sharma <[email protected]>

> -----Original Message-----
> From: [email protected] [mailto:linux-rdma-
> [email protected]] On Behalf Of Chuck Lever
> Sent: Monday, May 11, 2015 11:32 PM
> To: [email protected]; [email protected]
> Subject: [PATCH v2 00/16] NFS/RDMA patches proposed for 4.2
>
> I'd like these patches to be considered for merging upstream. This patch
> series
> includes:
>
> - JIT allocation of rpcrdma_mw structures
> - Break-up of rb_lock
> - Reduction of how many rpcrdma_mw structs are needed per transport
>
> These are pre-requisites for increasing the RPC slot count and r/wsize on
> RPC/RDMA transports, and provide scalability benefits even on their own.
> And:
>
> - A generic transport fault injector
>
> This is useful to discover regressions in logic that handles transport
> reconnection.
>
> You can find these in my git repo in the "nfs-rdma-for-4.2" topic branch.
> See:
>
> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>
>
> Changes since v1:
>
> - Rebased on 4.1-rc3
> - Transport fault injector controlled from debugfs rather than /proc
> - Transport fault injector works for all transport types
> - bc_send() clean up suggested by Christoph Hellwig
> - Added Reviewed-by: tags. Many thanks to reviewers!
> - Addressed all review comments but one: Sagi's comment about
> ri_device remains unresolved.
>
> ---
>
> Chuck Lever (16):
> SUNRPC: Transport fault injection
> xprtrdma: Warn when there are orphaned IB objects
> xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt
> xprtrdma: Remove rr_func
> xprtrdma: Use ib_device pointer safely
> xprtrdma: Introduce helpers for allocating MWs
> xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external()
> xprtrdma: Introduce an FRMR recovery workqueue
> xprtrdma: Acquire MRs in rpcrdma_register_external()
> xprtrdma: Remove unused LOCAL_INV recovery logic
> xprtrdma: Remove ->ro_reset
> xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy
> xprtrdma: Split rb_lock
> xprtrdma: Stack relief in fmr_op_map()
> xprtrdma: Reduce per-transport MR allocation
> SUNRPC: Clean up bc_send()
>
>
> include/linux/sunrpc/bc_xprt.h | 1
> include/linux/sunrpc/xprt.h | 19 +++
> include/linux/sunrpc/xprtrdma.h | 3
> net/sunrpc/Makefile | 2
> net/sunrpc/bc_svc.c | 63 ---------
> net/sunrpc/clnt.c | 1
> net/sunrpc/debugfs.c | 77 +++++++++++
> net/sunrpc/svc.c | 33 ++++-
> net/sunrpc/xprt.c | 2
> net/sunrpc/xprtrdma/fmr_ops.c | 120 +++++++++++------
> net/sunrpc/xprtrdma/frwr_ops.c | 227
> +++++++++++++++++++++++---------
> net/sunrpc/xprtrdma/physical_ops.c | 14 --
> net/sunrpc/xprtrdma/rpc_rdma.c | 8 -
> net/sunrpc/xprtrdma/transport.c | 30 +++-
> net/sunrpc/xprtrdma/verbs.c | 257
> +++++++++---------------------------
> net/sunrpc/xprtrdma/xprt_rdma.h | 38 ++++-
> net/sunrpc/xprtsock.c | 10 +
> 17 files changed, 492 insertions(+), 413 deletions(-) delete mode 100644
> net/sunrpc/bc_svc.c
>
> --
> Chuck Lever
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body
> of a message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html

2015-05-26 15:28:41

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v2 00/16] NFS/RDMA patches proposed for 4.2

On Mon, 2015-05-11 at 14:02 -0400, Chuck Lever wrote:
> I'd like these patches to be considered for merging upstream. This
> patch series includes:
>
> - JIT allocation of rpcrdma_mw structures
> - Break-up of rb_lock
> - Reduction of how many rpcrdma_mw structs are needed per transport
>
> These are pre-requisites for increasing the RPC slot count and
> r/wsize on RPC/RDMA transports, and provide scalability benefits
> even on their own. And:
>
> - A generic transport fault injector
>
> This is useful to discover regressions in logic that handles
> transport reconnection.
>
> You can find these in my git repo in the "nfs-rdma-for-4.2" topic
> branch. See:
>
> git://git.linux-nfs.org/projects/cel/cel-2.6.git

I assume you are planning on this going in through the nfs tree. As
such, I'm planning on removing this patchset from the linux-rdma
patchworks site.

However, I'll add this for the series:

Reviewed-by: Doug Ledford <[email protected]>

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD


Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-05-26 15:36:56

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH v2 00/16] NFS/RDMA patches proposed for 4.2


On May 26, 2015, at 11:28 AM, Doug Ledford <[email protected]> wrote:

> On Mon, 2015-05-11 at 14:02 -0400, Chuck Lever wrote:
>> I'd like these patches to be considered for merging upstream. This
>> patch series includes:
>>
>> - JIT allocation of rpcrdma_mw structures
>> - Break-up of rb_lock
>> - Reduction of how many rpcrdma_mw structs are needed per transport
>>
>> These are pre-requisites for increasing the RPC slot count and
>> r/wsize on RPC/RDMA transports, and provide scalability benefits
>> even on their own. And:
>>
>> - A generic transport fault injector
>>
>> This is useful to discover regressions in logic that handles
>> transport reconnection.
>>
>> You can find these in my git repo in the "nfs-rdma-for-4.2" topic
>> branch. See:
>>
>> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>
> I assume you are planning on this going in through the nfs tree. As
> such, I'm planning on removing this patchset from the linux-rdma
> patchworks site.

Yes, patches to net/sunrpc/xprtrdma/ will typically go through
Anna or Bruce. I post to linux-rdma for review of RDMA-related
changes.

> However, I'll add this for the series:
>
> Reviewed-by: Doug Ledford <[email protected]>

Thanks, I will post a refresh today.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com