2019-04-10 20:06:36

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 00/19] Proposed NFS/RDMA patches for v5.2

For review, this is a series of minor clean-ups, bug fixes, and
performance optimizations.

See also:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-for-5.2

---

Chuck Lever (19):
SUNRPC: Avoid digging into the ATOMIC pool
xprtrdma: Fix an frwr_map recovery nit
xprtrdma: Defer completion only when local invalidation is needed
xprtrdma: Clean up rpcrdma_create_req()
xprtrdma: Clean up rpcrdma_create_rep() and rpcrdma_destroy_rep()
xprtrdma: rpcrdma_regbuf alignment
xprtrdma: Allocate req's regbufs at xprt create time
xprtrdma: De-duplicate "allocate new, free old regbuf"
xprtrdma: Clean up regbuf helpers
xprtrdma: Backchannel can use GFP_KERNEL allocations
xprtrdma: Increase maximum number of backchannel request
xprtrdma: Trace marshaling failures
xprtrdma: Clean up sendctx functions
xprtrdma: More Send completion batching
xprtrdma: Eliminate rpcrdma_ia::ri_device
SUNRPC: Update comments based on recent changes
xprtrdma: Remove rpcrdma_create_data_internal::rsize and wsize
xprtrdma: Aggregate the inline settings in struct rpcrdma_ep
xprtrdma: Eliminate struct rpcrdma_create_data_internal


include/trace/events/rpcrdma.h | 27 +++
net/sunrpc/socklib.c | 2
net/sunrpc/xprt.c | 4
net/sunrpc/xprtrdma/backchannel.c | 120 +++++---------
net/sunrpc/xprtrdma/frwr_ops.c | 40 ++---
net/sunrpc/xprtrdma/rpc_rdma.c | 146 +++++++++--------
net/sunrpc/xprtrdma/transport.c | 103 ++----------
net/sunrpc/xprtrdma/verbs.c | 327 +++++++++++++++++++++----------------
net/sunrpc/xprtrdma/xprt_rdma.h | 114 ++++++-------
9 files changed, 430 insertions(+), 453 deletions(-)

--
Chuck Lever


2019-04-10 20:06:40

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 01/19] SUNRPC: Avoid digging into the ATOMIC pool

Page allocation requests made when the SPARSE_PAGES flag is set are
allowed to fail, and are not critical. No need to spend a rare
resource.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/socklib.c | 2 +-
net/sunrpc/xprtrdma/rpc_rdma.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c
index 7e55cfc..9faea12 100644
--- a/net/sunrpc/socklib.c
+++ b/net/sunrpc/socklib.c
@@ -106,7 +106,7 @@ static size_t xdr_skb_read_and_csum_bits(struct xdr_skb_reader *desc, void *to,
/* ACL likes to be lazy in allocating pages - ACLs
* are small by default but can get huge. */
if ((xdr->flags & XDRBUF_SPARSE_PAGES) && *ppage == NULL) {
- *ppage = alloc_page(GFP_ATOMIC);
+ *ppage = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
if (unlikely(*ppage == NULL)) {
if (copied == 0)
copied = -ENOMEM;
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 6c1fb27..b759b16 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -238,7 +238,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
*/
if (unlikely(xdrbuf->flags & XDRBUF_SPARSE_PAGES)) {
if (!*ppages)
- *ppages = alloc_page(GFP_ATOMIC);
+ *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
if (!*ppages)
return -ENOBUFS;
}


2019-04-10 20:06:45

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 02/19] xprtrdma: Fix an frwr_map recovery nit

After a DMA map failure in frwr_map, mark the MR so that recycling
won't attempt to DMA unmap it.

Signed-off-by: Chuck Lever <[email protected]>
Fixes: e2f34e26710b ("xprtrdma: Yet another double DMA-unmap")
---
net/sunrpc/xprtrdma/frwr_ops.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 52cb6c1..a2a2e01 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -466,7 +466,7 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt,
return seg;

out_dmamap_err:
- frwr->fr_state = FRWR_IS_INVALID;
+ mr->mr_dir = DMA_NONE;
trace_xprtrdma_frwr_sgerr(mr, i);
rpcrdma_mr_put(mr);
return ERR_PTR(-EIO);


2019-04-10 20:06:51

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 03/19] xprtrdma: Defer completion only when local invalidation is needed

While looking at another issue, I noticed that deferred completion
happens to run on the same CPU as Receive completion, thanks to the
fact that the deferred completion workqueue is BOUND. That suggests
there's really no benefit to deferring completion unless it will
have to context switch while waiting for LocalInv to complete.

A somewhat non-intuitive side benefit of this change is that there
are fewer waits for Send completions. Now that this wait is always
done in the Reply handler (a single process) it serializes
subsequent replies. Send completions are batched, so waiting for one
Send completion means waiting for all outstanding Send completions
at once. When the Reply handler gets to subsequent replies, waiting
(and the context switch that goes with it) is less likely to be
needed.

Measurements of IOPS throughput without deferred completion show
improvement of several percent, and latency is just as good or
slightly better for 4KB 100% read and 8KB 70% read / 30% write.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 31 +++++++++++++++++++++++++------
net/sunrpc/xprtrdma/verbs.c | 8 ++++----
net/sunrpc/xprtrdma/xprt_rdma.h | 1 -
3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index b759b16..c3bd18a 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -1226,7 +1226,7 @@ static int decode_reply_chunk(struct xdr_stream *xdr, u32 *length)
* RPC completion while holding the transport lock to ensure
* the rep, rqst, and rq_task pointers remain stable.
*/
-void rpcrdma_complete_rqst(struct rpcrdma_rep *rep)
+static void rpcrdma_complete_rqst(struct rpcrdma_rep *rep)
{
struct rpcrdma_xprt *r_xprt = rep->rr_rxprt;
struct rpc_xprt *xprt = &r_xprt->rx_xprt;
@@ -1268,6 +1268,12 @@ void rpcrdma_complete_rqst(struct rpcrdma_rep *rep)
goto out;
}

+/**
+ * rpcrdma_release_rqst - Release hardware resources
+ * @r_xprt: controlling transport
+ * @req: request with resources to release
+ *
+ */
void rpcrdma_release_rqst(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
{
/* Invalidate and unmap the data payloads before waking
@@ -1295,7 +1301,11 @@ void rpcrdma_release_rqst(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
}
}

-/* Reply handling runs in the poll worker thread. Anything that
+/**
+ * rpcrdma_deferred_completion
+ * @work: work struct embedded in an rpcrdma_rep
+ *
+ * Reply handling runs in the poll worker thread. Anything that
* might wait is deferred to a separate workqueue.
*/
void rpcrdma_deferred_completion(struct work_struct *work)
@@ -1306,13 +1316,14 @@ void rpcrdma_deferred_completion(struct work_struct *work)
struct rpcrdma_xprt *r_xprt = rep->rr_rxprt;

trace_xprtrdma_defer_cmp(rep);
- if (rep->rr_wc_flags & IB_WC_WITH_INVALIDATE)
- frwr_reminv(rep, &req->rl_registered);
+
rpcrdma_release_rqst(r_xprt, req);
rpcrdma_complete_rqst(rep);
}

-/* Process received RPC/RDMA messages.
+/**
+ * rpcrdma_reply_handler - Process received RPC/RDMA messages
+ * @rep: Incoming rpcrdma_rep object to process
*
* Errors must result in the RPC task either being awakened, or
* allowed to timeout, to discover the errors at that time.
@@ -1375,7 +1386,15 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *rep)
clear_bit(RPCRDMA_REQ_F_PENDING, &req->rl_flags);

trace_xprtrdma_reply(rqst->rq_task, rep, req, credits);
- queue_work(buf->rb_completion_wq, &rep->rr_work);
+
+ if (rep->rr_wc_flags & IB_WC_WITH_INVALIDATE)
+ frwr_reminv(rep, &req->rl_registered);
+ if (!list_empty(&req->rl_registered)) {
+ queue_work(buf->rb_completion_wq, &rep->rr_work);
+ } else {
+ rpcrdma_release_rqst(r_xprt, req);
+ rpcrdma_complete_rqst(rep);
+ }
return;

out_badversion:
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 30cfc0e..fe005c6 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1106,10 +1106,10 @@ struct rpcrdma_req *
if (rc)
goto out;

- buf->rb_completion_wq = alloc_workqueue("rpcrdma-%s",
- WQ_MEM_RECLAIM | WQ_HIGHPRI,
- 0,
- r_xprt->rx_xprt.address_strings[RPC_DISPLAY_ADDR]);
+ buf->rb_completion_wq =
+ alloc_workqueue("rpcrdma-%s",
+ WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 0,
+ r_xprt->rx_xprt.address_strings[RPC_DISPLAY_ADDR]);
if (!buf->rb_completion_wq) {
rc = -ENOMEM;
goto out;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 10f6593..6a49597 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -613,7 +613,6 @@ int rpcrdma_prepare_send_sges(struct rpcrdma_xprt *r_xprt,
void rpcrdma_unmap_sendctx(struct rpcrdma_sendctx *sc);
int rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst);
void rpcrdma_set_max_header_sizes(struct rpcrdma_xprt *);
-void rpcrdma_complete_rqst(struct rpcrdma_rep *rep);
void rpcrdma_reply_handler(struct rpcrdma_rep *rep);
void rpcrdma_release_rqst(struct rpcrdma_xprt *r_xprt,
struct rpcrdma_req *req);


2019-04-10 20:06:56

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 04/19] xprtrdma: Clean up rpcrdma_create_req()

Eventually, I'd like to invoke rpcrdma_create_req() during the
call_reserve step. Memory allocation there probably needs to use
GFP_NOIO. Therefore a set of GFP flags needs to be passed in.

As an additional clean up, just return a pointer or NULL, because
the only error return code here is -ENOMEM.

Lastly, clean up the function names to be consistent with the
pattern: "rpcrdma" _ object-type _ action

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/backchannel.c | 6 +++---
net/sunrpc/xprtrdma/verbs.c | 29 ++++++++++++++++-------------
net/sunrpc/xprtrdma/xprt_rdma.h | 3 ++-
3 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c
index d79b18c..713961a 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -31,9 +31,9 @@ static int rpcrdma_bc_setup_reqs(struct rpcrdma_xprt *r_xprt,
struct rpcrdma_regbuf *rb;
size_t size;

- req = rpcrdma_create_req(r_xprt);
- if (IS_ERR(req))
- return PTR_ERR(req);
+ req = rpcrdma_req_create(r_xprt, GFP_KERNEL);
+ if (!req)
+ return -ENOMEM;
rqst = &req->rl_slot;

rqst->rq_xprt = xprt;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index fe005c6..82ea298 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -996,22 +996,27 @@ struct rpcrdma_sendctx *rpcrdma_sendctx_get_locked(struct rpcrdma_buffer *buf)
rpcrdma_mrs_create(r_xprt);
}

-struct rpcrdma_req *
-rpcrdma_create_req(struct rpcrdma_xprt *r_xprt)
+/**
+ * rpcrdma_req_create - Allocate an rpcrdma_req object
+ * @r_xprt: controlling r_xprt
+ * @flags: GFP flags passed to memory allocators
+ *
+ * Returns an allocated and fully initialized rpcrdma_req or NULL.
+ */
+struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, gfp_t flags)
{
struct rpcrdma_buffer *buffer = &r_xprt->rx_buf;
struct rpcrdma_regbuf *rb;
struct rpcrdma_req *req;

- req = kzalloc(sizeof(*req), GFP_KERNEL);
+ req = kzalloc(sizeof(*req), flags);
if (req == NULL)
- return ERR_PTR(-ENOMEM);
+ return NULL;

- rb = rpcrdma_alloc_regbuf(RPCRDMA_HDRBUF_SIZE,
- DMA_TO_DEVICE, GFP_KERNEL);
+ rb = rpcrdma_alloc_regbuf(RPCRDMA_HDRBUF_SIZE, DMA_TO_DEVICE, flags);
if (IS_ERR(rb)) {
kfree(req);
- return ERR_PTR(-ENOMEM);
+ return NULL;
}
req->rl_rdmabuf = rb;
xdr_buf_init(&req->rl_hdrbuf, rb->rg_base, rdmab_length(rb));
@@ -1086,16 +1091,14 @@ struct rpcrdma_req *

INIT_LIST_HEAD(&buf->rb_send_bufs);
INIT_LIST_HEAD(&buf->rb_allreqs);
+
+ rc = -ENOMEM;
for (i = 0; i < buf->rb_max_requests; i++) {
struct rpcrdma_req *req;

- req = rpcrdma_create_req(r_xprt);
- if (IS_ERR(req)) {
- dprintk("RPC: %s: request buffer %d alloc"
- " failed\n", __func__, i);
- rc = PTR_ERR(req);
+ req = rpcrdma_req_create(r_xprt, GFP_KERNEL);
+ if (!req)
goto out;
- }
list_add(&req->rl_list, &buf->rb_send_bufs);
}

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 6a49597..ad6d8df 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -528,7 +528,8 @@ int rpcrdma_ep_post(struct rpcrdma_ia *, struct rpcrdma_ep *,
/*
* Buffer calls - xprtrdma/verbs.c
*/
-struct rpcrdma_req *rpcrdma_create_req(struct rpcrdma_xprt *);
+struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt,
+ gfp_t flags);
void rpcrdma_req_destroy(struct rpcrdma_req *req);
int rpcrdma_buffer_create(struct rpcrdma_xprt *);
void rpcrdma_buffer_destroy(struct rpcrdma_buffer *);


2019-04-10 20:07:01

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 05/19] xprtrdma: Clean up rpcrdma_create_rep() and rpcrdma_destroy_rep()

For code legibility, clean up the function names to be consistent
with the pattern: "rpcrdma" _ object-type _ action

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/verbs.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 82ea298..3bc751e 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -76,7 +76,6 @@
static void rpcrdma_sendctx_put_locked(struct rpcrdma_sendctx *sc);
static void rpcrdma_mrs_create(struct rpcrdma_xprt *r_xprt);
static void rpcrdma_mrs_destroy(struct rpcrdma_buffer *buf);
-static int rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt, bool temp);
static void rpcrdma_dma_unmap_regbuf(struct rpcrdma_regbuf *rb);
static void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, bool temp);

@@ -1029,15 +1028,12 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, gfp_t flags)
return req;
}

-static int
-rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt, bool temp)
+static bool rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt, bool temp)
{
struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
struct rpcrdma_rep *rep;
- int rc;

- rc = -ENOMEM;
rep = kzalloc(sizeof(*rep), GFP_KERNEL);
if (rep == NULL)
goto out;
@@ -1063,12 +1059,12 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, gfp_t flags)
spin_lock(&buf->rb_lock);
list_add(&rep->rr_list, &buf->rb_recv_bufs);
spin_unlock(&buf->rb_lock);
- return 0;
+ return true;

out_free:
kfree(rep);
out:
- return rc;
+ return false;
}

int
@@ -1124,8 +1120,7 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, gfp_t flags)
return rc;
}

-static void
-rpcrdma_destroy_rep(struct rpcrdma_rep *rep)
+static void rpcrdma_rep_destroy(struct rpcrdma_rep *rep)
{
rpcrdma_free_regbuf(rep->rr_rdmabuf);
kfree(rep);
@@ -1205,7 +1200,7 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, gfp_t flags)
rep = list_first_entry(&buf->rb_recv_bufs,
struct rpcrdma_rep, rr_list);
list_del(&rep->rr_list);
- rpcrdma_destroy_rep(rep);
+ rpcrdma_rep_destroy(rep);
}

while (!list_empty(&buf->rb_send_bufs)) {
@@ -1334,7 +1329,7 @@ struct rpcrdma_req *
}
spin_unlock(&buffers->rb_lock);
if (rep)
- rpcrdma_destroy_rep(rep);
+ rpcrdma_rep_destroy(rep);
}

/*
@@ -1351,7 +1346,7 @@ struct rpcrdma_req *
list_add(&rep->rr_list, &buffers->rb_recv_bufs);
spin_unlock(&buffers->rb_lock);
} else {
- rpcrdma_destroy_rep(rep);
+ rpcrdma_rep_destroy(rep);
}
}

@@ -1500,7 +1495,7 @@ struct rpcrdma_regbuf *
list_del(&rep->rr_list);
spin_unlock(&buf->rb_lock);
if (!rep) {
- if (rpcrdma_create_rep(r_xprt, temp))
+ if (!rpcrdma_rep_create(r_xprt, temp))
break;
continue;
}


2019-04-10 20:07:07

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 06/19] xprtrdma: rpcrdma_regbuf alignment

Allocate the struct rpcrdma_regbuf separately from the I/O buffer
to better guarantee the alignment of the I/O buffer and eliminate
the wasted space between the rpcrdma_regbuf metadata and the buffer
itself.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/backchannel.c | 6 +++---
net/sunrpc/xprtrdma/rpc_rdma.c | 4 ++--
net/sunrpc/xprtrdma/transport.c | 8 ++++----
net/sunrpc/xprtrdma/verbs.c | 29 ++++++++++++++++-------------
net/sunrpc/xprtrdma/xprt_rdma.h | 19 ++++++++++---------
5 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c
index 713961a..6170ec7 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -45,10 +45,10 @@ static int rpcrdma_bc_setup_reqs(struct rpcrdma_xprt *r_xprt,

size = r_xprt->rx_data.inline_rsize;
rb = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, GFP_KERNEL);
- if (IS_ERR(rb))
+ if (!rb)
goto out_fail;
req->rl_sendbuf = rb;
- xdr_buf_init(&rqst->rq_snd_buf, rb->rg_base,
+ xdr_buf_init(&rqst->rq_snd_buf, rdmab_data(rb),
min_t(size_t, size, PAGE_SIZE));
}
return 0;
@@ -123,7 +123,7 @@ static int rpcrdma_bc_marshal_reply(struct rpc_rqst *rqst)

rpcrdma_set_xdrlen(&req->rl_hdrbuf, 0);
xdr_init_encode(&req->rl_stream, &req->rl_hdrbuf,
- req->rl_rdmabuf->rg_base, rqst);
+ rdmab_data(req->rl_rdmabuf), rqst);

p = xdr_reserve_space(&req->rl_stream, 28);
if (unlikely(!p))
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index c3bd18a..31032c7 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -747,8 +747,8 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
int ret;

rpcrdma_set_xdrlen(&req->rl_hdrbuf, 0);
- xdr_init_encode(xdr, &req->rl_hdrbuf,
- req->rl_rdmabuf->rg_base, rqst);
+ xdr_init_encode(xdr, &req->rl_hdrbuf, rdmab_data(req->rl_rdmabuf),
+ rqst);

/* Fixed header fields */
ret = -EMSGSIZE;
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 5d26135..e3b5b91 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -595,7 +595,7 @@ void xprt_rdma_close(struct rpc_xprt *xprt)
return true;

rb = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, flags);
- if (IS_ERR(rb))
+ if (!rb)
return false;

rpcrdma_free_regbuf(req->rl_sendbuf);
@@ -625,7 +625,7 @@ void xprt_rdma_close(struct rpc_xprt *xprt)
return true;

rb = rpcrdma_alloc_regbuf(size, DMA_NONE, flags);
- if (IS_ERR(rb))
+ if (!rb)
return false;

rpcrdma_free_regbuf(req->rl_recvbuf);
@@ -660,8 +660,8 @@ void xprt_rdma_close(struct rpc_xprt *xprt)
if (!rpcrdma_get_recvbuf(r_xprt, req, rqst->rq_rcvsize, flags))
goto out_fail;

- rqst->rq_buffer = req->rl_sendbuf->rg_base;
- rqst->rq_rbuffer = req->rl_recvbuf->rg_base;
+ rqst->rq_buffer = rdmab_data(req->rl_sendbuf);
+ rqst->rq_rbuffer = rdmab_data(req->rl_recvbuf);
trace_xprtrdma_op_allocate(task, req);
return 0;

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 3bc751e..ca2d6d8 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1013,12 +1013,12 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, gfp_t flags)
return NULL;

rb = rpcrdma_alloc_regbuf(RPCRDMA_HDRBUF_SIZE, DMA_TO_DEVICE, flags);
- if (IS_ERR(rb)) {
+ if (!rb) {
kfree(req);
return NULL;
}
req->rl_rdmabuf = rb;
- xdr_buf_init(&req->rl_hdrbuf, rb->rg_base, rdmab_length(rb));
+ xdr_buf_init(&req->rl_hdrbuf, rdmab_data(rb), rdmab_length(rb));
req->rl_buffer = buffer;
INIT_LIST_HEAD(&req->rl_registered);

@@ -1040,11 +1040,9 @@ static bool rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt, bool temp)

rep->rr_rdmabuf = rpcrdma_alloc_regbuf(cdata->inline_rsize,
DMA_FROM_DEVICE, GFP_KERNEL);
- if (IS_ERR(rep->rr_rdmabuf)) {
- rc = PTR_ERR(rep->rr_rdmabuf);
+ if (!rep->rr_rdmabuf)
goto out_free;
- }
- xdr_buf_init(&rep->rr_hdrbuf, rep->rr_rdmabuf->rg_base,
+ xdr_buf_init(&rep->rr_hdrbuf, rdmab_data(rep->rr_rdmabuf),
rdmab_length(rep->rr_rdmabuf));

rep->rr_cqe.done = rpcrdma_wc_receive;
@@ -1356,8 +1354,7 @@ struct rpcrdma_req *
* @direction: direction of data movement
* @flags: GFP flags
*
- * Returns an ERR_PTR, or a pointer to a regbuf, a buffer that
- * can be persistently DMA-mapped for I/O.
+ * Returns a pointer to a rpcrdma_regbuf object, or NULL.
*
* xprtrdma uses a regbuf for posting an outgoing RDMA SEND, or for
* receiving the payload of RDMA RECV operations. During Long Calls
@@ -1369,14 +1366,18 @@ struct rpcrdma_regbuf *
{
struct rpcrdma_regbuf *rb;

- rb = kmalloc(sizeof(*rb) + size, flags);
- if (rb == NULL)
- return ERR_PTR(-ENOMEM);
+ rb = kmalloc(sizeof(*rb), flags);
+ if (!rb)
+ return NULL;
+ rb->rg_data = kmalloc(size, flags);
+ if (!rb->rg_data) {
+ kfree(rb);
+ return NULL;
+ }

rb->rg_device = NULL;
rb->rg_direction = direction;
rb->rg_iov.length = size;
-
return rb;
}

@@ -1394,7 +1395,7 @@ struct rpcrdma_regbuf *
return false;

rb->rg_iov.addr = ib_dma_map_single(device,
- (void *)rb->rg_base,
+ rdmab_data(rb),
rdmab_length(rb),
rb->rg_direction);
if (ib_dma_mapping_error(device, rdmab_addr(rb))) {
@@ -1429,6 +1430,8 @@ struct rpcrdma_regbuf *
rpcrdma_free_regbuf(struct rpcrdma_regbuf *rb)
{
rpcrdma_dma_unmap_regbuf(rb);
+ if (rb)
+ kfree(rb->rg_data);
kfree(rb);
}

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index ad6d8df..73f9e54 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -121,33 +121,34 @@ struct rpcrdma_regbuf {
struct ib_sge rg_iov;
struct ib_device *rg_device;
enum dma_data_direction rg_direction;
- __be32 rg_base[0] __attribute__ ((aligned(256)));
+ void *rg_data;
};

-static inline u64
-rdmab_addr(struct rpcrdma_regbuf *rb)
+static inline u64 rdmab_addr(struct rpcrdma_regbuf *rb)
{
return rb->rg_iov.addr;
}

-static inline u32
-rdmab_length(struct rpcrdma_regbuf *rb)
+static inline u32 rdmab_length(struct rpcrdma_regbuf *rb)
{
return rb->rg_iov.length;
}

-static inline u32
-rdmab_lkey(struct rpcrdma_regbuf *rb)
+static inline u32 rdmab_lkey(struct rpcrdma_regbuf *rb)
{
return rb->rg_iov.lkey;
}

-static inline struct ib_device *
-rdmab_device(struct rpcrdma_regbuf *rb)
+static inline struct ib_device *rdmab_device(struct rpcrdma_regbuf *rb)
{
return rb->rg_device;
}

+static inline void *rdmab_data(const struct rpcrdma_regbuf *rb)
+{
+ return rb->rg_data;
+}
+
#define RPCRDMA_DEF_GFP (GFP_NOIO | __GFP_NOWARN)

/* To ensure a transport can always make forward progress,


2019-04-10 20:07:12

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 07/19] xprtrdma: Allocate req's regbufs at xprt create time

Allocating an rpcrdma_req's regbufs at xprt create time enables
a pair of micro-optimizations:

First, if these regbufs are always there, we can eliminate two
conditional branches from the hot xprt_rdma_allocate path.

Second, by allocating a 1KB buffer, it places a lower bound on the
size of these buffers, without adding yet another conditional
branch. The lower bound reduces the number of hardway re-
allocations. In fact, for some workloads it completely eliminates
hardway allocations.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/backchannel.c | 18 ++++--------------
net/sunrpc/xprtrdma/transport.c | 4 ++--
net/sunrpc/xprtrdma/verbs.c | 34 ++++++++++++++++++++++++++--------
net/sunrpc/xprtrdma/xprt_rdma.h | 2 +-
4 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c
index 6170ec7..e1a125a 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -28,10 +28,10 @@ static int rpcrdma_bc_setup_reqs(struct rpcrdma_xprt *r_xprt,
unsigned int i;

for (i = 0; i < (count << 1); i++) {
- struct rpcrdma_regbuf *rb;
size_t size;

- req = rpcrdma_req_create(r_xprt, GFP_KERNEL);
+ size = min_t(size_t, r_xprt->rx_data.inline_rsize, PAGE_SIZE);
+ req = rpcrdma_req_create(r_xprt, size, GFP_KERNEL);
if (!req)
return -ENOMEM;
rqst = &req->rl_slot;
@@ -42,20 +42,10 @@ static int rpcrdma_bc_setup_reqs(struct rpcrdma_xprt *r_xprt,
spin_lock(&xprt->bc_pa_lock);
list_add(&rqst->rq_bc_pa_list, &xprt->bc_pa_list);
spin_unlock(&xprt->bc_pa_lock);
-
- size = r_xprt->rx_data.inline_rsize;
- rb = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, GFP_KERNEL);
- if (!rb)
- goto out_fail;
- req->rl_sendbuf = rb;
- xdr_buf_init(&rqst->rq_snd_buf, rdmab_data(rb),
- min_t(size_t, size, PAGE_SIZE));
+ xdr_buf_init(&rqst->rq_snd_buf, rdmab_data(req->rl_sendbuf),
+ size);
}
return 0;
-
-out_fail:
- rpcrdma_req_destroy(req);
- return -ENOMEM;
}

/**
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index e3b5b91..09a4693 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -591,7 +591,7 @@ void xprt_rdma_close(struct rpc_xprt *xprt)
{
struct rpcrdma_regbuf *rb;

- if (req->rl_sendbuf && rdmab_length(req->rl_sendbuf) >= size)
+ if (likely(rdmab_length(req->rl_sendbuf) >= size))
return true;

rb = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, flags);
@@ -621,7 +621,7 @@ void xprt_rdma_close(struct rpc_xprt *xprt)
{
struct rpcrdma_regbuf *rb;

- if (req->rl_recvbuf && rdmab_length(req->rl_recvbuf) >= size)
+ if (likely(rdmab_length(req->rl_recvbuf) >= size))
return true;

rb = rpcrdma_alloc_regbuf(size, DMA_NONE, flags);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index ca2d6d8..e4644fd 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -998,11 +998,13 @@ struct rpcrdma_sendctx *rpcrdma_sendctx_get_locked(struct rpcrdma_buffer *buf)
/**
* rpcrdma_req_create - Allocate an rpcrdma_req object
* @r_xprt: controlling r_xprt
+ * @size: initial size, in bytes, of send and receive buffers
* @flags: GFP flags passed to memory allocators
*
* Returns an allocated and fully initialized rpcrdma_req or NULL.
*/
-struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, gfp_t flags)
+struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, size_t size,
+ gfp_t flags)
{
struct rpcrdma_buffer *buffer = &r_xprt->rx_buf;
struct rpcrdma_regbuf *rb;
@@ -1010,22 +1012,37 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, gfp_t flags)

req = kzalloc(sizeof(*req), flags);
if (req == NULL)
- return NULL;
+ goto out1;

rb = rpcrdma_alloc_regbuf(RPCRDMA_HDRBUF_SIZE, DMA_TO_DEVICE, flags);
- if (!rb) {
- kfree(req);
- return NULL;
- }
+ if (!rb)
+ goto out2;
req->rl_rdmabuf = rb;
xdr_buf_init(&req->rl_hdrbuf, rdmab_data(rb), rdmab_length(rb));
+
+ req->rl_sendbuf = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, flags);
+ if (!req->rl_sendbuf)
+ goto out3;
+
+ req->rl_recvbuf = rpcrdma_alloc_regbuf(size, DMA_NONE, flags);
+ if (!req->rl_recvbuf)
+ goto out4;
+
req->rl_buffer = buffer;
INIT_LIST_HEAD(&req->rl_registered);
-
spin_lock(&buffer->rb_lock);
list_add(&req->rl_all, &buffer->rb_allreqs);
spin_unlock(&buffer->rb_lock);
return req;
+
+out4:
+ kfree(req->rl_sendbuf);
+out3:
+ kfree(req->rl_rdmabuf);
+out2:
+ kfree(req);
+out1:
+ return NULL;
}

static bool rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt, bool temp)
@@ -1090,7 +1107,8 @@ static bool rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt, bool temp)
for (i = 0; i < buf->rb_max_requests; i++) {
struct rpcrdma_req *req;

- req = rpcrdma_req_create(r_xprt, GFP_KERNEL);
+ req = rpcrdma_req_create(r_xprt, RPCRDMA_V1_DEF_INLINE_SIZE,
+ GFP_KERNEL);
if (!req)
goto out;
list_add(&req->rl_list, &buf->rb_send_bufs);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 73f9e54..202294a 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -529,7 +529,7 @@ int rpcrdma_ep_post(struct rpcrdma_ia *, struct rpcrdma_ep *,
/*
* Buffer calls - xprtrdma/verbs.c
*/
-struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt,
+struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, size_t size,
gfp_t flags);
void rpcrdma_req_destroy(struct rpcrdma_req *req);
int rpcrdma_buffer_create(struct rpcrdma_xprt *);


2019-04-10 20:07:17

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 08/19] xprtrdma: De-duplicate "allocate new, free old regbuf"

Clean up by providing an API to do this common task.

At this point, the difference between rpcrdma_get_sendbuf and
rpcrdma_get_recvbuf has become tiny. These can be collapsed into a
single helper.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/transport.c | 59 ++++++++-------------------------------
net/sunrpc/xprtrdma/verbs.c | 25 +++++++++++++++++
net/sunrpc/xprtrdma/xprt_rdma.h | 2 +
3 files changed, 39 insertions(+), 47 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 09a4693..47d2e00 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -585,52 +585,15 @@ void xprt_rdma_close(struct rpc_xprt *xprt)
rpc_wake_up_next(&xprt->backlog);
}

-static bool
-rpcrdma_get_sendbuf(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
- size_t size, gfp_t flags)
+static bool rpcrdma_check_regbuf(struct rpcrdma_xprt *r_xprt,
+ struct rpcrdma_regbuf *rb, size_t size,
+ gfp_t flags)
{
- struct rpcrdma_regbuf *rb;
-
- if (likely(rdmab_length(req->rl_sendbuf) >= size))
- return true;
-
- rb = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, flags);
- if (!rb)
- return false;
-
- rpcrdma_free_regbuf(req->rl_sendbuf);
- r_xprt->rx_stats.hardway_register_count += size;
- req->rl_sendbuf = rb;
- return true;
-}
-
-/* The rq_rcv_buf is used only if a Reply chunk is necessary.
- * The decision to use a Reply chunk is made later in
- * rpcrdma_marshal_req. This buffer is registered at that time.
- *
- * Otherwise, the associated RPC Reply arrives in a separate
- * Receive buffer, arbitrarily chosen by the HCA. The buffer
- * allocated here for the RPC Reply is not utilized in that
- * case. See rpcrdma_inline_fixup.
- *
- * A regbuf is used here to remember the buffer size.
- */
-static bool
-rpcrdma_get_recvbuf(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
- size_t size, gfp_t flags)
-{
- struct rpcrdma_regbuf *rb;
-
- if (likely(rdmab_length(req->rl_recvbuf) >= size))
- return true;
-
- rb = rpcrdma_alloc_regbuf(size, DMA_NONE, flags);
- if (!rb)
- return false;
-
- rpcrdma_free_regbuf(req->rl_recvbuf);
- r_xprt->rx_stats.hardway_register_count += size;
- req->rl_recvbuf = rb;
+ if (unlikely(rdmab_length(rb) < size)) {
+ if (!rpcrdma_regbuf_realloc(rb, size, flags))
+ return false;
+ r_xprt->rx_stats.hardway_register_count += size;
+ }
return true;
}

@@ -655,9 +618,11 @@ void xprt_rdma_close(struct rpc_xprt *xprt)
if (RPC_IS_SWAPPER(task))
flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN;

- if (!rpcrdma_get_sendbuf(r_xprt, req, rqst->rq_callsize, flags))
+ if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize,
+ flags))
goto out_fail;
- if (!rpcrdma_get_recvbuf(r_xprt, req, rqst->rq_rcvsize, flags))
+ if (!rpcrdma_check_regbuf(r_xprt, req->rl_recvbuf, rqst->rq_rcvsize,
+ flags))
goto out_fail;

rqst->rq_buffer = rdmab_data(req->rl_sendbuf);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index e4644fd..c691939 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1400,6 +1400,31 @@ struct rpcrdma_regbuf *
}

/**
+ * rpcrdma_regbuf_realloc - re-allocate a SEND/RECV buffer
+ * @rb: regbuf to reallocate
+ * @size: size of buffer to be allocated, in bytes
+ * @flags: GFP flags
+ *
+ * Returns true if reallocation was successful. If false is
+ * returned, @rb is left untouched.
+ */
+bool rpcrdma_regbuf_realloc(struct rpcrdma_regbuf *rb, size_t size, gfp_t flags)
+{
+ void *buf;
+
+ buf = kmalloc(size, flags);
+ if (!buf)
+ return false;
+
+ rpcrdma_dma_unmap_regbuf(rb);
+ kfree(rb->rg_data);
+
+ rb->rg_data = buf;
+ rb->rg_iov.length = size;
+ return true;
+}
+
+/**
* __rpcrdma_map_regbuf - DMA-map a regbuf
* @ia: controlling rpcrdma_ia
* @rb: regbuf to be mapped
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 202294a..a484ab3 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -552,6 +552,8 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, size_t size,

struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(size_t, enum dma_data_direction,
gfp_t);
+bool rpcrdma_regbuf_realloc(struct rpcrdma_regbuf *rb, size_t size,
+ gfp_t flags);
bool __rpcrdma_dma_map_regbuf(struct rpcrdma_ia *, struct rpcrdma_regbuf *);
void rpcrdma_free_regbuf(struct rpcrdma_regbuf *);



2019-04-10 20:07:23

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 09/19] xprtrdma: Clean up regbuf helpers

For code legibility, clean up the function names to be consistent
with the pattern: "rpcrdma" _ object-type _ action

Also rpcrdma_regbuf_alloc and rpcrdma_regbuf_free no longer have any
callers outside of verbs.c, and can thus be made static.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 47 ++++++++++-----------
net/sunrpc/xprtrdma/verbs.c | 88 +++++++++++++++++----------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 27 ++++++++----
3 files changed, 80 insertions(+), 82 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 31032c7..20b0df7 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -536,22 +536,21 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,

/* Prepare an SGE for the RPC-over-RDMA transport header.
*/
-static bool
-rpcrdma_prepare_hdr_sge(struct rpcrdma_ia *ia, struct rpcrdma_req *req,
- u32 len)
+static bool rpcrdma_prepare_hdr_sge(struct rpcrdma_xprt *r_xprt,
+ struct rpcrdma_req *req, u32 len)
{
struct rpcrdma_sendctx *sc = req->rl_sendctx;
struct rpcrdma_regbuf *rb = req->rl_rdmabuf;
struct ib_sge *sge = sc->sc_sges;

- if (!rpcrdma_dma_map_regbuf(ia, rb))
+ if (!rpcrdma_regbuf_dma_map(r_xprt, rb))
goto out_regbuf;
sge->addr = rdmab_addr(rb);
sge->length = len;
sge->lkey = rdmab_lkey(rb);

- ib_dma_sync_single_for_device(rdmab_device(rb), sge->addr,
- sge->length, DMA_TO_DEVICE);
+ ib_dma_sync_single_for_device(rdmab_device(rb), sge->addr, sge->length,
+ DMA_TO_DEVICE);
sc->sc_wr.num_sge++;
return true;

@@ -563,22 +562,21 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
/* Prepare the Send SGEs. The head and tail iovec, and each entry
* in the page list, gets its own SGE.
*/
-static bool
-rpcrdma_prepare_msg_sges(struct rpcrdma_ia *ia, struct rpcrdma_req *req,
- struct xdr_buf *xdr, enum rpcrdma_chunktype rtype)
+static bool rpcrdma_prepare_msg_sges(struct rpcrdma_xprt *r_xprt,
+ struct rpcrdma_req *req,
+ struct xdr_buf *xdr,
+ enum rpcrdma_chunktype rtype)
{
struct rpcrdma_sendctx *sc = req->rl_sendctx;
unsigned int sge_no, page_base, len, remaining;
struct rpcrdma_regbuf *rb = req->rl_sendbuf;
- struct ib_device *device = ia->ri_device;
struct ib_sge *sge = sc->sc_sges;
- u32 lkey = ia->ri_pd->local_dma_lkey;
struct page *page, **ppages;

/* The head iovec is straightforward, as it is already
* DMA-mapped. Sync the content that has changed.
*/
- if (!rpcrdma_dma_map_regbuf(ia, rb))
+ if (!rpcrdma_regbuf_dma_map(r_xprt, rb))
goto out_regbuf;
sge_no = 1;
sge[sge_no].addr = rdmab_addr(rb);
@@ -626,13 +624,14 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
goto out_mapping_overflow;

len = min_t(u32, PAGE_SIZE - page_base, remaining);
- sge[sge_no].addr = ib_dma_map_page(device, *ppages,
- page_base, len,
- DMA_TO_DEVICE);
- if (ib_dma_mapping_error(device, sge[sge_no].addr))
+ sge[sge_no].addr =
+ ib_dma_map_page(rdmab_device(rb), *ppages,
+ page_base, len, DMA_TO_DEVICE);
+ if (ib_dma_mapping_error(rdmab_device(rb),
+ sge[sge_no].addr))
goto out_mapping_err;
sge[sge_no].length = len;
- sge[sge_no].lkey = lkey;
+ sge[sge_no].lkey = rdmab_lkey(rb);

sc->sc_unmap_count++;
ppages++;
@@ -653,13 +652,13 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,

map_tail:
sge_no++;
- sge[sge_no].addr = ib_dma_map_page(device, page,
- page_base, len,
- DMA_TO_DEVICE);
- if (ib_dma_mapping_error(device, sge[sge_no].addr))
+ sge[sge_no].addr =
+ ib_dma_map_page(rdmab_device(rb), page, page_base, len,
+ DMA_TO_DEVICE);
+ if (ib_dma_mapping_error(rdmab_device(rb), sge[sge_no].addr))
goto out_mapping_err;
sge[sge_no].length = len;
- sge[sge_no].lkey = lkey;
+ sge[sge_no].lkey = rdmab_lkey(rb);
sc->sc_unmap_count++;
}

@@ -707,11 +706,11 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
req->rl_sendctx->sc_req = req;
__clear_bit(RPCRDMA_REQ_F_TX_RESOURCES, &req->rl_flags);

- if (!rpcrdma_prepare_hdr_sge(&r_xprt->rx_ia, req, hdrlen))
+ if (!rpcrdma_prepare_hdr_sge(r_xprt, req, hdrlen))
return -EIO;

if (rtype != rpcrdma_areadch)
- if (!rpcrdma_prepare_msg_sges(&r_xprt->rx_ia, req, xdr, rtype))
+ if (!rpcrdma_prepare_msg_sges(r_xprt, req, xdr, rtype))
return -EIO;

return 0;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index c691939..127c1f6 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -76,7 +76,11 @@
static void rpcrdma_sendctx_put_locked(struct rpcrdma_sendctx *sc);
static void rpcrdma_mrs_create(struct rpcrdma_xprt *r_xprt);
static void rpcrdma_mrs_destroy(struct rpcrdma_buffer *buf);
-static void rpcrdma_dma_unmap_regbuf(struct rpcrdma_regbuf *rb);
+static struct rpcrdma_regbuf *
+rpcrdma_regbuf_alloc(size_t size, enum dma_data_direction direction,
+ gfp_t flags);
+static void rpcrdma_regbuf_dma_unmap(struct rpcrdma_regbuf *rb);
+static void rpcrdma_regbuf_free(struct rpcrdma_regbuf *rb);
static void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, bool temp);

/* Wait for outstanding transport work to finish.
@@ -437,11 +441,11 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
* mappings and MRs are gone.
*/
list_for_each_entry(rep, &buf->rb_recv_bufs, rr_list)
- rpcrdma_dma_unmap_regbuf(rep->rr_rdmabuf);
+ rpcrdma_regbuf_dma_unmap(rep->rr_rdmabuf);
list_for_each_entry(req, &buf->rb_allreqs, rl_all) {
- rpcrdma_dma_unmap_regbuf(req->rl_rdmabuf);
- rpcrdma_dma_unmap_regbuf(req->rl_sendbuf);
- rpcrdma_dma_unmap_regbuf(req->rl_recvbuf);
+ rpcrdma_regbuf_dma_unmap(req->rl_rdmabuf);
+ rpcrdma_regbuf_dma_unmap(req->rl_sendbuf);
+ rpcrdma_regbuf_dma_unmap(req->rl_recvbuf);
}
rpcrdma_mrs_destroy(buf);
ib_dealloc_pd(ia->ri_pd);
@@ -1014,17 +1018,17 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, size_t size,
if (req == NULL)
goto out1;

- rb = rpcrdma_alloc_regbuf(RPCRDMA_HDRBUF_SIZE, DMA_TO_DEVICE, flags);
+ rb = rpcrdma_regbuf_alloc(RPCRDMA_HDRBUF_SIZE, DMA_TO_DEVICE, flags);
if (!rb)
goto out2;
req->rl_rdmabuf = rb;
xdr_buf_init(&req->rl_hdrbuf, rdmab_data(rb), rdmab_length(rb));

- req->rl_sendbuf = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, flags);
+ req->rl_sendbuf = rpcrdma_regbuf_alloc(size, DMA_TO_DEVICE, flags);
if (!req->rl_sendbuf)
goto out3;

- req->rl_recvbuf = rpcrdma_alloc_regbuf(size, DMA_NONE, flags);
+ req->rl_recvbuf = rpcrdma_regbuf_alloc(size, DMA_NONE, flags);
if (!req->rl_recvbuf)
goto out4;

@@ -1055,7 +1059,7 @@ static bool rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt, bool temp)
if (rep == NULL)
goto out;

- rep->rr_rdmabuf = rpcrdma_alloc_regbuf(cdata->inline_rsize,
+ rep->rr_rdmabuf = rpcrdma_regbuf_alloc(cdata->inline_rsize,
DMA_FROM_DEVICE, GFP_KERNEL);
if (!rep->rr_rdmabuf)
goto out_free;
@@ -1138,7 +1142,7 @@ static bool rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt, bool temp)

static void rpcrdma_rep_destroy(struct rpcrdma_rep *rep)
{
- rpcrdma_free_regbuf(rep->rr_rdmabuf);
+ rpcrdma_regbuf_free(rep->rr_rdmabuf);
kfree(rep);
}

@@ -1154,9 +1158,9 @@ static void rpcrdma_rep_destroy(struct rpcrdma_rep *rep)
{
list_del(&req->rl_all);

- rpcrdma_free_regbuf(req->rl_recvbuf);
- rpcrdma_free_regbuf(req->rl_sendbuf);
- rpcrdma_free_regbuf(req->rl_rdmabuf);
+ rpcrdma_regbuf_free(req->rl_recvbuf);
+ rpcrdma_regbuf_free(req->rl_sendbuf);
+ rpcrdma_regbuf_free(req->rl_rdmabuf);
kfree(req);
}

@@ -1366,20 +1370,14 @@ struct rpcrdma_req *
}
}

-/**
- * rpcrdma_alloc_regbuf - allocate and DMA-map memory for SEND/RECV buffers
- * @size: size of buffer to be allocated, in bytes
- * @direction: direction of data movement
- * @flags: GFP flags
- *
- * Returns a pointer to a rpcrdma_regbuf object, or NULL.
+/* Returns a pointer to a rpcrdma_regbuf object, or NULL.
*
* xprtrdma uses a regbuf for posting an outgoing RDMA SEND, or for
* receiving the payload of RDMA RECV operations. During Long Calls
* or Replies they may be registered externally via frwr_map.
*/
-struct rpcrdma_regbuf *
-rpcrdma_alloc_regbuf(size_t size, enum dma_data_direction direction,
+static struct rpcrdma_regbuf *
+rpcrdma_regbuf_alloc(size_t size, enum dma_data_direction direction,
gfp_t flags)
{
struct rpcrdma_regbuf *rb;
@@ -1416,7 +1414,7 @@ bool rpcrdma_regbuf_realloc(struct rpcrdma_regbuf *rb, size_t size, gfp_t flags)
if (!buf)
return false;

- rpcrdma_dma_unmap_regbuf(rb);
+ rpcrdma_regbuf_dma_unmap(rb);
kfree(rb->rg_data);

rb->rg_data = buf;
@@ -1425,34 +1423,33 @@ bool rpcrdma_regbuf_realloc(struct rpcrdma_regbuf *rb, size_t size, gfp_t flags)
}

/**
- * __rpcrdma_map_regbuf - DMA-map a regbuf
- * @ia: controlling rpcrdma_ia
+ * __rpcrdma_regbuf_dma_map - DMA-map a regbuf
+ * @r_xprt: controlling transport instance
* @rb: regbuf to be mapped
+ *
+ * Returns true if the buffer is now DMA mapped to @r_xprt's device
*/
-bool
-__rpcrdma_dma_map_regbuf(struct rpcrdma_ia *ia, struct rpcrdma_regbuf *rb)
+bool __rpcrdma_regbuf_dma_map(struct rpcrdma_xprt *r_xprt,
+ struct rpcrdma_regbuf *rb)
{
- struct ib_device *device = ia->ri_device;
+ struct ib_device *device = r_xprt->rx_ia.ri_device;

if (rb->rg_direction == DMA_NONE)
return false;

- rb->rg_iov.addr = ib_dma_map_single(device,
- rdmab_data(rb),
- rdmab_length(rb),
- rb->rg_direction);
+ rb->rg_iov.addr = ib_dma_map_single(device, rdmab_data(rb),
+ rdmab_length(rb), rb->rg_direction);
if (ib_dma_mapping_error(device, rdmab_addr(rb))) {
trace_xprtrdma_dma_maperr(rdmab_addr(rb));
return false;
}

rb->rg_device = device;
- rb->rg_iov.lkey = ia->ri_pd->local_dma_lkey;
+ rb->rg_iov.lkey = r_xprt->rx_ia.ri_pd->local_dma_lkey;
return true;
}

-static void
-rpcrdma_dma_unmap_regbuf(struct rpcrdma_regbuf *rb)
+static void rpcrdma_regbuf_dma_unmap(struct rpcrdma_regbuf *rb)
{
if (!rb)
return;
@@ -1460,19 +1457,14 @@ bool rpcrdma_regbuf_realloc(struct rpcrdma_regbuf *rb, size_t size, gfp_t flags)
if (!rpcrdma_regbuf_is_mapped(rb))
return;

- ib_dma_unmap_single(rb->rg_device, rdmab_addr(rb),
- rdmab_length(rb), rb->rg_direction);
+ ib_dma_unmap_single(rb->rg_device, rdmab_addr(rb), rdmab_length(rb),
+ rb->rg_direction);
rb->rg_device = NULL;
}

-/**
- * rpcrdma_free_regbuf - deregister and free registered buffer
- * @rb: regbuf to be deregistered and freed
- */
-void
-rpcrdma_free_regbuf(struct rpcrdma_regbuf *rb)
+static void rpcrdma_regbuf_free(struct rpcrdma_regbuf *rb)
{
- rpcrdma_dma_unmap_regbuf(rb);
+ rpcrdma_regbuf_dma_unmap(rb);
if (rb)
kfree(rb->rg_data);
kfree(rb);
@@ -1547,11 +1539,9 @@ bool rpcrdma_regbuf_realloc(struct rpcrdma_regbuf *rb, size_t size, gfp_t flags)
}

rb = rep->rr_rdmabuf;
- if (!rpcrdma_regbuf_is_mapped(rb)) {
- if (!__rpcrdma_dma_map_regbuf(&r_xprt->rx_ia, rb)) {
- rpcrdma_recv_buffer_put(rep);
- break;
- }
+ if (!rpcrdma_regbuf_dma_map(r_xprt, rb)) {
+ rpcrdma_recv_buffer_put(rep);
+ break;
}

trace_xprtrdma_post_recv(rep->rr_recv_wr.wr_cqe);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index a484ab3..9871cd9 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -550,25 +550,34 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, size_t size,
void rpcrdma_buffer_put(struct rpcrdma_req *);
void rpcrdma_recv_buffer_put(struct rpcrdma_rep *);

-struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(size_t, enum dma_data_direction,
- gfp_t);
bool rpcrdma_regbuf_realloc(struct rpcrdma_regbuf *rb, size_t size,
gfp_t flags);
-bool __rpcrdma_dma_map_regbuf(struct rpcrdma_ia *, struct rpcrdma_regbuf *);
-void rpcrdma_free_regbuf(struct rpcrdma_regbuf *);
+bool __rpcrdma_regbuf_dma_map(struct rpcrdma_xprt *r_xprt,
+ struct rpcrdma_regbuf *rb);

-static inline bool
-rpcrdma_regbuf_is_mapped(struct rpcrdma_regbuf *rb)
+/**
+ * rpcrdma_regbuf_is_mapped - check if buffer is DMA mapped
+ *
+ * Returns true if the buffer is now mapped to rb->rg_device.
+ */
+static inline bool rpcrdma_regbuf_is_mapped(struct rpcrdma_regbuf *rb)
{
return rb->rg_device != NULL;
}

-static inline bool
-rpcrdma_dma_map_regbuf(struct rpcrdma_ia *ia, struct rpcrdma_regbuf *rb)
+/**
+ * rpcrdma_regbuf_dma_map - DMA-map a regbuf
+ * @r_xprt: controlling transport instance
+ * @rb: regbuf to be mapped
+ *
+ * Returns true if the buffer is currently DMA mapped.
+ */
+static inline bool rpcrdma_regbuf_dma_map(struct rpcrdma_xprt *r_xprt,
+ struct rpcrdma_regbuf *rb)
{
if (likely(rpcrdma_regbuf_is_mapped(rb)))
return true;
- return __rpcrdma_dma_map_regbuf(ia, rb);
+ return __rpcrdma_regbuf_dma_map(r_xprt, rb);
}

/*


2019-04-10 20:07:29

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 10/19] xprtrdma: Backchannel can use GFP_KERNEL allocations

The Receive handler runs in process context, thus can use on-demand
GFP_KERNEL allocations instead of pre-allocation.

This makes the xprtrdma backchannel independent of the number of
backchannel session slots provisioned by the Upper Layer protocol.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/backchannel.c | 104 ++++++++++++++-----------------------
1 file changed, 40 insertions(+), 64 deletions(-)

diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c
index e1a125a..ae51ef6 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -19,35 +19,6 @@

#undef RPCRDMA_BACKCHANNEL_DEBUG

-static int rpcrdma_bc_setup_reqs(struct rpcrdma_xprt *r_xprt,
- unsigned int count)
-{
- struct rpc_xprt *xprt = &r_xprt->rx_xprt;
- struct rpcrdma_req *req;
- struct rpc_rqst *rqst;
- unsigned int i;
-
- for (i = 0; i < (count << 1); i++) {
- size_t size;
-
- size = min_t(size_t, r_xprt->rx_data.inline_rsize, PAGE_SIZE);
- req = rpcrdma_req_create(r_xprt, size, GFP_KERNEL);
- if (!req)
- return -ENOMEM;
- rqst = &req->rl_slot;
-
- rqst->rq_xprt = xprt;
- INIT_LIST_HEAD(&rqst->rq_bc_list);
- __set_bit(RPC_BC_PA_IN_USE, &rqst->rq_bc_pa_state);
- spin_lock(&xprt->bc_pa_lock);
- list_add(&rqst->rq_bc_pa_list, &xprt->bc_pa_list);
- spin_unlock(&xprt->bc_pa_lock);
- xdr_buf_init(&rqst->rq_snd_buf, rdmab_data(req->rl_sendbuf),
- size);
- }
- return 0;
-}
-
/**
* xprt_rdma_bc_setup - Pre-allocate resources for handling backchannel requests
* @xprt: transport associated with these backchannel resources
@@ -58,34 +29,10 @@ static int rpcrdma_bc_setup_reqs(struct rpcrdma_xprt *r_xprt,
int xprt_rdma_bc_setup(struct rpc_xprt *xprt, unsigned int reqs)
{
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
- int rc;
-
- /* The backchannel reply path returns each rpc_rqst to the
- * bc_pa_list _after_ the reply is sent. If the server is
- * faster than the client, it can send another backward
- * direction request before the rpc_rqst is returned to the
- * list. The client rejects the request in this case.
- *
- * Twice as many rpc_rqsts are prepared to ensure there is
- * always an rpc_rqst available as soon as a reply is sent.
- */
- if (reqs > RPCRDMA_BACKWARD_WRS >> 1)
- goto out_err;
-
- rc = rpcrdma_bc_setup_reqs(r_xprt, reqs);
- if (rc)
- goto out_free;

- r_xprt->rx_buf.rb_bc_srv_max_requests = reqs;
+ r_xprt->rx_buf.rb_bc_srv_max_requests = RPCRDMA_BACKWARD_WRS >> 1;
trace_xprtrdma_cb_setup(r_xprt, reqs);
return 0;
-
-out_free:
- xprt_rdma_bc_destroy(xprt, reqs);
-
-out_err:
- pr_err("RPC: %s: setup backchannel transport failed\n", __func__);
- return -ENOMEM;
}

/**
@@ -213,6 +160,43 @@ void xprt_rdma_bc_free_rqst(struct rpc_rqst *rqst)
spin_unlock(&xprt->bc_pa_lock);
}

+static struct rpc_rqst *rpcrdma_bc_rqst_get(struct rpcrdma_xprt *r_xprt)
+{
+ struct rpc_xprt *xprt = &r_xprt->rx_xprt;
+ struct rpcrdma_req *req;
+ struct rpc_rqst *rqst;
+ size_t size;
+
+ spin_lock(&xprt->bc_pa_lock);
+ rqst = list_first_entry_or_null(&xprt->bc_pa_list, struct rpc_rqst,
+ rq_bc_pa_list);
+ if (!rqst)
+ goto create_req;
+ list_del(&rqst->rq_bc_pa_list);
+ spin_unlock(&xprt->bc_pa_lock);
+ return rqst;
+
+create_req:
+ spin_unlock(&xprt->bc_pa_lock);
+
+ /* Set a limit to prevent a remote from overrunning our resources.
+ */
+ if (xprt->bc_alloc_count >= RPCRDMA_BACKWARD_WRS)
+ return NULL;
+
+ size = min_t(size_t, r_xprt->rx_data.inline_rsize, PAGE_SIZE);
+ req = rpcrdma_req_create(r_xprt, size, GFP_KERNEL);
+ if (!req)
+ return NULL;
+
+ xprt->bc_alloc_count++;
+ rqst = &req->rl_slot;
+ rqst->rq_xprt = xprt;
+ __set_bit(RPC_BC_PA_IN_USE, &rqst->rq_bc_pa_state);
+ xdr_buf_init(&rqst->rq_snd_buf, rdmab_data(req->rl_sendbuf), size);
+ return rqst;
+}
+
/**
* rpcrdma_bc_receive_call - Handle a backward direction call
* @r_xprt: transport receiving the call
@@ -244,18 +228,10 @@ void rpcrdma_bc_receive_call(struct rpcrdma_xprt *r_xprt,
pr_info("RPC: %s: %*ph\n", __func__, size, p);
#endif

- /* Grab a free bc rqst */
- spin_lock(&xprt->bc_pa_lock);
- if (list_empty(&xprt->bc_pa_list)) {
- spin_unlock(&xprt->bc_pa_lock);
+ rqst = rpcrdma_bc_rqst_get(r_xprt);
+ if (!rqst)
goto out_overflow;
- }
- rqst = list_first_entry(&xprt->bc_pa_list,
- struct rpc_rqst, rq_bc_pa_list);
- list_del(&rqst->rq_bc_pa_list);
- spin_unlock(&xprt->bc_pa_lock);

- /* Prepare rqst */
rqst->rq_reply_bytes_recvd = 0;
rqst->rq_xid = *p;



2019-04-10 20:07:33

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 11/19] xprtrdma: Increase maximum number of backchannel request

Reflects the change introduced in commit 067c46967160 ("NFSv4.1:
Bump the default callback session slot count to 16").

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/xprt_rdma.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 9871cd9..f1bbe36 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -103,12 +103,14 @@ struct rpcrdma_ep {

/* Pre-allocate extra Work Requests for handling backward receives
* and sends. This is a fixed value because the Work Queues are
- * allocated when the forward channel is set up.
+ * allocated when the forward channel is set up, long before the
+ * backchannel is provisioned. This value is two times
+ * NFS4_DEF_CB_SLOT_TABLE_SIZE.
*/
#if defined(CONFIG_SUNRPC_BACKCHANNEL)
-#define RPCRDMA_BACKWARD_WRS (8)
+#define RPCRDMA_BACKWARD_WRS (32)
#else
-#define RPCRDMA_BACKWARD_WRS (0)
+#define RPCRDMA_BACKWARD_WRS (0)
#endif

/* Registered buffer -- registered kmalloc'd memory for RDMA SEND/RECV


2019-04-10 20:07:38

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 12/19] xprtrdma: Trace marshaling failures

Record an event when rpcrdma_marshal_req returns a non-zero return
value to help track down why an xprt close might have occurred.

Signed-off-by: Chuck Lever <[email protected]>
---
include/trace/events/rpcrdma.h | 27 +++++++++++++++++++++++++++
net/sunrpc/xprtrdma/rpc_rdma.c | 1 +
2 files changed, 28 insertions(+)

diff --git a/include/trace/events/rpcrdma.h b/include/trace/events/rpcrdma.h
index 962975b..df9851cb 100644
--- a/include/trace/events/rpcrdma.h
+++ b/include/trace/events/rpcrdma.h
@@ -511,6 +511,33 @@
)
);

+TRACE_EVENT(xprtrdma_marshal_failed,
+ TP_PROTO(const struct rpc_rqst *rqst,
+ int ret
+ ),
+
+ TP_ARGS(rqst, ret),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, task_id)
+ __field(unsigned int, client_id)
+ __field(u32, xid)
+ __field(int, ret)
+ ),
+
+ TP_fast_assign(
+ __entry->task_id = rqst->rq_task->tk_pid;
+ __entry->client_id = rqst->rq_task->tk_client->cl_clid;
+ __entry->xid = be32_to_cpu(rqst->rq_xid);
+ __entry->ret = ret;
+ ),
+
+ TP_printk("task:%u@%u xid=0x%08x: ret=%d",
+ __entry->task_id, __entry->client_id, __entry->xid,
+ __entry->ret
+ )
+);
+
TRACE_EVENT(xprtrdma_post_send,
TP_PROTO(
const struct rpcrdma_req *req,
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 20b0df7..a3eb299 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -875,6 +875,7 @@ static bool rpcrdma_prepare_msg_sges(struct rpcrdma_xprt *r_xprt,
return 0;

out_err:
+ trace_xprtrdma_marshal_failed(rqst, ret);
switch (ret) {
case -EAGAIN:
xprt_wait_for_buffer_space(rqst->rq_xprt);


2019-04-10 20:07:44

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 13/19] xprtrdma: Clean up sendctx functions

Minor clean-ups I've stumbled on since sendctx was merged last year.
In particular, making Send completion processing more efficient
appears to have a measurable impact on IOPS throughput.

Note: test_and_clear_bit() returns a value, thus an explicit memory
barrier is not necessary.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 27 ++++++++++++---------------
net/sunrpc/xprtrdma/verbs.c | 17 ++++++++---------
net/sunrpc/xprtrdma/xprt_rdma.h | 5 +++--
3 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index a3eb299..fcd29a4 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -508,30 +508,26 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
}

/**
- * rpcrdma_unmap_sendctx - DMA-unmap Send buffers
+ * rpcrdma_sendctx_unmap - DMA-unmap Send buffer
* @sc: sendctx containing SGEs to unmap
*
*/
-void
-rpcrdma_unmap_sendctx(struct rpcrdma_sendctx *sc)
+void rpcrdma_sendctx_unmap(struct rpcrdma_sendctx *sc)
{
- struct rpcrdma_ia *ia = &sc->sc_xprt->rx_ia;
struct ib_sge *sge;
- unsigned int count;

/* The first two SGEs contain the transport header and
* the inline buffer. These are always left mapped so
* they can be cheaply re-used.
*/
- sge = &sc->sc_sges[2];
- for (count = sc->sc_unmap_count; count; ++sge, --count)
- ib_dma_unmap_page(ia->ri_device,
- sge->addr, sge->length, DMA_TO_DEVICE);
+ for (sge = &sc->sc_sges[2]; sc->sc_unmap_count;
+ ++sge, --sc->sc_unmap_count)
+ ib_dma_unmap_page(sc->sc_device, sge->addr, sge->length,
+ DMA_TO_DEVICE);

- if (test_and_clear_bit(RPCRDMA_REQ_F_TX_RESOURCES, &sc->sc_req->rl_flags)) {
- smp_mb__after_atomic();
+ if (test_and_clear_bit(RPCRDMA_REQ_F_TX_RESOURCES,
+ &sc->sc_req->rl_flags))
wake_up_bit(&sc->sc_req->rl_flags, RPCRDMA_REQ_F_TX_RESOURCES);
- }
}

/* Prepare an SGE for the RPC-over-RDMA transport header.
@@ -578,6 +574,7 @@ static bool rpcrdma_prepare_msg_sges(struct rpcrdma_xprt *r_xprt,
*/
if (!rpcrdma_regbuf_dma_map(r_xprt, rb))
goto out_regbuf;
+ sc->sc_device = rdmab_device(rb);
sge_no = 1;
sge[sge_no].addr = rdmab_addr(rb);
sge[sge_no].length = xdr->head[0].iov_len;
@@ -673,12 +670,12 @@ static bool rpcrdma_prepare_msg_sges(struct rpcrdma_xprt *r_xprt,
return false;

out_mapping_overflow:
- rpcrdma_unmap_sendctx(sc);
+ rpcrdma_sendctx_unmap(sc);
pr_err("rpcrdma: too many Send SGEs (%u)\n", sge_no);
return false;

out_mapping_err:
- rpcrdma_unmap_sendctx(sc);
+ rpcrdma_sendctx_unmap(sc);
trace_xprtrdma_dma_maperr(sge[sge_no].addr);
return false;
}
@@ -698,7 +695,7 @@ static bool rpcrdma_prepare_msg_sges(struct rpcrdma_xprt *r_xprt,
struct rpcrdma_req *req, u32 hdrlen,
struct xdr_buf *xdr, enum rpcrdma_chunktype rtype)
{
- req->rl_sendctx = rpcrdma_sendctx_get_locked(&r_xprt->rx_buf);
+ req->rl_sendctx = rpcrdma_sendctx_get_locked(r_xprt);
if (!req->rl_sendctx)
return -EAGAIN;
req->rl_sendctx->sc_wr.num_sge = 0;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 127c1f6..378b460 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -870,20 +870,20 @@ static unsigned long rpcrdma_sendctx_next(struct rpcrdma_buffer *buf,

/**
* rpcrdma_sendctx_get_locked - Acquire a send context
- * @buf: transport buffers from which to acquire an unused context
+ * @r_xprt: controlling transport instance
*
* Returns pointer to a free send completion context; or NULL if
* the queue is empty.
*
* Usage: Called to acquire an SGE array before preparing a Send WR.
*
- * The caller serializes calls to this function (per rpcrdma_buffer),
- * and provides an effective memory barrier that flushes the new value
+ * The caller serializes calls to this function (per transport), and
+ * provides an effective memory barrier that flushes the new value
* of rb_sc_head.
*/
-struct rpcrdma_sendctx *rpcrdma_sendctx_get_locked(struct rpcrdma_buffer *buf)
+struct rpcrdma_sendctx *rpcrdma_sendctx_get_locked(struct rpcrdma_xprt *r_xprt)
{
- struct rpcrdma_xprt *r_xprt;
+ struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
struct rpcrdma_sendctx *sc;
unsigned long next_head;

@@ -908,7 +908,6 @@ struct rpcrdma_sendctx *rpcrdma_sendctx_get_locked(struct rpcrdma_buffer *buf)
* backing up. Cause the caller to pause and try again.
*/
set_bit(RPCRDMA_BUF_F_EMPTY_SCQ, &buf->rb_flags);
- r_xprt = container_of(buf, struct rpcrdma_xprt, rx_buf);
r_xprt->rx_stats.empty_sendctx_q++;
return NULL;
}
@@ -920,7 +919,7 @@ struct rpcrdma_sendctx *rpcrdma_sendctx_get_locked(struct rpcrdma_buffer *buf)
* Usage: Called from Send completion to return a sendctxt
* to the queue.
*
- * The caller serializes calls to this function (per rpcrdma_buffer).
+ * The caller serializes calls to this function (per transport).
*/
static void
rpcrdma_sendctx_put_locked(struct rpcrdma_sendctx *sc)
@@ -928,7 +927,7 @@ struct rpcrdma_sendctx *rpcrdma_sendctx_get_locked(struct rpcrdma_buffer *buf)
struct rpcrdma_buffer *buf = &sc->sc_xprt->rx_buf;
unsigned long next_tail;

- /* Unmap SGEs of previously completed by unsignaled
+ /* Unmap SGEs of previously completed but unsignaled
* Sends by walking up the queue until @sc is found.
*/
next_tail = buf->rb_sc_tail;
@@ -936,7 +935,7 @@ struct rpcrdma_sendctx *rpcrdma_sendctx_get_locked(struct rpcrdma_buffer *buf)
next_tail = rpcrdma_sendctx_next(buf, next_tail);

/* ORDER: item must be accessed _before_ tail is updated */
- rpcrdma_unmap_sendctx(buf->rb_sc_ctxs[next_tail]);
+ rpcrdma_sendctx_unmap(buf->rb_sc_ctxs[next_tail]);

} while (buf->rb_sc_ctxs[next_tail] != sc);

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index f1bbe36..7d82c91 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -225,6 +225,7 @@ enum {
struct rpcrdma_sendctx {
struct ib_send_wr sc_wr;
struct ib_cqe sc_cqe;
+ struct ib_device *sc_device;
struct rpcrdma_xprt *sc_xprt;
struct rpcrdma_req *sc_req;
unsigned int sc_unmap_count;
@@ -536,7 +537,7 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, size_t size,
void rpcrdma_req_destroy(struct rpcrdma_req *req);
int rpcrdma_buffer_create(struct rpcrdma_xprt *);
void rpcrdma_buffer_destroy(struct rpcrdma_buffer *);
-struct rpcrdma_sendctx *rpcrdma_sendctx_get_locked(struct rpcrdma_buffer *buf);
+struct rpcrdma_sendctx *rpcrdma_sendctx_get_locked(struct rpcrdma_xprt *r_xprt);

struct rpcrdma_mr *rpcrdma_mr_get(struct rpcrdma_xprt *r_xprt);
void rpcrdma_mr_put(struct rpcrdma_mr *mr);
@@ -625,7 +626,7 @@ int rpcrdma_prepare_send_sges(struct rpcrdma_xprt *r_xprt,
struct rpcrdma_req *req, u32 hdrlen,
struct xdr_buf *xdr,
enum rpcrdma_chunktype rtype);
-void rpcrdma_unmap_sendctx(struct rpcrdma_sendctx *sc);
+void rpcrdma_sendctx_unmap(struct rpcrdma_sendctx *sc);
int rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst);
void rpcrdma_set_max_header_sizes(struct rpcrdma_xprt *);
void rpcrdma_reply_handler(struct rpcrdma_rep *rep);


2019-04-10 20:07:49

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 14/19] xprtrdma: More Send completion batching

Instead of using a fixed number, allow the amount of Send completion
batching to vary based on the client's maximum credit limit.

- A larger default gives a small boost to IOPS throughput

- Reducing it based on max_requests gives a safe result when the
max credit limit is cranked down (eg. when the device has a small
max_qp_wr).

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/verbs.c | 4 +---
net/sunrpc/xprtrdma/xprt_rdma.h | 10 ----------
2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 378b460..f65d17a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -521,9 +521,7 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
ep->rep_attr.cap.max_send_sge,
ep->rep_attr.cap.max_recv_sge);

- /* set trigger for requesting send completion */
- ep->rep_send_batch = min_t(unsigned int, RPCRDMA_MAX_SEND_BATCH,
- cdata->max_requests >> 2);
+ ep->rep_send_batch = cdata->max_requests >> 3;
ep->rep_send_count = ep->rep_send_batch;
init_waitqueue_head(&ep->rep_connect_wait);
ep->rep_receive_count = 0;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 7d82c91..5063fa7 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -232,16 +232,6 @@ struct rpcrdma_sendctx {
struct ib_sge sc_sges[];
};

-/* Limit the number of SGEs that can be unmapped during one
- * Send completion. This caps the amount of work a single
- * completion can do before returning to the provider.
- *
- * Setting this to zero disables Send completion batching.
- */
-enum {
- RPCRDMA_MAX_SEND_BATCH = 7,
-};
-
/*
* struct rpcrdma_mr - external memory region metadata
*


2019-04-10 20:07:57

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 15/19] xprtrdma: Eliminate rpcrdma_ia::ri_device

Clean up.

Since commit 54cbd6b0c6b9 ("xprtrdma: Delay DMA mapping Send and
Receive buffers"), a pointer to the device is now saved in each
regbuf when it is DMA mapped.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/frwr_ops.c | 17 +++++++++--------
net/sunrpc/xprtrdma/verbs.c | 29 +++++++++++++----------------
net/sunrpc/xprtrdma/xprt_rdma.h | 7 +++----
3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index a2a2e01..7cd2718 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -82,13 +82,13 @@

/**
* frwr_is_supported - Check if device supports FRWR
- * @ia: interface adapter to check
+ * @device: interface adapter to check
*
* Returns true if device supports FRWR, otherwise false
*/
-bool frwr_is_supported(struct rpcrdma_ia *ia)
+bool frwr_is_supported(struct ib_device *device)
{
- struct ib_device_attr *attrs = &ia->ri_device->attrs;
+ struct ib_device_attr *attrs = &device->attrs;

if (!(attrs->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS))
goto out_not_supported;
@@ -98,7 +98,7 @@ bool frwr_is_supported(struct rpcrdma_ia *ia)

out_not_supported:
pr_info("rpcrdma: 'frwr' mode is not supported by device %s\n",
- ia->ri_device->name);
+ device->name);
return false;
}

@@ -131,7 +131,7 @@ void frwr_release_mr(struct rpcrdma_mr *mr)

if (mr->mr_dir != DMA_NONE) {
trace_xprtrdma_mr_unmap(mr);
- ib_dma_unmap_sg(r_xprt->rx_ia.ri_device,
+ ib_dma_unmap_sg(r_xprt->rx_ia.ri_id->device,
mr->mr_sg, mr->mr_nents, mr->mr_dir);
mr->mr_dir = DMA_NONE;
}
@@ -211,7 +211,7 @@ int frwr_init_mr(struct rpcrdma_ia *ia, struct rpcrdma_mr *mr)
int frwr_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
struct rpcrdma_create_data_internal *cdata)
{
- struct ib_device_attr *attrs = &ia->ri_device->attrs;
+ struct ib_device_attr *attrs = &ia->ri_id->device->attrs;
int max_qp_wr, depth, delta;

ia->ri_mrtype = IB_MR_TYPE_MEM_REG;
@@ -253,7 +253,7 @@ int frwr_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
} while (delta > 0);
}

- max_qp_wr = ia->ri_device->attrs.max_qp_wr;
+ max_qp_wr = ia->ri_id->device->attrs.max_qp_wr;
max_qp_wr -= RPCRDMA_BACKWARD_WRS;
max_qp_wr -= 1;
if (max_qp_wr < RPCRDMA_MIN_SLOT_TABLE)
@@ -436,7 +436,8 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt,
}
mr->mr_dir = rpcrdma_data_dir(writing);

- mr->mr_nents = ib_dma_map_sg(ia->ri_device, mr->mr_sg, i, mr->mr_dir);
+ mr->mr_nents =
+ ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
if (!mr->mr_nents)
goto out_dmamap_err;

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index f65d17a..9ed5fd8 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -250,7 +250,7 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
case RDMA_CM_EVENT_DEVICE_REMOVAL:
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
pr_info("rpcrdma: removing device %s for %s:%s\n",
- ia->ri_device->name,
+ ia->ri_id->device->name,
rpcrdma_addrstr(r_xprt), rpcrdma_portstr(r_xprt));
#endif
set_bit(RPCRDMA_IAF_REMOVING, &ia->ri_flags);
@@ -259,7 +259,6 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
wait_for_completion(&ia->ri_remove_done);

ia->ri_id = NULL;
- ia->ri_device = NULL;
/* Return 1 to ensure the core destroys the id. */
return 1;
case RDMA_CM_EVENT_ESTABLISHED:
@@ -294,7 +293,7 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)

dprintk("RPC: %s: %s:%s on %s/frwr: %s\n", __func__,
rpcrdma_addrstr(r_xprt), rpcrdma_portstr(r_xprt),
- ia->ri_device->name, rdma_event_msg(event->event));
+ ia->ri_id->device->name, rdma_event_msg(event->event));
return 0;
}

@@ -373,9 +372,8 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
rc = PTR_ERR(ia->ri_id);
goto out_err;
}
- ia->ri_device = ia->ri_id->device;

- ia->ri_pd = ib_alloc_pd(ia->ri_device, 0);
+ ia->ri_pd = ib_alloc_pd(ia->ri_id->device, 0);
if (IS_ERR(ia->ri_pd)) {
rc = PTR_ERR(ia->ri_pd);
pr_err("rpcrdma: ib_alloc_pd() returned %d\n", rc);
@@ -384,12 +382,12 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)

switch (xprt_rdma_memreg_strategy) {
case RPCRDMA_FRWR:
- if (frwr_is_supported(ia))
+ if (frwr_is_supported(ia->ri_id->device))
break;
/*FALLTHROUGH*/
default:
pr_err("rpcrdma: Device %s does not support memreg mode %d\n",
- ia->ri_device->name, xprt_rdma_memreg_strategy);
+ ia->ri_id->device->name, xprt_rdma_memreg_strategy);
rc = -EINVAL;
goto out_err;
}
@@ -471,7 +469,6 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
rdma_destroy_id(ia->ri_id);
}
ia->ri_id = NULL;
- ia->ri_device = NULL;

/* If the pd is still busy, xprtrdma missed freeing a resource */
if (ia->ri_pd && !IS_ERR(ia->ri_pd))
@@ -491,7 +488,7 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
unsigned int max_sge;
int rc;

- max_sge = min_t(unsigned int, ia->ri_device->attrs.max_send_sge,
+ max_sge = min_t(unsigned int, ia->ri_id->device->attrs.max_send_sge,
RPCRDMA_MAX_SEND_SGES);
if (max_sge < RPCRDMA_MIN_SEND_SGES) {
pr_warn("rpcrdma: HCA provides only %d send SGEs\n", max_sge);
@@ -526,16 +523,16 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
init_waitqueue_head(&ep->rep_connect_wait);
ep->rep_receive_count = 0;

- sendcq = ib_alloc_cq(ia->ri_device, NULL,
+ sendcq = ib_alloc_cq(ia->ri_id->device, NULL,
ep->rep_attr.cap.max_send_wr + 1,
- ia->ri_device->num_comp_vectors > 1 ? 1 : 0,
+ ia->ri_id->device->num_comp_vectors > 1 ? 1 : 0,
IB_POLL_WORKQUEUE);
if (IS_ERR(sendcq)) {
rc = PTR_ERR(sendcq);
goto out1;
}

- recvcq = ib_alloc_cq(ia->ri_device, NULL,
+ recvcq = ib_alloc_cq(ia->ri_id->device, NULL,
ep->rep_attr.cap.max_recv_wr + 1,
0, IB_POLL_WORKQUEUE);
if (IS_ERR(recvcq)) {
@@ -561,7 +558,7 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
/* Client offers RDMA Read but does not initiate */
ep->rep_remote_cma.initiator_depth = 0;
ep->rep_remote_cma.responder_resources =
- min_t(int, U8_MAX, ia->ri_device->attrs.max_qp_rd_atom);
+ min_t(int, U8_MAX, ia->ri_id->device->attrs.max_qp_rd_atom);

/* Limit transport retries so client can detect server
* GID changes quickly. RPC layer handles re-establishing
@@ -673,7 +670,7 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
*/
old = id;
rc = -ENETUNREACH;
- if (ia->ri_device != id->device) {
+ if (ia->ri_id->device != id->device) {
pr_err("rpcrdma: can't reconnect on different device!\n");
goto out_destroy;
}
@@ -1296,7 +1293,7 @@ struct rpcrdma_mr *

if (mr->mr_dir != DMA_NONE) {
trace_xprtrdma_mr_unmap(mr);
- ib_dma_unmap_sg(r_xprt->rx_ia.ri_device,
+ ib_dma_unmap_sg(r_xprt->rx_ia.ri_id->device,
mr->mr_sg, mr->mr_nents, mr->mr_dir);
mr->mr_dir = DMA_NONE;
}
@@ -1429,7 +1426,7 @@ bool rpcrdma_regbuf_realloc(struct rpcrdma_regbuf *rb, size_t size, gfp_t flags)
bool __rpcrdma_regbuf_dma_map(struct rpcrdma_xprt *r_xprt,
struct rpcrdma_regbuf *rb)
{
- struct ib_device *device = r_xprt->rx_ia.ri_device;
+ struct ib_device *device = r_xprt->rx_ia.ri_id->device;

if (rb->rg_direction == DMA_NONE)
return false;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 5063fa7..99f7e54 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -66,11 +66,8 @@
* Interface Adapter -- one per transport instance
*/
struct rpcrdma_ia {
- struct ib_device *ri_device;
struct rdma_cm_id *ri_id;
struct ib_pd *ri_pd;
- struct completion ri_done;
- struct completion ri_remove_done;
int ri_async_rc;
unsigned int ri_max_segs;
unsigned int ri_max_frwr_depth;
@@ -80,6 +77,8 @@ struct rpcrdma_ia {
bool ri_implicit_roundup;
enum ib_mr_type ri_mrtype;
unsigned long ri_flags;
+ struct completion ri_done;
+ struct completion ri_remove_done;
};

enum {
@@ -585,7 +584,7 @@ static inline bool rpcrdma_regbuf_dma_map(struct rpcrdma_xprt *r_xprt,

/* Memory registration calls xprtrdma/frwr_ops.c
*/
-bool frwr_is_supported(struct rpcrdma_ia *);
+bool frwr_is_supported(struct ib_device *device);
int frwr_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
struct rpcrdma_create_data_internal *cdata);
int frwr_init_mr(struct rpcrdma_ia *ia, struct rpcrdma_mr *mr);


2019-04-10 20:08:00

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 16/19] SUNRPC: Update comments based on recent changes

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index d7117d2..73db033 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -949,7 +949,7 @@ struct rpc_rqst *xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid)
* @req: Request to pin
*
* Caller must ensure this is atomic with the call to xprt_lookup_rqst()
- * so should be holding the xprt receive lock.
+ * so should be holding xprt->queue_lock.
*/
void xprt_pin_rqst(struct rpc_rqst *req)
{
@@ -961,7 +961,7 @@ void xprt_pin_rqst(struct rpc_rqst *req)
* xprt_unpin_rqst - Unpin a request on the transport receive list
* @req: Request to pin
*
- * Caller should be holding the xprt receive lock.
+ * Caller should be holding xprt->queue_lock.
*/
void xprt_unpin_rqst(struct rpc_rqst *req)
{


2019-04-10 20:08:05

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 17/19] xprtrdma: Remove rpcrdma_create_data_internal::rsize and wsize

Clean up.

xprt_rdma_max_inline_{read,write} cannot be set to large values
by virtue of proc_dointvec_minmax. The current maximum is
RPCRDMA_MAX_INLINE, which is much smaller than RPCRDMA_MAX_SEGS *
PAGE_SIZE.

The .rsize and .wsize fields are otherwise unused in the current
code base, and thus can be removed.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/transport.c | 9 ---------
net/sunrpc/xprtrdma/xprt_rdma.h | 2 --
2 files changed, 11 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 47d2e00..f362642 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -350,17 +350,8 @@
xprt_rdma_format_addresses(xprt, sap);

cdata.max_requests = xprt_rdma_slot_table_entries;
-
- cdata.rsize = RPCRDMA_MAX_SEGS * PAGE_SIZE; /* RDMA write max */
- cdata.wsize = RPCRDMA_MAX_SEGS * PAGE_SIZE; /* RDMA read max */
-
cdata.inline_wsize = xprt_rdma_max_inline_write;
- if (cdata.inline_wsize > cdata.wsize)
- cdata.inline_wsize = cdata.wsize;
-
cdata.inline_rsize = xprt_rdma_max_inline_read;
- if (cdata.inline_rsize > cdata.rsize)
- cdata.inline_rsize = cdata.rsize;

/*
* Create new transport instance, which includes initialized
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 99f7e54..c1e265a 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -419,8 +419,6 @@ enum {
*/
struct rpcrdma_create_data_internal {
unsigned int max_requests; /* max requests (slots) in flight */
- unsigned int rsize; /* mount rsize - max read hdr+data */
- unsigned int wsize; /* mount wsize - max write hdr+data */
unsigned int inline_rsize; /* max non-rdma read data payload */
unsigned int inline_wsize; /* max non-rdma write data payload */
};


2019-04-10 20:08:12

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 18/19] xprtrdma: Aggregate the inline settings in struct rpcrdma_ep

Clean up.

The inline settings are actually a characteristic of the endpoint,
and not related to the device. They are also modified after the
transport instance is created, so they do not belong in the cdata
structure either.

Lastly, let's use names that are more natural to RDMA than to NFS:
inline_write -> inline_send and inline_read -> inline_recv. The
/proc files retain their names to avoid breaking user space.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/backchannel.c | 6 +++---
net/sunrpc/xprtrdma/rpc_rdma.c | 34 +++++++++++++++++++---------------
net/sunrpc/xprtrdma/transport.c | 4 +---
net/sunrpc/xprtrdma/verbs.c | 24 +++++++++++++-----------
net/sunrpc/xprtrdma/xprt_rdma.h | 9 +++++----
5 files changed, 41 insertions(+), 36 deletions(-)

diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c
index ae51ef6..ce98659 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -44,10 +44,10 @@ int xprt_rdma_bc_setup(struct rpc_xprt *xprt, unsigned int reqs)
size_t xprt_rdma_bc_maxpayload(struct rpc_xprt *xprt)
{
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
- struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
+ struct rpcrdma_ep *ep = &r_xprt->rx_ep;
size_t maxmsg;

- maxmsg = min_t(unsigned int, cdata->inline_rsize, cdata->inline_wsize);
+ maxmsg = min_t(unsigned int, ep->rep_inline_send, ep->rep_inline_recv);
maxmsg = min_t(unsigned int, maxmsg, PAGE_SIZE);
return maxmsg - RPCRDMA_HDRLEN_MIN;
}
@@ -184,7 +184,7 @@ static struct rpc_rqst *rpcrdma_bc_rqst_get(struct rpcrdma_xprt *r_xprt)
if (xprt->bc_alloc_count >= RPCRDMA_BACKWARD_WRS)
return NULL;

- size = min_t(size_t, r_xprt->rx_data.inline_rsize, PAGE_SIZE);
+ size = min_t(size_t, r_xprt->rx_ep.rep_inline_recv, PAGE_SIZE);
req = rpcrdma_req_create(r_xprt, size, GFP_KERNEL);
if (!req)
return NULL;
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index fcd29a4..4f0b879 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -105,16 +105,23 @@ static unsigned int rpcrdma_max_reply_header_size(unsigned int maxsegs)
return size;
}

+/**
+ * rpcrdma_set_max_header_sizes - Initialize inline payload sizes
+ * @r_xprt: transport instance to initialize
+ *
+ * The max_inline fields contain the maximum size of an RPC message
+ * so the marshaling code doesn't have to repeat this calculation
+ * for every RPC.
+ */
void rpcrdma_set_max_header_sizes(struct rpcrdma_xprt *r_xprt)
{
- struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
- struct rpcrdma_ia *ia = &r_xprt->rx_ia;
- unsigned int maxsegs = ia->ri_max_segs;
-
- ia->ri_max_inline_write = cdata->inline_wsize -
- rpcrdma_max_call_header_size(maxsegs);
- ia->ri_max_inline_read = cdata->inline_rsize -
- rpcrdma_max_reply_header_size(maxsegs);
+ unsigned int maxsegs = r_xprt->rx_ia.ri_max_segs;
+ struct rpcrdma_ep *ep = &r_xprt->rx_ep;
+
+ ep->rep_max_inline_send =
+ ep->rep_inline_send - rpcrdma_max_call_header_size(maxsegs);
+ ep->rep_max_inline_recv =
+ ep->rep_inline_recv - rpcrdma_max_reply_header_size(maxsegs);
}

/* The client can send a request inline as long as the RPCRDMA header
@@ -131,7 +138,7 @@ static bool rpcrdma_args_inline(struct rpcrdma_xprt *r_xprt,
struct xdr_buf *xdr = &rqst->rq_snd_buf;
unsigned int count, remaining, offset;

- if (xdr->len > r_xprt->rx_ia.ri_max_inline_write)
+ if (xdr->len > r_xprt->rx_ep.rep_max_inline_send)
return false;

if (xdr->page_len) {
@@ -159,9 +166,7 @@ static bool rpcrdma_args_inline(struct rpcrdma_xprt *r_xprt,
static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
struct rpc_rqst *rqst)
{
- struct rpcrdma_ia *ia = &r_xprt->rx_ia;
-
- return rqst->rq_rcv_buf.buflen <= ia->ri_max_inline_read;
+ return rqst->rq_rcv_buf.buflen <= r_xprt->rx_ep.rep_max_inline_recv;
}

/* The client is required to provide a Reply chunk if the maximum
@@ -173,10 +178,9 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
const struct rpc_rqst *rqst)
{
const struct xdr_buf *buf = &rqst->rq_rcv_buf;
- const struct rpcrdma_ia *ia = &r_xprt->rx_ia;

- return buf->head[0].iov_len + buf->tail[0].iov_len <
- ia->ri_max_inline_read;
+ return (buf->head[0].iov_len + buf->tail[0].iov_len) <
+ r_xprt->rx_ep.rep_max_inline_recv;
}

/* Split @vec on page boundaries into SGEs. FMR registers pages, not
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index f362642..4e5b8fe 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -70,7 +70,7 @@

static unsigned int xprt_rdma_slot_table_entries = RPCRDMA_DEF_SLOT_TABLE;
unsigned int xprt_rdma_max_inline_read = RPCRDMA_DEF_INLINE;
-static unsigned int xprt_rdma_max_inline_write = RPCRDMA_DEF_INLINE;
+unsigned int xprt_rdma_max_inline_write = RPCRDMA_DEF_INLINE;
unsigned int xprt_rdma_memreg_strategy = RPCRDMA_FRWR;
int xprt_rdma_pad_optimize;

@@ -350,8 +350,6 @@
xprt_rdma_format_addresses(xprt, sap);

cdata.max_requests = xprt_rdma_slot_table_entries;
- cdata.inline_wsize = xprt_rdma_max_inline_write;
- cdata.inline_rsize = xprt_rdma_max_inline_read;

/*
* Create new transport instance, which includes initialized
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 9ed5fd8..b419906 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -188,7 +188,6 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
rpcrdma_update_connect_private(struct rpcrdma_xprt *r_xprt,
struct rdma_conn_param *param)
{
- struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
const struct rpcrdma_connect_private *pmsg = param->private_data;
unsigned int rsize, wsize;

@@ -205,12 +204,13 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
wsize = rpcrdma_decode_buffer_size(pmsg->cp_recv_size);
}

- if (rsize < cdata->inline_rsize)
- cdata->inline_rsize = rsize;
- if (wsize < cdata->inline_wsize)
- cdata->inline_wsize = wsize;
- dprintk("RPC: %s: max send %u, max recv %u\n",
- __func__, cdata->inline_wsize, cdata->inline_rsize);
+ if (rsize < r_xprt->rx_ep.rep_inline_recv)
+ r_xprt->rx_ep.rep_inline_recv = rsize;
+ if (wsize < r_xprt->rx_ep.rep_inline_send)
+ r_xprt->rx_ep.rep_inline_send = wsize;
+ dprintk("RPC: %s: max send %u, max recv %u\n", __func__,
+ r_xprt->rx_ep.rep_inline_send,
+ r_xprt->rx_ep.rep_inline_recv);
rpcrdma_set_max_header_sizes(r_xprt);
}

@@ -488,6 +488,9 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
unsigned int max_sge;
int rc;

+ ep->rep_inline_send = xprt_rdma_max_inline_write;
+ ep->rep_inline_recv = xprt_rdma_max_inline_read;
+
max_sge = min_t(unsigned int, ia->ri_id->device->attrs.max_send_sge,
RPCRDMA_MAX_SEND_SGES);
if (max_sge < RPCRDMA_MIN_SEND_SGES) {
@@ -550,8 +553,8 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
pmsg->cp_magic = rpcrdma_cmp_magic;
pmsg->cp_version = RPCRDMA_CMP_VERSION;
pmsg->cp_flags |= RPCRDMA_CMP_F_SND_W_INV_OK;
- pmsg->cp_send_size = rpcrdma_encode_buffer_size(cdata->inline_wsize);
- pmsg->cp_recv_size = rpcrdma_encode_buffer_size(cdata->inline_rsize);
+ pmsg->cp_send_size = rpcrdma_encode_buffer_size(ep->rep_inline_send);
+ pmsg->cp_recv_size = rpcrdma_encode_buffer_size(ep->rep_inline_recv);
ep->rep_remote_cma.private_data = pmsg;
ep->rep_remote_cma.private_data_len = sizeof(*pmsg);

@@ -1045,7 +1048,6 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, size_t size,

static bool rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt, bool temp)
{
- struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
struct rpcrdma_rep *rep;

@@ -1053,7 +1055,7 @@ static bool rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt, bool temp)
if (rep == NULL)
goto out;

- rep->rr_rdmabuf = rpcrdma_regbuf_alloc(cdata->inline_rsize,
+ rep->rr_rdmabuf = rpcrdma_regbuf_alloc(r_xprt->rx_ep.rep_inline_recv,
DMA_FROM_DEVICE, GFP_KERNEL);
if (!rep->rr_rdmabuf)
goto out_free;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index c1e265a..9beb7fc 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -71,8 +71,6 @@ struct rpcrdma_ia {
int ri_async_rc;
unsigned int ri_max_segs;
unsigned int ri_max_frwr_depth;
- unsigned int ri_max_inline_write;
- unsigned int ri_max_inline_read;
unsigned int ri_max_send_sges;
bool ri_implicit_roundup;
enum ib_mr_type ri_mrtype;
@@ -92,11 +90,15 @@ enum {
struct rpcrdma_ep {
unsigned int rep_send_count;
unsigned int rep_send_batch;
+ unsigned int rep_max_inline_send;
+ unsigned int rep_max_inline_recv;
int rep_connected;
struct ib_qp_init_attr rep_attr;
wait_queue_head_t rep_connect_wait;
struct rpcrdma_connect_private rep_cm_private;
struct rdma_conn_param rep_remote_cma;
+ unsigned int rep_inline_send; /* negotiated */
+ unsigned int rep_inline_recv; /* negotiated */
int rep_receive_count;
};

@@ -419,8 +421,6 @@ enum {
*/
struct rpcrdma_create_data_internal {
unsigned int max_requests; /* max requests (slots) in flight */
- unsigned int inline_rsize; /* max non-rdma read data payload */
- unsigned int inline_wsize; /* max non-rdma write data payload */
};

/*
@@ -630,6 +630,7 @@ static inline void rpcrdma_set_xdrlen(struct xdr_buf *xdr, size_t len)
/* RPC/RDMA module init - xprtrdma/transport.c
*/
extern unsigned int xprt_rdma_max_inline_read;
+extern unsigned int xprt_rdma_max_inline_write;
void xprt_rdma_format_addresses(struct rpc_xprt *xprt, struct sockaddr *sap);
void xprt_rdma_free_addresses(struct rpc_xprt *xprt);
void xprt_rdma_close(struct rpc_xprt *xprt);


2019-04-10 20:08:17

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 19/19] xprtrdma: Eliminate struct rpcrdma_create_data_internal

Clean up.

Move the remaining field in rpcrdma_create_data_internal so the
structure can be removed.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/frwr_ops.c | 21 +++++++----------
net/sunrpc/xprtrdma/transport.c | 27 +++-------------------
net/sunrpc/xprtrdma/verbs.c | 47 +++++++++++++++++++++++----------------
net/sunrpc/xprtrdma/xprt_rdma.h | 21 ++++-------------
4 files changed, 46 insertions(+), 70 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 7cd2718..1d369b6 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -194,12 +194,11 @@ int frwr_init_mr(struct rpcrdma_ia *ia, struct rpcrdma_mr *mr)
* frwr_open - Prepare an endpoint for use with FRWR
* @ia: interface adapter this endpoint will use
* @ep: endpoint to prepare
- * @cdata: transport parameters
*
* On success, sets:
* ep->rep_attr.cap.max_send_wr
* ep->rep_attr.cap.max_recv_wr
- * cdata->max_requests
+ * ep->rep_max_requests
* ia->ri_max_segs
*
* And these FRWR-related fields:
@@ -208,8 +207,7 @@ int frwr_init_mr(struct rpcrdma_ia *ia, struct rpcrdma_mr *mr)
*
* On failure, a negative errno is returned.
*/
-int frwr_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
- struct rpcrdma_create_data_internal *cdata)
+int frwr_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep)
{
struct ib_device_attr *attrs = &ia->ri_id->device->attrs;
int max_qp_wr, depth, delta;
@@ -258,19 +256,18 @@ int frwr_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
max_qp_wr -= 1;
if (max_qp_wr < RPCRDMA_MIN_SLOT_TABLE)
return -ENOMEM;
- if (cdata->max_requests > max_qp_wr)
- cdata->max_requests = max_qp_wr;
- ep->rep_attr.cap.max_send_wr = cdata->max_requests * depth;
+ if (ep->rep_max_requests > max_qp_wr)
+ ep->rep_max_requests = max_qp_wr;
+ ep->rep_attr.cap.max_send_wr = ep->rep_max_requests * depth;
if (ep->rep_attr.cap.max_send_wr > max_qp_wr) {
- cdata->max_requests = max_qp_wr / depth;
- if (!cdata->max_requests)
+ ep->rep_max_requests = max_qp_wr / depth;
+ if (!ep->rep_max_requests)
return -EINVAL;
- ep->rep_attr.cap.max_send_wr = cdata->max_requests *
- depth;
+ ep->rep_attr.cap.max_send_wr = ep->rep_max_requests * depth;
}
ep->rep_attr.cap.max_send_wr += RPCRDMA_BACKWARD_WRS;
ep->rep_attr.cap.max_send_wr += 1; /* for ib_drain_sq */
- ep->rep_attr.cap.max_recv_wr = cdata->max_requests;
+ ep->rep_attr.cap.max_recv_wr = ep->rep_max_requests;
ep->rep_attr.cap.max_recv_wr += RPCRDMA_BACKWARD_WRS;
ep->rep_attr.cap.max_recv_wr += 1; /* for ib_drain_rq */

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 4e5b8fe..f0b13a9 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -68,7 +68,7 @@
* tunables
*/

-static unsigned int xprt_rdma_slot_table_entries = RPCRDMA_DEF_SLOT_TABLE;
+unsigned int xprt_rdma_slot_table_entries = RPCRDMA_DEF_SLOT_TABLE;
unsigned int xprt_rdma_max_inline_read = RPCRDMA_DEF_INLINE;
unsigned int xprt_rdma_max_inline_write = RPCRDMA_DEF_INLINE;
unsigned int xprt_rdma_memreg_strategy = RPCRDMA_FRWR;
@@ -288,7 +288,7 @@

cancel_delayed_work_sync(&r_xprt->rx_connect_worker);

- rpcrdma_ep_destroy(&r_xprt->rx_ep, &r_xprt->rx_ia);
+ rpcrdma_ep_destroy(r_xprt);
rpcrdma_buffer_destroy(&r_xprt->rx_buf);
rpcrdma_ia_close(&r_xprt->rx_ia);

@@ -311,10 +311,8 @@
static struct rpc_xprt *
xprt_setup_rdma(struct xprt_create *args)
{
- struct rpcrdma_create_data_internal cdata;
struct rpc_xprt *xprt;
struct rpcrdma_xprt *new_xprt;
- struct rpcrdma_ep *new_ep;
struct sockaddr *sap;
int rc;

@@ -349,29 +347,12 @@
xprt_set_bound(xprt);
xprt_rdma_format_addresses(xprt, sap);

- cdata.max_requests = xprt_rdma_slot_table_entries;
-
- /*
- * Create new transport instance, which includes initialized
- * o ia
- * o endpoint
- * o buffers
- */
-
new_xprt = rpcx_to_rdmax(xprt);
-
rc = rpcrdma_ia_open(new_xprt);
if (rc)
goto out1;

- /*
- * initialize and create ep
- */
- new_xprt->rx_data = cdata;
- new_ep = &new_xprt->rx_ep;
-
- rc = rpcrdma_ep_create(&new_xprt->rx_ep,
- &new_xprt->rx_ia, &new_xprt->rx_data);
+ rc = rpcrdma_ep_create(new_xprt);
if (rc)
goto out2;

@@ -402,7 +383,7 @@
rpcrdma_buffer_destroy(&new_xprt->rx_buf);
rc = -ENODEV;
out3:
- rpcrdma_ep_destroy(new_ep, &new_xprt->rx_ia);
+ rpcrdma_ep_destroy(new_xprt);
out2:
rpcrdma_ia_close(&new_xprt->rx_ia);
out1:
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index b419906..c65edcf 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -476,18 +476,22 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
ia->ri_pd = NULL;
}

-/*
- * Create unconnected endpoint.
+/**
+ * rpcrdma_ep_create - Create unconnected endpoint
+ * @r_xprt: transport to instantiate
+ *
+ * Returns zero on success, or a negative errno.
*/
-int
-rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
- struct rpcrdma_create_data_internal *cdata)
+int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt)
{
+ struct rpcrdma_ep *ep = &r_xprt->rx_ep;
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_connect_private *pmsg = &ep->rep_cm_private;
struct ib_cq *sendcq, *recvcq;
unsigned int max_sge;
int rc;

+ ep->rep_max_requests = xprt_rdma_slot_table_entries;
ep->rep_inline_send = xprt_rdma_max_inline_write;
ep->rep_inline_recv = xprt_rdma_max_inline_read;

@@ -499,7 +503,7 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
}
ia->ri_max_send_sges = max_sge;

- rc = frwr_open(ia, ep, cdata);
+ rc = frwr_open(ia, ep);
if (rc)
return rc;

@@ -521,7 +525,7 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
ep->rep_attr.cap.max_send_sge,
ep->rep_attr.cap.max_recv_sge);

- ep->rep_send_batch = cdata->max_requests >> 3;
+ ep->rep_send_batch = ep->rep_max_requests >> 3;
ep->rep_send_count = ep->rep_send_batch;
init_waitqueue_head(&ep->rep_connect_wait);
ep->rep_receive_count = 0;
@@ -584,16 +588,16 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
return rc;
}

-/*
- * rpcrdma_ep_destroy
+/**
+ * rpcrdma_ep_destroy - Disconnect and destroy endpoint.
+ * @r_xprt: transport instance to shut down
*
- * Disconnect and destroy endpoint. After this, the only
- * valid operations on the ep are to free it (if dynamically
- * allocated) or re-create it.
*/
-void
-rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
+void rpcrdma_ep_destroy(struct rpcrdma_xprt *r_xprt)
{
+ struct rpcrdma_ep *ep = &r_xprt->rx_ep;
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+
if (ia->ri_id && ia->ri_id->qp) {
rpcrdma_ep_disconnect(ep, ia);
rdma_destroy_qp(ia->ri_id);
@@ -623,7 +627,7 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
goto out1;

rc = -ENOMEM;
- err = rpcrdma_ep_create(ep, ia, &r_xprt->rx_data);
+ err = rpcrdma_ep_create(r_xprt);
if (err) {
pr_err("rpcrdma: rpcrdma_ep_create returned %d\n", err);
goto out2;
@@ -640,7 +644,7 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
return 0;

out3:
- rpcrdma_ep_destroy(ep, ia);
+ rpcrdma_ep_destroy(r_xprt);
out2:
rpcrdma_ia_close(ia);
out1:
@@ -1082,14 +1086,19 @@ static bool rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt, bool temp)
return false;
}

-int
-rpcrdma_buffer_create(struct rpcrdma_xprt *r_xprt)
+/**
+ * rpcrdma_buffer_create - Create initial set of req/rep objects
+ * @r_xprt: transport instance to (re)initialize
+ *
+ * Returns zero on success, otherwise a negative errno.
+ */
+int rpcrdma_buffer_create(struct rpcrdma_xprt *r_xprt)
{
struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
int i, rc;

buf->rb_flags = 0;
- buf->rb_max_requests = r_xprt->rx_data.max_requests;
+ buf->rb_max_requests = r_xprt->rx_ep.rep_max_requests;
buf->rb_bc_srv_max_requests = 0;
spin_lock_init(&buf->rb_mrlock);
spin_lock_init(&buf->rb_lock);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 9beb7fc..618d97b 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -97,6 +97,7 @@ struct rpcrdma_ep {
wait_queue_head_t rep_connect_wait;
struct rpcrdma_connect_private rep_cm_private;
struct rdma_conn_param rep_remote_cma;
+ unsigned int rep_max_requests; /* set by /proc */
unsigned int rep_inline_send; /* negotiated */
unsigned int rep_inline_recv; /* negotiated */
int rep_receive_count;
@@ -414,16 +415,6 @@ enum {
};

/*
- * Internal structure for transport instance creation. This
- * exists primarily for modularity.
- *
- * This data should be set with mount options
- */
-struct rpcrdma_create_data_internal {
- unsigned int max_requests; /* max requests (slots) in flight */
-};
-
-/*
* Statistics for RPCRDMA
*/
struct rpcrdma_stats {
@@ -467,7 +458,6 @@ struct rpcrdma_xprt {
struct rpcrdma_ia rx_ia;
struct rpcrdma_ep rx_ep;
struct rpcrdma_buffer rx_buf;
- struct rpcrdma_create_data_internal rx_data;
struct delayed_work rx_connect_worker;
struct rpcrdma_stats rx_stats;
};
@@ -507,9 +497,8 @@ struct rpcrdma_xprt {
/*
* Endpoint calls - xprtrdma/verbs.c
*/
-int rpcrdma_ep_create(struct rpcrdma_ep *, struct rpcrdma_ia *,
- struct rpcrdma_create_data_internal *);
-void rpcrdma_ep_destroy(struct rpcrdma_ep *, struct rpcrdma_ia *);
+int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt);
+void rpcrdma_ep_destroy(struct rpcrdma_xprt *r_xprt);
int rpcrdma_ep_connect(struct rpcrdma_ep *, struct rpcrdma_ia *);
void rpcrdma_ep_disconnect(struct rpcrdma_ep *, struct rpcrdma_ia *);

@@ -583,8 +572,7 @@ static inline bool rpcrdma_regbuf_dma_map(struct rpcrdma_xprt *r_xprt,
/* Memory registration calls xprtrdma/frwr_ops.c
*/
bool frwr_is_supported(struct ib_device *device);
-int frwr_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
- struct rpcrdma_create_data_internal *cdata);
+int frwr_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep);
int frwr_init_mr(struct rpcrdma_ia *ia, struct rpcrdma_mr *mr);
void frwr_release_mr(struct rpcrdma_mr *mr);
size_t frwr_maxpages(struct rpcrdma_xprt *r_xprt);
@@ -629,6 +617,7 @@ static inline void rpcrdma_set_xdrlen(struct xdr_buf *xdr, size_t len)

/* RPC/RDMA module init - xprtrdma/transport.c
*/
+extern unsigned int xprt_rdma_slot_table_entries;
extern unsigned int xprt_rdma_max_inline_read;
extern unsigned int xprt_rdma_max_inline_write;
void xprt_rdma_format_addresses(struct rpc_xprt *xprt, struct sockaddr *sap);


2019-04-11 20:47:29

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v1 05/19] xprtrdma: Clean up rpcrdma_create_rep() and rpcrdma_destroy_rep()

Hi Chuck,

On Wed, 2019-04-10 at 16:06 -0400, Chuck Lever wrote:
> For code legibility, clean up the function names to be consistent
> with the pattern: "rpcrdma" _ object-type _ action
>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> net/sunrpc/xprtrdma/verbs.c | 21 ++++++++-------------
> 1 file changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 82ea298..3bc751e 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -76,7 +76,6 @@
> static void rpcrdma_sendctx_put_locked(struct rpcrdma_sendctx *sc);
> static void rpcrdma_mrs_create(struct rpcrdma_xprt *r_xprt);
> static void rpcrdma_mrs_destroy(struct rpcrdma_buffer *buf);
> -static int rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt, bool temp);
> static void rpcrdma_dma_unmap_regbuf(struct rpcrdma_regbuf *rb);
> static void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, bool temp);
>
> @@ -1029,15 +1028,12 @@ struct rpcrdma_req *rpcrdma_req_create(struct
> rpcrdma_xprt *r_xprt, gfp_t flags)
> return req;
> }
>
> -static int
> -rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt, bool temp)
> +static bool rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt, bool temp)
> {
> struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
> struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
> struct rpcrdma_rep *rep;
> - int rc;
>
> - rc = -ENOMEM;

rc is used again a little lower, in the IS_ERR(rep->rr_rdmabuf) check. Can you
remove it there too, please?

Thanks,
Anna

> rep = kzalloc(sizeof(*rep), GFP_KERNEL);
> if (rep == NULL)
> goto out;
> @@ -1063,12 +1059,12 @@ struct rpcrdma_req *rpcrdma_req_create(struct
> rpcrdma_xprt *r_xprt, gfp_t flags)
> spin_lock(&buf->rb_lock);
> list_add(&rep->rr_list, &buf->rb_recv_bufs);
> spin_unlock(&buf->rb_lock);
> - return 0;
> + return true;
>
> out_free:
> kfree(rep);
> out:
> - return rc;
> + return false;
> }
>
> int
> @@ -1124,8 +1120,7 @@ struct rpcrdma_req *rpcrdma_req_create(struct
> rpcrdma_xprt *r_xprt, gfp_t flags)
> return rc;
> }
>
> -static void
> -rpcrdma_destroy_rep(struct rpcrdma_rep *rep)
> +static void rpcrdma_rep_destroy(struct rpcrdma_rep *rep)
> {
> rpcrdma_free_regbuf(rep->rr_rdmabuf);
> kfree(rep);
> @@ -1205,7 +1200,7 @@ struct rpcrdma_req *rpcrdma_req_create(struct
> rpcrdma_xprt *r_xprt, gfp_t flags)
> rep = list_first_entry(&buf->rb_recv_bufs,
> struct rpcrdma_rep, rr_list);
> list_del(&rep->rr_list);
> - rpcrdma_destroy_rep(rep);
> + rpcrdma_rep_destroy(rep);
> }
>
> while (!list_empty(&buf->rb_send_bufs)) {
> @@ -1334,7 +1329,7 @@ struct rpcrdma_req *
> }
> spin_unlock(&buffers->rb_lock);
> if (rep)
> - rpcrdma_destroy_rep(rep);
> + rpcrdma_rep_destroy(rep);
> }
>
> /*
> @@ -1351,7 +1346,7 @@ struct rpcrdma_req *
> list_add(&rep->rr_list, &buffers->rb_recv_bufs);
> spin_unlock(&buffers->rb_lock);
> } else {
> - rpcrdma_destroy_rep(rep);
> + rpcrdma_rep_destroy(rep);
> }
> }
>
> @@ -1500,7 +1495,7 @@ struct rpcrdma_regbuf *
> list_del(&rep->rr_list);
> spin_unlock(&buf->rb_lock);
> if (!rep) {
> - if (rpcrdma_create_rep(r_xprt, temp))
> + if (!rpcrdma_rep_create(r_xprt, temp))
> break;
> continue;
> }
>