2016-11-29 16:04:21

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 00/10] Server-side NFS/RDMA patches proposed for v4.10

Hi Bruce-

These patches are ready for you to consider for v4.10. Changes proposed
for v4.10 include:

- Drop connection on GSS sequence window overflow
- Remove unnecessary spin lock in the svc_rdma_send path
- A number of minor clean-ups

Available in the "nfsd-rdma-for-4.10" topic branch of this git repo:

git://git.linux-nfs.org/projects/cel/cel-2.6.git


Or for browsing:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.10


Changes since v2:
- Rebased on v4.9-rc7


Changes since v1:
- Rebased on v4.9-rc5
- Address checkpatch.pl warnings
- Address compiler warnings
- Remove a dprintk in svc_rdma_get_inv_rkey()
- Patches reordered so fixes come before clean-ups

---

Chuck Lever (10):
svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm
svcauth_gss: Close connection when dropping an incoming message
svcrdma: Renovate sendto chunk list parsing
svcrdma: Remove BH-disabled spin locking in svc_rdma_send()
svcrdma: Remove DMA map accounting
svcrdma: Remove svc_rdma_op_ctxt::wc_status
svcrdma: Remove unused variables in xprt_rdma_bc_allocate()
svcrdma: Remove unused variable in rdma_copy_tail()
svcrdma: Break up dprintk format in svc_rdma_accept()
svcrdma: Further clean-up of svc_rdma_get_inv_rkey()


include/linux/sunrpc/svc_rdma.h | 7 --
net/sunrpc/auth_gss/svcauth_gss.c | 2
net/sunrpc/svc.c | 14 ++-
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 5 -
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 21 ++++-
net/sunrpc/xprtrdma/svc_rdma_sendto.c | 116 ++++++++--------------------
net/sunrpc/xprtrdma/svc_rdma_transport.c | 94 +++++++----------------
7 files changed, 94 insertions(+), 165 deletions(-)

--
Chuck Lever


2016-11-29 16:04:28

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 01/10] svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm

Logic copied from xs_setup_bc_tcp().

Fixes: 39a9beab5acb ('rpc: share one xps between all backchannels')
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index 20027f8..6035c5a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -359,6 +359,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
out_fail:
xprt_rdma_free_addresses(xprt);
args->bc_xprt->xpt_bc_xprt = NULL;
+ args->bc_xprt->xpt_bc_xps = NULL;
xprt_put(xprt);
xprt_free(xprt);
return ERR_PTR(-EINVAL);


2016-11-29 16:04:36

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 02/10] svcauth_gss: Close connection when dropping an incoming message

S5.3.3.1 of RFC 2203 requires that an incoming GSS-wrapped message
whose sequence number lies outside the current window is dropped.
The rationale is:

The reason for discarding requests silently is that the server
is unable to determine if the duplicate or out of range request
was due to a sequencing problem in the client, network, or the
operating system, or due to some quirk in routing, or a replay
attack by an intruder. Discarding the request allows the client
to recover after timing out, if indeed the duplication was
unintentional or well intended.

However, clients may rely on the server dropping the connection to
indicate that a retransmit is needed. Without a connection reset, a
client can wait forever without retransmitting, and the workload
just stops dead. I've reproduced this behavior by running xfstests
generic/323 on an NFSv4.0 mount with proto=rdma and sec=krb5i.

To address this issue, have the server close the connection when it
silently discards an incoming message due to a GSS sequence number
problem.

There are a few other places where the server will never reply.
Change those spots in a similar fashion.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/auth_gss/svcauth_gss.c | 2 +-
net/sunrpc/svc.c | 14 +++++++++-----
2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index 45662d7..886e9d38 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -1548,7 +1548,7 @@ static void destroy_use_gss_proxy_proc_entry(struct net *net) {}
ret = SVC_COMPLETE;
goto out;
drop:
- ret = SVC_DROP;
+ ret = SVC_CLOSE;
out:
if (rsci)
cache_put(&rsci->h, sn->rsc_cache);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 7c8070e..75f290b 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1155,8 +1155,7 @@ static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ..
case SVC_DENIED:
goto err_bad_auth;
case SVC_CLOSE:
- if (test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
- svc_close_xprt(rqstp->rq_xprt);
+ goto close;
case SVC_DROP:
goto dropit;
case SVC_COMPLETE:
@@ -1246,7 +1245,7 @@ static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ..

sendit:
if (svc_authorise(rqstp))
- goto dropit;
+ goto close;
return 1; /* Caller can now send it */

dropit:
@@ -1254,11 +1253,16 @@ static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ..
dprintk("svc: svc_process dropit\n");
return 0;

+ close:
+ if (test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
+ svc_close_xprt(rqstp->rq_xprt);
+ dprintk("svc: svc_process close\n");
+ return 0;
+
err_short_len:
svc_printk(rqstp, "short len %Zd, dropping request\n",
argv->iov_len);
-
- goto dropit; /* drop request */
+ goto close;

err_bad_rpc:
serv->sv_stats->rpcbadfmt++;


2016-11-29 16:04:44

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 03/10] svcrdma: Renovate sendto chunk list parsing

The current sendto code appears to support clients that provide only
one of a Read list, a Write list, or a Reply chunk. My reading of
that code is that it doesn't support the following cases:

- Read list + Write list
- Read list + Reply chunk
- Write list + Reply chunk
- Read list + Write list + Reply chunk

The protocol allows more than one Read or Write chunk in those
lists. Some clients do send a Read list and Reply chunk
simultaneously. NFSv4 WRITE uses a Read list for the data payload,
and a Reply chunk because the GETATTR result in the reply can
contain a large object like an ACL.

Generalize one of the sendto code paths needed to support all of
the above cases, and attempt to ensure that only one pass is done
through the RPC Call's transport header to gather chunk list
information for building the reply.

Signed-off-by: Chuck Lever <[email protected]>
---
include/linux/sunrpc/svc_rdma.h | 2 -
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 14 +++++
net/sunrpc/xprtrdma/svc_rdma_sendto.c | 92 ++++++++-----------------------
3 files changed, 39 insertions(+), 69 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index cc3ae16..6aef63b 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -236,8 +236,6 @@ extern int rdma_read_chunk_frmr(struct svcxprt_rdma *, struct svc_rqst *,
extern int svc_rdma_map_xdr(struct svcxprt_rdma *, struct xdr_buf *,
struct svc_rdma_req_map *, bool);
extern int svc_rdma_sendto(struct svc_rqst *);
-extern struct rpcrdma_read_chunk *
- svc_rdma_get_read_chunk(struct rpcrdma_msg *);
extern void svc_rdma_send_error(struct svcxprt_rdma *, struct rpcrdma_msg *,
int);

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index ad1df97..873c2a9 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -415,6 +415,20 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
return 1;
}

+/* Returns the address of the first read chunk or <nul> if no read chunk
+ * is present
+ */
+static struct rpcrdma_read_chunk *
+svc_rdma_get_read_chunk(struct rpcrdma_msg *rmsgp)
+{
+ struct rpcrdma_read_chunk *ch =
+ (struct rpcrdma_read_chunk *)&rmsgp->rm_body.rm_chunks[0];
+
+ if (ch->rc_discrim == xdr_zero)
+ return NULL;
+ return ch;
+}
+
static int rdma_read_chunks(struct svcxprt_rdma *xprt,
struct rpcrdma_msg *rmsgp,
struct svc_rqst *rqstp,
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index f5a91ed..0a58d40 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -153,76 +153,35 @@ static dma_addr_t dma_map_xdr(struct svcxprt_rdma *xprt,
return dma_addr;
}

-/* Returns the address of the first read chunk or <nul> if no read chunk
- * is present
+/* Parse the RPC Call's transport header.
*/
-struct rpcrdma_read_chunk *
-svc_rdma_get_read_chunk(struct rpcrdma_msg *rmsgp)
+static void svc_rdma_get_write_arrays(struct rpcrdma_msg *rmsgp,
+ struct rpcrdma_write_array **write,
+ struct rpcrdma_write_array **reply)
{
- struct rpcrdma_read_chunk *ch =
- (struct rpcrdma_read_chunk *)&rmsgp->rm_body.rm_chunks[0];
+ __be32 *p;

- if (ch->rc_discrim == xdr_zero)
- return NULL;
- return ch;
-}
-
-/* Returns the address of the first read write array element or <nul>
- * if no write array list is present
- */
-static struct rpcrdma_write_array *
-svc_rdma_get_write_array(struct rpcrdma_msg *rmsgp)
-{
- if (rmsgp->rm_body.rm_chunks[0] != xdr_zero ||
- rmsgp->rm_body.rm_chunks[1] == xdr_zero)
- return NULL;
- return (struct rpcrdma_write_array *)&rmsgp->rm_body.rm_chunks[1];
-}
+ p = (__be32 *)&rmsgp->rm_body.rm_chunks[0];

-/* Returns the address of the first reply array element or <nul> if no
- * reply array is present
- */
-static struct rpcrdma_write_array *
-svc_rdma_get_reply_array(struct rpcrdma_msg *rmsgp,
- struct rpcrdma_write_array *wr_ary)
-{
- struct rpcrdma_read_chunk *rch;
- struct rpcrdma_write_array *rp_ary;
+ /* Read list */
+ while (*p++ != xdr_zero)
+ p += 5;

- /* XXX: Need to fix when reply chunk may occur with read list
- * and/or write list.
- */
- if (rmsgp->rm_body.rm_chunks[0] != xdr_zero ||
- rmsgp->rm_body.rm_chunks[1] != xdr_zero)
- return NULL;
-
- rch = svc_rdma_get_read_chunk(rmsgp);
- if (rch) {
- while (rch->rc_discrim != xdr_zero)
- rch++;
-
- /* The reply chunk follows an empty write array located
- * at 'rc_position' here. The reply array is at rc_target.
- */
- rp_ary = (struct rpcrdma_write_array *)&rch->rc_target;
- goto found_it;
- }
-
- if (wr_ary) {
- int chunk = be32_to_cpu(wr_ary->wc_nchunks);
-
- rp_ary = (struct rpcrdma_write_array *)
- &wr_ary->wc_array[chunk].wc_target.rs_length;
- goto found_it;
+ /* Write list */
+ if (*p != xdr_zero) {
+ *write = (struct rpcrdma_write_array *)p;
+ while (*p++ != xdr_zero)
+ p += 1 + be32_to_cpu(*p) * 4;
+ } else {
+ *write = NULL;
+ p++;
}

- /* No read list, no write list */
- rp_ary = (struct rpcrdma_write_array *)&rmsgp->rm_body.rm_chunks[2];
-
- found_it:
- if (rp_ary->wc_discrim == xdr_zero)
- return NULL;
- return rp_ary;
+ /* Reply chunk */
+ if (*p != xdr_zero)
+ *reply = (struct rpcrdma_write_array *)p;
+ else
+ *reply = NULL;
}

/* RPC-over-RDMA Version One private extension: Remote Invalidation.
@@ -244,8 +203,8 @@ static u32 svc_rdma_get_inv_rkey(struct rpcrdma_msg *rdma_argp,

inv_rkey = 0;

- rd_ary = svc_rdma_get_read_chunk(rdma_argp);
- if (rd_ary) {
+ rd_ary = (struct rpcrdma_read_chunk *)&rdma_argp->rm_body.rm_chunks[0];
+ if (rd_ary->rc_discrim != xdr_zero) {
inv_rkey = be32_to_cpu(rd_ary->rc_target.rs_handle);
goto out;
}
@@ -622,8 +581,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
* places this at the start of page 0.
*/
rdma_argp = page_address(rqstp->rq_pages[0]);
- wr_ary = svc_rdma_get_write_array(rdma_argp);
- rp_ary = svc_rdma_get_reply_array(rdma_argp, wr_ary);
+ svc_rdma_get_write_arrays(rdma_argp, &wr_ary, &rp_ary);

inv_rkey = 0;
if (rdma->sc_snd_w_inv)


2016-11-29 16:04:52

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 04/10] svcrdma: Remove BH-disabled spin locking in svc_rdma_send()

svcrdma's current SQ accounting algorithm takes sc_lock and disables
bottom-halves while posting all RDMA Read, Write, and Send WRs.

This is relatively heavyweight serialization. And note that Write and
Send are already fully serialized by the xpt_mutex.

Using a single atomic_t should be all that is necessary to guarantee
that ib_post_send() is called only when there is enough space on the
send queue. This is what the other RDMA-enabled storage targets do.

Signed-off-by: Chuck Lever <[email protected]>
---
include/linux/sunrpc/svc_rdma.h | 2 +-
net/sunrpc/xprtrdma/svc_rdma_sendto.c | 7 ++++++-
net/sunrpc/xprtrdma/svc_rdma_transport.c | 25 ++++++++++---------------
3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 6aef63b..601cb07 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -139,7 +139,7 @@ struct svcxprt_rdma {
int sc_max_sge_rd; /* max sge for read target */
bool sc_snd_w_inv; /* OK to use Send With Invalidate */

- atomic_t sc_sq_count; /* Number of SQ WR on queue */
+ atomic_t sc_sq_avail; /* SQEs ready to be consumed */
unsigned int sc_sq_depth; /* Depth of SQ */
unsigned int sc_rq_depth; /* Depth of RQ */
u32 sc_max_requests; /* Forward credits */
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 0a58d40..30eeab5 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -594,7 +594,12 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
goto err0;
inline_bytes = rqstp->rq_res.len;

- /* Create the RDMA response header */
+ /* Create the RDMA response header. xprt->xpt_mutex,
+ * acquired in svc_send(), serializes RPC replies. The
+ * code path below that inserts the credit grant value
+ * into each transport header runs only inside this
+ * critical section.
+ */
ret = -ENOMEM;
res_page = alloc_page(GFP_KERNEL);
if (!res_page)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 1334de2..3e68e3d 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -436,7 +436,7 @@ static void svc_rdma_send_wc_common(struct svcxprt_rdma *xprt,
goto err;

out:
- atomic_dec(&xprt->sc_sq_count);
+ atomic_inc(&xprt->sc_sq_avail);
wake_up(&xprt->sc_send_wait);
return;

@@ -1010,6 +1010,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
newxprt->sc_rq_depth = newxprt->sc_max_requests +
newxprt->sc_max_bc_requests;
newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_rq_depth;
+ atomic_set(&newxprt->sc_sq_avail, newxprt->sc_sq_depth);

if (!svc_rdma_prealloc_ctxts(newxprt))
goto errout;
@@ -1339,15 +1340,13 @@ int svc_rdma_send(struct svcxprt_rdma *xprt, struct ib_send_wr *wr)

/* If the SQ is full, wait until an SQ entry is available */
while (1) {
- spin_lock_bh(&xprt->sc_lock);
- if (xprt->sc_sq_depth < atomic_read(&xprt->sc_sq_count) + wr_count) {
- spin_unlock_bh(&xprt->sc_lock);
+ if ((atomic_sub_return(wr_count, &xprt->sc_sq_avail) < 0)) {
atomic_inc(&rdma_stat_sq_starve);

/* Wait until SQ WR available if SQ still full */
+ atomic_add(wr_count, &xprt->sc_sq_avail);
wait_event(xprt->sc_send_wait,
- atomic_read(&xprt->sc_sq_count) <
- xprt->sc_sq_depth);
+ atomic_read(&xprt->sc_sq_avail) > wr_count);
if (test_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags))
return -ENOTCONN;
continue;
@@ -1357,21 +1356,17 @@ int svc_rdma_send(struct svcxprt_rdma *xprt, struct ib_send_wr *wr)
svc_xprt_get(&xprt->sc_xprt);

/* Bump used SQ WR count and post */
- atomic_add(wr_count, &xprt->sc_sq_count);
ret = ib_post_send(xprt->sc_qp, wr, &bad_wr);
if (ret) {
set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
- atomic_sub(wr_count, &xprt->sc_sq_count);
for (i = 0; i < wr_count; i ++)
svc_xprt_put(&xprt->sc_xprt);
- dprintk("svcrdma: failed to post SQ WR rc=%d, "
- "sc_sq_count=%d, sc_sq_depth=%d\n",
- ret, atomic_read(&xprt->sc_sq_count),
- xprt->sc_sq_depth);
- }
- spin_unlock_bh(&xprt->sc_lock);
- if (ret)
+ dprintk("svcrdma: failed to post SQ WR rc=%d\n", ret);
+ dprintk(" sc_sq_avail=%d, sc_sq_depth=%d\n",
+ atomic_read(&xprt->sc_sq_avail),
+ xprt->sc_sq_depth);
wake_up(&xprt->sc_send_wait);
+ }
break;
}
return ret;


2016-11-29 16:05:01

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 05/10] svcrdma: Remove DMA map accounting

Clean up: sc_dma_used is not required for correct operation. It is
simply a debugging tool to report when svcrdma has leaked DMA maps.

However, manipulating an atomic has a measurable CPU cost, and DMA
map accounting specific to svcrdma will be meaningless once svcrdma
is converted to use the new generic r/w API.

A similar kind of debug accounting can be done simply by enabling
the IOMMU or by using CONFIG_DMA_API_DEBUG, CONFIG_IOMMU_DEBUG, and
CONFIG_IOMMU_LEAK.

Signed-off-by: Chuck Lever <[email protected]>
---
include/linux/sunrpc/svc_rdma.h | 2 --
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 1 -
net/sunrpc/xprtrdma/svc_rdma_transport.c | 13 +++----------
3 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 601cb07..43d7c70 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -148,7 +148,6 @@ struct svcxprt_rdma {

struct ib_pd *sc_pd;

- atomic_t sc_dma_used;
spinlock_t sc_ctxt_lock;
struct list_head sc_ctxts;
int sc_ctxt_used;
@@ -200,7 +199,6 @@ static inline void svc_rdma_count_mappings(struct svcxprt_rdma *rdma,
struct svc_rdma_op_ctxt *ctxt)
{
ctxt->mapped_sges++;
- atomic_inc(&rdma->sc_dma_used);
}

/* svc_rdma_backchannel.c */
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 873c2a9..283246e 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -279,7 +279,6 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
frmr->sg);
return -ENOMEM;
}
- atomic_inc(&xprt->sc_dma_used);

n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, NULL, PAGE_SIZE);
if (unlikely(n != frmr->sg_nents)) {
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 3e68e3d..61ae392 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -226,25 +226,22 @@ void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt)
struct svcxprt_rdma *xprt = ctxt->xprt;
struct ib_device *device = xprt->sc_cm_id->device;
u32 lkey = xprt->sc_pd->local_dma_lkey;
- unsigned int i, count;
+ unsigned int i;

- for (count = 0, i = 0; i < ctxt->mapped_sges; i++) {
+ for (i = 0; i < ctxt->mapped_sges; i++) {
/*
* Unmap the DMA addr in the SGE if the lkey matches
* the local_dma_lkey, otherwise, ignore it since it is
* an FRMR lkey and will be unmapped later when the
* last WR that uses it completes.
*/
- if (ctxt->sge[i].lkey == lkey) {
- count++;
+ if (ctxt->sge[i].lkey == lkey)
ib_dma_unmap_page(device,
ctxt->sge[i].addr,
ctxt->sge[i].length,
ctxt->direction);
- }
}
ctxt->mapped_sges = 0;
- atomic_sub(count, &xprt->sc_dma_used);
}

void svc_rdma_put_context(struct svc_rdma_op_ctxt *ctxt, int free_pages)
@@ -946,7 +943,6 @@ void svc_rdma_put_frmr(struct svcxprt_rdma *rdma,
if (frmr) {
ib_dma_unmap_sg(rdma->sc_cm_id->device,
frmr->sg, frmr->sg_nents, frmr->direction);
- atomic_dec(&rdma->sc_dma_used);
spin_lock_bh(&rdma->sc_frmr_q_lock);
WARN_ON_ONCE(!list_empty(&frmr->frmr_list));
list_add(&frmr->frmr_list, &rdma->sc_frmr_q);
@@ -1258,9 +1254,6 @@ static void __svc_rdma_free(struct work_struct *work)
if (rdma->sc_ctxt_used != 0)
pr_err("svcrdma: ctxt still in use? (%d)\n",
rdma->sc_ctxt_used);
- if (atomic_read(&rdma->sc_dma_used) != 0)
- pr_err("svcrdma: dma still in use? (%d)\n",
- atomic_read(&rdma->sc_dma_used));

/* Final put of backchannel client transport */
if (xprt->xpt_bc_xprt) {


2016-11-29 16:05:08

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 06/10] svcrdma: Remove svc_rdma_op_ctxt::wc_status

Clean up: Completion status is already reported in the individual
completion handlers. Save a few bytes in struct svc_rdma_op_ctxt.

Signed-off-by: Chuck Lever <[email protected]>
---
include/linux/sunrpc/svc_rdma.h | 1 -
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 4 ++--
net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 -
3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 43d7c70..757fb96 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -79,7 +79,6 @@ struct svc_rdma_op_ctxt {
struct ib_cqe reg_cqe;
struct ib_cqe inv_cqe;
struct list_head dto_q;
- enum ib_wc_status wc_status;
u32 byte_len;
u32 position;
struct svcxprt_rdma *xprt;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 283246e..428ab4ef 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -640,8 +640,8 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
goto defer;
goto out;
}
- dprintk("svcrdma: processing ctxt=%p on xprt=%p, rqstp=%p, status=%d\n",
- ctxt, rdma_xprt, rqstp, ctxt->wc_status);
+ dprintk("svcrdma: processing ctxt=%p on xprt=%p, rqstp=%p\n",
+ ctxt, rdma_xprt, rqstp);
atomic_inc(&rdma_stat_recv);

/* Build up the XDR from the receive buffers. */
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 61ae392..733d6e03 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -395,7 +395,6 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)

/* WARNING: Only wc->wr_cqe and wc->status are reliable */
ctxt = container_of(cqe, struct svc_rdma_op_ctxt, cqe);
- ctxt->wc_status = wc->status;
svc_rdma_unmap_dma(ctxt);

if (wc->status != IB_WC_SUCCESS)


2016-11-29 16:05:18

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 07/10] svcrdma: Remove unused variables in xprt_rdma_bc_allocate()

Clean up.

/linux-2.6/net/sunrpc/xprtrdma/svc_rdma_backchannel.c: In function
‘xprt_rdma_bc_allocate’:
linux-2.6/net/sunrpc/xprtrdma/svc_rdma_backchannel.c:169:23: warning:
variable ‘rdma’ set but not used [-Wunused-but-set-variable]
struct svcxprt_rdma *rdma;
^

Fixes: 5d252f90a800 ("svcrdma: Add class for RDMA backwards ...")
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index 6035c5a..288e35c 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -164,13 +164,9 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
xprt_rdma_bc_allocate(struct rpc_task *task)
{
struct rpc_rqst *rqst = task->tk_rqstp;
- struct svc_xprt *sxprt = rqst->rq_xprt->bc_xprt;
size_t size = rqst->rq_callsize;
- struct svcxprt_rdma *rdma;
struct page *page;

- rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt);
-
if (size > PAGE_SIZE) {
WARN_ONCE(1, "svcrdma: large bc buffer request (size %zu)\n",
size);


2016-11-29 16:05:25

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 08/10] svcrdma: Remove unused variable in rdma_copy_tail()

Clean up.

linux-2.6/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c: In function
‘rdma_copy_tail’:
linux-2.6/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:376:6: warning:
variable ‘ret’ set but not used [-Wunused-but-set-variable]
int ret;
^

Fixes: a97c331f9aa9 ("svcrdma: Handle additional inline content")
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 428ab4ef..57d35fb 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -373,9 +373,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
u32 position, u32 byte_count, u32 page_offset, int page_no)
{
char *srcp, *destp;
- int ret;

- ret = 0;
srcp = head->arg.head[0].iov_base + position;
byte_count = head->arg.head[0].iov_len - position;
if (byte_count > PAGE_SIZE) {


2016-11-29 16:05:33

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 09/10] svcrdma: Break up dprintk format in svc_rdma_accept()

The current code results in:

Nov 7 14:50:19 klimt kernel: svcrdma: newxprt->sc_cm_id=ffff88085590c800,
newxprt->sc_pd=ffff880852a7ce00#012 cm_id->device=ffff88084dd20000,
sc_pd->device=ffff88084dd20000#012 cap.max_send_wr = 272#012
cap.max_recv_wr = 34#012 cap.max_send_sge = 32#012
cap.max_recv_sge = 32
Nov 7 14:50:19 klimt kernel: svcrdma: new connection ffff880855908000
accepted with the following attributes:#012 local_ip :
10.0.0.5#012 local_port#011 : 20049#012 remote_ip :
10.0.0.2#012 remote_port : 59909#012 max_sge : 32#012
max_sge_rd : 30#012 sq_depth : 272#012 max_requests :
32#012 ord : 16

Split up the output over multiple dprintks and take the opportunity
to fix the display of IPv6 addresses.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/svc_rdma_transport.c | 55 ++++++++++--------------------
1 file changed, 18 insertions(+), 37 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 733d6e03..ca2799a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -41,6 +41,7 @@
*/

#include <linux/sunrpc/svc_xprt.h>
+#include <linux/sunrpc/addr.h>
#include <linux/sunrpc/debug.h>
#include <linux/sunrpc/rpc_rdma.h>
#include <linux/interrupt.h>
@@ -968,6 +969,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
struct rpcrdma_connect_private pmsg;
struct ib_qp_init_attr qp_attr;
struct ib_device *dev;
+ struct sockaddr *sap;
unsigned int i;
int ret = 0;

@@ -1048,18 +1050,12 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
qp_attr.qp_type = IB_QPT_RC;
qp_attr.send_cq = newxprt->sc_sq_cq;
qp_attr.recv_cq = newxprt->sc_rq_cq;
- dprintk("svcrdma: newxprt->sc_cm_id=%p, newxprt->sc_pd=%p\n"
- " cm_id->device=%p, sc_pd->device=%p\n"
- " cap.max_send_wr = %d\n"
- " cap.max_recv_wr = %d\n"
- " cap.max_send_sge = %d\n"
- " cap.max_recv_sge = %d\n",
- newxprt->sc_cm_id, newxprt->sc_pd,
- dev, newxprt->sc_pd->device,
- qp_attr.cap.max_send_wr,
- qp_attr.cap.max_recv_wr,
- qp_attr.cap.max_send_sge,
- qp_attr.cap.max_recv_sge);
+ dprintk("svcrdma: newxprt->sc_cm_id=%p, newxprt->sc_pd=%p\n",
+ newxprt->sc_cm_id, newxprt->sc_pd);
+ dprintk(" cap.max_send_wr = %d, cap.max_recv_wr = %d\n",
+ qp_attr.cap.max_send_wr, qp_attr.cap.max_recv_wr);
+ dprintk(" cap.max_send_sge = %d, cap.max_recv_sge = %d\n",
+ qp_attr.cap.max_send_sge, qp_attr.cap.max_recv_sge);

ret = rdma_create_qp(newxprt->sc_cm_id, newxprt->sc_pd, &qp_attr);
if (ret) {
@@ -1142,31 +1138,16 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
goto errout;
}

- dprintk("svcrdma: new connection %p accepted with the following "
- "attributes:\n"
- " local_ip : %pI4\n"
- " local_port : %d\n"
- " remote_ip : %pI4\n"
- " remote_port : %d\n"
- " max_sge : %d\n"
- " max_sge_rd : %d\n"
- " sq_depth : %d\n"
- " max_requests : %d\n"
- " ord : %d\n",
- newxprt,
- &((struct sockaddr_in *)&newxprt->sc_cm_id->
- route.addr.src_addr)->sin_addr.s_addr,
- ntohs(((struct sockaddr_in *)&newxprt->sc_cm_id->
- route.addr.src_addr)->sin_port),
- &((struct sockaddr_in *)&newxprt->sc_cm_id->
- route.addr.dst_addr)->sin_addr.s_addr,
- ntohs(((struct sockaddr_in *)&newxprt->sc_cm_id->
- route.addr.dst_addr)->sin_port),
- newxprt->sc_max_sge,
- newxprt->sc_max_sge_rd,
- newxprt->sc_sq_depth,
- newxprt->sc_max_requests,
- newxprt->sc_ord);
+ dprintk("svcrdma: new connection %p accepted:\n", newxprt);
+ sap = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.src_addr;
+ dprintk(" local address : %pIS:%u\n", sap, rpc_get_port(sap));
+ sap = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.dst_addr;
+ dprintk(" remote address : %pIS:%u\n", sap, rpc_get_port(sap));
+ dprintk(" max_sge : %d\n", newxprt->sc_max_sge);
+ dprintk(" max_sge_rd : %d\n", newxprt->sc_max_sge_rd);
+ dprintk(" sq_depth : %d\n", newxprt->sc_sq_depth);
+ dprintk(" max_requests : %d\n", newxprt->sc_max_requests);
+ dprintk(" ord : %d\n", newxprt->sc_ord);

return &newxprt->sc_xprt;



2016-11-29 16:05:41

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 10/10] svcrdma: Further clean-up of svc_rdma_get_inv_rkey()

No longer any need for the dprintk().

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/svc_rdma_sendto.c | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 30eeab5..ad4d286 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -199,31 +199,22 @@ static u32 svc_rdma_get_inv_rkey(struct rpcrdma_msg *rdma_argp,
{
struct rpcrdma_read_chunk *rd_ary;
struct rpcrdma_segment *arg_ch;
- u32 inv_rkey;
-
- inv_rkey = 0;

rd_ary = (struct rpcrdma_read_chunk *)&rdma_argp->rm_body.rm_chunks[0];
- if (rd_ary->rc_discrim != xdr_zero) {
- inv_rkey = be32_to_cpu(rd_ary->rc_target.rs_handle);
- goto out;
- }
+ if (rd_ary->rc_discrim != xdr_zero)
+ return be32_to_cpu(rd_ary->rc_target.rs_handle);

if (wr_ary && be32_to_cpu(wr_ary->wc_nchunks)) {
arg_ch = &wr_ary->wc_array[0].wc_target;
- inv_rkey = be32_to_cpu(arg_ch->rs_handle);
- goto out;
+ return be32_to_cpu(arg_ch->rs_handle);
}

if (rp_ary && be32_to_cpu(rp_ary->wc_nchunks)) {
arg_ch = &rp_ary->wc_array[0].wc_target;
- inv_rkey = be32_to_cpu(arg_ch->rs_handle);
- goto out;
+ return be32_to_cpu(arg_ch->rs_handle);
}

-out:
- dprintk("svcrdma: Send With Invalidate rkey=%08x\n", inv_rkey);
- return inv_rkey;
+ return 0;
}

/* Assumptions:


2016-11-30 22:30:07

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v3 00/10] Server-side NFS/RDMA patches proposed for v4.10

On Tue, Nov 29, 2016 at 11:04:16AM -0500, Chuck Lever wrote:
> These patches are ready for you to consider for v4.10. Changes proposed
> for v4.10 include:

They look fine to me, thanks; applying for 4.10.--b.

>
> - Drop connection on GSS sequence window overflow
> - Remove unnecessary spin lock in the svc_rdma_send path
> - A number of minor clean-ups
>
> Available in the "nfsd-rdma-for-4.10" topic branch of this git repo:
>
> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>
>
> Or for browsing:
>
> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.10
>
>
> Changes since v2:
> - Rebased on v4.9-rc7
>
>
> Changes since v1:
> - Rebased on v4.9-rc5
> - Address checkpatch.pl warnings
> - Address compiler warnings
> - Remove a dprintk in svc_rdma_get_inv_rkey()
> - Patches reordered so fixes come before clean-ups
>
> ---
>
> Chuck Lever (10):
> svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm
> svcauth_gss: Close connection when dropping an incoming message
> svcrdma: Renovate sendto chunk list parsing
> svcrdma: Remove BH-disabled spin locking in svc_rdma_send()
> svcrdma: Remove DMA map accounting
> svcrdma: Remove svc_rdma_op_ctxt::wc_status
> svcrdma: Remove unused variables in xprt_rdma_bc_allocate()
> svcrdma: Remove unused variable in rdma_copy_tail()
> svcrdma: Break up dprintk format in svc_rdma_accept()
> svcrdma: Further clean-up of svc_rdma_get_inv_rkey()
>
>
> include/linux/sunrpc/svc_rdma.h | 7 --
> net/sunrpc/auth_gss/svcauth_gss.c | 2
> net/sunrpc/svc.c | 14 ++-
> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 5 -
> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 21 ++++-
> net/sunrpc/xprtrdma/svc_rdma_sendto.c | 116 ++++++++--------------------
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 94 +++++++----------------
> 7 files changed, 94 insertions(+), 165 deletions(-)
>
> --
> Chuck Lever

2016-12-03 03:17:54

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v3 02/10] svcauth_gss: Close connection when dropping an incoming message

By the way, pynfs test GSS1 now fails with a complaint about the dropped
connection.

I'll fix it up to require the dropped connection, since we now think
it's actually a bug to do otherwise.

--b.

On Tue, Nov 29, 2016 at 11:04:34AM -0500, Chuck Lever wrote:
> S5.3.3.1 of RFC 2203 requires that an incoming GSS-wrapped message
> whose sequence number lies outside the current window is dropped.
> The rationale is:
>
> The reason for discarding requests silently is that the server
> is unable to determine if the duplicate or out of range request
> was due to a sequencing problem in the client, network, or the
> operating system, or due to some quirk in routing, or a replay
> attack by an intruder. Discarding the request allows the client
> to recover after timing out, if indeed the duplication was
> unintentional or well intended.
>
> However, clients may rely on the server dropping the connection to
> indicate that a retransmit is needed. Without a connection reset, a
> client can wait forever without retransmitting, and the workload
> just stops dead. I've reproduced this behavior by running xfstests
> generic/323 on an NFSv4.0 mount with proto=rdma and sec=krb5i.
>
> To address this issue, have the server close the connection when it
> silently discards an incoming message due to a GSS sequence number
> problem.
>
> There are a few other places where the server will never reply.
> Change those spots in a similar fashion.
>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> net/sunrpc/auth_gss/svcauth_gss.c | 2 +-
> net/sunrpc/svc.c | 14 +++++++++-----
> 2 files changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
> index 45662d7..886e9d38 100644
> --- a/net/sunrpc/auth_gss/svcauth_gss.c
> +++ b/net/sunrpc/auth_gss/svcauth_gss.c
> @@ -1548,7 +1548,7 @@ static void destroy_use_gss_proxy_proc_entry(struct net *net) {}
> ret = SVC_COMPLETE;
> goto out;
> drop:
> - ret = SVC_DROP;
> + ret = SVC_CLOSE;
> out:
> if (rsci)
> cache_put(&rsci->h, sn->rsc_cache);
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 7c8070e..75f290b 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -1155,8 +1155,7 @@ static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ..
> case SVC_DENIED:
> goto err_bad_auth;
> case SVC_CLOSE:
> - if (test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
> - svc_close_xprt(rqstp->rq_xprt);
> + goto close;
> case SVC_DROP:
> goto dropit;
> case SVC_COMPLETE:
> @@ -1246,7 +1245,7 @@ static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ..
>
> sendit:
> if (svc_authorise(rqstp))
> - goto dropit;
> + goto close;
> return 1; /* Caller can now send it */
>
> dropit:
> @@ -1254,11 +1253,16 @@ static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ..
> dprintk("svc: svc_process dropit\n");
> return 0;
>
> + close:
> + if (test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
> + svc_close_xprt(rqstp->rq_xprt);
> + dprintk("svc: svc_process close\n");
> + return 0;
> +
> err_short_len:
> svc_printk(rqstp, "short len %Zd, dropping request\n",
> argv->iov_len);
> -
> - goto dropit; /* drop request */
> + goto close;
>
> err_bad_rpc:
> serv->sv_stats->rpcbadfmt++;