LinuxLists.cc - [PATCH v2 00/13] NFS/RDMA client-side patches proposed for v4.13

2017-06-08 15:51:42

Subject: [PATCH v2 00/13] NFS/RDMA client-side patches proposed for v4.13

This series contains a major fix for client-side NFS/RDMA.

When a signal fires, it's possible for the server's RPC reply to
race with the client code that terminates the RPC after a signal.
The result is both of these code paths try to invalidate the same
MR at the same time.

Because FRWR invalidation is typically fast, it's nearly impossible
to hit the race with FRWR. However, FMR invalidation happens at
about the same speed as the NFS server responds, so it's more likely
to hit this window when using FMR. FMR is also more sensitive to
concurrent operations on the same MR, which can result in a kernel
crash or an HCA firmware reset.

As part of closing the signal race window, the reply handler is
restructured and several error recovery paths in the invalidation
code are fixed.

In addition to this fix, there are a pair of small but important
changes to make NFSv4.1 Transparent State Migration work. These
enable basic migration test cases to pass successfully.

Available in the "nfs-rdma-for-4.13" topic branch of this git repo:

git://git.linux-nfs.org/projects/cel/cel-2.6.git

Or for browsing:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfs-rdma-for-4.13

Changes since v1:
- Moved back-portable fixes to top of series
- Split fixes for CONFIRMED_R and contrived slot into separate patches
- Tested and reworked fixes for CONFIRMED_R and contrived slot number

---

Chuck Lever (13):
xprtrdma: On invalidation failure, remove MWs from rl_registered
xprtrdma: Pre-mark remotely invalidated MRs
xprtrdma: Pass only the list of registered MRs to ro_unmap_sync
xprtrdma: Rename rpcrdma_req::rl_free
xprtrdma: Fix client lock-up after application signal fires
xprtrdma: Fix FRWR invalidation error recovery
xprtrdma: Don't defer MR recovery if ro_map fails
NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration
NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration
xprtrdma: Demote "connect" log messages
xprtrdma: FMR does not need list_del_init()
xprtrdma: Replace PAGE_MASK with offset_in_page()
xprtrdma: Fix documenting comments in frwr_ops.c

fs/nfs/nfs4client.c | 5 ++
fs/nfs/nfs4proc.c | 7 +-
fs/nfs/nfs4state.c | 16 +++--
include/linux/nfs_fs_sb.h | 2 +
net/sunrpc/xprtrdma/fmr_ops.c | 47 ++++++++-------
net/sunrpc/xprtrdma/frwr_ops.c | 69 ++++++++++------------
net/sunrpc/xprtrdma/rpc_rdma.c | 125 +++++++++++++++++++++++++--------------
net/sunrpc/xprtrdma/transport.c | 3 +
net/sunrpc/xprtrdma/verbs.c | 55 ++++-------------
net/sunrpc/xprtrdma/xprt_rdma.h | 40 ++++++++++++
10 files changed, 215 insertions(+), 154 deletions(-)

--
Chuck Lever

2017-06-08 15:51:49

by Chuck Lever III

[permalink] [raw]

Subject: [PATCH v2 01/13] xprtrdma: On invalidation failure, remove MWs from rl_registered

Callers assume the ro_unmap_sync and ro_unmap_safe methods empty
the list of registered MRs. Ensure that all paths through
fmr_op_unmap_sync() remove MWs from that list.

Fixes: 9d6b04097882 ("xprtrdma: Place registered MWs on a ... ")
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 59e6402..21f3cd5 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -295,6 +295,7 @@ enum {
pr_err("rpcrdma: ib_unmap_fmr failed (%i)\n", rc);

list_for_each_entry_safe(mw, tmp, &req->rl_registered, mw_list) {
+ list_del_init(&mw->mw_list);
list_del_init(&mw->fmr.fm_mr->list);
fmr_op_recover_mr(mw);
}

2017-06-08 15:51:58

by Chuck Lever III

[permalink] [raw]

Subject: [PATCH v2 02/13] xprtrdma: Pre-mark remotely invalidated MRs

There are rare cases where an rpcrdma_req and its matched
rpcrdma_rep can be re-used, via rpcrdma_buffer_put, while the RPC
reply handler is still using that req. This is typically due to a
signal firing at just the wrong instant.

As part of closing this race window, avoid using the wrong
rpcrdma_rep to detect remotely invalidated MRs. Mark MRs as
invalidated while we are sure the rep is still OK to use.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/frwr_ops.c | 4 +---
net/sunrpc/xprtrdma/rpc_rdma.c | 22 ++++++++++++++++++++--
net/sunrpc/xprtrdma/verbs.c | 1 +
net/sunrpc/xprtrdma/xprt_rdma.h | 6 ++++++
4 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index f81dd93..31290cb 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -464,7 +464,6 @@
frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
{
struct ib_send_wr *first, **prev, *last, *bad_wr;
- struct rpcrdma_rep *rep = req->rl_reply;
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_frmr *f;
struct rpcrdma_mw *mw;
@@ -483,8 +482,7 @@
list_for_each_entry(mw, &req->rl_registered, mw_list) {
mw->frmr.fr_state = FRMR_IS_INVALID;

- if ((rep->rr_wc_flags & IB_WC_WITH_INVALIDATE) &&
- (mw->mw_handle == rep->rr_inv_rkey))
+ if (mw->mw_flags & RPCRDMA_MW_F_RI)
continue;

f = &mw->frmr;
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 694e9b1..2356a63 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -928,6 +928,24 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
return fixup_copy_count;
}

+/* Caller must guarantee @rep remains stable during this call.
+ */
+static void
+rpcrdma_mark_remote_invalidation(struct list_head *mws,
+ struct rpcrdma_rep *rep)
+{
+ struct rpcrdma_mw *mw;
+
+ if (!(rep->rr_wc_flags & IB_WC_WITH_INVALIDATE))
+ return;
+
+ list_for_each_entry(mw, mws, mw_list)
+ if (mw->mw_handle == rep->rr_inv_rkey) {
+ mw->mw_flags = RPCRDMA_MW_F_RI;
+ break; /* only one invalidated MR per RPC */
+ }
+}
+
#if defined(CONFIG_SUNRPC_BACKCHANNEL)
/* By convention, backchannel calls arrive via rdma_msg type
* messages, and never populate the chunk lists. This makes
@@ -1006,13 +1024,13 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
/* Sanity checking has passed. We are now committed
* to complete this transaction.
*/
+ rpcrdma_mark_remote_invalidation(&req->rl_registered, rep);
list_del_init(&rqst->rq_list);
+ req->rl_reply = rep;
spin_unlock_bh(&xprt->transport_lock);
dprintk("RPC: %s: reply %p completes request %p (xid 0x%08x)\n",
__func__, rep, req, be32_to_cpu(headerp->rm_xid));

- /* from here on, the reply is no longer an orphan */
- req->rl_reply = rep;
xprt->reestablish_timeout = 0;

if (headerp->rm_vers != rpcrdma_version)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 3dbce9a..a8be66d 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1187,6 +1187,7 @@ struct rpcrdma_mw *

if (!mw)
goto out_nomws;
+ mw->mw_flags = 0;
return mw;

out_nomws:
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 1d66acf..2e02733 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -271,6 +271,7 @@ struct rpcrdma_mw {
struct scatterlist *mw_sg;
int mw_nents;
enum dma_data_direction mw_dir;
+ unsigned long mw_flags;
union {
struct rpcrdma_fmr fmr;
struct rpcrdma_frmr frmr;
@@ -282,6 +283,11 @@ struct rpcrdma_mw {
struct list_head mw_all;
};

+/* mw_flags */
+enum {
+ RPCRDMA_MW_F_RI = 1,
+};
+
/*
* struct rpcrdma_req -- structure central to the request/reply sequence.
*

2017-06-08 15:52:06

by Chuck Lever III

[permalink] [raw]

Subject: [PATCH v2 03/13] xprtrdma: Pass only the list of registered MRs to ro_unmap_sync

There are rare cases where an rpcrdma_req can be re-used (via
rpcrdma_buffer_put) while the RPC reply handler is still running.
This is due to a signal firing at just the wrong instant.

Since commit 9d6b04097882 ("xprtrdma: Place registered MWs on a
per-req list"), rpcrdma_mws are self-contained; ie., they fully
describe an MR and scatterlist, and no part of that information is
stored in struct rpcrdma_req.

As part of closing the above race window, pass only the req's list
of registered MRs to ro_unmap_sync, rather than the rpcrdma_req
itself.

Some extra transport header sanity checking is removed. Since the
client depends on its own recollection of what memory had been
registered, there doesn't seem to be a way to abuse this change.

And, the check was not terribly effective. If the client had sent
Read chunks, the "list_empty" test is negative in both of the
removed cases, which are actually looking for Write or Reply
chunks.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 16 +++++++++-------
net/sunrpc/xprtrdma/frwr_ops.c | 19 +++++++++----------
net/sunrpc/xprtrdma/rpc_rdma.c | 16 +++++++---------
net/sunrpc/xprtrdma/xprt_rdma.h | 2 +-
4 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 21f3cd5..5556ed9 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -255,24 +255,26 @@ enum {
* Sleeps until it is safe for the host CPU to access the
* previously mapped memory regions.
*
- * Caller ensures that req->rl_registered is not empty.
+ * Caller ensures that @mws is not empty before the call. This
+ * function empties the list.
*/
static void
-fmr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
+fmr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct list_head *mws)
{
struct rpcrdma_mw *mw, *tmp;
LIST_HEAD(unmap_list);
int rc;

- dprintk("RPC: %s: req %p\n", __func__, req);
-
/* ORDER: Invalidate all of the req's MRs first
*
* ib_unmap_fmr() is slow, so use a single call instead
* of one call per mapped FMR.
*/
- list_for_each_entry(mw, &req->rl_registered, mw_list)
+ list_for_each_entry(mw, mws, mw_list) {
+ dprintk("RPC: %s: unmapping fmr %p\n",
+ __func__, &mw->fmr);
list_add_tail(&mw->fmr.fm_mr->list, &unmap_list);
+ }
r_xprt->rx_stats.local_inv_needed++;
rc = ib_unmap_fmr(&unmap_list);
if (rc)
@@ -281,7 +283,7 @@ enum {
/* ORDER: Now DMA unmap all of the req's MRs, and return
* them to the free MW list.
*/
- list_for_each_entry_safe(mw, tmp, &req->rl_registered, mw_list) {
+ list_for_each_entry_safe(mw, tmp, mws, mw_list) {
list_del_init(&mw->mw_list);
list_del_init(&mw->fmr.fm_mr->list);
ib_dma_unmap_sg(r_xprt->rx_ia.ri_device,
@@ -294,7 +296,7 @@ enum {
out_reset:
pr_err("rpcrdma: ib_unmap_fmr failed (%i)\n", rc);

- list_for_each_entry_safe(mw, tmp, &req->rl_registered, mw_list) {
+ list_for_each_entry_safe(mw, tmp, mws, mw_list) {
list_del_init(&mw->mw_list);
list_del_init(&mw->fmr.fm_mr->list);
fmr_op_recover_mr(mw);
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 31290cb..97f9f85 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -458,10 +458,11 @@
* Sleeps until it is safe for the host CPU to access the
* previously mapped memory regions.
*
- * Caller ensures that req->rl_registered is not empty.
+ * Caller ensures that @mws is not empty before the call. This
+ * function empties the list.
*/
static void
-frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
+frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct list_head *mws)
{
struct ib_send_wr *first, **prev, *last, *bad_wr;
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
@@ -469,9 +470,7 @@
struct rpcrdma_mw *mw;
int count, rc;

- dprintk("RPC: %s: req %p\n", __func__, req);
-
- /* ORDER: Invalidate all of the req's MRs first
+ /* ORDER: Invalidate all of the MRs first
*
* Chain the LOCAL_INV Work Requests and post them with
* a single ib_post_send() call.
@@ -479,7 +478,7 @@
f = NULL;
count = 0;
prev = &first;
- list_for_each_entry(mw, &req->rl_registered, mw_list) {
+ list_for_each_entry(mw, mws, mw_list) {
mw->frmr.fr_state = FRMR_IS_INVALID;

if (mw->mw_flags & RPCRDMA_MW_F_RI)
@@ -528,12 +527,12 @@

wait_for_completion(&f->fr_linv_done);

- /* ORDER: Now DMA unmap all of the req's MRs, and return
+ /* ORDER: Now DMA unmap all of the MRs, and return
* them to the free MW list.
*/
unmap:
- while (!list_empty(&req->rl_registered)) {
- mw = rpcrdma_pop_mw(&req->rl_registered);
+ while (!list_empty(mws)) {
+ mw = rpcrdma_pop_mw(mws);
dprintk("RPC: %s: DMA unmapping frmr %p\n",
__func__, &mw->frmr);
ib_dma_unmap_sg(ia->ri_device,
@@ -549,7 +548,7 @@
/* Find and reset the MRs in the LOCAL_INV WRs that did not
* get posted. This is synchronous, and slow.
*/
- list_for_each_entry(mw, &req->rl_registered, mw_list) {
+ list_for_each_entry(mw, mws, mw_list) {
f = &mw->frmr;
if (mw->mw_handle == bad_wr->ex.invalidate_rkey) {
__frwr_reset_mr(ia, mw);
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 2356a63..c88132d 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -995,6 +995,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
__be32 *iptr;
int rdmalen, status, rmerr;
unsigned long cwnd;
+ struct list_head mws;

dprintk("RPC: %s: incoming rep %p\n", __func__, rep);

@@ -1024,7 +1025,8 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
/* Sanity checking has passed. We are now committed
* to complete this transaction.
*/
- rpcrdma_mark_remote_invalidation(&req->rl_registered, rep);
+ list_replace_init(&req->rl_registered, &mws);
+ rpcrdma_mark_remote_invalidation(&mws, rep);
list_del_init(&rqst->rq_list);
req->rl_reply = rep;
spin_unlock_bh(&xprt->transport_lock);
@@ -1042,12 +1044,9 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
case rdma_msg:
/* never expect read chunks */
/* never expect reply chunks (two ways to check) */
- /* never expect write chunks without having offered RDMA */
if (headerp->rm_body.rm_chunks[0] != xdr_zero ||
(headerp->rm_body.rm_chunks[1] == xdr_zero &&
- headerp->rm_body.rm_chunks[2] != xdr_zero) ||
- (headerp->rm_body.rm_chunks[1] != xdr_zero &&
- list_empty(&req->rl_registered)))
+ headerp->rm_body.rm_chunks[2] != xdr_zero))
goto badheader;
if (headerp->rm_body.rm_chunks[1] != xdr_zero) {
/* count any expected write chunks in read reply */
@@ -1084,8 +1083,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
/* never expect read or write chunks, always reply chunks */
if (headerp->rm_body.rm_chunks[0] != xdr_zero ||
headerp->rm_body.rm_chunks[1] != xdr_zero ||
- headerp->rm_body.rm_chunks[2] != xdr_one ||
- list_empty(&req->rl_registered))
+ headerp->rm_body.rm_chunks[2] != xdr_one)
goto badheader;
iptr = (__be32 *)((unsigned char *)headerp +
RPCRDMA_HDRLEN_MIN);
@@ -1118,8 +1116,8 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
* control: waking the next RPC waits until this RPC has
* relinquished all its Send Queue entries.
*/
- if (!list_empty(&req->rl_registered))
- r_xprt->rx_ia.ri_ops->ro_unmap_sync(r_xprt, req);
+ if (!list_empty(&mws))
+ r_xprt->rx_ia.ri_ops->ro_unmap_sync(r_xprt, &mws);

spin_lock_bh(&xprt->transport_lock);
cwnd = xprt->cwnd;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 2e02733..1c23117 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -467,7 +467,7 @@ struct rpcrdma_memreg_ops {
struct rpcrdma_mr_seg *, int, bool,
struct rpcrdma_mw **);
void (*ro_unmap_sync)(struct rpcrdma_xprt *,
- struct rpcrdma_req *);
+ struct list_head *);
void (*ro_unmap_safe)(struct rpcrdma_xprt *,
struct rpcrdma_req *, bool);
void (*ro_recover_mr)(struct rpcrdma_mw *);

2017-06-08 15:52:14

by Chuck Lever III

[permalink] [raw]

Subject: [PATCH v2 04/13] xprtrdma: Rename rpcrdma_req::rl_free

Clean up: I'm about to use the rl_free field for purposes other than
a free list. So use a more generic name.

This is a refactoring change only.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/verbs.c | 9 ++++-----
net/sunrpc/xprtrdma/xprt_rdma.h | 2 +-
2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index a8be66d..df72224 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -971,7 +971,6 @@ struct rpcrdma_req *
if (req == NULL)
return ERR_PTR(-ENOMEM);

- INIT_LIST_HEAD(&req->rl_free);
spin_lock(&buffer->rb_reqslock);
list_add(&req->rl_all, &buffer->rb_allreqs);
spin_unlock(&buffer->rb_reqslock);
@@ -1055,7 +1054,7 @@ struct rpcrdma_rep *
goto out;
}
req->rl_backchannel = false;
- list_add(&req->rl_free, &buf->rb_send_bufs);
+ list_add(&req->rl_list, &buf->rb_send_bufs);
}

INIT_LIST_HEAD(&buf->rb_recv_bufs);
@@ -1084,8 +1083,8 @@ struct rpcrdma_rep *
struct rpcrdma_req *req;

req = list_first_entry(&buf->rb_send_bufs,
- struct rpcrdma_req, rl_free);
- list_del(&req->rl_free);
+ struct rpcrdma_req, rl_list);
+ list_del(&req->rl_list);
return req;
}

@@ -1268,7 +1267,7 @@ struct rpcrdma_req *

spin_lock(&buffers->rb_lock);
buffers->rb_send_count--;
- list_add_tail(&req->rl_free, &buffers->rb_send_bufs);
+ list_add_tail(&req->rl_list, &buffers->rb_send_bufs);
if (rep) {
buffers->rb_recv_count--;
list_add_tail(&rep->rr_list, &buffers->rb_recv_bufs);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 1c23117..ad918c8 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -340,7 +340,7 @@ enum {

struct rpcrdma_buffer;
struct rpcrdma_req {
- struct list_head rl_free;
+ struct list_head rl_list;
unsigned int rl_mapped_sges;
unsigned int rl_connect_cookie;
struct rpcrdma_buffer *rl_buffer;

2017-06-08 15:52:27

by Chuck Lever III

[permalink] [raw]

Subject: [PATCH v2 05/13] xprtrdma: Fix client lock-up after application signal fires

After a signal, the RPC client aborts synchronous RPCs running on
behalf of the signaled application.

The server is still executing those RPCs, and will write the results
back into the client's memory when it's done. By the time the server
writes the results, that memory is likely being used for other
purposes. Therefore xprtrdma has to immediately invalidate all
memory regions used by those aborted RPCs to prevent the server's
writes from clobbering that re-used memory.

With FMR memory registration, invalidation takes a relatively long
time. In fact, the invalidation is often still running when the
server tries to write the results into the memory regions that are
being invalidated.

This sets up a race between two processes:

1. After the signal, xprt_rdma_free calls ro_unmap_safe.
2. While ro_unmap_safe is still running, the server replies and
rpcrdma_reply_handler runs, calling ro_unmap_sync.

Both processes invoke ib_unmap_fmr on the same FMR.

The mlx4 driver allows two ib_unmap_fmr calls on the same FMR at
the same time, but HCAs generally don't tolerate this. Sometimes
this can result in a system crash.

If the HCA happens to survive, rpcrdma_reply_handler continues. It
removes the rpc_rqst from rq_list and releases the transport_lock.
This enables xprt_rdma_free to run in another process, and the
rpc_rqst is released while rpcrdma_reply_handler is still waiting
for the ib_unmap_fmr call to finish.

But further down in rpcrdma_reply_handler, the transport_lock is
taken again, and "rqst" is dereferenced. If "rqst" has already been
released, this triggers a general protection fault. Since bottom-
halves are disabled, the system locks up.

Address both issues by reversing the order of the xprt_lookup_rqst
call and the ro_unmap_sync call. Introduce a separate lookup
mechanism for rpcrdma_req's to enable calling ro_unmap_sync before
xprt_lookup_rqst. Now the handler takes the transport_lock once
and holds it for the XID lookup and RPC completion.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 79 +++++++++++++++++++++++++--------------
net/sunrpc/xprtrdma/transport.c | 3 +
net/sunrpc/xprtrdma/verbs.c | 3 +
net/sunrpc/xprtrdma/xprt_rdma.h | 30 +++++++++++++++
4 files changed, 84 insertions(+), 31 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index c88132d..b6584ae 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -734,6 +734,9 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
rpclen = 0;
}

+ req->rl_xid = rqst->rq_xid;
+ rpcrdma_insert_req(&r_xprt->rx_buf, req);
+
/* This implementation supports the following combinations
* of chunk lists in one RPC-over-RDMA Call message:
*
@@ -987,11 +990,12 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
{
struct rpcrdma_rep *rep =
container_of(work, struct rpcrdma_rep, rr_work);
+ struct rpcrdma_xprt *r_xprt = rep->rr_rxprt;
+ struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
+ struct rpc_xprt *xprt = &r_xprt->rx_xprt;
struct rpcrdma_msg *headerp;
struct rpcrdma_req *req;
struct rpc_rqst *rqst;
- struct rpcrdma_xprt *r_xprt = rep->rr_rxprt;
- struct rpc_xprt *xprt = &r_xprt->rx_xprt;
__be32 *iptr;
int rdmalen, status, rmerr;
unsigned long cwnd;
@@ -1013,28 +1017,45 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
/* Match incoming rpcrdma_rep to an rpcrdma_req to
* get context for handling any incoming chunks.
*/
- spin_lock_bh(&xprt->transport_lock);
- rqst = xprt_lookup_rqst(xprt, headerp->rm_xid);
- if (!rqst)
+ spin_lock(&buf->rb_lock);
+ req = rpcrdma_lookup_req_locked(&r_xprt->rx_buf,
+ headerp->rm_xid);
+ if (!req)
goto out_nomatch;
-
- req = rpcr_to_rdmar(rqst);
if (req->rl_reply)
goto out_duplicate;

- /* Sanity checking has passed. We are now committed
- * to complete this transaction.
- */
list_replace_init(&req->rl_registered, &mws);
rpcrdma_mark_remote_invalidation(&mws, rep);
- list_del_init(&rqst->rq_list);
+
+ /* Avoid races with signals and duplicate replies
+ * by marking this req as matched.
+ */
req->rl_reply = rep;
- spin_unlock_bh(&xprt->transport_lock);
+ spin_unlock(&buf->rb_lock);
+
dprintk("RPC: %s: reply %p completes request %p (xid 0x%08x)\n",
__func__, rep, req, be32_to_cpu(headerp->rm_xid));

- xprt->reestablish_timeout = 0;
+ /* Invalidate and unmap the data payloads before waking the
+ * waiting application. This guarantees the memory regions
+ * are properly fenced from the server before the application
+ * accesses the data. It also ensures proper send flow control:
+ * waking the next RPC waits until this RPC has relinquished
+ * all its Send Queue entries.
+ */
+ if (!list_empty(&mws))
+ r_xprt->rx_ia.ri_ops->ro_unmap_sync(r_xprt, &mws);

+ /* Perform XID lookup, reconstruction of the RPC reply, and
+ * RPC completion while holding the transport lock to ensure
+ * the rep, rqst, and rq_task pointers remain stable.
+ */
+ spin_lock_bh(&xprt->transport_lock);
+ rqst = xprt_lookup_rqst(xprt, headerp->rm_xid);
+ if (!rqst)
+ goto out_norqst;
+ xprt->reestablish_timeout = 0;
if (headerp->rm_vers != rpcrdma_version)
goto out_badversion;

@@ -1109,17 +1130,6 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
}

out:
- /* Invalidate and flush the data payloads before waking the
- * waiting application. This guarantees the memory region is
- * properly fenced from the server before the application
- * accesses the data. It also ensures proper send flow
- * control: waking the next RPC waits until this RPC has
- * relinquished all its Send Queue entries.
- */
- if (!list_empty(&mws))
- r_xprt->rx_ia.ri_ops->ro_unmap_sync(r_xprt, &mws);
-
- spin_lock_bh(&xprt->transport_lock);
cwnd = xprt->cwnd;
xprt->cwnd = atomic_read(&r_xprt->rx_buf.rb_credits) << RPC_CWNDSHIFT;
if (xprt->cwnd > cwnd)
@@ -1128,7 +1138,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
xprt_complete_rqst(rqst->rq_task, status);
spin_unlock_bh(&xprt->transport_lock);
dprintk("RPC: %s: xprt_complete_rqst(0x%p, 0x%p, %d)\n",
- __func__, xprt, rqst, status);
+ __func__, xprt, rqst, status);
return;

out_badstatus:
@@ -1177,26 +1187,37 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
r_xprt->rx_stats.bad_reply_count++;
goto out;

-/* If no pending RPC transaction was matched, post a replacement
- * receive buffer before returning.
+/* The req was still available, but by the time the transport_lock
+ * was acquired, the rqst and task had been released. Thus the RPC
+ * has already been terminated.
*/
+out_norqst:
+ spin_unlock_bh(&xprt->transport_lock);
+ rpcrdma_buffer_put(req);
+ dprintk("RPC: %s: race, no rqst left for req %p\n",
+ __func__, req);
+ return;
+
out_shortreply:
dprintk("RPC: %s: short/invalid reply\n", __func__);
goto repost;

out_nomatch:
- spin_unlock_bh(&xprt->transport_lock);
+ spin_unlock(&buf->rb_lock);
dprintk("RPC: %s: no match for incoming xid 0x%08x len %d\n",
__func__, be32_to_cpu(headerp->rm_xid),
rep->rr_len);
goto repost;

out_duplicate:
- spin_unlock_bh(&xprt->transport_lock);
+ spin_unlock(&buf->rb_lock);
dprintk("RPC: %s: "
"duplicate reply %p to RPC request %p: xid 0x%08x\n",
__func__, rep, req, be32_to_cpu(headerp->rm_xid));

+/* If no pending RPC transaction was matched, post a replacement
+ * receive buffer before returning.
+ */
repost:
r_xprt->rx_stats.bad_reply_count++;
if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, rep))
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 62ecbcc..d1c458e 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -684,7 +684,8 @@

dprintk("RPC: %s: called on 0x%p\n", __func__, req->rl_reply);

- if (unlikely(!list_empty(&req->rl_registered)))
+ rpcrdma_remove_req(&r_xprt->rx_buf, req);
+ if (!list_empty(&req->rl_registered))
ia->ri_ops->ro_unmap_safe(r_xprt, req, !RPC_IS_ASYNC(task));
rpcrdma_unmap_sges(ia, req);
rpcrdma_buffer_put(req);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index df72224..a215a87 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1032,6 +1032,7 @@ struct rpcrdma_rep *
spin_lock_init(&buf->rb_recovery_lock);
INIT_LIST_HEAD(&buf->rb_mws);
INIT_LIST_HEAD(&buf->rb_all);
+ INIT_LIST_HEAD(&buf->rb_pending);
INIT_LIST_HEAD(&buf->rb_stale_mrs);
INIT_DELAYED_WORK(&buf->rb_refresh_worker,
rpcrdma_mr_refresh_worker);
@@ -1084,7 +1085,7 @@ struct rpcrdma_rep *

req = list_first_entry(&buf->rb_send_bufs,
struct rpcrdma_req, rl_list);
- list_del(&req->rl_list);
+ list_del_init(&req->rl_list);
return req;
}

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index ad918c8..b282d3f 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -341,6 +341,7 @@ enum {
struct rpcrdma_buffer;
struct rpcrdma_req {
struct list_head rl_list;
+ __be32 rl_xid;
unsigned int rl_mapped_sges;
unsigned int rl_connect_cookie;
struct rpcrdma_buffer *rl_buffer;
@@ -402,6 +403,7 @@ struct rpcrdma_buffer {
int rb_send_count, rb_recv_count;
struct list_head rb_send_bufs;
struct list_head rb_recv_bufs;
+ struct list_head rb_pending;
u32 rb_max_requests;
atomic_t rb_credits; /* most recent credit grant */

@@ -550,6 +552,34 @@ int rpcrdma_ep_post(struct rpcrdma_ia *, struct rpcrdma_ep *,
int rpcrdma_buffer_create(struct rpcrdma_xprt *);
void rpcrdma_buffer_destroy(struct rpcrdma_buffer *);

+static inline void
+rpcrdma_insert_req(struct rpcrdma_buffer *buffers, struct rpcrdma_req *req)
+{
+ spin_lock(&buffers->rb_lock);
+ if (list_empty(&req->rl_list))
+ list_add_tail(&req->rl_list, &buffers->rb_pending);
+ spin_unlock(&buffers->rb_lock);
+}
+
+static inline struct rpcrdma_req *
+rpcrdma_lookup_req_locked(struct rpcrdma_buffer *buffers, __be32 xid)
+{
+ struct rpcrdma_req *pos;
+
+ list_for_each_entry(pos, &buffers->rb_pending, rl_list)
+ if (pos->rl_xid == xid)
+ return pos;
+ return NULL;
+}
+
+static inline void
+rpcrdma_remove_req(struct rpcrdma_buffer *buffers, struct rpcrdma_req *req)
+{
+ spin_lock(&buffers->rb_lock);
+ list_del(&req->rl_list);
+ spin_unlock(&buffers->rb_lock);
+}
+
struct rpcrdma_mw *rpcrdma_get_mw(struct rpcrdma_xprt *);
void rpcrdma_put_mw(struct rpcrdma_xprt *, struct rpcrdma_mw *);
struct rpcrdma_req *rpcrdma_buffer_get(struct rpcrdma_buffer *);

2017-06-08 15:52:30

by Chuck Lever III

[permalink] [raw]

Subject: [PATCH v2 06/13] xprtrdma: Fix FRWR invalidation error recovery

When ib_post_send() fails, all LOCAL_INV WRs past @bad_wr have to be
examined, and the MRs reset by hand.

I'm not sure how the existing code can work by comparing R_keys.
Restructure the logic so that instead it walks the chain of WRs,
starting from the first bad one.

Make sure to wait for completion if at least one WR was actually
posted. Otherwise, if the ib_post_send fails, we can end up
DMA-unmapping the MR while LOCAL_INV operations are in flight.

Commit 7a89f9c626e3 ("xprtrdma: Honor ->send_request API contract")
added the rdma_disconnect() call site. The disconnect actually
causes more problems than it solves, and SQ overruns happen only as
a result of software bugs. So remove it.

Fixes: d7a21c1bed54 ("xprtrdma: Reset MRs in frwr_op_unmap_sync()")
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/frwr_ops.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 97f9f85..24631e0 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -521,12 +521,13 @@
* unless ri_id->qp is a valid pointer.
*/
r_xprt->rx_stats.local_inv_needed++;
+ bad_wr = NULL;
rc = ib_post_send(ia->ri_id->qp, first, &bad_wr);
+ if (bad_wr != first)
+ wait_for_completion(&f->fr_linv_done);
if (rc)
goto reset_mrs;

- wait_for_completion(&f->fr_linv_done);
-
/* ORDER: Now DMA unmap all of the MRs, and return
* them to the free MW list.
*/
@@ -543,17 +544,19 @@

reset_mrs:
pr_err("rpcrdma: FRMR invalidate ib_post_send returned %i\n", rc);
- rdma_disconnect(ia->ri_id);

/* Find and reset the MRs in the LOCAL_INV WRs that did not
- * get posted. This is synchronous, and slow.
+ * get posted.
*/
- list_for_each_entry(mw, mws, mw_list) {
- f = &mw->frmr;
- if (mw->mw_handle == bad_wr->ex.invalidate_rkey) {
- __frwr_reset_mr(ia, mw);
- bad_wr = bad_wr->next;
- }
+ rpcrdma_init_cqcount(&r_xprt->rx_ep, -count);
+ while (bad_wr) {
+ f = container_of(bad_wr, struct rpcrdma_frmr,
+ fr_invwr);
+ mw = container_of(f, struct rpcrdma_mw, frmr);
+
+ __frwr_reset_mr(ia, mw);
+
+ bad_wr = bad_wr->next;
}
goto unmap;
}

2017-06-08 15:52:43

by Chuck Lever III

[permalink] [raw]

Subject: [PATCH v2 07/13] xprtrdma: Don't defer MR recovery if ro_map fails

Deferred MR recovery does a DMA-unmapping of the MW. However, ro_map
invokes rpcrdma_defer_mr_recovery in some error cases where the MW
has not even been DMA-mapped yet.

Avoid a DMA-unmapping error replacing rpcrdma_defer_mr_recovery.

Also note that if ib_dma_map_sg is asked to map 0 nents, it will
return 0. So the extra "if (i == 0)" check is no longer needed.

Fixes: 42fe28f60763 ("xprtrdma: Do not leak an MW during a DMA ...")
Fixes: 505bbe64dd04 ("xprtrdma: Refactor MR recovery work queues")
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 18 +++++++++---------
net/sunrpc/xprtrdma/frwr_ops.c | 19 ++++++++-----------
2 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 5556ed9..2f4eacd 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -213,13 +213,11 @@ enum {
offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
break;
}
- mw->mw_nents = i;
mw->mw_dir = rpcrdma_data_dir(writing);
- if (i == 0)
- goto out_dmamap_err;

- if (!ib_dma_map_sg(r_xprt->rx_ia.ri_device,
- mw->mw_sg, mw->mw_nents, mw->mw_dir))
+ mw->mw_nents = ib_dma_map_sg(r_xprt->rx_ia.ri_device,
+ mw->mw_sg, i, mw->mw_dir);
+ if (!mw->mw_nents)
goto out_dmamap_err;

for (i = 0, dma_pages = mw->fmr.fm_physaddrs; i < mw->mw_nents; i++)
@@ -237,16 +235,18 @@ enum {
return mw->mw_nents;

out_dmamap_err:
- pr_err("rpcrdma: failed to dma map sg %p sg_nents %u\n",
- mw->mw_sg, mw->mw_nents);
- rpcrdma_defer_mr_recovery(mw);
+ pr_err("rpcrdma: failed to DMA map sg %p sg_nents %d\n",
+ mw->mw_sg, i);
+ rpcrdma_put_mw(r_xprt, mw);
return -EIO;

out_maperr:
pr_err("rpcrdma: ib_map_phys_fmr %u@0x%llx+%i (%d) status %i\n",
len, (unsigned long long)dma_pages[0],
pageoff, mw->mw_nents, rc);
- rpcrdma_defer_mr_recovery(mw);
+ ib_dma_unmap_sg(r_xprt->rx_ia.ri_device,
+ mw->mw_sg, mw->mw_nents, mw->mw_dir);
+ rpcrdma_put_mw(r_xprt, mw);
return -EIO;
}

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 24631e0..8f63d38 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -355,7 +355,7 @@
struct ib_mr *mr;
struct ib_reg_wr *reg_wr;
struct ib_send_wr *bad_wr;
- int rc, i, n, dma_nents;
+ int rc, i, n;
u8 key;

mw = NULL;
@@ -391,14 +391,10 @@
offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
break;
}
- mw->mw_nents = i;
mw->mw_dir = rpcrdma_data_dir(writing);
- if (i == 0)
- goto out_dmamap_err;

- dma_nents = ib_dma_map_sg(ia->ri_device,
- mw->mw_sg, mw->mw_nents, mw->mw_dir);
- if (!dma_nents)
+ mw->mw_nents = ib_dma_map_sg(ia->ri_device, mw->mw_sg, i, mw->mw_dir);
+ if (!mw->mw_nents)
goto out_dmamap_err;

n = ib_map_mr_sg(mr, mw->mw_sg, mw->mw_nents, NULL, PAGE_SIZE);
@@ -436,13 +432,14 @@
return mw->mw_nents;

out_dmamap_err:
- pr_err("rpcrdma: failed to dma map sg %p sg_nents %u\n",
- mw->mw_sg, mw->mw_nents);
- rpcrdma_defer_mr_recovery(mw);
+ pr_err("rpcrdma: failed to DMA map sg %p sg_nents %d\n",
+ mw->mw_sg, i);
+ frmr->fr_state = FRMR_IS_INVALID;
+ rpcrdma_put_mw(r_xprt, mw);
return -EIO;

out_mapmr_err:
- pr_err("rpcrdma: failed to map mr %p (%u/%u)\n",
+ pr_err("rpcrdma: failed to map mr %p (%d/%d)\n",
frmr->fr_mr, n, mw->mw_nents);
rpcrdma_defer_mr_recovery(mw);
return -EIO;

2017-06-08 15:52:51

by Chuck Lever III

[permalink] [raw]

Subject: [PATCH v2 08/13] NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration

Transparent State Migration copies a client's lease state from the
server where a filesystem used to reside to the server where it now
resides. When an NFSv4.1 client first contacts that destination
server, it uses EXCHANGE_ID to detect trunking relationships.

The lease that was copied there is returned to that client, but the
destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
the client. This is because the lease was confirmed on the source
server (before it was copied).

Normally, when CONFIRMED_R is set, a client purges the lease and
creates a new one. However, that throws away the entire benefit of
Transparent State Migration.

Therefore, the client must not purge that lease when it is possible
that Transparent State Migration has occurred.

Reported-by: Xuan Qi <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Xuan Qi <[email protected]>
---
fs/nfs/nfs4client.c | 5 +++++
fs/nfs/nfs4state.c | 16 +++++++++++-----
include/linux/nfs_fs_sb.h | 2 ++
3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 692a7a8..203d7a3 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -414,6 +414,7 @@ struct nfs_client *nfs4_init_client(struct nfs_client *clp,
if (clp != old)
clp->cl_preserve_clid = true;
nfs_put_client(clp);
+ clear_bit(NFS_CS_TSM_POSSIBLE, &clp->cl_flags);
return old;

error:
@@ -853,6 +854,8 @@ static int nfs4_set_client(struct nfs_server *server,
set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);
if (server->options & NFS_OPTION_MIGRATION)
set_bit(NFS_CS_MIGRATION, &cl_init.init_flags);
+ if (test_bit(NFS_MIG_TSM_POSSIBLE, &server->mig_status))
+ set_bit(NFS_CS_TSM_POSSIBLE, &cl_init.init_flags);

/* Allocate or find a client reference we can use */
clp = nfs_get_client(&cl_init);
@@ -1213,9 +1216,11 @@ int nfs4_update_server(struct nfs_server *server, const char *hostname,
return -EAFNOSUPPORT;

nfs_server_remove_lists(server);
+ set_bit(NFS_MIG_TSM_POSSIBLE, &server->mig_status);
error = nfs4_set_client(server, hostname, sap, salen, buf,
clp->cl_proto, clnt->cl_timeout,
clp->cl_minorversion, net);
+ clear_bit(NFS_MIG_TSM_POSSIBLE, &server->mig_status);
nfs_put_client(clp);
if (error != 0) {
nfs_server_insert_lists(server);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index b34de03..83f8e60 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -352,11 +352,17 @@ int nfs41_discover_server_trunking(struct nfs_client *clp,
if (clp != *result)
return 0;

- /* Purge state if the client id was established in a prior instance */
- if (clp->cl_exchange_flags & EXCHGID4_FLAG_CONFIRMED_R)
- set_bit(NFS4CLNT_PURGE_STATE, &clp->cl_state);
- else
- set_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
+ /*
+ * Purge state if the client id was established in a prior
+ * instance and the client id could not have arrived on the
+ * server via Transparent State Migration.
+ */
+ if (clp->cl_exchange_flags & EXCHGID4_FLAG_CONFIRMED_R) {
+ if (!test_bit(NFS_CS_TSM_POSSIBLE, &clp->cl_flags))
+ set_bit(NFS4CLNT_PURGE_STATE, &clp->cl_state);
+ else
+ set_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
+ }
nfs4_schedule_state_manager(clp);
status = nfs_wait_client_init_complete(clp);
if (status < 0)
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index e418a10..74c4466 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -42,6 +42,7 @@ struct nfs_client {
#define NFS_CS_MIGRATION 2 /* - transparent state migr */
#define NFS_CS_INFINITE_SLOTS 3 /* - don't limit TCP slots */
#define NFS_CS_NO_RETRANS_TIMEOUT 4 /* - Disable retransmit timeouts */
+#define NFS_CS_TSM_POSSIBLE 5 /* - Maybe state migration */
struct sockaddr_storage cl_addr; /* server identifier */
size_t cl_addrlen;
char * cl_hostname; /* hostname of server */
@@ -210,6 +211,7 @@ struct nfs_server {
unsigned long mig_status;
#define NFS_MIG_IN_TRANSITION (1)
#define NFS_MIG_FAILED (2)
+#define NFS_MIG_TSM_POSSIBLE (3)

void (*destroy)(struct nfs_server *);

2017-06-08 15:53:02

by Chuck Lever III

[permalink] [raw]

Subject: [PATCH v2 10/13] xprtrdma: Demote "connect" log messages

Some have complained about the log messages generated when xprtrdma
opens or closes a connection to a server. When an NFS mount is
mostly idle these can appear every few minutes as the client idles
out the connection and reconnects.

Connection and disconnection is a normal part of operation, and not
exceptional, so change these to dprintk's for now. At some point
all of these will be converted to tracepoints, but that's for
another day.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/verbs.c | 44 ++++++++-----------------------------------
1 file changed, 8 insertions(+), 36 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index a215a87..e4171f2 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -243,8 +243,6 @@
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
struct sockaddr *sap = (struct sockaddr *)&ep->rep_remote_addr;
#endif
- struct ib_qp_attr *attr = &ia->ri_qp_attr;
- struct ib_qp_init_attr *iattr = &ia->ri_qp_init_attr;
int connstate = 0;

switch (event->event) {
@@ -267,7 +265,8 @@
break;
case RDMA_CM_EVENT_DEVICE_REMOVAL:
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
- pr_info("rpcrdma: removing device for %pIS:%u\n",
+ pr_info("rpcrdma: removing device %s for %pIS:%u\n",
+ ia->ri_device->name,
sap, rpc_get_port(sap));
#endif
set_bit(RPCRDMA_IAF_REMOVING, &ia->ri_flags);
@@ -282,13 +281,6 @@
return 1;
case RDMA_CM_EVENT_ESTABLISHED:
connstate = 1;
- ib_query_qp(ia->ri_id->qp, attr,
- IB_QP_MAX_QP_RD_ATOMIC | IB_QP_MAX_DEST_RD_ATOMIC,
- iattr);
- dprintk("RPC: %s: %d responder resources"
- " (%d initiator)\n",
- __func__, attr->max_dest_rd_atomic,
- attr->max_rd_atomic);
rpcrdma_update_connect_private(xprt, &event->param.conn);
goto connected;
case RDMA_CM_EVENT_CONNECT_ERROR:
@@ -298,11 +290,9 @@
connstate = -ENETDOWN;
goto connected;
case RDMA_CM_EVENT_REJECTED:
-#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
- pr_info("rpcrdma: connection to %pIS:%u on %s rejected: %s\n",
- sap, rpc_get_port(sap), ia->ri_device->name,
+ dprintk("rpcrdma: connection to %pIS:%u rejected: %s\n",
+ sap, rpc_get_port(sap),
rdma_reject_msg(id, event->status));
-#endif
connstate = -ECONNREFUSED;
if (event->status == IB_CM_REJ_STALE_CONN)
connstate = -EAGAIN;
@@ -310,37 +300,19 @@
case RDMA_CM_EVENT_DISCONNECTED:
connstate = -ECONNABORTED;
connected:
- dprintk("RPC: %s: %sconnected\n",
- __func__, connstate > 0 ? "" : "dis");
atomic_set(&xprt->rx_buf.rb_credits, 1);
ep->rep_connected = connstate;
rpcrdma_conn_func(ep);
wake_up_all(&ep->rep_connect_wait);
/*FALLTHROUGH*/
default:
- dprintk("RPC: %s: %pIS:%u (ep 0x%p): %s\n",
- __func__, sap, rpc_get_port(sap), ep,
- rdma_event_msg(event->event));
+ dprintk("RPC: %s: %pIS:%u on %s/%s (ep 0x%p): %s\n",
+ __func__, sap, rpc_get_port(sap),
+ ia->ri_device->name, ia->ri_ops->ro_displayname,
+ ep, rdma_event_msg(event->event));
break;
}

-#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
- if (connstate == 1) {
- int ird = attr->max_dest_rd_atomic;
- int tird = ep->rep_remote_cma.responder_resources;
-
- pr_info("rpcrdma: connection to %pIS:%u on %s, memreg '%s', %d credits, %d responders%s\n",
- sap, rpc_get_port(sap),
- ia->ri_device->name,
- ia->ri_ops->ro_displayname,
- xprt->rx_buf.rb_max_requests,
- ird, ird < 4 && ird < tird / 2 ? " (low!)" : "");
- } else if (connstate < 0) {
- pr_info("rpcrdma: connection to %pIS:%u closed (%d)\n",
- sap, rpc_get_port(sap), connstate);
- }
-#endif
-
return 0;
}

2017-06-08 15:53:10

by Chuck Lever III

[permalink] [raw]

Subject: [PATCH v2 11/13] xprtrdma: FMR does not need list_del_init()

Clean up.

Commit 38f1932e60ba ("xprtrdma: Remove FMRs from the unmap list
after unmapping") utilized list_del_init() to try to prevent some
list corruption. The corruption was actually caused by the reply
handler racing with a signal. Now that MR invalidation is properly
serialized, list_del_init() can safely be replaced.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/fmr_ops.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 2f4eacd..d3f84bb 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -91,7 +91,7 @@ enum {

list_add(&mw->fmr.fm_mr->list, &l);
rc = ib_unmap_fmr(&l);
- list_del_init(&mw->fmr.fm_mr->list);
+ list_del(&mw->fmr.fm_mr->list);
return rc;
}

@@ -261,7 +261,7 @@ enum {
static void
fmr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct list_head *mws)
{
- struct rpcrdma_mw *mw, *tmp;
+ struct rpcrdma_mw *mw;
LIST_HEAD(unmap_list);
int rc;

@@ -283,9 +283,11 @@ enum {
/* ORDER: Now DMA unmap all of the req's MRs, and return
* them to the free MW list.
*/
- list_for_each_entry_safe(mw, tmp, mws, mw_list) {
- list_del_init(&mw->mw_list);
- list_del_init(&mw->fmr.fm_mr->list);
+ while (!list_empty(mws)) {
+ mw = rpcrdma_pop_mw(mws);
+ dprintk("RPC: %s: DMA unmapping fmr %p\n",
+ __func__, &mw->fmr);
+ list_del(&mw->fmr.fm_mr->list);
ib_dma_unmap_sg(r_xprt->rx_ia.ri_device,
mw->mw_sg, mw->mw_nents, mw->mw_dir);
rpcrdma_put_mw(r_xprt, mw);
@@ -296,9 +298,9 @@ enum {
out_reset:
pr_err("rpcrdma: ib_unmap_fmr failed (%i)\n", rc);

- list_for_each_entry_safe(mw, tmp, mws, mw_list) {
- list_del_init(&mw->mw_list);
- list_del_init(&mw->fmr.fm_mr->list);
+ while (!list_empty(mws)) {
+ mw = rpcrdma_pop_mw(mws);
+ list_del(&mw->fmr.fm_mr->list);
fmr_op_recover_mr(mw);
}
}

2017-06-08 15:52:54

by Chuck Lever III

[permalink] [raw]

Subject: [PATCH v2 09/13] NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration

2017-06-08 15:53:18

by Chuck Lever III

[permalink] [raw]

Subject: [PATCH v2 12/13] xprtrdma: Replace PAGE_MASK with offset_in_page()

Clean up.

Reported by: Geliang Tang <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index b6584ae..ca4d6e4 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -141,7 +141,7 @@ static bool rpcrdma_args_inline(struct rpcrdma_xprt *r_xprt,

if (xdr->page_len) {
remaining = xdr->page_len;
- offset = xdr->page_base & ~PAGE_MASK;
+ offset = offset_in_page(xdr->page_base);
count = 0;
while (remaining) {
remaining -= min_t(unsigned int,
@@ -222,7 +222,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,

len = xdrbuf->page_len;
ppages = xdrbuf->pages + (xdrbuf->page_base >> PAGE_SHIFT);
- page_base = xdrbuf->page_base & ~PAGE_MASK;
+ page_base = offset_in_page(xdrbuf->page_base);
p = 0;
while (len && n < RPCRDMA_MAX_SEGS) {
if (!ppages[p]) {
@@ -540,7 +540,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
goto out;

page = virt_to_page(xdr->tail[0].iov_base);
- page_base = (unsigned long)xdr->tail[0].iov_base & ~PAGE_MASK;
+ page_base = offset_in_page(xdr->tail[0].iov_base);

/* If the content in the page list is an odd length,
* xdr_write_pages() has added a pad at the beginning
@@ -557,7 +557,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
*/
if (xdr->page_len) {
ppages = xdr->pages + (xdr->page_base >> PAGE_SHIFT);
- page_base = xdr->page_base & ~PAGE_MASK;
+ page_base = offset_in_page(xdr->page_base);
remaining = xdr->page_len;
while (remaining) {
sge_no++;
@@ -587,7 +587,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
*/
if (xdr->tail[0].iov_len) {
page = virt_to_page(xdr->tail[0].iov_base);
- page_base = (unsigned long)xdr->tail[0].iov_base & ~PAGE_MASK;
+ page_base = offset_in_page(xdr->tail[0].iov_base);
len = xdr->tail[0].iov_len;

map_tail:
@@ -878,9 +878,9 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
srcp += curlen;
copy_len -= curlen;

- page_base = rqst->rq_rcv_buf.page_base;
- ppages = rqst->rq_rcv_buf.pages + (page_base >> PAGE_SHIFT);
- page_base &= ~PAGE_MASK;
+ ppages = rqst->rq_rcv_buf.pages +
+ (rqst->rq_rcv_buf.page_base >> PAGE_SHIFT);
+ page_base = offset_in_page(rqst->rq_rcv_buf.page_base);
fixup_copy_count = 0;
if (copy_len && rqst->rq_rcv_buf.page_len) {
int pagelist_len;

2017-06-08 15:53:26

by Chuck Lever III

[permalink] [raw]

Subject: [PATCH v2 13/13] xprtrdma: Fix documenting comments in frwr_ops.c

Clean up.

FASTREG and LOCAL_INV WRs are typically not signaled. localinv_wake
is used for the last LOCAL_INV WR in a chain, which is always
signaled. The documenting comments should reflect that.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/frwr_ops.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 8f63d38..6aea36a 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -277,7 +277,7 @@
}

/**
- * frwr_wc_fastreg - Invoked by RDMA provider for each polled FastReg WC
+ * frwr_wc_fastreg - Invoked by RDMA provider for a flushed FastReg WC
* @cq: completion queue (ignored)
* @wc: completed WR
*
@@ -298,7 +298,7 @@
}

/**
- * frwr_wc_localinv - Invoked by RDMA provider for each polled LocalInv WC
+ * frwr_wc_localinv - Invoked by RDMA provider for a flushed LocalInv WC
* @cq: completion queue (ignored)
* @wc: completed WR
*
@@ -319,7 +319,7 @@
}

/**
- * frwr_wc_localinv - Invoked by RDMA provider for each polled LocalInv WC
+ * frwr_wc_localinv_wake - Invoked by RDMA provider for a signaled LocalInv WC
* @cq: completion queue (ignored)
* @wc: completed WR
*

2017-06-09 18:59:01

by Anna Schumaker

[permalink] [raw]

Subject: Re: [PATCH v2 04/13] xprtrdma: Rename rpcrdma_req::rl_free

Hi Chuck,

On 06/08/2017 11:52 AM, Chuck Lever wrote:
> Clean up: I'm about to use the rl_free field for purposes other than
> a free list. So use a more generic name.
>
> This is a refactoring change only.
>
> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
> Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> net/sunrpc/xprtrdma/verbs.c | 9 ++++-----
> net/sunrpc/xprtrdma/xprt_rdma.h | 2 +-
> 2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index a8be66d..df72224 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -971,7 +971,6 @@ struct rpcrdma_req *
> if (req == NULL)
> return ERR_PTR(-ENOMEM);
>
> - INIT_LIST_HEAD(&req->rl_free);

Does rl_list still need to get initialized somewhere?

Thanks,
Anna

> spin_lock(&buffer->rb_reqslock);
> list_add(&req->rl_all, &buffer->rb_allreqs);
> spin_unlock(&buffer->rb_reqslock);
> @@ -1055,7 +1054,7 @@ struct rpcrdma_rep *
> goto out;
> }
> req->rl_backchannel = false;
> - list_add(&req->rl_free, &buf->rb_send_bufs);
> + list_add(&req->rl_list, &buf->rb_send_bufs);
> }
>
> INIT_LIST_HEAD(&buf->rb_recv_bufs);
> @@ -1084,8 +1083,8 @@ struct rpcrdma_rep *
> struct rpcrdma_req *req;
>
> req = list_first_entry(&buf->rb_send_bufs,
> - struct rpcrdma_req, rl_free);
> - list_del(&req->rl_free);
> + struct rpcrdma_req, rl_list);
> + list_del(&req->rl_list);
> return req;
> }
>
> @@ -1268,7 +1267,7 @@ struct rpcrdma_req *
>
> spin_lock(&buffers->rb_lock);
> buffers->rb_send_count--;
> - list_add_tail(&req->rl_free, &buffers->rb_send_bufs);
> + list_add_tail(&req->rl_list, &buffers->rb_send_bufs);
> if (rep) {
> buffers->rb_recv_count--;
> list_add_tail(&rep->rr_list, &buffers->rb_recv_bufs);
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index 1c23117..ad918c8 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -340,7 +340,7 @@ enum {
>
> struct rpcrdma_buffer;
> struct rpcrdma_req {
> - struct list_head rl_free;
> + struct list_head rl_list;
> unsigned int rl_mapped_sges;
> unsigned int rl_connect_cookie;
> struct rpcrdma_buffer *rl_buffer;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2017-06-09 19:03:49

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH v2 04/13] xprtrdma: Rename rpcrdma_req::rl_free

> On Jun 9, 2017, at 2:58 PM, Anna Schumaker <[email protected]> wrote:
>
> Hi Chuck,
>
> On 06/08/2017 11:52 AM, Chuck Lever wrote:
>> Clean up: I'm about to use the rl_free field for purposes other than
>> a free list. So use a more generic name.
>>
>> This is a refactoring change only.
>>
>> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
>> Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>> net/sunrpc/xprtrdma/verbs.c | 9 ++++-----
>> net/sunrpc/xprtrdma/xprt_rdma.h | 2 +-
>> 2 files changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index a8be66d..df72224 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -971,7 +971,6 @@ struct rpcrdma_req *
>> if (req == NULL)
>> return ERR_PTR(-ENOMEM);
>>
>> - INIT_LIST_HEAD(&req->rl_free);
>
> Does rl_list still need to get initialized somewhere?

rl_free/rl_list isn't the anchor of a list. It is only
ever a member of a list. So this INIT_LIST_HEAD is actually
unnecessary.

> Thanks,
> Anna
>
>> spin_lock(&buffer->rb_reqslock);
>> list_add(&req->rl_all, &buffer->rb_allreqs);
>> spin_unlock(&buffer->rb_reqslock);
>> @@ -1055,7 +1054,7 @@ struct rpcrdma_rep *
>> goto out;
>> }
>> req->rl_backchannel = false;
>> - list_add(&req->rl_free, &buf->rb_send_bufs);
>> + list_add(&req->rl_list, &buf->rb_send_bufs);
>> }
>>
>> INIT_LIST_HEAD(&buf->rb_recv_bufs);
>> @@ -1084,8 +1083,8 @@ struct rpcrdma_rep *
>> struct rpcrdma_req *req;
>>
>> req = list_first_entry(&buf->rb_send_bufs,
>> - struct rpcrdma_req, rl_free);
>> - list_del(&req->rl_free);
>> + struct rpcrdma_req, rl_list);
>> + list_del(&req->rl_list);
>> return req;
>> }
>>
>> @@ -1268,7 +1267,7 @@ struct rpcrdma_req *
>>
>> spin_lock(&buffers->rb_lock);
>> buffers->rb_send_count--;
>> - list_add_tail(&req->rl_free, &buffers->rb_send_bufs);
>> + list_add_tail(&req->rl_list, &buffers->rb_send_bufs);
>> if (rep) {
>> buffers->rb_recv_count--;
>> list_add_tail(&rep->rr_list, &buffers->rb_recv_bufs);
>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
>> index 1c23117..ad918c8 100644
>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>> @@ -340,7 +340,7 @@ enum {
>>
>> struct rpcrdma_buffer;
>> struct rpcrdma_req {
>> - struct list_head rl_free;
>> + struct list_head rl_list;
>> unsigned int rl_mapped_sges;
>> unsigned int rl_connect_cookie;
>> struct rpcrdma_buffer *rl_buffer;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever

2017-06-09 19:12:38

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH v2 04/13] xprtrdma: Rename rpcrdma_req::rl_free

> On Jun 9, 2017, at 3:03 PM, Chuck Lever <[email protected]> wrote:
>
>>
>> On Jun 9, 2017, at 2:58 PM, Anna Schumaker <[email protected]> wrote:
>>
>> Hi Chuck,
>>
>> On 06/08/2017 11:52 AM, Chuck Lever wrote:
>>> Clean up: I'm about to use the rl_free field for purposes other than
>>> a free list. So use a more generic name.
>>>
>>> This is a refactoring change only.
>>>
>>> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
>>> Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
>>> Signed-off-by: Chuck Lever <[email protected]>
>>> ---
>>> net/sunrpc/xprtrdma/verbs.c | 9 ++++-----
>>> net/sunrpc/xprtrdma/xprt_rdma.h | 2 +-
>>> 2 files changed, 5 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>> index a8be66d..df72224 100644
>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>> @@ -971,7 +971,6 @@ struct rpcrdma_req *
>>> if (req == NULL)
>>> return ERR_PTR(-ENOMEM);
>>>
>>> - INIT_LIST_HEAD(&req->rl_free);
>>
>> Does rl_list still need to get initialized somewhere?
>
> rl_free/rl_list isn't the anchor of a list. It is only
> ever a member of a list. So this INIT_LIST_HEAD is actually
> unnecessary.

Or, put another way: since list_empty() is not used
on this field anywhere, it shouldn't be necessary to
ensure it is initialized.

>> Thanks,
>> Anna
>>
>>> spin_lock(&buffer->rb_reqslock);
>>> list_add(&req->rl_all, &buffer->rb_allreqs);
>>> spin_unlock(&buffer->rb_reqslock);
>>> @@ -1055,7 +1054,7 @@ struct rpcrdma_rep *
>>> goto out;
>>> }
>>> req->rl_backchannel = false;
>>> - list_add(&req->rl_free, &buf->rb_send_bufs);
>>> + list_add(&req->rl_list, &buf->rb_send_bufs);
>>> }
>>>
>>> INIT_LIST_HEAD(&buf->rb_recv_bufs);
>>> @@ -1084,8 +1083,8 @@ struct rpcrdma_rep *
>>> struct rpcrdma_req *req;
>>>
>>> req = list_first_entry(&buf->rb_send_bufs,
>>> - struct rpcrdma_req, rl_free);
>>> - list_del(&req->rl_free);
>>> + struct rpcrdma_req, rl_list);
>>> + list_del(&req->rl_list);
>>> return req;
>>> }
>>>
>>> @@ -1268,7 +1267,7 @@ struct rpcrdma_req *
>>>
>>> spin_lock(&buffers->rb_lock);
>>> buffers->rb_send_count--;
>>> - list_add_tail(&req->rl_free, &buffers->rb_send_bufs);
>>> + list_add_tail(&req->rl_list, &buffers->rb_send_bufs);
>>> if (rep) {
>>> buffers->rb_recv_count--;
>>> list_add_tail(&rep->rr_list, &buffers->rb_recv_bufs);
>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> index 1c23117..ad918c8 100644
>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> @@ -340,7 +340,7 @@ enum {
>>>
>>> struct rpcrdma_buffer;
>>> struct rpcrdma_req {
>>> - struct list_head rl_free;
>>> + struct list_head rl_list;
>>> unsigned int rl_mapped_sges;
>>> unsigned int rl_connect_cookie;
>>> struct rpcrdma_buffer *rl_buffer;
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever

2017-06-09 19:15:36

by Anna Schumaker

[permalink] [raw]

Subject: Re: [PATCH v2 04/13] xprtrdma: Rename rpcrdma_req::rl_free

On 06/09/2017 03:12 PM, Chuck Lever wrote:
>
>> On Jun 9, 2017, at 3:03 PM, Chuck Lever <[email protected]> wrote:
>>
>>>
>>> On Jun 9, 2017, at 2:58 PM, Anna Schumaker <[email protected]> wrote:
>>>
>>> Hi Chuck,
>>>
>>> On 06/08/2017 11:52 AM, Chuck Lever wrote:
>>>> Clean up: I'm about to use the rl_free field for purposes other than
>>>> a free list. So use a more generic name.
>>>>
>>>> This is a refactoring change only.
>>>>
>>>> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
>>>> Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>> ---
>>>> net/sunrpc/xprtrdma/verbs.c | 9 ++++-----
>>>> net/sunrpc/xprtrdma/xprt_rdma.h | 2 +-
>>>> 2 files changed, 5 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>>> index a8be66d..df72224 100644
>>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>>> @@ -971,7 +971,6 @@ struct rpcrdma_req *
>>>> if (req == NULL)
>>>> return ERR_PTR(-ENOMEM);
>>>>
>>>> - INIT_LIST_HEAD(&req->rl_free);
>>>
>>> Does rl_list still need to get initialized somewhere?
>>
>> rl_free/rl_list isn't the anchor of a list. It is only
>> ever a member of a list. So this INIT_LIST_HEAD is actually
>> unnecessary.
>
> Or, put another way: since list_empty() is not used
> on this field anywhere, it shouldn't be necessary to
> ensure it is initialized.

Thanks for the explanation! I took a second look, and that all sounds right :)

>
>
>>> Thanks,
>>> Anna
>>>
>>>> spin_lock(&buffer->rb_reqslock);
>>>> list_add(&req->rl_all, &buffer->rb_allreqs);
>>>> spin_unlock(&buffer->rb_reqslock);
>>>> @@ -1055,7 +1054,7 @@ struct rpcrdma_rep *
>>>> goto out;
>>>> }
>>>> req->rl_backchannel = false;
>>>> - list_add(&req->rl_free, &buf->rb_send_bufs);
>>>> + list_add(&req->rl_list, &buf->rb_send_bufs);
>>>> }
>>>>
>>>> INIT_LIST_HEAD(&buf->rb_recv_bufs);
>>>> @@ -1084,8 +1083,8 @@ struct rpcrdma_rep *
>>>> struct rpcrdma_req *req;
>>>>
>>>> req = list_first_entry(&buf->rb_send_bufs,
>>>> - struct rpcrdma_req, rl_free);
>>>> - list_del(&req->rl_free);
>>>> + struct rpcrdma_req, rl_list);
>>>> + list_del(&req->rl_list);
>>>> return req;
>>>> }
>>>>
>>>> @@ -1268,7 +1267,7 @@ struct rpcrdma_req *
>>>>
>>>> spin_lock(&buffers->rb_lock);
>>>> buffers->rb_send_count--;
>>>> - list_add_tail(&req->rl_free, &buffers->rb_send_bufs);
>>>> + list_add_tail(&req->rl_list, &buffers->rb_send_bufs);
>>>> if (rep) {
>>>> buffers->rb_recv_count--;
>>>> list_add_tail(&rep->rr_list, &buffers->rb_recv_bufs);
>>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> index 1c23117..ad918c8 100644
>>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> @@ -340,7 +340,7 @@ enum {
>>>>
>>>> struct rpcrdma_buffer;
>>>> struct rpcrdma_req {
>>>> - struct list_head rl_free;
>>>> + struct list_head rl_list;
>>>> unsigned int rl_mapped_sges;
>>>> unsigned int rl_connect_cookie;
>>>> struct rpcrdma_buffer *rl_buffer;
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> Chuck Lever
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>