2018-09-17 18:31:13

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 00/44] Convert RPC client transmission to a queued model

For historical reasons, the RPC client is heavily serialised during the
process of transmitting a request by the XPRT_LOCK. A request is
required to take that lock before it can start XDR encoding, and it is
required to hold it until it is done transmitting. In essence the lock
protects the following functions:

- Stream based transport connect/reconnect
- RPCSEC_GSS encoding of the RPC message
- Transmission of a single RPC message

The following patch set assumes that we do not need to do much to
improve performance of the connect/reconnect case, as that is supposed
to be a rare occurrence.

The set looks at dealing with RPCSEC_GSS issues by removing serialisation
while encoding, and simply assuming that if we detect after grabbing the
XPRT_LOCK that we're about to transmit a message with a sequence number
that has fallen outside the window allowed by RFC2203, then we can
abort the transmission of that message, and schedule it for re-encoding.
Since window sizes are typically expected to lie above 100 messages or
so, we expect these cases where we miss the window to be rare, in
general.

We try to avoid the requirement that every request must go through the
process of being woken up to grab the XPRT_LOCK in order to transmit itself
by allowing a request that currently holds the XPRT_LOCK to grab other
requests from an ordered queue, and to transmit them too. The bulk of
the changes in this patchset are dedicated to providing this functionality.

In addition, the XPRT_LOCK queue provides some extra functionality:
- Throttling of the TCP slot allocation (as Chuck pointed out)
- Fair queuing, to ensure batch jobs don't crowd out interactive ones

The patchset does add functionality to ensure that the resulting
transmission queue is fair, and also fixes up the RPC wait queues to
ensure that they don't compromise fairness.
For now, this patchset discards the TCP slot throttling. We may still
want to throttle in the case where the connection is lost, but if we
do so, we should ensure we do not serialise all requests when in the
connected state.


The last few patches also take a new look at the client receive code
now that we have the iterator method for reading socket data into page
buffers. It converts the TCP and the UNIX stream code to using the
iterator method and performs some cleanups.

---
v2: - Address feedback by Chuck.
- Handle UDP/RDMA credits correctly
- Remove throttling of TCP slot allocations
- Minor nits
- Clean up the write_space handling
- Fair queueing
v3: - Performance improvements, bugfixes and cleanups
- Socket stream receive queue improvements

Trond Myklebust (44):
SUNRPC: Clean up initialisation of the struct rpc_rqst
SUNRPC: If there is no reply expected, bail early from call_decode
SUNRPC: The transmitted message must lie in the RPCSEC window of
validity
SUNRPC: Simplify identification of when the message send/receive is
complete
SUNRPC: Avoid holding locks across the XDR encoding of the RPC message
SUNRPC: Rename TCP receive-specific state variables
SUNRPC: Move reset of TCP state variables into the reconnect code
SUNRPC: Add socket transmit queue offset tracking
SUNRPC: Simplify dealing with aborted partially transmitted messages
SUNRPC: Refactor the transport request pinning
SUNRPC: Add a helper to wake up a sleeping rpc_task and set its status
SUNRPC: Test whether the task is queued before grabbing the queue
spinlocks
SUNRPC: Don't wake queued RPC calls multiple times in xprt_transmit
SUNRPC: Rename xprt->recv_lock to xprt->queue_lock
SUNRPC: Refactor xprt_transmit() to remove the reply queue code
SUNRPC: Refactor xprt_transmit() to remove wait for reply code
SUNRPC: Minor cleanup for call_transmit()
SUNRPC: Distinguish between the slot allocation list and receive queue
SUNRPC: Add a transmission queue for RPC requests
SUNRPC: Refactor RPC call encoding
SUNRPC: Fix up the back channel transmit
SUNRPC: Treat the task and request as separate in the
xprt_ops->send_request()
SUNRPC: Don't reset the request 'bytes_sent' counter when releasing
XPRT_LOCK
SUNRPC: Simplify xprt_prepare_transmit()
SUNRPC: Move RPC retransmission stat counter to xprt_transmit()
SUNRPC: Improve latency for interactive tasks
SUNRPC: Support for congestion control when queuing is enabled
SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queue
SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit
queue
SUNRPC: Allow soft RPC calls to time out when waiting for the
XPRT_LOCK
SUNRPC: Turn off throttling of RPC slots for TCP sockets
SUNRPC: Clean up transport write space handling
SUNRPC: Cleanup: remove the unused 'task' argument from the
request_send()
SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCK
SUNRPC: Convert xprt receive queue to use an rbtree
SUNRPC: Fix priority queue fairness
SUNRPC: Convert the xprt->sending queue back to an ordinary wait queue
SUNRPC: Add a label for RPC calls that require allocation on receive
SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter()
SUNRPC: Simplify TCP receive code by switching to using iterators
SUNRPC: Clean up - rename xs_tcp_data_receive() to
xs_stream_data_receive()
SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive
SUNRPC: Clean up xs_udp_data_receive()
SUNRPC: Unexport xdr_partial_copy_from_skb()

fs/nfs/nfs3xdr.c | 4 +-
include/linux/sunrpc/auth.h | 2 +
include/linux/sunrpc/auth_gss.h | 1 +
include/linux/sunrpc/bc_xprt.h | 1 +
include/linux/sunrpc/sched.h | 10 +-
include/linux/sunrpc/svc_xprt.h | 1 -
include/linux/sunrpc/xdr.h | 11 +-
include/linux/sunrpc/xprt.h | 35 +-
include/linux/sunrpc/xprtsock.h | 36 +-
include/trace/events/sunrpc.h | 37 +-
net/sunrpc/auth.c | 10 +
net/sunrpc/auth_gss/auth_gss.c | 41 +
net/sunrpc/auth_gss/gss_rpc_xdr.c | 1 +
net/sunrpc/backchannel_rqst.c | 1 -
net/sunrpc/clnt.c | 174 ++--
net/sunrpc/sched.c | 178 ++--
net/sunrpc/socklib.c | 10 +-
net/sunrpc/svc_xprt.c | 2 -
net/sunrpc/svcsock.c | 6 +-
net/sunrpc/xdr.c | 34 +
net/sunrpc/xprt.c | 893 ++++++++++++-----
net/sunrpc/xprtrdma/backchannel.c | 4 +-
net/sunrpc/xprtrdma/rpc_rdma.c | 12 +-
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 14 +-
net/sunrpc/xprtrdma/transport.c | 10 +-
net/sunrpc/xprtsock.c | 1060 +++++++++-----------
26 files changed, 1474 insertions(+), 1114 deletions(-)

--
2.17.1


2018-09-17 18:31:23

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 09/44] SUNRPC: Simplify dealing with aborted partially transmitted messages

If the previous message was only partially transmitted, we need to close
the socket in order to avoid corruption of the message stream. To do so,
we currently hijack the unlocking of the socket in order to schedule
the close.
Now that we track the message offset in the socket state, we can move
that kind of checking out of the socket lock code, which is needed to
allow messages to remain queued after dropping the socket lock.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xprtsock.c | 51 +++++++++++++++++++++----------------------
1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 629cc45e1e6c..3fbccebd0b10 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -491,6 +491,16 @@ static int xs_nospace(struct rpc_task *task)
return ret;
}

+/*
+ * Determine if the previous message in the stream was aborted before it
+ * could complete transmission.
+ */
+static bool
+xs_send_request_was_aborted(struct sock_xprt *transport, struct rpc_rqst *req)
+{
+ return transport->xmit.offset != 0 && req->rq_bytes_sent == 0;
+}
+
/*
* Construct a stream transport record marker in @buf.
*/
@@ -522,6 +532,12 @@ static int xs_local_send_request(struct rpc_task *task)
int status;
int sent = 0;

+ /* Close the stream if the previous transmission was incomplete */
+ if (xs_send_request_was_aborted(transport, req)) {
+ xs_close(xprt);
+ return -ENOTCONN;
+ }
+
xs_encode_stream_record_marker(&req->rq_snd_buf);

xs_pktdump("packet data:",
@@ -665,6 +681,13 @@ static int xs_tcp_send_request(struct rpc_task *task)
int status;
int sent;

+ /* Close the stream if the previous transmission was incomplete */
+ if (xs_send_request_was_aborted(transport, req)) {
+ if (transport->sock != NULL)
+ kernel_sock_shutdown(transport->sock, SHUT_RDWR);
+ return -ENOTCONN;
+ }
+
xs_encode_stream_record_marker(&req->rq_snd_buf);

xs_pktdump("packet data:",
@@ -755,30 +778,6 @@ static int xs_tcp_send_request(struct rpc_task *task)
return status;
}

-/**
- * xs_tcp_release_xprt - clean up after a tcp transmission
- * @xprt: transport
- * @task: rpc task
- *
- * This cleans up if an error causes us to abort the transmission of a request.
- * In this case, the socket may need to be reset in order to avoid confusing
- * the server.
- */
-static void xs_tcp_release_xprt(struct rpc_xprt *xprt, struct rpc_task *task)
-{
- struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
-
- if (task != xprt->snd_task)
- return;
- if (task == NULL)
- goto out_release;
- if (transport->xmit.offset == 0 || !xprt_connected(xprt))
- goto out_release;
- set_bit(XPRT_CLOSE_WAIT, &xprt->state);
-out_release:
- xprt_release_xprt(xprt, task);
-}
-
static void xs_save_old_callbacks(struct sock_xprt *transport, struct sock *sk)
{
transport->old_data_ready = sk->sk_data_ready;
@@ -2764,7 +2763,7 @@ static void bc_destroy(struct rpc_xprt *xprt)

static const struct rpc_xprt_ops xs_local_ops = {
.reserve_xprt = xprt_reserve_xprt,
- .release_xprt = xs_tcp_release_xprt,
+ .release_xprt = xprt_release_xprt,
.alloc_slot = xprt_alloc_slot,
.free_slot = xprt_free_slot,
.rpcbind = xs_local_rpcbind,
@@ -2806,7 +2805,7 @@ static const struct rpc_xprt_ops xs_udp_ops = {

static const struct rpc_xprt_ops xs_tcp_ops = {
.reserve_xprt = xprt_reserve_xprt,
- .release_xprt = xs_tcp_release_xprt,
+ .release_xprt = xprt_release_xprt,
.alloc_slot = xprt_lock_and_alloc_slot,
.free_slot = xprt_free_slot,
.rpcbind = rpcb_getport_async,
--
2.17.1

2018-09-17 18:31:17

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 03/44] SUNRPC: The transmitted message must lie in the RPCSEC window of validity

If a message has been encoded using RPCSEC_GSS, the server is
maintaining a window of sequence numbers that it considers valid.
The client should normally be tracking that window, and needs to
verify that the sequence number used by the message being transmitted
still lies inside the window of validity.

So far, we've been able to assume this condition would be realised
automatically, since the client has been encoding the message only
after taking the socket lock. Once we change that condition, we
will need the explicit check.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/auth.h | 2 ++
include/linux/sunrpc/auth_gss.h | 1 +
net/sunrpc/auth.c | 10 ++++++++
net/sunrpc/auth_gss/auth_gss.c | 41 +++++++++++++++++++++++++++++++++
net/sunrpc/clnt.c | 3 +++
net/sunrpc/xprt.c | 7 ++++++
6 files changed, 64 insertions(+)

diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h
index 58a6765c1c5e..2c97a3933ef9 100644
--- a/include/linux/sunrpc/auth.h
+++ b/include/linux/sunrpc/auth.h
@@ -157,6 +157,7 @@ struct rpc_credops {
int (*crkey_timeout)(struct rpc_cred *);
bool (*crkey_to_expire)(struct rpc_cred *);
char * (*crstringify_acceptor)(struct rpc_cred *);
+ bool (*crneed_reencode)(struct rpc_task *);
};

extern const struct rpc_authops authunix_ops;
@@ -192,6 +193,7 @@ __be32 * rpcauth_marshcred(struct rpc_task *, __be32 *);
__be32 * rpcauth_checkverf(struct rpc_task *, __be32 *);
int rpcauth_wrap_req(struct rpc_task *task, kxdreproc_t encode, void *rqstp, __be32 *data, void *obj);
int rpcauth_unwrap_resp(struct rpc_task *task, kxdrdproc_t decode, void *rqstp, __be32 *data, void *obj);
+bool rpcauth_xmit_need_reencode(struct rpc_task *task);
int rpcauth_refreshcred(struct rpc_task *);
void rpcauth_invalcred(struct rpc_task *);
int rpcauth_uptodatecred(struct rpc_task *);
diff --git a/include/linux/sunrpc/auth_gss.h b/include/linux/sunrpc/auth_gss.h
index 0c9eac351aab..30427b729070 100644
--- a/include/linux/sunrpc/auth_gss.h
+++ b/include/linux/sunrpc/auth_gss.h
@@ -70,6 +70,7 @@ struct gss_cl_ctx {
refcount_t count;
enum rpc_gss_proc gc_proc;
u32 gc_seq;
+ u32 gc_seq_xmit;
spinlock_t gc_seq_lock;
struct gss_ctx *gc_gss_ctx;
struct xdr_netobj gc_wire_ctx;
diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index 305ecea92170..59df5cdba0ac 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -817,6 +817,16 @@ rpcauth_unwrap_resp(struct rpc_task *task, kxdrdproc_t decode, void *rqstp,
return rpcauth_unwrap_req_decode(decode, rqstp, data, obj);
}

+bool
+rpcauth_xmit_need_reencode(struct rpc_task *task)
+{
+ struct rpc_cred *cred = task->tk_rqstp->rq_cred;
+
+ if (!cred || !cred->cr_ops->crneed_reencode)
+ return false;
+ return cred->cr_ops->crneed_reencode(task);
+}
+
int
rpcauth_refreshcred(struct rpc_task *task)
{
diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 21c0aa0a0d1d..c898a7c75e84 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -1984,6 +1984,46 @@ gss_unwrap_req_decode(kxdrdproc_t decode, struct rpc_rqst *rqstp,
return decode(rqstp, &xdr, obj);
}

+static bool
+gss_seq_is_newer(u32 new, u32 old)
+{
+ return (s32)(new - old) > 0;
+}
+
+static bool
+gss_xmit_need_reencode(struct rpc_task *task)
+{
+ struct rpc_rqst *req = task->tk_rqstp;
+ struct rpc_cred *cred = req->rq_cred;
+ struct gss_cl_ctx *ctx = gss_cred_get_ctx(cred);
+ u32 win, seq_xmit;
+ bool ret = true;
+
+ if (!ctx)
+ return true;
+
+ if (gss_seq_is_newer(req->rq_seqno, READ_ONCE(ctx->gc_seq)))
+ goto out;
+
+ seq_xmit = READ_ONCE(ctx->gc_seq_xmit);
+ while (gss_seq_is_newer(req->rq_seqno, seq_xmit)) {
+ u32 tmp = seq_xmit;
+
+ seq_xmit = cmpxchg(&ctx->gc_seq_xmit, tmp, req->rq_seqno);
+ if (seq_xmit == tmp) {
+ ret = false;
+ goto out;
+ }
+ }
+
+ win = ctx->gc_win;
+ if (win > 0)
+ ret = !gss_seq_is_newer(req->rq_seqno, seq_xmit - win);
+out:
+ gss_put_ctx(ctx);
+ return ret;
+}
+
static int
gss_unwrap_resp(struct rpc_task *task,
kxdrdproc_t decode, void *rqstp, __be32 *p, void *obj)
@@ -2052,6 +2092,7 @@ static const struct rpc_credops gss_credops = {
.crunwrap_resp = gss_unwrap_resp,
.crkey_timeout = gss_key_timeout,
.crstringify_acceptor = gss_stringify_acceptor,
+ .crneed_reencode = gss_xmit_need_reencode,
};

static const struct rpc_credops gss_nullops = {
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 4f1ec8013332..d41b5ac1d4e8 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2184,6 +2184,9 @@ call_status(struct rpc_task *task)
/* shutdown or soft timeout */
rpc_exit(task, status);
break;
+ case -EBADMSG:
+ task->tk_action = call_transmit;
+ break;
default:
if (clnt->cl_chatty)
printk("%s: RPC call returned error %d\n",
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 6aa09edc9567..3973e10ea2bd 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1014,6 +1014,13 @@ void xprt_transmit(struct rpc_task *task)
dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen);

if (!req->rq_reply_bytes_recvd) {
+
+ /* Verify that our message lies in the RPCSEC_GSS window */
+ if (!req->rq_bytes_sent && rpcauth_xmit_need_reencode(task)) {
+ task->tk_status = -EBADMSG;
+ return;
+ }
+
if (list_empty(&req->rq_list) && rpc_reply_expected(task)) {
/*
* Add to the list only if we're expecting a reply
--
2.17.1

2018-09-17 18:31:18

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 05/44] SUNRPC: Avoid holding locks across the XDR encoding of the RPC message

Currently, we grab the socket bit lock before we allow the message
to be XDR encoded. That significantly slows down the transmission
rate, since we serialise on a potentially blocking operation.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/clnt.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index e5ac35e803ad..a858366cd15d 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1949,9 +1949,6 @@ call_transmit(struct rpc_task *task)
task->tk_action = call_status;
if (task->tk_status < 0)
return;
- if (!xprt_prepare_transmit(task))
- return;
- task->tk_action = call_transmit_status;
/* Encode here so that rpcsec_gss can use correct sequence number. */
if (rpc_task_need_encode(task)) {
rpc_xdr_encode(task);
@@ -1965,6 +1962,9 @@ call_transmit(struct rpc_task *task)
return;
}
}
+ if (!xprt_prepare_transmit(task))
+ return;
+ task->tk_action = call_transmit_status;
xprt_transmit(task);
if (task->tk_status < 0)
return;
--
2.17.1

2018-09-17 18:31:17

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 04/44] SUNRPC: Simplify identification of when the message send/receive is complete

Add states to indicate that the message send and receive are not yet
complete.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/sched.h | 6 ++++--
net/sunrpc/clnt.c | 19 +++++++------------
net/sunrpc/xprt.c | 17 ++++++++++++++---
3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 592653becd91..9e655df70131 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -140,8 +140,10 @@ struct rpc_task_setup {
#define RPC_TASK_RUNNING 0
#define RPC_TASK_QUEUED 1
#define RPC_TASK_ACTIVE 2
-#define RPC_TASK_MSG_RECV 3
-#define RPC_TASK_MSG_RECV_WAIT 4
+#define RPC_TASK_NEED_XMIT 3
+#define RPC_TASK_NEED_RECV 4
+#define RPC_TASK_MSG_RECV 5
+#define RPC_TASK_MSG_RECV_WAIT 6

#define RPC_IS_RUNNING(t) test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate)
#define rpc_set_running(t) set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index d41b5ac1d4e8..e5ac35e803ad 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1156,6 +1156,7 @@ struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req)
*/
xbufp->len = xbufp->head[0].iov_len + xbufp->page_len +
xbufp->tail[0].iov_len;
+ set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);

task->tk_action = call_bc_transmit;
atomic_inc(&task->tk_count);
@@ -1720,17 +1721,10 @@ call_allocate(struct rpc_task *task)
rpc_exit(task, -ERESTARTSYS);
}

-static inline int
+static int
rpc_task_need_encode(struct rpc_task *task)
{
- return task->tk_rqstp->rq_snd_buf.len == 0;
-}
-
-static inline void
-rpc_task_force_reencode(struct rpc_task *task)
-{
- task->tk_rqstp->rq_snd_buf.len = 0;
- task->tk_rqstp->rq_bytes_sent = 0;
+ return test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate) == 0;
}

/*
@@ -1765,6 +1759,8 @@ rpc_xdr_encode(struct rpc_task *task)

task->tk_status = rpcauth_wrap_req(task, encode, req, p,
task->tk_msg.rpc_argp);
+ if (task->tk_status == 0)
+ set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
}

/*
@@ -1999,7 +1995,6 @@ call_transmit_status(struct rpc_task *task)
*/
if (task->tk_status == 0) {
xprt_end_transmit(task);
- rpc_task_force_reencode(task);
return;
}

@@ -2010,7 +2005,6 @@ call_transmit_status(struct rpc_task *task)
default:
dprint_status(task);
xprt_end_transmit(task);
- rpc_task_force_reencode(task);
break;
/*
* Special cases: if we've been waiting on the
@@ -2038,7 +2032,7 @@ call_transmit_status(struct rpc_task *task)
case -EADDRINUSE:
case -ENOTCONN:
case -EPIPE:
- rpc_task_force_reencode(task);
+ break;
}
}

@@ -2185,6 +2179,7 @@ call_status(struct rpc_task *task)
rpc_exit(task, status);
break;
case -EBADMSG:
+ clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
task->tk_action = call_transmit;
break;
default:
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 3973e10ea2bd..45d580cd93ac 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -936,10 +936,18 @@ void xprt_complete_rqst(struct rpc_task *task, int copied)
/* req->rq_reply_bytes_recvd */
smp_wmb();
req->rq_reply_bytes_recvd = copied;
+ clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate);
rpc_wake_up_queued_task(&xprt->pending, task);
}
EXPORT_SYMBOL_GPL(xprt_complete_rqst);

+static bool
+xprt_request_data_received(struct rpc_task *task)
+{
+ return !test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) &&
+ task->tk_rqstp->rq_reply_bytes_recvd != 0;
+}
+
static void xprt_timer(struct rpc_task *task)
{
struct rpc_rqst *req = task->tk_rqstp;
@@ -1031,12 +1039,13 @@ void xprt_transmit(struct rpc_task *task)
/* Add request to the receive list */
spin_lock(&xprt->recv_lock);
list_add_tail(&req->rq_list, &xprt->recv);
+ set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate);
spin_unlock(&xprt->recv_lock);
xprt_reset_majortimeo(req);
/* Turn off autodisconnect */
del_singleshot_timer_sync(&xprt->timer);
}
- } else if (!req->rq_bytes_sent)
+ } else if (xprt_request_data_received(task) && !req->rq_bytes_sent)
return;

connect_cookie = xprt->connect_cookie;
@@ -1046,9 +1055,11 @@ void xprt_transmit(struct rpc_task *task)
task->tk_status = status;
return;
}
+
xprt_inject_disconnect(xprt);

dprintk("RPC: %5u xmit complete\n", task->tk_pid);
+ clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
task->tk_flags |= RPC_TASK_SENT;
spin_lock_bh(&xprt->transport_lock);

@@ -1062,14 +1073,14 @@ void xprt_transmit(struct rpc_task *task)
spin_unlock_bh(&xprt->transport_lock);

req->rq_connect_cookie = connect_cookie;
- if (rpc_reply_expected(task) && !READ_ONCE(req->rq_reply_bytes_recvd)) {
+ if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) {
/*
* Sleep on the pending queue if we're expecting a reply.
* The spinlock ensures atomicity between the test of
* req->rq_reply_bytes_recvd, and the call to rpc_sleep_on().
*/
spin_lock(&xprt->recv_lock);
- if (!req->rq_reply_bytes_recvd) {
+ if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) {
rpc_sleep_on(&xprt->pending, task, xprt_timer);
/*
* Send an extra queue wakeup call if the
--
2.17.1

2018-09-17 18:31:33

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 16/44] SUNRPC: Refactor xprt_transmit() to remove wait for reply code

Allow the caller in clnt.c to call into the code to wait for a reply
after calling xprt_transmit(). Again, the reason is that the backchannel
code does not need this functionality.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprt.h | 1 +
net/sunrpc/clnt.c | 10 +----
net/sunrpc/xprt.c | 74 ++++++++++++++++++++++++++-----------
3 files changed, 54 insertions(+), 31 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 0250294c904a..4fa2af087cff 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -335,6 +335,7 @@ void xprt_free_slot(struct rpc_xprt *xprt,
void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task);
bool xprt_prepare_transmit(struct rpc_task *task);
void xprt_request_enqueue_receive(struct rpc_task *task);
+void xprt_request_wait_receive(struct rpc_task *task);
void xprt_transmit(struct rpc_task *task);
void xprt_end_transmit(struct rpc_task *task);
int xprt_adjust_timeout(struct rpc_rqst *req);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 414966273a3f..775d6e80b6e8 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1975,15 +1975,6 @@ call_transmit(struct rpc_task *task)
return;
if (is_retrans)
task->tk_client->cl_stats->rpcretrans++;
- /*
- * On success, ensure that we call xprt_end_transmit() before sleeping
- * in order to allow access to the socket to other RPC requests.
- */
- call_transmit_status(task);
- if (rpc_reply_expected(task))
- return;
- task->tk_action = rpc_exit_task;
- rpc_wake_up_queued_task(&task->tk_rqstp->rq_xprt->pending, task);
}

/*
@@ -2000,6 +1991,7 @@ call_transmit_status(struct rpc_task *task)
*/
if (task->tk_status == 0) {
xprt_end_transmit(task);
+ xprt_request_wait_receive(task);
return;
}

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index d8f870b5dd46..fe857ab18ee2 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -654,6 +654,22 @@ void xprt_force_disconnect(struct rpc_xprt *xprt)
}
EXPORT_SYMBOL_GPL(xprt_force_disconnect);

+static unsigned int
+xprt_connect_cookie(struct rpc_xprt *xprt)
+{
+ return READ_ONCE(xprt->connect_cookie);
+}
+
+static bool
+xprt_request_retransmit_after_disconnect(struct rpc_task *task)
+{
+ struct rpc_rqst *req = task->tk_rqstp;
+ struct rpc_xprt *xprt = req->rq_xprt;
+
+ return req->rq_connect_cookie != xprt_connect_cookie(xprt) ||
+ !xprt_connected(xprt);
+}
+
/**
* xprt_conditional_disconnect - force a transport to disconnect
* @xprt: transport to disconnect
@@ -1008,6 +1024,39 @@ static void xprt_timer(struct rpc_task *task)
task->tk_status = 0;
}

+/**
+ * xprt_request_wait_receive - wait for the reply to an RPC request
+ * @task: RPC task about to send a request
+ *
+ */
+void xprt_request_wait_receive(struct rpc_task *task)
+{
+ struct rpc_rqst *req = task->tk_rqstp;
+ struct rpc_xprt *xprt = req->rq_xprt;
+
+ if (!test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate))
+ return;
+ /*
+ * Sleep on the pending queue if we're expecting a reply.
+ * The spinlock ensures atomicity between the test of
+ * req->rq_reply_bytes_recvd, and the call to rpc_sleep_on().
+ */
+ spin_lock(&xprt->queue_lock);
+ if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) {
+ xprt->ops->set_retrans_timeout(task);
+ rpc_sleep_on(&xprt->pending, task, xprt_timer);
+ /*
+ * Send an extra queue wakeup call if the
+ * connection was dropped in case the call to
+ * rpc_sleep_on() raced.
+ */
+ if (xprt_request_retransmit_after_disconnect(task))
+ rpc_wake_up_queued_task_set_status(&xprt->pending,
+ task, -ENOTCONN);
+ }
+ spin_unlock(&xprt->queue_lock);
+}
+
/**
* xprt_prepare_transmit - reserve the transport before sending a request
* @task: RPC task about to send a request
@@ -1027,9 +1076,8 @@ bool xprt_prepare_transmit(struct rpc_task *task)
task->tk_status = req->rq_reply_bytes_recvd;
goto out_unlock;
}
- if ((task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT)
- && xprt_connected(xprt)
- && req->rq_connect_cookie == xprt->connect_cookie) {
+ if ((task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT) &&
+ !xprt_request_retransmit_after_disconnect(task)) {
xprt->ops->set_retrans_timeout(task);
rpc_sleep_on(&xprt->pending, task, xprt_timer);
goto out_unlock;
@@ -1090,8 +1138,6 @@ void xprt_transmit(struct rpc_task *task)
task->tk_flags |= RPC_TASK_SENT;
spin_lock_bh(&xprt->transport_lock);

- xprt->ops->set_retrans_timeout(task);
-
xprt->stat.sends++;
xprt->stat.req_u += xprt->stat.sends - xprt->stat.recvs;
xprt->stat.bklog_u += xprt->backlog.qlen;
@@ -1100,22 +1146,6 @@ void xprt_transmit(struct rpc_task *task)
spin_unlock_bh(&xprt->transport_lock);

req->rq_connect_cookie = connect_cookie;
- if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) {
- /*
- * Sleep on the pending queue if we're expecting a reply.
- * The spinlock ensures atomicity between the test of
- * req->rq_reply_bytes_recvd, and the call to rpc_sleep_on().
- */
- spin_lock(&xprt->queue_lock);
- if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) {
- rpc_sleep_on(&xprt->pending, task, xprt_timer);
- /* Wake up immediately if the connection was dropped */
- if (!xprt_connected(xprt))
- rpc_wake_up_queued_task_set_status(&xprt->pending,
- task, -ENOTCONN);
- }
- spin_unlock(&xprt->queue_lock);
- }
}

static void xprt_add_backlog(struct rpc_xprt *xprt, struct rpc_task *task)
@@ -1320,7 +1350,7 @@ xprt_request_init(struct rpc_task *task)
req->rq_xprt = xprt;
req->rq_buffer = NULL;
req->rq_xid = xprt_alloc_xid(xprt);
- req->rq_connect_cookie = xprt->connect_cookie - 1;
+ req->rq_connect_cookie = xprt_connect_cookie(xprt) - 1;
req->rq_bytes_sent = 0;
req->rq_snd_buf.len = 0;
req->rq_snd_buf.buflen = 0;
--
2.17.1

2018-09-17 18:31:21

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 06/44] SUNRPC: Rename TCP receive-specific state variables

Since we will want to introduce similar TCP state variables for the
transmission of requests, let's rename the existing ones to label
that they are for the receive side.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprtsock.h | 16 +--
include/trace/events/sunrpc.h | 10 +-
net/sunrpc/xprtsock.c | 178 ++++++++++++++++----------------
3 files changed, 103 insertions(+), 101 deletions(-)

diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
index ae0f99b9b965..90d5ca8e65f4 100644
--- a/include/linux/sunrpc/xprtsock.h
+++ b/include/linux/sunrpc/xprtsock.h
@@ -30,15 +30,17 @@ struct sock_xprt {
/*
* State of TCP reply receive
*/
- __be32 tcp_fraghdr,
- tcp_xid,
- tcp_calldir;
+ struct {
+ __be32 fraghdr,
+ xid,
+ calldir;

- u32 tcp_offset,
- tcp_reclen;
+ u32 offset,
+ len;

- unsigned long tcp_copied,
- tcp_flags;
+ unsigned long copied,
+ flags;
+ } recv;

/*
* Connection of transports
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index bbb08a3ef5cc..0aa347194e0f 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -525,11 +525,11 @@ TRACE_EVENT(xs_tcp_data_recv,
TP_fast_assign(
__assign_str(addr, xs->xprt.address_strings[RPC_DISPLAY_ADDR]);
__assign_str(port, xs->xprt.address_strings[RPC_DISPLAY_PORT]);
- __entry->xid = be32_to_cpu(xs->tcp_xid);
- __entry->flags = xs->tcp_flags;
- __entry->copied = xs->tcp_copied;
- __entry->reclen = xs->tcp_reclen;
- __entry->offset = xs->tcp_offset;
+ __entry->xid = be32_to_cpu(xs->recv.xid);
+ __entry->flags = xs->recv.flags;
+ __entry->copied = xs->recv.copied;
+ __entry->reclen = xs->recv.len;
+ __entry->offset = xs->recv.offset;
),

TP_printk("peer=[%s]:%s xid=0x%08x flags=%s copied=%lu reclen=%u offset=%lu",
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 6b7539c0466e..cd7d093721ae 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1169,42 +1169,42 @@ static inline void xs_tcp_read_fraghdr(struct rpc_xprt *xprt, struct xdr_skb_rea
size_t len, used;
char *p;

- p = ((char *) &transport->tcp_fraghdr) + transport->tcp_offset;
- len = sizeof(transport->tcp_fraghdr) - transport->tcp_offset;
+ p = ((char *) &transport->recv.fraghdr) + transport->recv.offset;
+ len = sizeof(transport->recv.fraghdr) - transport->recv.offset;
used = xdr_skb_read_bits(desc, p, len);
- transport->tcp_offset += used;
+ transport->recv.offset += used;
if (used != len)
return;

- transport->tcp_reclen = ntohl(transport->tcp_fraghdr);
- if (transport->tcp_reclen & RPC_LAST_STREAM_FRAGMENT)
- transport->tcp_flags |= TCP_RCV_LAST_FRAG;
+ transport->recv.len = ntohl(transport->recv.fraghdr);
+ if (transport->recv.len & RPC_LAST_STREAM_FRAGMENT)
+ transport->recv.flags |= TCP_RCV_LAST_FRAG;
else
- transport->tcp_flags &= ~TCP_RCV_LAST_FRAG;
- transport->tcp_reclen &= RPC_FRAGMENT_SIZE_MASK;
+ transport->recv.flags &= ~TCP_RCV_LAST_FRAG;
+ transport->recv.len &= RPC_FRAGMENT_SIZE_MASK;

- transport->tcp_flags &= ~TCP_RCV_COPY_FRAGHDR;
- transport->tcp_offset = 0;
+ transport->recv.flags &= ~TCP_RCV_COPY_FRAGHDR;
+ transport->recv.offset = 0;

/* Sanity check of the record length */
- if (unlikely(transport->tcp_reclen < 8)) {
+ if (unlikely(transport->recv.len < 8)) {
dprintk("RPC: invalid TCP record fragment length\n");
xs_tcp_force_close(xprt);
return;
}
dprintk("RPC: reading TCP record fragment of length %d\n",
- transport->tcp_reclen);
+ transport->recv.len);
}

static void xs_tcp_check_fraghdr(struct sock_xprt *transport)
{
- if (transport->tcp_offset == transport->tcp_reclen) {
- transport->tcp_flags |= TCP_RCV_COPY_FRAGHDR;
- transport->tcp_offset = 0;
- if (transport->tcp_flags & TCP_RCV_LAST_FRAG) {
- transport->tcp_flags &= ~TCP_RCV_COPY_DATA;
- transport->tcp_flags |= TCP_RCV_COPY_XID;
- transport->tcp_copied = 0;
+ if (transport->recv.offset == transport->recv.len) {
+ transport->recv.flags |= TCP_RCV_COPY_FRAGHDR;
+ transport->recv.offset = 0;
+ if (transport->recv.flags & TCP_RCV_LAST_FRAG) {
+ transport->recv.flags &= ~TCP_RCV_COPY_DATA;
+ transport->recv.flags |= TCP_RCV_COPY_XID;
+ transport->recv.copied = 0;
}
}
}
@@ -1214,20 +1214,20 @@ static inline void xs_tcp_read_xid(struct sock_xprt *transport, struct xdr_skb_r
size_t len, used;
char *p;

- len = sizeof(transport->tcp_xid) - transport->tcp_offset;
+ len = sizeof(transport->recv.xid) - transport->recv.offset;
dprintk("RPC: reading XID (%zu bytes)\n", len);
- p = ((char *) &transport->tcp_xid) + transport->tcp_offset;
+ p = ((char *) &transport->recv.xid) + transport->recv.offset;
used = xdr_skb_read_bits(desc, p, len);
- transport->tcp_offset += used;
+ transport->recv.offset += used;
if (used != len)
return;
- transport->tcp_flags &= ~TCP_RCV_COPY_XID;
- transport->tcp_flags |= TCP_RCV_READ_CALLDIR;
- transport->tcp_copied = 4;
+ transport->recv.flags &= ~TCP_RCV_COPY_XID;
+ transport->recv.flags |= TCP_RCV_READ_CALLDIR;
+ transport->recv.copied = 4;
dprintk("RPC: reading %s XID %08x\n",
- (transport->tcp_flags & TCP_RPC_REPLY) ? "reply for"
+ (transport->recv.flags & TCP_RPC_REPLY) ? "reply for"
: "request with",
- ntohl(transport->tcp_xid));
+ ntohl(transport->recv.xid));
xs_tcp_check_fraghdr(transport);
}

@@ -1239,34 +1239,34 @@ static inline void xs_tcp_read_calldir(struct sock_xprt *transport,
char *p;

/*
- * We want transport->tcp_offset to be 8 at the end of this routine
+ * We want transport->recv.offset to be 8 at the end of this routine
* (4 bytes for the xid and 4 bytes for the call/reply flag).
* When this function is called for the first time,
- * transport->tcp_offset is 4 (after having already read the xid).
+ * transport->recv.offset is 4 (after having already read the xid).
*/
- offset = transport->tcp_offset - sizeof(transport->tcp_xid);
- len = sizeof(transport->tcp_calldir) - offset;
+ offset = transport->recv.offset - sizeof(transport->recv.xid);
+ len = sizeof(transport->recv.calldir) - offset;
dprintk("RPC: reading CALL/REPLY flag (%zu bytes)\n", len);
- p = ((char *) &transport->tcp_calldir) + offset;
+ p = ((char *) &transport->recv.calldir) + offset;
used = xdr_skb_read_bits(desc, p, len);
- transport->tcp_offset += used;
+ transport->recv.offset += used;
if (used != len)
return;
- transport->tcp_flags &= ~TCP_RCV_READ_CALLDIR;
+ transport->recv.flags &= ~TCP_RCV_READ_CALLDIR;
/*
* We don't yet have the XDR buffer, so we will write the calldir
* out after we get the buffer from the 'struct rpc_rqst'
*/
- switch (ntohl(transport->tcp_calldir)) {
+ switch (ntohl(transport->recv.calldir)) {
case RPC_REPLY:
- transport->tcp_flags |= TCP_RCV_COPY_CALLDIR;
- transport->tcp_flags |= TCP_RCV_COPY_DATA;
- transport->tcp_flags |= TCP_RPC_REPLY;
+ transport->recv.flags |= TCP_RCV_COPY_CALLDIR;
+ transport->recv.flags |= TCP_RCV_COPY_DATA;
+ transport->recv.flags |= TCP_RPC_REPLY;
break;
case RPC_CALL:
- transport->tcp_flags |= TCP_RCV_COPY_CALLDIR;
- transport->tcp_flags |= TCP_RCV_COPY_DATA;
- transport->tcp_flags &= ~TCP_RPC_REPLY;
+ transport->recv.flags |= TCP_RCV_COPY_CALLDIR;
+ transport->recv.flags |= TCP_RCV_COPY_DATA;
+ transport->recv.flags &= ~TCP_RPC_REPLY;
break;
default:
dprintk("RPC: invalid request message type\n");
@@ -1287,21 +1287,21 @@ static inline void xs_tcp_read_common(struct rpc_xprt *xprt,

rcvbuf = &req->rq_private_buf;

- if (transport->tcp_flags & TCP_RCV_COPY_CALLDIR) {
+ if (transport->recv.flags & TCP_RCV_COPY_CALLDIR) {
/*
* Save the RPC direction in the XDR buffer
*/
- memcpy(rcvbuf->head[0].iov_base + transport->tcp_copied,
- &transport->tcp_calldir,
- sizeof(transport->tcp_calldir));
- transport->tcp_copied += sizeof(transport->tcp_calldir);
- transport->tcp_flags &= ~TCP_RCV_COPY_CALLDIR;
+ memcpy(rcvbuf->head[0].iov_base + transport->recv.copied,
+ &transport->recv.calldir,
+ sizeof(transport->recv.calldir));
+ transport->recv.copied += sizeof(transport->recv.calldir);
+ transport->recv.flags &= ~TCP_RCV_COPY_CALLDIR;
}

len = desc->count;
- if (len > transport->tcp_reclen - transport->tcp_offset)
- desc->count = transport->tcp_reclen - transport->tcp_offset;
- r = xdr_partial_copy_from_skb(rcvbuf, transport->tcp_copied,
+ if (len > transport->recv.len - transport->recv.offset)
+ desc->count = transport->recv.len - transport->recv.offset;
+ r = xdr_partial_copy_from_skb(rcvbuf, transport->recv.copied,
desc, xdr_skb_read_bits);

if (desc->count) {
@@ -1314,31 +1314,31 @@ static inline void xs_tcp_read_common(struct rpc_xprt *xprt,
* Any remaining data from this record will
* be discarded.
*/
- transport->tcp_flags &= ~TCP_RCV_COPY_DATA;
+ transport->recv.flags &= ~TCP_RCV_COPY_DATA;
dprintk("RPC: XID %08x truncated request\n",
- ntohl(transport->tcp_xid));
- dprintk("RPC: xprt = %p, tcp_copied = %lu, "
- "tcp_offset = %u, tcp_reclen = %u\n",
- xprt, transport->tcp_copied,
- transport->tcp_offset, transport->tcp_reclen);
+ ntohl(transport->recv.xid));
+ dprintk("RPC: xprt = %p, recv.copied = %lu, "
+ "recv.offset = %u, recv.len = %u\n",
+ xprt, transport->recv.copied,
+ transport->recv.offset, transport->recv.len);
return;
}

- transport->tcp_copied += r;
- transport->tcp_offset += r;
+ transport->recv.copied += r;
+ transport->recv.offset += r;
desc->count = len - r;

dprintk("RPC: XID %08x read %zd bytes\n",
- ntohl(transport->tcp_xid), r);
- dprintk("RPC: xprt = %p, tcp_copied = %lu, tcp_offset = %u, "
- "tcp_reclen = %u\n", xprt, transport->tcp_copied,
- transport->tcp_offset, transport->tcp_reclen);
-
- if (transport->tcp_copied == req->rq_private_buf.buflen)
- transport->tcp_flags &= ~TCP_RCV_COPY_DATA;
- else if (transport->tcp_offset == transport->tcp_reclen) {
- if (transport->tcp_flags & TCP_RCV_LAST_FRAG)
- transport->tcp_flags &= ~TCP_RCV_COPY_DATA;
+ ntohl(transport->recv.xid), r);
+ dprintk("RPC: xprt = %p, recv.copied = %lu, recv.offset = %u, "
+ "recv.len = %u\n", xprt, transport->recv.copied,
+ transport->recv.offset, transport->recv.len);
+
+ if (transport->recv.copied == req->rq_private_buf.buflen)
+ transport->recv.flags &= ~TCP_RCV_COPY_DATA;
+ else if (transport->recv.offset == transport->recv.len) {
+ if (transport->recv.flags & TCP_RCV_LAST_FRAG)
+ transport->recv.flags &= ~TCP_RCV_COPY_DATA;
}
}

@@ -1353,14 +1353,14 @@ static inline int xs_tcp_read_reply(struct rpc_xprt *xprt,
container_of(xprt, struct sock_xprt, xprt);
struct rpc_rqst *req;

- dprintk("RPC: read reply XID %08x\n", ntohl(transport->tcp_xid));
+ dprintk("RPC: read reply XID %08x\n", ntohl(transport->recv.xid));

/* Find and lock the request corresponding to this xid */
spin_lock(&xprt->recv_lock);
- req = xprt_lookup_rqst(xprt, transport->tcp_xid);
+ req = xprt_lookup_rqst(xprt, transport->recv.xid);
if (!req) {
dprintk("RPC: XID %08x request not found!\n",
- ntohl(transport->tcp_xid));
+ ntohl(transport->recv.xid));
spin_unlock(&xprt->recv_lock);
return -1;
}
@@ -1370,8 +1370,8 @@ static inline int xs_tcp_read_reply(struct rpc_xprt *xprt,
xs_tcp_read_common(xprt, desc, req);

spin_lock(&xprt->recv_lock);
- if (!(transport->tcp_flags & TCP_RCV_COPY_DATA))
- xprt_complete_rqst(req->rq_task, transport->tcp_copied);
+ if (!(transport->recv.flags & TCP_RCV_COPY_DATA))
+ xprt_complete_rqst(req->rq_task, transport->recv.copied);
xprt_unpin_rqst(req);
spin_unlock(&xprt->recv_lock);
return 0;
@@ -1393,7 +1393,7 @@ static int xs_tcp_read_callback(struct rpc_xprt *xprt,
struct rpc_rqst *req;

/* Look up the request corresponding to the given XID */
- req = xprt_lookup_bc_request(xprt, transport->tcp_xid);
+ req = xprt_lookup_bc_request(xprt, transport->recv.xid);
if (req == NULL) {
printk(KERN_WARNING "Callback slot table overflowed\n");
xprt_force_disconnect(xprt);
@@ -1403,8 +1403,8 @@ static int xs_tcp_read_callback(struct rpc_xprt *xprt,
dprintk("RPC: read callback XID %08x\n", ntohl(req->rq_xid));
xs_tcp_read_common(xprt, desc, req);

- if (!(transport->tcp_flags & TCP_RCV_COPY_DATA))
- xprt_complete_bc_request(req, transport->tcp_copied);
+ if (!(transport->recv.flags & TCP_RCV_COPY_DATA))
+ xprt_complete_bc_request(req, transport->recv.copied);

return 0;
}
@@ -1415,7 +1415,7 @@ static inline int _xs_tcp_read_data(struct rpc_xprt *xprt,
struct sock_xprt *transport =
container_of(xprt, struct sock_xprt, xprt);

- return (transport->tcp_flags & TCP_RPC_REPLY) ?
+ return (transport->recv.flags & TCP_RPC_REPLY) ?
xs_tcp_read_reply(xprt, desc) :
xs_tcp_read_callback(xprt, desc);
}
@@ -1458,9 +1458,9 @@ static void xs_tcp_read_data(struct rpc_xprt *xprt,
else {
/*
* The transport_lock protects the request handling.
- * There's no need to hold it to update the tcp_flags.
+ * There's no need to hold it to update the recv.flags.
*/
- transport->tcp_flags &= ~TCP_RCV_COPY_DATA;
+ transport->recv.flags &= ~TCP_RCV_COPY_DATA;
}
}

@@ -1468,12 +1468,12 @@ static inline void xs_tcp_read_discard(struct sock_xprt *transport, struct xdr_s
{
size_t len;

- len = transport->tcp_reclen - transport->tcp_offset;
+ len = transport->recv.len - transport->recv.offset;
if (len > desc->count)
len = desc->count;
desc->count -= len;
desc->offset += len;
- transport->tcp_offset += len;
+ transport->recv.offset += len;
dprintk("RPC: discarded %zu bytes\n", len);
xs_tcp_check_fraghdr(transport);
}
@@ -1494,22 +1494,22 @@ static int xs_tcp_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb, uns
trace_xs_tcp_data_recv(transport);
/* Read in a new fragment marker if necessary */
/* Can we ever really expect to get completely empty fragments? */
- if (transport->tcp_flags & TCP_RCV_COPY_FRAGHDR) {
+ if (transport->recv.flags & TCP_RCV_COPY_FRAGHDR) {
xs_tcp_read_fraghdr(xprt, &desc);
continue;
}
/* Read in the xid if necessary */
- if (transport->tcp_flags & TCP_RCV_COPY_XID) {
+ if (transport->recv.flags & TCP_RCV_COPY_XID) {
xs_tcp_read_xid(transport, &desc);
continue;
}
/* Read in the call/reply flag */
- if (transport->tcp_flags & TCP_RCV_READ_CALLDIR) {
+ if (transport->recv.flags & TCP_RCV_READ_CALLDIR) {
xs_tcp_read_calldir(transport, &desc);
continue;
}
/* Read in the request data */
- if (transport->tcp_flags & TCP_RCV_COPY_DATA) {
+ if (transport->recv.flags & TCP_RCV_COPY_DATA) {
xs_tcp_read_data(xprt, &desc);
continue;
}
@@ -1602,10 +1602,10 @@ static void xs_tcp_state_change(struct sock *sk)
if (!xprt_test_and_set_connected(xprt)) {

/* Reset TCP record info */
- transport->tcp_offset = 0;
- transport->tcp_reclen = 0;
- transport->tcp_copied = 0;
- transport->tcp_flags =
+ transport->recv.offset = 0;
+ transport->recv.len = 0;
+ transport->recv.copied = 0;
+ transport->recv.flags =
TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
xprt->connect_cookie++;
clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
--
2.17.1

2018-09-17 18:31:24

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 10/44] SUNRPC: Refactor the transport request pinning

We are going to need to pin for both send and receive.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/sched.h | 3 +--
include/linux/sunrpc/xprt.h | 1 +
net/sunrpc/xprt.c | 43 +++++++++++++++++++-----------------
3 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 9e655df70131..8062ce6b18e5 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -142,8 +142,7 @@ struct rpc_task_setup {
#define RPC_TASK_ACTIVE 2
#define RPC_TASK_NEED_XMIT 3
#define RPC_TASK_NEED_RECV 4
-#define RPC_TASK_MSG_RECV 5
-#define RPC_TASK_MSG_RECV_WAIT 6
+#define RPC_TASK_MSG_PIN_WAIT 5

#define RPC_IS_RUNNING(t) test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate)
#define rpc_set_running(t) set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate)
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 3d80524e92d6..bd743c51a865 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -103,6 +103,7 @@ struct rpc_rqst {
/* A cookie used to track the
state of the transport
connection */
+ atomic_t rq_pin;

/*
* Partial send handling
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 45d580cd93ac..649a40cfae6d 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -847,16 +847,22 @@ struct rpc_rqst *xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid)
}
EXPORT_SYMBOL_GPL(xprt_lookup_rqst);

+static bool
+xprt_is_pinned_rqst(struct rpc_rqst *req)
+{
+ return atomic_read(&req->rq_pin) != 0;
+}
+
/**
* xprt_pin_rqst - Pin a request on the transport receive list
* @req: Request to pin
*
* Caller must ensure this is atomic with the call to xprt_lookup_rqst()
- * so should be holding the xprt transport lock.
+ * so should be holding the xprt receive lock.
*/
void xprt_pin_rqst(struct rpc_rqst *req)
{
- set_bit(RPC_TASK_MSG_RECV, &req->rq_task->tk_runstate);
+ atomic_inc(&req->rq_pin);
}
EXPORT_SYMBOL_GPL(xprt_pin_rqst);

@@ -864,31 +870,22 @@ EXPORT_SYMBOL_GPL(xprt_pin_rqst);
* xprt_unpin_rqst - Unpin a request on the transport receive list
* @req: Request to pin
*
- * Caller should be holding the xprt transport lock.
+ * Caller should be holding the xprt receive lock.
*/
void xprt_unpin_rqst(struct rpc_rqst *req)
{
- struct rpc_task *task = req->rq_task;
-
- clear_bit(RPC_TASK_MSG_RECV, &task->tk_runstate);
- if (test_bit(RPC_TASK_MSG_RECV_WAIT, &task->tk_runstate))
- wake_up_bit(&task->tk_runstate, RPC_TASK_MSG_RECV);
+ if (!test_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate)) {
+ atomic_dec(&req->rq_pin);
+ return;
+ }
+ if (atomic_dec_and_test(&req->rq_pin))
+ wake_up_var(&req->rq_pin);
}
EXPORT_SYMBOL_GPL(xprt_unpin_rqst);

static void xprt_wait_on_pinned_rqst(struct rpc_rqst *req)
-__must_hold(&req->rq_xprt->recv_lock)
{
- struct rpc_task *task = req->rq_task;
-
- if (task && test_bit(RPC_TASK_MSG_RECV, &task->tk_runstate)) {
- spin_unlock(&req->rq_xprt->recv_lock);
- set_bit(RPC_TASK_MSG_RECV_WAIT, &task->tk_runstate);
- wait_on_bit(&task->tk_runstate, RPC_TASK_MSG_RECV,
- TASK_UNINTERRUPTIBLE);
- clear_bit(RPC_TASK_MSG_RECV_WAIT, &task->tk_runstate);
- spin_lock(&req->rq_xprt->recv_lock);
- }
+ wait_var_event(&req->rq_pin, !xprt_is_pinned_rqst(req));
}

/**
@@ -1388,7 +1385,13 @@ void xprt_release(struct rpc_task *task)
spin_lock(&xprt->recv_lock);
if (!list_empty(&req->rq_list)) {
list_del_init(&req->rq_list);
- xprt_wait_on_pinned_rqst(req);
+ if (xprt_is_pinned_rqst(req)) {
+ set_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate);
+ spin_unlock(&xprt->recv_lock);
+ xprt_wait_on_pinned_rqst(req);
+ spin_lock(&xprt->recv_lock);
+ clear_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate);
+ }
}
spin_unlock(&xprt->recv_lock);
spin_lock_bh(&xprt->transport_lock);
--
2.17.1

2018-09-17 18:31:15

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 02/44] SUNRPC: If there is no reply expected, bail early from call_decode

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/clnt.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index bc9d020bf71f..4f1ec8013332 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2260,6 +2260,11 @@ call_decode(struct rpc_task *task)

dprint_status(task);

+ if (!decode) {
+ task->tk_action = rpc_exit_task;
+ return;
+ }
+
if (task->tk_flags & RPC_CALL_MAJORSEEN) {
if (clnt->cl_chatty) {
printk(KERN_NOTICE "%s: server %s OK\n",
@@ -2297,13 +2302,11 @@ call_decode(struct rpc_task *task)
goto out_retry;
return;
}
-
task->tk_action = rpc_exit_task;

- if (decode) {
- task->tk_status = rpcauth_unwrap_resp(task, decode, req, p,
- task->tk_msg.rpc_resp);
- }
+ task->tk_status = rpcauth_unwrap_resp(task, decode, req, p,
+ task->tk_msg.rpc_resp);
+
dprintk("RPC: %5u call_decode result %d\n", task->tk_pid,
task->tk_status);
return;
--
2.17.1

2018-09-17 18:31:20

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 07/44] SUNRPC: Move reset of TCP state variables into the reconnect code

Rather than resetting state variables in socket state_change() callback,
do it in the sunrpc TCP connect function itself.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xprtsock.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index cd7d093721ae..ec1e3f93e707 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1600,13 +1600,6 @@ static void xs_tcp_state_change(struct sock *sk)
case TCP_ESTABLISHED:
spin_lock(&xprt->transport_lock);
if (!xprt_test_and_set_connected(xprt)) {
-
- /* Reset TCP record info */
- transport->recv.offset = 0;
- transport->recv.len = 0;
- transport->recv.copied = 0;
- transport->recv.flags =
- TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
xprt->connect_cookie++;
clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
xprt_clear_connecting(xprt);
@@ -2386,6 +2379,12 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)

xs_set_memalloc(xprt);

+ /* Reset TCP record info */
+ transport->recv.offset = 0;
+ transport->recv.len = 0;
+ transport->recv.copied = 0;
+ transport->recv.flags = TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
+
/* Tell the socket layer to start connecting... */
xprt->stat.connect_count++;
xprt->stat.connect_start = jiffies;
--
2.17.1

2018-09-17 18:31:26

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 12/44] SUNRPC: Test whether the task is queued before grabbing the queue spinlocks

When asked to wake up an RPC task, it makes sense to test whether or not
the task is still queued.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/sched.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index dec01bd1b71c..9a8ec012b449 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -479,6 +479,8 @@ void rpc_wake_up_queued_task_on_wq(struct workqueue_struct *wq,
struct rpc_wait_queue *queue,
struct rpc_task *task)
{
+ if (!RPC_IS_QUEUED(task))
+ return;
spin_lock_bh(&queue->lock);
rpc_wake_up_task_on_wq_queue_locked(wq, queue, task);
spin_unlock_bh(&queue->lock);
@@ -489,6 +491,8 @@ void rpc_wake_up_queued_task_on_wq(struct workqueue_struct *wq,
*/
void rpc_wake_up_queued_task(struct rpc_wait_queue *queue, struct rpc_task *task)
{
+ if (!RPC_IS_QUEUED(task))
+ return;
spin_lock_bh(&queue->lock);
rpc_wake_up_task_queue_locked(queue, task);
spin_unlock_bh(&queue->lock);
--
2.17.1

2018-09-17 18:31:22

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 08/44] SUNRPC: Add socket transmit queue offset tracking

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprtsock.h | 7 ++++++
net/sunrpc/xprtsock.c | 40 ++++++++++++++++++---------------
2 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
index 90d5ca8e65f4..005cfb6e7238 100644
--- a/include/linux/sunrpc/xprtsock.h
+++ b/include/linux/sunrpc/xprtsock.h
@@ -42,6 +42,13 @@ struct sock_xprt {
flags;
} recv;

+ /*
+ * State of TCP transmit queue
+ */
+ struct {
+ u32 offset;
+ } xmit;
+
/*
* Connection of transports
*/
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index ec1e3f93e707..629cc45e1e6c 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -461,7 +461,7 @@ static int xs_nospace(struct rpc_task *task)
int ret = -EAGAIN;

dprintk("RPC: %5u xmit incomplete (%u left of %u)\n",
- task->tk_pid, req->rq_slen - req->rq_bytes_sent,
+ task->tk_pid, req->rq_slen - transport->xmit.offset,
req->rq_slen);

/* Protect against races with write_space */
@@ -528,19 +528,22 @@ static int xs_local_send_request(struct rpc_task *task)
req->rq_svec->iov_base, req->rq_svec->iov_len);

req->rq_xtime = ktime_get();
- status = xs_sendpages(transport->sock, NULL, 0, xdr, req->rq_bytes_sent,
+ status = xs_sendpages(transport->sock, NULL, 0, xdr,
+ transport->xmit.offset,
true, &sent);
dprintk("RPC: %s(%u) = %d\n",
- __func__, xdr->len - req->rq_bytes_sent, status);
+ __func__, xdr->len - transport->xmit.offset, status);

if (status == -EAGAIN && sock_writeable(transport->inet))
status = -ENOBUFS;

if (likely(sent > 0) || status == 0) {
- req->rq_bytes_sent += sent;
- req->rq_xmit_bytes_sent += sent;
+ transport->xmit.offset += sent;
+ req->rq_bytes_sent = transport->xmit.offset;
if (likely(req->rq_bytes_sent >= req->rq_slen)) {
+ req->rq_xmit_bytes_sent += transport->xmit.offset;
req->rq_bytes_sent = 0;
+ transport->xmit.offset = 0;
return 0;
}
status = -EAGAIN;
@@ -592,10 +595,10 @@ static int xs_udp_send_request(struct rpc_task *task)
return -ENOTCONN;
req->rq_xtime = ktime_get();
status = xs_sendpages(transport->sock, xs_addr(xprt), xprt->addrlen,
- xdr, req->rq_bytes_sent, true, &sent);
+ xdr, 0, true, &sent);

dprintk("RPC: xs_udp_send_request(%u) = %d\n",
- xdr->len - req->rq_bytes_sent, status);
+ xdr->len, status);

/* firewall is blocking us, don't return -EAGAIN or we end up looping */
if (status == -EPERM)
@@ -684,17 +687,20 @@ static int xs_tcp_send_request(struct rpc_task *task)
while (1) {
sent = 0;
status = xs_sendpages(transport->sock, NULL, 0, xdr,
- req->rq_bytes_sent, zerocopy, &sent);
+ transport->xmit.offset,
+ zerocopy, &sent);

dprintk("RPC: xs_tcp_send_request(%u) = %d\n",
- xdr->len - req->rq_bytes_sent, status);
+ xdr->len - transport->xmit.offset, status);

/* If we've sent the entire packet, immediately
* reset the count of bytes sent. */
- req->rq_bytes_sent += sent;
- req->rq_xmit_bytes_sent += sent;
+ transport->xmit.offset += sent;
+ req->rq_bytes_sent = transport->xmit.offset;
if (likely(req->rq_bytes_sent >= req->rq_slen)) {
+ req->rq_xmit_bytes_sent += transport->xmit.offset;
req->rq_bytes_sent = 0;
+ transport->xmit.offset = 0;
return 0;
}

@@ -760,18 +766,13 @@ static int xs_tcp_send_request(struct rpc_task *task)
*/
static void xs_tcp_release_xprt(struct rpc_xprt *xprt, struct rpc_task *task)
{
- struct rpc_rqst *req;
+ struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);

if (task != xprt->snd_task)
return;
if (task == NULL)
goto out_release;
- req = task->tk_rqstp;
- if (req == NULL)
- goto out_release;
- if (req->rq_bytes_sent == 0)
- goto out_release;
- if (req->rq_bytes_sent == req->rq_snd_buf.len)
+ if (transport->xmit.offset == 0 || !xprt_connected(xprt))
goto out_release;
set_bit(XPRT_CLOSE_WAIT, &xprt->state);
out_release:
@@ -2021,6 +2022,8 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt,
write_unlock_bh(&sk->sk_callback_lock);
}

+ transport->xmit.offset = 0;
+
/* Tell the socket layer to start connecting... */
xprt->stat.connect_count++;
xprt->stat.connect_start = jiffies;
@@ -2384,6 +2387,7 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
transport->recv.len = 0;
transport->recv.copied = 0;
transport->recv.flags = TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
+ transport->xmit.offset = 0;

/* Tell the socket layer to start connecting... */
xprt->stat.connect_count++;
--
2.17.1

2018-09-17 18:31:14

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 01/44] SUNRPC: Clean up initialisation of the struct rpc_rqst

Move the initialisation back into xprt.c.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprt.h | 1 -
net/sunrpc/clnt.c | 1 -
net/sunrpc/xprt.c | 91 +++++++++++++++++++++----------------
3 files changed, 51 insertions(+), 42 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 336fd1a19cca..3d80524e92d6 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -325,7 +325,6 @@ struct xprt_class {
struct rpc_xprt *xprt_create_transport(struct xprt_create *args);
void xprt_connect(struct rpc_task *task);
void xprt_reserve(struct rpc_task *task);
-void xprt_request_init(struct rpc_task *task);
void xprt_retry_reserve(struct rpc_task *task);
int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task);
int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 8ea2f5fadd96..bc9d020bf71f 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1558,7 +1558,6 @@ call_reserveresult(struct rpc_task *task)
task->tk_status = 0;
if (status >= 0) {
if (task->tk_rqstp) {
- xprt_request_init(task);
task->tk_action = call_refresh;
return;
}
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index a8db2e3f8904..6aa09edc9567 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1250,6 +1250,55 @@ void xprt_free(struct rpc_xprt *xprt)
}
EXPORT_SYMBOL_GPL(xprt_free);

+static __be32
+xprt_alloc_xid(struct rpc_xprt *xprt)
+{
+ __be32 xid;
+
+ spin_lock(&xprt->reserve_lock);
+ xid = (__force __be32)xprt->xid++;
+ spin_unlock(&xprt->reserve_lock);
+ return xid;
+}
+
+static void
+xprt_init_xid(struct rpc_xprt *xprt)
+{
+ xprt->xid = prandom_u32();
+}
+
+static void
+xprt_request_init(struct rpc_task *task)
+{
+ struct rpc_xprt *xprt = task->tk_xprt;
+ struct rpc_rqst *req = task->tk_rqstp;
+
+ INIT_LIST_HEAD(&req->rq_list);
+ req->rq_timeout = task->tk_client->cl_timeout->to_initval;
+ req->rq_task = task;
+ req->rq_xprt = xprt;
+ req->rq_buffer = NULL;
+ req->rq_xid = xprt_alloc_xid(xprt);
+ req->rq_connect_cookie = xprt->connect_cookie - 1;
+ req->rq_bytes_sent = 0;
+ req->rq_snd_buf.len = 0;
+ req->rq_snd_buf.buflen = 0;
+ req->rq_rcv_buf.len = 0;
+ req->rq_rcv_buf.buflen = 0;
+ req->rq_release_snd_buf = NULL;
+ xprt_reset_majortimeo(req);
+ dprintk("RPC: %5u reserved req %p xid %08x\n", task->tk_pid,
+ req, ntohl(req->rq_xid));
+}
+
+static void
+xprt_do_reserve(struct rpc_xprt *xprt, struct rpc_task *task)
+{
+ xprt->ops->alloc_slot(xprt, task);
+ if (task->tk_rqstp != NULL)
+ xprt_request_init(task);
+}
+
/**
* xprt_reserve - allocate an RPC request slot
* @task: RPC task requesting a slot allocation
@@ -1269,7 +1318,7 @@ void xprt_reserve(struct rpc_task *task)
task->tk_timeout = 0;
task->tk_status = -EAGAIN;
if (!xprt_throttle_congested(xprt, task))
- xprt->ops->alloc_slot(xprt, task);
+ xprt_do_reserve(xprt, task);
}

/**
@@ -1291,45 +1340,7 @@ void xprt_retry_reserve(struct rpc_task *task)

task->tk_timeout = 0;
task->tk_status = -EAGAIN;
- xprt->ops->alloc_slot(xprt, task);
-}
-
-static inline __be32 xprt_alloc_xid(struct rpc_xprt *xprt)
-{
- __be32 xid;
-
- spin_lock(&xprt->reserve_lock);
- xid = (__force __be32)xprt->xid++;
- spin_unlock(&xprt->reserve_lock);
- return xid;
-}
-
-static inline void xprt_init_xid(struct rpc_xprt *xprt)
-{
- xprt->xid = prandom_u32();
-}
-
-void xprt_request_init(struct rpc_task *task)
-{
- struct rpc_xprt *xprt = task->tk_xprt;
- struct rpc_rqst *req = task->tk_rqstp;
-
- INIT_LIST_HEAD(&req->rq_list);
- req->rq_timeout = task->tk_client->cl_timeout->to_initval;
- req->rq_task = task;
- req->rq_xprt = xprt;
- req->rq_buffer = NULL;
- req->rq_xid = xprt_alloc_xid(xprt);
- req->rq_connect_cookie = xprt->connect_cookie - 1;
- req->rq_bytes_sent = 0;
- req->rq_snd_buf.len = 0;
- req->rq_snd_buf.buflen = 0;
- req->rq_rcv_buf.len = 0;
- req->rq_rcv_buf.buflen = 0;
- req->rq_release_snd_buf = NULL;
- xprt_reset_majortimeo(req);
- dprintk("RPC: %5u reserved req %p xid %08x\n", task->tk_pid,
- req, ntohl(req->rq_xid));
+ xprt_do_reserve(xprt, task);
}

/**
--
2.17.1

2018-09-17 18:31:29

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 14/44] SUNRPC: Rename xprt->recv_lock to xprt->queue_lock

We will use the same lock to protect both the transmit and receive queues.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprt.h | 2 +-
net/sunrpc/svcsock.c | 6 ++---
net/sunrpc/xprt.c | 24 ++++++++---------
net/sunrpc/xprtrdma/rpc_rdma.c | 10 ++++----
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 4 +--
net/sunrpc/xprtsock.c | 30 +++++++++++-----------
6 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index bd743c51a865..c25d0a5fda69 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -235,7 +235,7 @@ struct rpc_xprt {
*/
spinlock_t transport_lock; /* lock transport info */
spinlock_t reserve_lock; /* lock slot table */
- spinlock_t recv_lock; /* lock receive list */
+ spinlock_t queue_lock; /* send/receive queue lock */
u32 xid; /* Next XID value to use */
struct rpc_task * snd_task; /* Task blocked in send */
struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 5445145e639c..db8bb6b3a2b0 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1004,7 +1004,7 @@ static int receive_cb_reply(struct svc_sock *svsk, struct svc_rqst *rqstp)

if (!bc_xprt)
return -EAGAIN;
- spin_lock(&bc_xprt->recv_lock);
+ spin_lock(&bc_xprt->queue_lock);
req = xprt_lookup_rqst(bc_xprt, xid);
if (!req)
goto unlock_notfound;
@@ -1022,7 +1022,7 @@ static int receive_cb_reply(struct svc_sock *svsk, struct svc_rqst *rqstp)
memcpy(dst->iov_base, src->iov_base, src->iov_len);
xprt_complete_rqst(req->rq_task, rqstp->rq_arg.len);
rqstp->rq_arg.len = 0;
- spin_unlock(&bc_xprt->recv_lock);
+ spin_unlock(&bc_xprt->queue_lock);
return 0;
unlock_notfound:
printk(KERN_NOTICE
@@ -1031,7 +1031,7 @@ static int receive_cb_reply(struct svc_sock *svsk, struct svc_rqst *rqstp)
__func__, ntohl(calldir),
bc_xprt, ntohl(xid));
unlock_eagain:
- spin_unlock(&bc_xprt->recv_lock);
+ spin_unlock(&bc_xprt->queue_lock);
return -EAGAIN;
}

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 3a3b3445a7c0..6e3d4b4ee79e 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -826,7 +826,7 @@ static void xprt_connect_status(struct rpc_task *task)
* @xprt: transport on which the original request was transmitted
* @xid: RPC XID of incoming reply
*
- * Caller holds xprt->recv_lock.
+ * Caller holds xprt->queue_lock.
*/
struct rpc_rqst *xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid)
{
@@ -892,7 +892,7 @@ static void xprt_wait_on_pinned_rqst(struct rpc_rqst *req)
* xprt_update_rtt - Update RPC RTT statistics
* @task: RPC request that recently completed
*
- * Caller holds xprt->recv_lock.
+ * Caller holds xprt->queue_lock.
*/
void xprt_update_rtt(struct rpc_task *task)
{
@@ -914,7 +914,7 @@ EXPORT_SYMBOL_GPL(xprt_update_rtt);
* @task: RPC request that recently completed
* @copied: actual number of bytes received from the transport
*
- * Caller holds xprt->recv_lock.
+ * Caller holds xprt->queue_lock.
*/
void xprt_complete_rqst(struct rpc_task *task, int copied)
{
@@ -1034,10 +1034,10 @@ void xprt_transmit(struct rpc_task *task)
memcpy(&req->rq_private_buf, &req->rq_rcv_buf,
sizeof(req->rq_private_buf));
/* Add request to the receive list */
- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
list_add_tail(&req->rq_list, &xprt->recv);
set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate);
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);
xprt_reset_majortimeo(req);
/* Turn off autodisconnect */
del_singleshot_timer_sync(&xprt->timer);
@@ -1076,7 +1076,7 @@ void xprt_transmit(struct rpc_task *task)
* The spinlock ensures atomicity between the test of
* req->rq_reply_bytes_recvd, and the call to rpc_sleep_on().
*/
- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) {
rpc_sleep_on(&xprt->pending, task, xprt_timer);
/* Wake up immediately if the connection was dropped */
@@ -1084,7 +1084,7 @@ void xprt_transmit(struct rpc_task *task)
rpc_wake_up_queued_task_set_status(&xprt->pending,
task, -ENOTCONN);
}
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);
}
}

@@ -1379,18 +1379,18 @@ void xprt_release(struct rpc_task *task)
task->tk_ops->rpc_count_stats(task, task->tk_calldata);
else if (task->tk_client)
rpc_count_iostats(task, task->tk_client->cl_metrics);
- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
if (!list_empty(&req->rq_list)) {
list_del_init(&req->rq_list);
if (xprt_is_pinned_rqst(req)) {
set_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate);
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);
xprt_wait_on_pinned_rqst(req);
- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
clear_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate);
}
}
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);
spin_lock_bh(&xprt->transport_lock);
xprt->ops->release_xprt(xprt, task);
if (xprt->ops->release_request)
@@ -1420,7 +1420,7 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net)

spin_lock_init(&xprt->transport_lock);
spin_lock_init(&xprt->reserve_lock);
- spin_lock_init(&xprt->recv_lock);
+ spin_lock_init(&xprt->queue_lock);

INIT_LIST_HEAD(&xprt->free);
INIT_LIST_HEAD(&xprt->recv);
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index c8ae983c6cc0..0020dc401215 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -1238,7 +1238,7 @@ void rpcrdma_complete_rqst(struct rpcrdma_rep *rep)
goto out_badheader;

out:
- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
cwnd = xprt->cwnd;
xprt->cwnd = r_xprt->rx_buf.rb_credits << RPC_CWNDSHIFT;
if (xprt->cwnd > cwnd)
@@ -1246,7 +1246,7 @@ void rpcrdma_complete_rqst(struct rpcrdma_rep *rep)

xprt_complete_rqst(rqst->rq_task, status);
xprt_unpin_rqst(rqst);
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);
return;

/* If the incoming reply terminated a pending RPC, the next
@@ -1345,7 +1345,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *rep)
/* Match incoming rpcrdma_rep to an rpcrdma_req to
* get context for handling any incoming chunks.
*/
- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
rqst = xprt_lookup_rqst(xprt, rep->rr_xid);
if (!rqst)
goto out_norqst;
@@ -1357,7 +1357,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *rep)
credits = buf->rb_max_requests;
buf->rb_credits = credits;

- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);

req = rpcr_to_rdmar(rqst);
req->rl_reply = rep;
@@ -1378,7 +1378,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *rep)
* is corrupt.
*/
out_norqst:
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);
trace_xprtrdma_reply_rqst(rep);
goto repost;

diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index a68180090554..09b12b7568fe 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -56,7 +56,7 @@ int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt, __be32 *rdma_resp,
if (src->iov_len < 24)
goto out_shortreply;

- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
req = xprt_lookup_rqst(xprt, xid);
if (!req)
goto out_notfound;
@@ -86,7 +86,7 @@ int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt, __be32 *rdma_resp,
rcvbuf->len = 0;

out_unlock:
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);
out:
return ret;

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 3fbccebd0b10..8d6404259ff9 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -966,12 +966,12 @@ static void xs_local_data_read_skb(struct rpc_xprt *xprt,
return;

/* Look up and lock the request corresponding to the given XID */
- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
rovr = xprt_lookup_rqst(xprt, *xp);
if (!rovr)
goto out_unlock;
xprt_pin_rqst(rovr);
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);
task = rovr->rq_task;

copied = rovr->rq_private_buf.buflen;
@@ -980,16 +980,16 @@ static void xs_local_data_read_skb(struct rpc_xprt *xprt,

if (xs_local_copy_to_xdr(&rovr->rq_private_buf, skb)) {
dprintk("RPC: sk_buff copy failed\n");
- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
goto out_unpin;
}

- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
xprt_complete_rqst(task, copied);
out_unpin:
xprt_unpin_rqst(rovr);
out_unlock:
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);
}

static void xs_local_data_receive(struct sock_xprt *transport)
@@ -1058,13 +1058,13 @@ static void xs_udp_data_read_skb(struct rpc_xprt *xprt,
return;

/* Look up and lock the request corresponding to the given XID */
- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
rovr = xprt_lookup_rqst(xprt, *xp);
if (!rovr)
goto out_unlock;
xprt_pin_rqst(rovr);
xprt_update_rtt(rovr->rq_task);
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);
task = rovr->rq_task;

if ((copied = rovr->rq_private_buf.buflen) > repsize)
@@ -1072,7 +1072,7 @@ static void xs_udp_data_read_skb(struct rpc_xprt *xprt,

/* Suck it into the iovec, verify checksum if not done by hw. */
if (csum_partial_copy_to_xdr(&rovr->rq_private_buf, skb)) {
- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
__UDPX_INC_STATS(sk, UDP_MIB_INERRORS);
goto out_unpin;
}
@@ -1081,13 +1081,13 @@ static void xs_udp_data_read_skb(struct rpc_xprt *xprt,
spin_lock_bh(&xprt->transport_lock);
xprt_adjust_cwnd(xprt, task, copied);
spin_unlock_bh(&xprt->transport_lock);
- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
xprt_complete_rqst(task, copied);
__UDPX_INC_STATS(sk, UDP_MIB_INDATAGRAMS);
out_unpin:
xprt_unpin_rqst(rovr);
out_unlock:
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);
}

static void xs_udp_data_receive(struct sock_xprt *transport)
@@ -1356,24 +1356,24 @@ static inline int xs_tcp_read_reply(struct rpc_xprt *xprt,
dprintk("RPC: read reply XID %08x\n", ntohl(transport->recv.xid));

/* Find and lock the request corresponding to this xid */
- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
req = xprt_lookup_rqst(xprt, transport->recv.xid);
if (!req) {
dprintk("RPC: XID %08x request not found!\n",
ntohl(transport->recv.xid));
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);
return -1;
}
xprt_pin_rqst(req);
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);

xs_tcp_read_common(xprt, desc, req);

- spin_lock(&xprt->recv_lock);
+ spin_lock(&xprt->queue_lock);
if (!(transport->recv.flags & TCP_RCV_COPY_DATA))
xprt_complete_rqst(req->rq_task, transport->recv.copied);
xprt_unpin_rqst(req);
- spin_unlock(&xprt->recv_lock);
+ spin_unlock(&xprt->queue_lock);
return 0;
}

--
2.17.1

2018-09-17 18:31:37

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 20/44] SUNRPC: Refactor RPC call encoding

Move the call encoding so that it occurs before the transport connection
etc.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprt.h | 1 +
net/sunrpc/clnt.c | 81 ++++++++++++++++++++++---------------
net/sunrpc/xprt.c | 22 +++++-----
3 files changed, 63 insertions(+), 41 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 81a6c2c8dfc7..b8a7de161f67 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -347,6 +347,7 @@ bool xprt_prepare_transmit(struct rpc_task *task);
void xprt_request_enqueue_transmit(struct rpc_task *task);
void xprt_request_enqueue_receive(struct rpc_task *task);
void xprt_request_wait_receive(struct rpc_task *task);
+bool xprt_request_need_retransmit(struct rpc_task *task);
void xprt_transmit(struct rpc_task *task);
void xprt_end_transmit(struct rpc_task *task);
int xprt_adjust_timeout(struct rpc_rqst *req);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index c1a19a3e1356..64159716be30 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -61,6 +61,7 @@ static void call_start(struct rpc_task *task);
static void call_reserve(struct rpc_task *task);
static void call_reserveresult(struct rpc_task *task);
static void call_allocate(struct rpc_task *task);
+static void call_encode(struct rpc_task *task);
static void call_decode(struct rpc_task *task);
static void call_bind(struct rpc_task *task);
static void call_bind_status(struct rpc_task *task);
@@ -1140,7 +1141,8 @@ struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req)
struct xdr_buf *xbufp = &req->rq_snd_buf;
struct rpc_task_setup task_setup_data = {
.callback_ops = &rpc_default_ops,
- .flags = RPC_TASK_SOFTCONN,
+ .flags = RPC_TASK_SOFTCONN |
+ RPC_TASK_NO_RETRANS_TIMEOUT,
};

dprintk("RPC: rpc_run_bc_task req= %p\n", req);
@@ -1160,7 +1162,6 @@ struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req)
task->tk_action = call_bc_transmit;
atomic_inc(&task->tk_count);
WARN_ON_ONCE(atomic_read(&task->tk_count) != 2);
- xprt_request_enqueue_transmit(task);
rpc_execute(task);

dprintk("RPC: rpc_run_bc_task: task= %p\n", task);
@@ -1680,7 +1681,7 @@ call_allocate(struct rpc_task *task)
dprint_status(task);

task->tk_status = 0;
- task->tk_action = call_bind;
+ task->tk_action = call_encode;

if (req->rq_buffer)
return;
@@ -1724,12 +1725,12 @@ call_allocate(struct rpc_task *task)
static int
rpc_task_need_encode(struct rpc_task *task)
{
- return test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate) == 0;
+ return test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate) == 0 &&
+ (!(task->tk_flags & RPC_TASK_SENT) ||
+ !(task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT) ||
+ xprt_request_need_retransmit(task));
}

-/*
- * 3. Encode arguments of an RPC call
- */
static void
rpc_xdr_encode(struct rpc_task *task)
{
@@ -1745,6 +1746,7 @@ rpc_xdr_encode(struct rpc_task *task)
xdr_buf_init(&req->rq_rcv_buf,
req->rq_rbuffer,
req->rq_rcvsize);
+ req->rq_bytes_sent = 0;

p = rpc_encode_header(task);
if (p == NULL) {
@@ -1761,6 +1763,34 @@ rpc_xdr_encode(struct rpc_task *task)
task->tk_msg.rpc_argp);
}

+/*
+ * 3. Encode arguments of an RPC call
+ */
+static void
+call_encode(struct rpc_task *task)
+{
+ if (!rpc_task_need_encode(task))
+ goto out;
+ /* Encode here so that rpcsec_gss can use correct sequence number. */
+ rpc_xdr_encode(task);
+ /* Did the encode result in an error condition? */
+ if (task->tk_status != 0) {
+ /* Was the error nonfatal? */
+ if (task->tk_status == -EAGAIN)
+ rpc_delay(task, HZ >> 4);
+ else
+ rpc_exit(task, task->tk_status);
+ return;
+ }
+
+ /* Add task to reply queue before transmission to avoid races */
+ if (rpc_reply_expected(task))
+ xprt_request_enqueue_receive(task);
+ xprt_request_enqueue_transmit(task);
+out:
+ task->tk_action = call_bind;
+}
+
/*
* 4. Get the server port number if not yet set
*/
@@ -1945,24 +1975,8 @@ call_transmit(struct rpc_task *task)
dprint_status(task);

task->tk_action = call_transmit_status;
- /* Encode here so that rpcsec_gss can use correct sequence number. */
- if (rpc_task_need_encode(task)) {
- rpc_xdr_encode(task);
- /* Did the encode result in an error condition? */
- if (task->tk_status != 0) {
- /* Was the error nonfatal? */
- if (task->tk_status == -EAGAIN)
- rpc_delay(task, HZ >> 4);
- else
- rpc_exit(task, task->tk_status);
- return;
- }
- }
-
- /* Add task to reply queue before transmission to avoid races */
- if (rpc_reply_expected(task))
- xprt_request_enqueue_receive(task);
- xprt_request_enqueue_transmit(task);
+ if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
+ return;

if (!xprt_prepare_transmit(task))
return;
@@ -1997,9 +2011,9 @@ call_transmit_status(struct rpc_task *task)
xprt_end_transmit(task);
break;
case -EBADMSG:
- task->tk_action = call_transmit;
- task->tk_status = 0;
xprt_end_transmit(task);
+ task->tk_status = 0;
+ task->tk_action = call_encode;
break;
/*
* Special cases: if we've been waiting on the
@@ -2048,6 +2062,9 @@ call_bc_transmit(struct rpc_task *task)
{
struct rpc_rqst *req = task->tk_rqstp;

+ if (rpc_task_need_encode(task))
+ xprt_request_enqueue_transmit(task);
+
if (!xprt_prepare_transmit(task))
goto out_retry;

@@ -2169,7 +2186,7 @@ call_status(struct rpc_task *task)
case -EPIPE:
case -ENOTCONN:
case -EAGAIN:
- task->tk_action = call_bind;
+ task->tk_action = call_encode;
break;
case -EIO:
/* shutdown or soft timeout */
@@ -2234,7 +2251,7 @@ call_timeout(struct rpc_task *task)
rpcauth_invalcred(task);

retry:
- task->tk_action = call_bind;
+ task->tk_action = call_encode;
task->tk_status = 0;
}

@@ -2278,7 +2295,7 @@ call_decode(struct rpc_task *task)

if (req->rq_rcv_buf.len < 12) {
if (!RPC_IS_SOFT(task)) {
- task->tk_action = call_bind;
+ task->tk_action = call_encode;
goto out_retry;
}
dprintk("RPC: %s: too small RPC reply size (%d bytes)\n",
@@ -2409,7 +2426,7 @@ rpc_verify_header(struct rpc_task *task)
task->tk_garb_retry--;
dprintk("RPC: %5u %s: retry garbled creds\n",
task->tk_pid, __func__);
- task->tk_action = call_bind;
+ task->tk_action = call_encode;
goto out_retry;
case RPC_AUTH_TOOWEAK:
printk(KERN_NOTICE "RPC: server %s requires stronger "
@@ -2478,7 +2495,7 @@ rpc_verify_header(struct rpc_task *task)
task->tk_garb_retry--;
dprintk("RPC: %5u %s: retrying\n",
task->tk_pid, __func__);
- task->tk_action = call_bind;
+ task->tk_action = call_encode;
out_retry:
return ERR_PTR(-EAGAIN);
}
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 39a6f6e8ae01..426a3a05e075 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1057,18 +1057,10 @@ void xprt_request_wait_receive(struct rpc_task *task)
spin_unlock(&xprt->queue_lock);
}

-static bool
-xprt_request_need_transmit(struct rpc_task *task)
-{
- return !(task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT) ||
- xprt_request_retransmit_after_disconnect(task);
-}
-
static bool
xprt_request_need_enqueue_transmit(struct rpc_task *task, struct rpc_rqst *req)
{
- return xprt_request_need_transmit(task) &&
- !test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
+ return !test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
}

/**
@@ -1123,6 +1115,18 @@ xprt_request_dequeue_transmit(struct rpc_task *task)
spin_unlock(&xprt->queue_lock);
}

+/**
+ * xprt_request_need_retransmit - Test if a task needs retransmission
+ * @task: pointer to rpc_task
+ *
+ * Test for whether a connection breakage requires the task to retransmit
+ */
+bool
+xprt_request_need_retransmit(struct rpc_task *task)
+{
+ return xprt_request_retransmit_after_disconnect(task);
+}
+
/**
* xprt_prepare_transmit - reserve the transport before sending a request
* @task: RPC task about to send a request
--
2.17.1

2018-09-17 18:31:47

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 29/44] SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit queue

Rather than forcing each and every RPC task to grab the socket write
lock in order to send itself, we allow whichever task is holding the
write lock to attempt to drain the entire transmit queue.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xprt.c | 71 +++++++++++++++++++++++++++++++++++++++--------
1 file changed, 60 insertions(+), 11 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 68974966b2e4..ae1109c7b9b4 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1223,15 +1223,20 @@ void xprt_end_transmit(struct rpc_task *task)
}

/**
- * xprt_transmit - send an RPC request on a transport
- * @task: controlling RPC task
+ * xprt_request_transmit - send an RPC request on a transport
+ * @req: pointer to request to transmit
+ * @snd_task: RPC task that owns the transport lock
*
- * We have to copy the iovec because sendmsg fiddles with its contents.
+ * This performs the transmission of a single request.
+ * Note that if the request is not the same as snd_task, then it
+ * does need to be pinned.
+ * Returns '0' on success.
*/
-void xprt_transmit(struct rpc_task *task)
+static int
+xprt_request_transmit(struct rpc_rqst *req, struct rpc_task *snd_task)
{
- struct rpc_rqst *req = task->tk_rqstp;
- struct rpc_xprt *xprt = req->rq_xprt;
+ struct rpc_xprt *xprt = req->rq_xprt;
+ struct rpc_task *task = req->rq_task;
unsigned int connect_cookie;
int is_retrans = RPC_WAS_SENT(task);
int status;
@@ -1239,11 +1244,13 @@ void xprt_transmit(struct rpc_task *task)
dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen);

if (!req->rq_bytes_sent) {
- if (xprt_request_data_received(task))
+ if (xprt_request_data_received(task)) {
+ status = 0;
goto out_dequeue;
+ }
/* Verify that our message lies in the RPCSEC_GSS window */
if (rpcauth_xmit_need_reencode(task)) {
- task->tk_status = -EBADMSG;
+ status = -EBADMSG;
goto out_dequeue;
}
}
@@ -1256,12 +1263,11 @@ void xprt_transmit(struct rpc_task *task)
req->rq_ntrans++;

connect_cookie = xprt->connect_cookie;
- status = xprt->ops->send_request(req, task);
+ status = xprt->ops->send_request(req, snd_task);
trace_xprt_transmit(xprt, req->rq_xid, status);
if (status != 0) {
req->rq_ntrans--;
- task->tk_status = status;
- return;
+ return status;
}

if (is_retrans)
@@ -1283,6 +1289,49 @@ void xprt_transmit(struct rpc_task *task)
req->rq_connect_cookie = connect_cookie;
out_dequeue:
xprt_request_dequeue_transmit(task);
+ rpc_wake_up_queued_task_set_status(&xprt->sending, task, status);
+ return status;
+}
+
+/**
+ * xprt_transmit - send an RPC request on a transport
+ * @task: controlling RPC task
+ *
+ * Attempts to drain the transmit queue. On exit, either the transport
+ * signalled an error that needs to be handled before transmission can
+ * resume, or @task finished transmitting, and detected that it already
+ * received a reply.
+ */
+void
+xprt_transmit(struct rpc_task *task)
+{
+ struct rpc_rqst *next, *req = task->tk_rqstp;
+ struct rpc_xprt *xprt = req->rq_xprt;
+ int status;
+
+ spin_lock(&xprt->queue_lock);
+ while (!list_empty(&xprt->xmit_queue)) {
+ next = list_first_entry(&xprt->xmit_queue,
+ struct rpc_rqst, rq_xmit);
+ xprt_pin_rqst(next);
+ spin_unlock(&xprt->queue_lock);
+ status = xprt_request_transmit(next, task);
+ if (status == -EBADMSG && next != req)
+ status = 0;
+ cond_resched();
+ spin_lock(&xprt->queue_lock);
+ xprt_unpin_rqst(next);
+ if (status == 0) {
+ if (!xprt_request_data_received(task) ||
+ test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
+ continue;
+ } else if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
+ rpc_wake_up_queued_task(&xprt->pending, task);
+ else
+ task->tk_status = status;
+ break;
+ }
+ spin_unlock(&xprt->queue_lock);
}

static void xprt_add_backlog(struct rpc_xprt *xprt, struct rpc_task *task)
--
2.17.1

2018-09-17 18:31:45

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 27/44] SUNRPC: Support for congestion control when queuing is enabled

Both RDMA and UDP transports require the request to get a "congestion control"
credit before they can be transmitted. Right now, this is done when
the request locks the socket. We'd like it to happen when a request attempts
to be transmitted for the first time.
In order to support retransmission of requests that already hold such
credits, we also want to ensure that they get queued first, so that we
don't deadlock with requests that have yet to obtain a credit.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprt.h | 2 +
net/sunrpc/clnt.c | 5 ++
net/sunrpc/xprt.c | 128 +++++++++++++++++++++---------
net/sunrpc/xprtrdma/backchannel.c | 3 +
net/sunrpc/xprtrdma/transport.c | 3 +
net/sunrpc/xprtsock.c | 4 +
6 files changed, 109 insertions(+), 36 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index e377620b9744..0d0cc127615e 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -397,6 +397,7 @@ void xprt_complete_rqst(struct rpc_task *task, int copied);
void xprt_pin_rqst(struct rpc_rqst *req);
void xprt_unpin_rqst(struct rpc_rqst *req);
void xprt_release_rqst_cong(struct rpc_task *task);
+bool xprt_request_get_cong(struct rpc_xprt *xprt, struct rpc_rqst *req);
void xprt_disconnect_done(struct rpc_xprt *xprt);
void xprt_force_disconnect(struct rpc_xprt *xprt);
void xprt_conditional_disconnect(struct rpc_xprt *xprt, unsigned int cookie);
@@ -415,6 +416,7 @@ void xprt_unlock_connect(struct rpc_xprt *, void *);
#define XPRT_BINDING (5)
#define XPRT_CLOSING (6)
#define XPRT_CONGESTED (9)
+#define XPRT_CWND_WAIT (10)

static inline void xprt_set_connected(struct rpc_xprt *xprt)
{
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 8dc3d33827c4..f03911f84953 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1996,6 +1996,11 @@ call_transmit_status(struct rpc_task *task)
dprint_status(task);
xprt_end_transmit(task);
break;
+ case -EBADSLT:
+ xprt_end_transmit(task);
+ task->tk_action = call_transmit;
+ task->tk_status = 0;
+ break;
case -EBADMSG:
xprt_end_transmit(task);
task->tk_status = 0;
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 3e68f35f71f6..e07a54fbe1e7 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -68,8 +68,6 @@
static void xprt_init(struct rpc_xprt *xprt, struct net *net);
static __be32 xprt_alloc_xid(struct rpc_xprt *xprt);
static void xprt_connect_status(struct rpc_task *task);
-static int __xprt_get_cong(struct rpc_xprt *, struct rpc_task *);
-static void __xprt_put_cong(struct rpc_xprt *, struct rpc_rqst *);
static void xprt_destroy(struct rpc_xprt *xprt);

static DEFINE_SPINLOCK(xprt_list_lock);
@@ -221,6 +219,31 @@ static void xprt_clear_locked(struct rpc_xprt *xprt)
queue_work(xprtiod_workqueue, &xprt->task_cleanup);
}

+static bool
+xprt_need_congestion_window_wait(struct rpc_xprt *xprt)
+{
+ return test_bit(XPRT_CWND_WAIT, &xprt->state);
+}
+
+static void
+xprt_set_congestion_window_wait(struct rpc_xprt *xprt)
+{
+ if (!list_empty(&xprt->xmit_queue)) {
+ /* Peek at head of queue to see if it can make progress */
+ if (list_first_entry(&xprt->xmit_queue, struct rpc_rqst,
+ rq_xmit)->rq_cong)
+ return;
+ }
+ set_bit(XPRT_CWND_WAIT, &xprt->state);
+}
+
+static void
+xprt_test_and_clear_congestion_window_wait(struct rpc_xprt *xprt)
+{
+ if (!RPCXPRT_CONGESTED(xprt))
+ clear_bit(XPRT_CWND_WAIT, &xprt->state);
+}
+
/*
* xprt_reserve_xprt_cong - serialize write access to transports
* @task: task that is requesting access to the transport
@@ -228,6 +251,7 @@ static void xprt_clear_locked(struct rpc_xprt *xprt)
* Same as xprt_reserve_xprt, but Van Jacobson congestion control is
* integrated into the decision of whether a request is allowed to be
* woken up and given access to the transport.
+ * Note that the lock is only granted if we know there are free slots.
*/
int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task)
{
@@ -243,14 +267,12 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task)
xprt->snd_task = task;
return 1;
}
- if (__xprt_get_cong(xprt, task)) {
+ if (!xprt_need_congestion_window_wait(xprt)) {
xprt->snd_task = task;
return 1;
}
xprt_clear_locked(xprt);
out_sleep:
- if (req)
- __xprt_put_cong(xprt, req);
dprintk("RPC: %5u failed to lock transport %p\n", task->tk_pid, xprt);
task->tk_timeout = 0;
task->tk_status = -EAGAIN;
@@ -294,32 +316,14 @@ static void __xprt_lock_write_next(struct rpc_xprt *xprt)
xprt_clear_locked(xprt);
}

-static bool __xprt_lock_write_cong_func(struct rpc_task *task, void *data)
-{
- struct rpc_xprt *xprt = data;
- struct rpc_rqst *req;
-
- req = task->tk_rqstp;
- if (req == NULL) {
- xprt->snd_task = task;
- return true;
- }
- if (__xprt_get_cong(xprt, task)) {
- xprt->snd_task = task;
- req->rq_ntrans++;
- return true;
- }
- return false;
-}
-
static void __xprt_lock_write_next_cong(struct rpc_xprt *xprt)
{
if (test_and_set_bit(XPRT_LOCKED, &xprt->state))
return;
- if (RPCXPRT_CONGESTED(xprt))
+ if (xprt_need_congestion_window_wait(xprt))
goto out_unlock;
if (rpc_wake_up_first_on_wq(xprtiod_workqueue, &xprt->sending,
- __xprt_lock_write_cong_func, xprt))
+ __xprt_lock_write_func, xprt))
return;
out_unlock:
xprt_clear_locked(xprt);
@@ -370,16 +374,16 @@ static inline void xprt_release_write(struct rpc_xprt *xprt, struct rpc_task *ta
* overflowed. Put the task to sleep if this is the case.
*/
static int
-__xprt_get_cong(struct rpc_xprt *xprt, struct rpc_task *task)
+__xprt_get_cong(struct rpc_xprt *xprt, struct rpc_rqst *req)
{
- struct rpc_rqst *req = task->tk_rqstp;
-
if (req->rq_cong)
return 1;
dprintk("RPC: %5u xprt_cwnd_limited cong = %lu cwnd = %lu\n",
- task->tk_pid, xprt->cong, xprt->cwnd);
- if (RPCXPRT_CONGESTED(xprt))
+ req->rq_task->tk_pid, xprt->cong, xprt->cwnd);
+ if (RPCXPRT_CONGESTED(xprt)) {
+ xprt_set_congestion_window_wait(xprt);
return 0;
+ }
req->rq_cong = 1;
xprt->cong += RPC_CWNDSCALE;
return 1;
@@ -396,9 +400,31 @@ __xprt_put_cong(struct rpc_xprt *xprt, struct rpc_rqst *req)
return;
req->rq_cong = 0;
xprt->cong -= RPC_CWNDSCALE;
+ xprt_test_and_clear_congestion_window_wait(xprt);
__xprt_lock_write_next_cong(xprt);
}

+/**
+ * xprt_request_get_cong - Request congestion control credits
+ * @xprt: pointer to transport
+ * @req: pointer to RPC request
+ *
+ * Useful for transports that require congestion control.
+ */
+bool
+xprt_request_get_cong(struct rpc_xprt *xprt, struct rpc_rqst *req)
+{
+ bool ret = false;
+
+ if (req->rq_cong)
+ return true;
+ spin_lock_bh(&xprt->transport_lock);
+ ret = __xprt_get_cong(xprt, req) != 0;
+ spin_unlock_bh(&xprt->transport_lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(xprt_request_get_cong);
+
/**
* xprt_release_rqst_cong - housekeeping when request is complete
* @task: RPC request that recently completed
@@ -413,6 +439,20 @@ void xprt_release_rqst_cong(struct rpc_task *task)
}
EXPORT_SYMBOL_GPL(xprt_release_rqst_cong);

+/*
+ * Clear the congestion window wait flag and wake up the next
+ * entry on xprt->sending
+ */
+static void
+xprt_clear_congestion_window_wait(struct rpc_xprt *xprt)
+{
+ if (test_and_clear_bit(XPRT_CWND_WAIT, &xprt->state)) {
+ spin_lock_bh(&xprt->transport_lock);
+ __xprt_lock_write_next_cong(xprt);
+ spin_unlock_bh(&xprt->transport_lock);
+ }
+}
+
/**
* xprt_adjust_cwnd - adjust transport congestion window
* @xprt: pointer to xprt
@@ -1057,12 +1097,28 @@ xprt_request_enqueue_transmit(struct rpc_task *task)

if (xprt_request_need_enqueue_transmit(task, req)) {
spin_lock(&xprt->queue_lock);
- list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) {
- if (pos->rq_task->tk_owner != task->tk_owner)
- continue;
- list_add_tail(&req->rq_xmit2, &pos->rq_xmit2);
- INIT_LIST_HEAD(&req->rq_xmit);
- goto out;
+ /*
+ * Requests that carry congestion control credits are added
+ * to the head of the list to avoid starvation issues.
+ */
+ if (req->rq_cong) {
+ xprt_clear_congestion_window_wait(xprt);
+ list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) {
+ if (pos->rq_cong)
+ continue;
+ /* Note: req is added _before_ pos */
+ list_add_tail(&req->rq_xmit, &pos->rq_xmit);
+ INIT_LIST_HEAD(&req->rq_xmit2);
+ goto out;
+ }
+ } else {
+ list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) {
+ if (pos->rq_task->tk_owner != task->tk_owner)
+ continue;
+ list_add_tail(&req->rq_xmit2, &pos->rq_xmit2);
+ INIT_LIST_HEAD(&req->rq_xmit);
+ goto out;
+ }
}
list_add_tail(&req->rq_xmit, &xprt->xmit_queue);
INIT_LIST_HEAD(&req->rq_xmit2);
diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c
index ed58761e6b23..e7c445cee16f 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -200,6 +200,9 @@ int xprt_rdma_bc_send_reply(struct rpc_rqst *rqst)
if (!xprt_connected(rqst->rq_xprt))
goto drop_connection;

+ if (!xprt_request_get_cong(rqst->rq_xprt, rqst))
+ return -EBADSLT;
+
rc = rpcrdma_bc_marshal_reply(rqst);
if (rc < 0)
goto failed_marshal;
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index fa684bf4d090..9ff322e53f37 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -721,6 +721,9 @@ xprt_rdma_send_request(struct rpc_rqst *rqst, struct rpc_task *task)
if (!xprt_connected(xprt))
goto drop_connection;

+ if (!xprt_request_get_cong(xprt, rqst))
+ return -EBADSLT;
+
rc = rpcrdma_marshal_req(r_xprt, rqst);
if (rc < 0)
goto failed_marshal;
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index b8143eded4af..8831e84a058a 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -609,6 +609,10 @@ static int xs_udp_send_request(struct rpc_rqst *req, struct rpc_task *task)

if (!xprt_bound(xprt))
return -ENOTCONN;
+
+ if (!xprt_request_get_cong(xprt, req))
+ return -EBADSLT;
+
req->rq_xtime = ktime_get();
status = xs_sendpages(transport->sock, xs_addr(xprt), xprt->addrlen,
xdr, 0, true, &sent);
--
2.17.1

2018-09-17 18:32:00

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 43/44] SUNRPC: Clean up xs_udp_data_receive()

Simplify the retry logic.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xprtsock.c | 17 +++++------------
1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 1daa179b7706..175347f62875 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1341,25 +1341,18 @@ static void xs_udp_data_receive(struct sock_xprt *transport)
struct sock *sk;
int err;

-restart:
mutex_lock(&transport->recv_mutex);
sk = transport->inet;
if (sk == NULL)
goto out;
+ clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state);
for (;;) {
skb = skb_recv_udp(sk, 0, 1, &err);
- if (skb != NULL) {
- xs_udp_data_read_skb(&transport->xprt, sk, skb);
- consume_skb(skb);
- continue;
- }
- if (!test_and_clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state))
+ if (skb == NULL)
break;
- if (need_resched()) {
- mutex_unlock(&transport->recv_mutex);
- cond_resched();
- goto restart;
- }
+ xs_udp_data_read_skb(&transport->xprt, sk, skb);
+ consume_skb(skb);
+ cond_resched();
}
out:
mutex_unlock(&transport->recv_mutex);
--
2.17.1

2018-09-17 18:31:59

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 42/44] SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xdr.h | 1 -
net/sunrpc/socklib.c | 4 +-
net/sunrpc/xprtsock.c | 137 +++++--------------------------------
3 files changed, 18 insertions(+), 124 deletions(-)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 745587132a87..8815be7cae72 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -185,7 +185,6 @@ struct xdr_skb_reader {

typedef size_t (*xdr_skb_read_actor)(struct xdr_skb_reader *desc, void *to, size_t len);

-size_t xdr_skb_read_bits(struct xdr_skb_reader *desc, void *to, size_t len);
extern int csum_partial_copy_to_xdr(struct xdr_buf *, struct sk_buff *);
extern ssize_t xdr_partial_copy_from_skb(struct xdr_buf *, unsigned int,
struct xdr_skb_reader *, xdr_skb_read_actor);
diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c
index 08f00a98151f..0e7c0dee7578 100644
--- a/net/sunrpc/socklib.c
+++ b/net/sunrpc/socklib.c
@@ -26,7 +26,8 @@
* Possibly called several times to iterate over an sk_buff and copy
* data out of it.
*/
-size_t xdr_skb_read_bits(struct xdr_skb_reader *desc, void *to, size_t len)
+static size_t
+xdr_skb_read_bits(struct xdr_skb_reader *desc, void *to, size_t len)
{
if (len > desc->count)
len = desc->count;
@@ -36,7 +37,6 @@ size_t xdr_skb_read_bits(struct xdr_skb_reader *desc, void *to, size_t len)
desc->offset += len;
return len;
}
-EXPORT_SYMBOL_GPL(xdr_skb_read_bits);

/**
* xdr_skb_read_and_csum_bits - copy and checksum from skb to buffer
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 15364e2746bd..1daa179b7706 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -667,6 +667,17 @@ static void xs_stream_data_receive_workfn(struct work_struct *work)
xs_stream_data_receive(transport);
}

+static void
+xs_stream_reset_connect(struct sock_xprt *transport)
+{
+ transport->recv.offset = 0;
+ transport->recv.len = 0;
+ transport->recv.copied = 0;
+ transport->xmit.offset = 0;
+ transport->xprt.stat.connect_count++;
+ transport->xprt.stat.connect_start = jiffies;
+}
+
#define XS_SENDMSG_FLAGS (MSG_DONTWAIT | MSG_NOSIGNAL)

static int xs_send_kvec(struct socket *sock, struct sockaddr *addr, int addrlen, struct kvec *vec, unsigned int base, int more)
@@ -1263,114 +1274,6 @@ static void xs_destroy(struct rpc_xprt *xprt)
module_put(THIS_MODULE);
}

-static int xs_local_copy_to_xdr(struct xdr_buf *xdr, struct sk_buff *skb)
-{
- struct xdr_skb_reader desc = {
- .skb = skb,
- .offset = sizeof(rpc_fraghdr),
- .count = skb->len - sizeof(rpc_fraghdr),
- };
-
- if (xdr_partial_copy_from_skb(xdr, 0, &desc, xdr_skb_read_bits) < 0)
- return -1;
- if (desc.count)
- return -1;
- return 0;
-}
-
-/**
- * xs_local_data_read_skb
- * @xprt: transport
- * @sk: socket
- * @skb: skbuff
- *
- * Currently this assumes we can read the whole reply in a single gulp.
- */
-static void xs_local_data_read_skb(struct rpc_xprt *xprt,
- struct sock *sk,
- struct sk_buff *skb)
-{
- struct rpc_task *task;
- struct rpc_rqst *rovr;
- int repsize, copied;
- u32 _xid;
- __be32 *xp;
-
- repsize = skb->len - sizeof(rpc_fraghdr);
- if (repsize < 4) {
- dprintk("RPC: impossible RPC reply size %d\n", repsize);
- return;
- }
-
- /* Copy the XID from the skb... */
- xp = skb_header_pointer(skb, sizeof(rpc_fraghdr), sizeof(_xid), &_xid);
- if (xp == NULL)
- return;
-
- /* Look up and lock the request corresponding to the given XID */
- spin_lock(&xprt->queue_lock);
- rovr = xprt_lookup_rqst(xprt, *xp);
- if (!rovr)
- goto out_unlock;
- xprt_pin_rqst(rovr);
- spin_unlock(&xprt->queue_lock);
- task = rovr->rq_task;
-
- copied = rovr->rq_private_buf.buflen;
- if (copied > repsize)
- copied = repsize;
-
- if (xs_local_copy_to_xdr(&rovr->rq_private_buf, skb)) {
- dprintk("RPC: sk_buff copy failed\n");
- spin_lock(&xprt->queue_lock);
- goto out_unpin;
- }
-
- spin_lock(&xprt->queue_lock);
- xprt_complete_rqst(task, copied);
-out_unpin:
- xprt_unpin_rqst(rovr);
- out_unlock:
- spin_unlock(&xprt->queue_lock);
-}
-
-static void xs_local_data_receive(struct sock_xprt *transport)
-{
- struct sk_buff *skb;
- struct sock *sk;
- int err;
-
-restart:
- mutex_lock(&transport->recv_mutex);
- sk = transport->inet;
- if (sk == NULL)
- goto out;
- for (;;) {
- skb = skb_recv_datagram(sk, 0, 1, &err);
- if (skb != NULL) {
- xs_local_data_read_skb(&transport->xprt, sk, skb);
- skb_free_datagram(sk, skb);
- continue;
- }
- if (!test_and_clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state))
- break;
- if (need_resched()) {
- mutex_unlock(&transport->recv_mutex);
- cond_resched();
- goto restart;
- }
- }
-out:
- mutex_unlock(&transport->recv_mutex);
-}
-
-static void xs_local_data_receive_workfn(struct work_struct *work)
-{
- struct sock_xprt *transport =
- container_of(work, struct sock_xprt, recv_worker);
- xs_local_data_receive(transport);
-}
-
/**
* xs_udp_data_read_skb - receive callback for UDP sockets
* @xprt: transport
@@ -1971,11 +1874,8 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt,
write_unlock_bh(&sk->sk_callback_lock);
}

- transport->xmit.offset = 0;
+ xs_stream_reset_connect(transport);

- /* Tell the socket layer to start connecting... */
- xprt->stat.connect_count++;
- xprt->stat.connect_start = jiffies;
return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0);
}

@@ -2332,14 +2232,9 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
xs_set_memalloc(xprt);

/* Reset TCP record info */
- transport->recv.offset = 0;
- transport->recv.len = 0;
- transport->recv.copied = 0;
- transport->xmit.offset = 0;
+ xs_stream_reset_connect(transport);

/* Tell the socket layer to start connecting... */
- xprt->stat.connect_count++;
- xprt->stat.connect_start = jiffies;
set_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
switch (ret) {
@@ -2714,6 +2609,7 @@ static const struct rpc_xprt_ops xs_local_ops = {
.connect = xs_local_connect,
.buf_alloc = rpc_malloc,
.buf_free = rpc_free,
+ .prepare_request = xs_stream_prepare_request,
.send_request = xs_local_send_request,
.set_retrans_timeout = xprt_set_retrans_timeout_def,
.close = xs_close,
@@ -2898,9 +2794,8 @@ static struct rpc_xprt *xs_setup_local(struct xprt_create *args)
xprt->ops = &xs_local_ops;
xprt->timeout = &xs_local_default_timeout;

- INIT_WORK(&transport->recv_worker, xs_local_data_receive_workfn);
- INIT_DELAYED_WORK(&transport->connect_worker,
- xs_dummy_setup_socket);
+ INIT_WORK(&transport->recv_worker, xs_stream_data_receive_workfn);
+ INIT_DELAYED_WORK(&transport->connect_worker, xs_dummy_setup_socket);

switch (sun->sun_family) {
case AF_LOCAL:
--
2.17.1

2018-09-17 18:31:49

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 31/44] SUNRPC: Turn off throttling of RPC slots for TCP sockets

The theory was that we would need to grab the socket lock anyway, so we
might as well use it to gate the allocation of RPC slots for a TCP
socket.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprt.h | 1 -
net/sunrpc/xprt.c | 14 --------------
net/sunrpc/xprtsock.c | 2 +-
3 files changed, 1 insertion(+), 16 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 0d0cc127615e..14c9b4d49fb4 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -343,7 +343,6 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task);
void xprt_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task);
void xprt_free_slot(struct rpc_xprt *xprt,
struct rpc_rqst *req);
-void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task);
bool xprt_prepare_transmit(struct rpc_task *task);
void xprt_request_enqueue_transmit(struct rpc_task *task);
void xprt_request_enqueue_receive(struct rpc_task *task);
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index a523e59a074e..6bdc10147297 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1428,20 +1428,6 @@ void xprt_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task)
}
EXPORT_SYMBOL_GPL(xprt_alloc_slot);

-void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task)
-{
- /* Note: grabbing the xprt_lock_write() ensures that we throttle
- * new slot allocation if the transport is congested (i.e. when
- * reconnecting a stream transport or when out of socket write
- * buffer space).
- */
- if (xprt_lock_write(xprt, task)) {
- xprt_alloc_slot(xprt, task);
- xprt_release_write(xprt, task);
- }
-}
-EXPORT_SYMBOL_GPL(xprt_lock_and_alloc_slot);
-
void xprt_free_slot(struct rpc_xprt *xprt, struct rpc_rqst *req)
{
spin_lock(&xprt->reserve_lock);
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 8831e84a058a..f54e8110f4c6 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2809,7 +2809,7 @@ static const struct rpc_xprt_ops xs_udp_ops = {
static const struct rpc_xprt_ops xs_tcp_ops = {
.reserve_xprt = xprt_reserve_xprt,
.release_xprt = xprt_release_xprt,
- .alloc_slot = xprt_lock_and_alloc_slot,
+ .alloc_slot = xprt_alloc_slot,
.free_slot = xprt_free_slot,
.rpcbind = rpcb_getport_async,
.set_port = xs_set_port,
--
2.17.1

2018-09-17 18:32:02

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 44/44] SUNRPC: Unexport xdr_partial_copy_from_skb()

It is no longer used outside of net/sunrpc/socklib.c

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xdr.h | 2 --
net/sunrpc/socklib.c | 4 ++--
2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 8815be7cae72..43106ffa6788 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -186,8 +186,6 @@ struct xdr_skb_reader {
typedef size_t (*xdr_skb_read_actor)(struct xdr_skb_reader *desc, void *to, size_t len);

extern int csum_partial_copy_to_xdr(struct xdr_buf *, struct sk_buff *);
-extern ssize_t xdr_partial_copy_from_skb(struct xdr_buf *, unsigned int,
- struct xdr_skb_reader *, xdr_skb_read_actor);

extern int xdr_encode_word(struct xdr_buf *, unsigned int, u32);
extern int xdr_decode_word(struct xdr_buf *, unsigned int, u32 *);
diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c
index 0e7c0dee7578..9062967575c4 100644
--- a/net/sunrpc/socklib.c
+++ b/net/sunrpc/socklib.c
@@ -69,7 +69,8 @@ static size_t xdr_skb_read_and_csum_bits(struct xdr_skb_reader *desc, void *to,
* @copy_actor: virtual method for copying data
*
*/
-ssize_t xdr_partial_copy_from_skb(struct xdr_buf *xdr, unsigned int base, struct xdr_skb_reader *desc, xdr_skb_read_actor copy_actor)
+static ssize_t
+xdr_partial_copy_from_skb(struct xdr_buf *xdr, unsigned int base, struct xdr_skb_reader *desc, xdr_skb_read_actor copy_actor)
{
struct page **ppage = xdr->pages;
unsigned int len, pglen = xdr->page_len;
@@ -140,7 +141,6 @@ ssize_t xdr_partial_copy_from_skb(struct xdr_buf *xdr, unsigned int base, struct
out:
return copied;
}
-EXPORT_SYMBOL_GPL(xdr_partial_copy_from_skb);

/**
* csum_partial_copy_to_xdr - checksum and copy data
--
2.17.1

2018-09-17 18:31:33

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 17/44] SUNRPC: Minor cleanup for call_transmit()

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/clnt.c | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 775d6e80b6e8..be0f06a8156b 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1946,9 +1946,7 @@ call_transmit(struct rpc_task *task)

dprint_status(task);

- task->tk_action = call_status;
- if (task->tk_status < 0)
- return;
+ task->tk_action = call_transmit_status;
/* Encode here so that rpcsec_gss can use correct sequence number. */
if (rpc_task_need_encode(task)) {
rpc_xdr_encode(task);
@@ -1969,7 +1967,6 @@ call_transmit(struct rpc_task *task)

if (!xprt_prepare_transmit(task))
return;
- task->tk_action = call_transmit_status;
xprt_transmit(task);
if (task->tk_status < 0)
return;
@@ -1996,19 +1993,29 @@ call_transmit_status(struct rpc_task *task)
}

switch (task->tk_status) {
- case -EAGAIN:
- case -ENOBUFS:
- break;
default:
dprint_status(task);
xprt_end_transmit(task);
break;
+ case -EBADMSG:
+ clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
+ task->tk_action = call_transmit;
+ task->tk_status = 0;
+ xprt_end_transmit(task);
+ break;
/*
* Special cases: if we've been waiting on the
* socket's write_space() callback, or if the
* socket just returned a connection error,
* then hold onto the transport lock.
*/
+ case -ENOBUFS:
+ rpc_delay(task, HZ>>2);
+ /* fall through */
+ case -EAGAIN:
+ task->tk_action = call_transmit;
+ task->tk_status = 0;
+ break;
case -ECONNREFUSED:
case -EHOSTDOWN:
case -ENETDOWN:
@@ -2163,22 +2170,13 @@ call_status(struct rpc_task *task)
/* fall through */
case -EPIPE:
case -ENOTCONN:
- task->tk_action = call_bind;
- break;
- case -ENOBUFS:
- rpc_delay(task, HZ>>2);
- /* fall through */
case -EAGAIN:
- task->tk_action = call_transmit;
+ task->tk_action = call_bind;
break;
case -EIO:
/* shutdown or soft timeout */
rpc_exit(task, status);
break;
- case -EBADMSG:
- clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
- task->tk_action = call_transmit;
- break;
default:
if (clnt->cl_chatty)
printk("%s: RPC call returned error %d\n",
--
2.17.1

2018-09-17 18:31:28

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 13/44] SUNRPC: Don't wake queued RPC calls multiple times in xprt_transmit

Rather than waking up the entire queue of RPC messages a second time,
just wake up the task that was put to sleep.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xprt.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 649a40cfae6d..3a3b3445a7c0 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1079,13 +1079,10 @@ void xprt_transmit(struct rpc_task *task)
spin_lock(&xprt->recv_lock);
if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) {
rpc_sleep_on(&xprt->pending, task, xprt_timer);
- /*
- * Send an extra queue wakeup call if the
- * connection was dropped in case the call to
- * rpc_sleep_on() raced.
- */
+ /* Wake up immediately if the connection was dropped */
if (!xprt_connected(xprt))
- xprt_wake_pending_tasks(xprt, -ENOTCONN);
+ rpc_wake_up_queued_task_set_status(&xprt->pending,
+ task, -ENOTCONN);
}
spin_unlock(&xprt->recv_lock);
}
--
2.17.1

2018-09-17 18:31:44

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

One of the intentions with the priority queues was to ensure that no
single process can hog the transport. The field task->tk_owner therefore
identifies the RPC call's origin, and is intended to allow the RPC layer
to organise queues for fairness.
This commit therefore modifies the transmit queue to group requests
by task->tk_owner, and ensures that we round robin among those groups.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprt.h | 1 +
net/sunrpc/xprt.c | 27 ++++++++++++++++++++++++---
2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 8c2bb078f00c..e377620b9744 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -89,6 +89,7 @@ struct rpc_rqst {
};

struct list_head rq_xmit; /* Send queue */
+ struct list_head rq_xmit2; /* Send queue */

void *rq_buffer; /* Call XDR encode buffer */
size_t rq_callsize;
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 35f5df367591..3e68f35f71f6 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1052,12 +1052,21 @@ xprt_request_need_enqueue_transmit(struct rpc_task *task, struct rpc_rqst *req)
void
xprt_request_enqueue_transmit(struct rpc_task *task)
{
- struct rpc_rqst *req = task->tk_rqstp;
+ struct rpc_rqst *pos, *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt;

if (xprt_request_need_enqueue_transmit(task, req)) {
spin_lock(&xprt->queue_lock);
+ list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) {
+ if (pos->rq_task->tk_owner != task->tk_owner)
+ continue;
+ list_add_tail(&req->rq_xmit2, &pos->rq_xmit2);
+ INIT_LIST_HEAD(&req->rq_xmit);
+ goto out;
+ }
list_add_tail(&req->rq_xmit, &xprt->xmit_queue);
+ INIT_LIST_HEAD(&req->rq_xmit2);
+out:
set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
spin_unlock(&xprt->queue_lock);
}
@@ -1073,8 +1082,20 @@ xprt_request_enqueue_transmit(struct rpc_task *task)
static void
xprt_request_dequeue_transmit_locked(struct rpc_task *task)
{
- if (test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
- list_del(&task->tk_rqstp->rq_xmit);
+ struct rpc_rqst *req = task->tk_rqstp;
+
+ if (!test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
+ return;
+ if (!list_empty(&req->rq_xmit)) {
+ list_del(&req->rq_xmit);
+ if (!list_empty(&req->rq_xmit2)) {
+ struct rpc_rqst *next = list_first_entry(&req->rq_xmit2,
+ struct rpc_rqst, rq_xmit2);
+ list_del(&req->rq_xmit2);
+ list_add_tail(&next->rq_xmit, &next->rq_xprt->xmit_queue);
+ }
+ } else
+ list_del(&req->rq_xmit2);
}

/**
--
2.17.1

2018-09-17 18:31:51

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 33/44] SUNRPC: Cleanup: remove the unused 'task' argument from the request_send()

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprt.h | 2 +-
net/sunrpc/xprt.c | 2 +-
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 2 +-
net/sunrpc/xprtrdma/transport.c | 4 ++--
net/sunrpc/xprtsock.c | 11 ++++-------
5 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 5600242ccbf9..823860cce0bc 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -141,7 +141,7 @@ struct rpc_xprt_ops {
void (*connect)(struct rpc_xprt *xprt, struct rpc_task *task);
int (*buf_alloc)(struct rpc_task *task);
void (*buf_free)(struct rpc_task *task);
- int (*send_request)(struct rpc_rqst *req, struct rpc_task *task);
+ int (*send_request)(struct rpc_rqst *req);
void (*set_retrans_timeout)(struct rpc_task *task);
void (*timer)(struct rpc_xprt *xprt, struct rpc_task *task);
void (*release_request)(struct rpc_task *task);
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index e4d57f5be5e2..d1ea88b3f9d4 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1282,7 +1282,7 @@ xprt_request_transmit(struct rpc_rqst *req, struct rpc_task *snd_task)
req->rq_ntrans++;

connect_cookie = xprt->connect_cookie;
- status = xprt->ops->send_request(req, snd_task);
+ status = xprt->ops->send_request(req);
trace_xprt_transmit(xprt, req->rq_xid, status);
if (status != 0) {
req->rq_ntrans--;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index 35a8c3aab302..992312504cfd 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -215,7 +215,7 @@ rpcrdma_bc_send_request(struct svcxprt_rdma *rdma, struct rpc_rqst *rqst)
* connection.
*/
static int
-xprt_rdma_bc_send_request(struct rpc_rqst *rqst, struct rpc_task *task)
+xprt_rdma_bc_send_request(struct rpc_rqst *rqst)
{
struct svc_xprt *sxprt = rqst->rq_xprt->bc_xprt;
struct svcxprt_rdma *rdma;
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 9ff322e53f37..a5a6a4a353f2 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -693,7 +693,7 @@ xprt_rdma_free(struct rpc_task *task)

/**
* xprt_rdma_send_request - marshal and send an RPC request
- * @task: RPC task with an RPC message in rq_snd_buf
+ * @rqst: RPC message in rq_snd_buf
*
* Caller holds the transport's write lock.
*
@@ -706,7 +706,7 @@ xprt_rdma_free(struct rpc_task *task)
* sent. Do not try to send this message again.
*/
static int
-xprt_rdma_send_request(struct rpc_rqst *rqst, struct rpc_task *task)
+xprt_rdma_send_request(struct rpc_rqst *rqst)
{
struct rpc_xprt *xprt = rqst->rq_xprt;
struct rpcrdma_req *req = rpcr_to_rdmar(rqst);
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index ef8d0e81cbda..f16406228ead 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -507,7 +507,6 @@ static inline void xs_encode_stream_record_marker(struct xdr_buf *buf)
/**
* xs_local_send_request - write an RPC request to an AF_LOCAL socket
* @req: pointer to RPC request
- * @task: RPC task that manages the state of an RPC request
*
* Return values:
* 0: The request has been sent
@@ -516,7 +515,7 @@ static inline void xs_encode_stream_record_marker(struct xdr_buf *buf)
* ENOTCONN: Caller needs to invoke connect logic then call again
* other: Some other error occured, the request was not sent
*/
-static int xs_local_send_request(struct rpc_rqst *req, struct rpc_task *task)
+static int xs_local_send_request(struct rpc_rqst *req)
{
struct rpc_xprt *xprt = req->rq_xprt;
struct sock_xprt *transport =
@@ -579,7 +578,6 @@ static int xs_local_send_request(struct rpc_rqst *req, struct rpc_task *task)
/**
* xs_udp_send_request - write an RPC request to a UDP socket
* @req: pointer to RPC request
- * @task: address of RPC task that manages the state of an RPC request
*
* Return values:
* 0: The request has been sent
@@ -588,7 +586,7 @@ static int xs_local_send_request(struct rpc_rqst *req, struct rpc_task *task)
* ENOTCONN: Caller needs to invoke connect logic then call again
* other: Some other error occurred, the request was not sent
*/
-static int xs_udp_send_request(struct rpc_rqst *req, struct rpc_task *task)
+static int xs_udp_send_request(struct rpc_rqst *req)
{
struct rpc_xprt *xprt = req->rq_xprt;
struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
@@ -656,7 +654,6 @@ static int xs_udp_send_request(struct rpc_rqst *req, struct rpc_task *task)
/**
* xs_tcp_send_request - write an RPC request to a TCP socket
* @req: pointer to RPC request
- * @task: address of RPC task that manages the state of an RPC request
*
* Return values:
* 0: The request has been sent
@@ -668,7 +665,7 @@ static int xs_udp_send_request(struct rpc_rqst *req, struct rpc_task *task)
* XXX: In the case of soft timeouts, should we eventually give up
* if sendmsg is not able to make progress?
*/
-static int xs_tcp_send_request(struct rpc_rqst *req, struct rpc_task *task)
+static int xs_tcp_send_request(struct rpc_rqst *req)
{
struct rpc_xprt *xprt = req->rq_xprt;
struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
@@ -2704,7 +2701,7 @@ static int bc_sendto(struct rpc_rqst *req)
/*
* The send routine. Borrows from svc_send
*/
-static int bc_send_request(struct rpc_rqst *req, struct rpc_task *task)
+static int bc_send_request(struct rpc_rqst *req)
{
struct svc_xprt *xprt;
int len;
--
2.17.1

2018-09-17 18:31:38

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 21/44] SUNRPC: Fix up the back channel transmit

Fix up the back channel code to recognise that it has already been
transmitted, so does not need to be called again.
Also ensure that we set req->rq_task.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/bc_xprt.h | 1 +
net/sunrpc/clnt.c | 19 +++++--------------
net/sunrpc/xprt.c | 27 ++++++++++++++++++++++++++-
3 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/include/linux/sunrpc/bc_xprt.h b/include/linux/sunrpc/bc_xprt.h
index 4397a4824c81..28721cf73ec3 100644
--- a/include/linux/sunrpc/bc_xprt.h
+++ b/include/linux/sunrpc/bc_xprt.h
@@ -34,6 +34,7 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#ifdef CONFIG_SUNRPC_BACKCHANNEL
struct rpc_rqst *xprt_lookup_bc_request(struct rpc_xprt *xprt, __be32 xid);
void xprt_complete_bc_request(struct rpc_rqst *req, uint32_t copied);
+void xprt_init_bc_request(struct rpc_rqst *req, struct rpc_task *task);
void xprt_free_bc_request(struct rpc_rqst *req);
int xprt_setup_backchannel(struct rpc_xprt *, unsigned int min_reqs);
void xprt_destroy_backchannel(struct rpc_xprt *, unsigned int max_reqs);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 64159716be30..dcefbf406482 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1138,7 +1138,6 @@ EXPORT_SYMBOL_GPL(rpc_call_async);
struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req)
{
struct rpc_task *task;
- struct xdr_buf *xbufp = &req->rq_snd_buf;
struct rpc_task_setup task_setup_data = {
.callback_ops = &rpc_default_ops,
.flags = RPC_TASK_SOFTCONN |
@@ -1150,14 +1149,7 @@ struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req)
* Create an rpc_task to send the data
*/
task = rpc_new_task(&task_setup_data);
- task->tk_rqstp = req;
-
- /*
- * Set up the xdr_buf length.
- * This also indicates that the buffer is XDR encoded already.
- */
- xbufp->len = xbufp->head[0].iov_len + xbufp->page_len +
- xbufp->tail[0].iov_len;
+ xprt_init_bc_request(req, task);

task->tk_action = call_bc_transmit;
atomic_inc(&task->tk_count);
@@ -2064,6 +2056,8 @@ call_bc_transmit(struct rpc_task *task)

if (rpc_task_need_encode(task))
xprt_request_enqueue_transmit(task);
+ if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
+ goto out_wakeup;

if (!xprt_prepare_transmit(task))
goto out_retry;
@@ -2073,13 +2067,11 @@ call_bc_transmit(struct rpc_task *task)
"error: %d\n", task->tk_status);
goto out_done;
}
- if (req->rq_connect_cookie != req->rq_xprt->connect_cookie)
- req->rq_bytes_sent = 0;

xprt_transmit(task);

if (task->tk_status == -EAGAIN)
- goto out_nospace;
+ goto out_retry;

xprt_end_transmit(task);
dprint_status(task);
@@ -2119,12 +2111,11 @@ call_bc_transmit(struct rpc_task *task)
"error: %d\n", task->tk_status);
break;
}
+out_wakeup:
rpc_wake_up_queued_task(&req->rq_xprt->pending, task);
out_done:
task->tk_action = rpc_exit_task;
return;
-out_nospace:
- req->rq_connect_cookie = req->rq_xprt->connect_cookie;
out_retry:
task->tk_status = 0;
}
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 426a3a05e075..d418bd4db7ff 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1389,6 +1389,12 @@ void xprt_free(struct rpc_xprt *xprt)
}
EXPORT_SYMBOL_GPL(xprt_free);

+static void
+xprt_init_connect_cookie(struct rpc_rqst *req, struct rpc_xprt *xprt)
+{
+ req->rq_connect_cookie = xprt_connect_cookie(xprt) - 1;
+}
+
static __be32
xprt_alloc_xid(struct rpc_xprt *xprt)
{
@@ -1417,7 +1423,7 @@ xprt_request_init(struct rpc_task *task)
req->rq_xprt = xprt;
req->rq_buffer = NULL;
req->rq_xid = xprt_alloc_xid(xprt);
- req->rq_connect_cookie = xprt_connect_cookie(xprt) - 1;
+ xprt_init_connect_cookie(req, xprt);
req->rq_bytes_sent = 0;
req->rq_snd_buf.len = 0;
req->rq_snd_buf.buflen = 0;
@@ -1551,6 +1557,25 @@ void xprt_release(struct rpc_task *task)
xprt_free_bc_request(req);
}

+#ifdef CONFIG_SUNRPC_BACKCHANNEL
+void
+xprt_init_bc_request(struct rpc_rqst *req, struct rpc_task *task)
+{
+ struct xdr_buf *xbufp = &req->rq_snd_buf;
+
+ task->tk_rqstp = req;
+ req->rq_task = task;
+ xprt_init_connect_cookie(req, req->rq_xprt);
+ /*
+ * Set up the xdr_buf length.
+ * This also indicates that the buffer is XDR encoded already.
+ */
+ xbufp->len = xbufp->head[0].iov_len + xbufp->page_len +
+ xbufp->tail[0].iov_len;
+ req->rq_bytes_sent = 0;
+}
+#endif
+
static void xprt_init(struct rpc_xprt *xprt, struct net *net)
{
kref_init(&xprt->kref);
--
2.17.1

2018-09-17 18:31:45

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 28/44] SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queue

Avoid memory starvation by giving RPCs that are tagged with the
RPC_TASK_SWAPPER flag the highest priority.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xprt.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index e07a54fbe1e7..68974966b2e4 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1111,6 +1111,17 @@ xprt_request_enqueue_transmit(struct rpc_task *task)
INIT_LIST_HEAD(&req->rq_xmit2);
goto out;
}
+ } else if (RPC_IS_SWAPPER(task)) {
+ list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) {
+ if (pos->rq_cong || pos->rq_bytes_sent)
+ continue;
+ if (RPC_IS_SWAPPER(pos->rq_task))
+ continue;
+ /* Note: req is added _before_ pos */
+ list_add_tail(&req->rq_xmit, &pos->rq_xmit);
+ INIT_LIST_HEAD(&req->rq_xmit2);
+ goto out;
+ }
} else {
list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) {
if (pos->rq_task->tk_owner != task->tk_owner)
--
2.17.1

2018-09-17 18:31:58

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 41/44] SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive()

In preparation for sharing with AF_LOCAL.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/trace/events/sunrpc.h | 16 ++++----
net/sunrpc/xprtsock.c | 71 +++++++++++++++--------------------
2 files changed, 38 insertions(+), 49 deletions(-)

diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 19e08d12696c..28e384186c35 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -470,14 +470,14 @@ TRACE_EVENT(xprt_ping,
__get_str(addr), __get_str(port), __entry->status)
);

-TRACE_EVENT(xs_tcp_data_ready,
- TP_PROTO(struct rpc_xprt *xprt, int err, unsigned int total),
+TRACE_EVENT(xs_stream_read_data,
+ TP_PROTO(struct rpc_xprt *xprt, ssize_t err, size_t total),

TP_ARGS(xprt, err, total),

TP_STRUCT__entry(
- __field(int, err)
- __field(unsigned int, total)
+ __field(ssize_t, err)
+ __field(size_t, total)
__string(addr, xprt ? xprt->address_strings[RPC_DISPLAY_ADDR] :
"(null)")
__string(port, xprt ? xprt->address_strings[RPC_DISPLAY_PORT] :
@@ -493,11 +493,11 @@ TRACE_EVENT(xs_tcp_data_ready,
xprt->address_strings[RPC_DISPLAY_PORT] : "(null)");
),

- TP_printk("peer=[%s]:%s err=%d total=%u", __get_str(addr),
+ TP_printk("peer=[%s]:%s err=%zd total=%zu", __get_str(addr),
__get_str(port), __entry->err, __entry->total)
);

-TRACE_EVENT(xs_tcp_data_recv,
+TRACE_EVENT(xs_stream_read_request,
TP_PROTO(struct sock_xprt *xs),

TP_ARGS(xs),
@@ -508,7 +508,7 @@ TRACE_EVENT(xs_tcp_data_recv,
__field(u32, xid)
__field(unsigned long, copied)
__field(unsigned int, reclen)
- __field(unsigned long, offset)
+ __field(unsigned int, offset)
),

TP_fast_assign(
@@ -520,7 +520,7 @@ TRACE_EVENT(xs_tcp_data_recv,
__entry->offset = xs->recv.offset;
),

- TP_printk("peer=[%s]:%s xid=0x%08x copied=%lu reclen=%u offset=%lu",
+ TP_printk("peer=[%s]:%s xid=0x%08x copied=%lu reclen=%u offset=%u",
__get_str(addr), __get_str(port), __entry->xid,
__entry->copied, __entry->reclen, __entry->offset)
);
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 5269ad98bb08..15364e2746bd 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -623,7 +623,7 @@ xs_read_stream(struct sock_xprt *transport, int flags)
read += ret;
}
if (xs_read_stream_request_done(transport)) {
- trace_xs_tcp_data_recv(transport);
+ trace_xs_stream_read_request(transport);
transport->recv.copied = 0;
}
transport->recv.offset = 0;
@@ -639,6 +639,34 @@ xs_read_stream(struct sock_xprt *transport, int flags)
return ret;
}

+static void xs_stream_data_receive(struct sock_xprt *transport)
+{
+ size_t read = 0;
+ ssize_t ret = 0;
+
+ mutex_lock(&transport->recv_mutex);
+ if (transport->sock == NULL)
+ goto out;
+ clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state);
+ for (;;) {
+ ret = xs_read_stream(transport, MSG_DONTWAIT | MSG_NOSIGNAL);
+ if (ret <= 0)
+ break;
+ read += ret;
+ cond_resched();
+ }
+out:
+ mutex_unlock(&transport->recv_mutex);
+ trace_xs_stream_read_data(&transport->xprt, ret, read);
+}
+
+static void xs_stream_data_receive_workfn(struct work_struct *work)
+{
+ struct sock_xprt *transport =
+ container_of(work, struct sock_xprt, recv_worker);
+ xs_stream_data_receive(transport);
+}
+
#define XS_SENDMSG_FLAGS (MSG_DONTWAIT | MSG_NOSIGNAL)

static int xs_send_kvec(struct socket *sock, struct sockaddr *addr, int addrlen, struct kvec *vec, unsigned int base, int more)
@@ -1495,45 +1523,6 @@ static size_t xs_tcp_bc_maxpayload(struct rpc_xprt *xprt)
}
#endif /* CONFIG_SUNRPC_BACKCHANNEL */

-static void xs_tcp_data_receive(struct sock_xprt *transport)
-{
- struct rpc_xprt *xprt = &transport->xprt;
- struct sock *sk;
- size_t read = 0;
- ssize_t ret = 0;
-
-restart:
- mutex_lock(&transport->recv_mutex);
- sk = transport->inet;
- if (sk == NULL)
- goto out;
-
- for (;;) {
- clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state);
- ret = xs_read_stream(transport, MSG_DONTWAIT | MSG_NOSIGNAL);
- if (ret < 0)
- break;
- read += ret;
- if (need_resched()) {
- mutex_unlock(&transport->recv_mutex);
- cond_resched();
- goto restart;
- }
- }
- if (test_bit(XPRT_SOCK_DATA_READY, &transport->sock_state))
- queue_work(xprtiod_workqueue, &transport->recv_worker);
-out:
- mutex_unlock(&transport->recv_mutex);
- trace_xs_tcp_data_ready(xprt, ret, read);
-}
-
-static void xs_tcp_data_receive_workfn(struct work_struct *work)
-{
- struct sock_xprt *transport =
- container_of(work, struct sock_xprt, recv_worker);
- xs_tcp_data_receive(transport);
-}
-
/**
* xs_tcp_state_change - callback to handle TCP socket state changes
* @sk: socket whose state has changed
@@ -3063,7 +3052,7 @@ static struct rpc_xprt *xs_setup_tcp(struct xprt_create *args)
xprt->connect_timeout = xprt->timeout->to_initval *
(xprt->timeout->to_retries + 1);

- INIT_WORK(&transport->recv_worker, xs_tcp_data_receive_workfn);
+ INIT_WORK(&transport->recv_worker, xs_stream_data_receive_workfn);
INIT_DELAYED_WORK(&transport->connect_worker, xs_tcp_setup_socket);

switch (addr->sa_family) {
--
2.17.1

2018-09-17 18:31:52

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 34/44] SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCK

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xprt.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index d1ea88b3f9d4..a1cb28a4adad 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -298,6 +298,8 @@ static inline int xprt_lock_write(struct rpc_xprt *xprt, struct rpc_task *task)
{
int retval;

+ if (test_bit(XPRT_LOCKED, &xprt->state) && xprt->snd_task == task)
+ return 1;
spin_lock_bh(&xprt->transport_lock);
retval = xprt->ops->reserve_xprt(xprt, task);
spin_unlock_bh(&xprt->transport_lock);
@@ -375,6 +377,8 @@ EXPORT_SYMBOL_GPL(xprt_release_xprt_cong);

static inline void xprt_release_write(struct rpc_xprt *xprt, struct rpc_task *task)
{
+ if (xprt->snd_task != task)
+ return;
spin_lock_bh(&xprt->transport_lock);
xprt->ops->release_xprt(xprt, task);
spin_unlock_bh(&xprt->transport_lock);
@@ -1644,8 +1648,7 @@ void xprt_release(struct rpc_task *task)
if (req == NULL) {
if (task->tk_client) {
xprt = task->tk_xprt;
- if (xprt->snd_task == task)
- xprt_release_write(xprt, task);
+ xprt_release_write(xprt, task);
}
return;
}
--
2.17.1

2018-09-17 18:31:36

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 19/44] SUNRPC: Add a transmission queue for RPC requests

Add the queue that will enforce the ordering of RPC task transmission.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprt.h | 6 +++
net/sunrpc/clnt.c | 6 +--
net/sunrpc/xprt.c | 84 +++++++++++++++++++++++++++++++++----
3 files changed, 83 insertions(+), 13 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 9cec2d0811f2..81a6c2c8dfc7 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -88,6 +88,8 @@ struct rpc_rqst {
struct list_head rq_recv; /* Receive queue */
};

+ struct list_head rq_xmit; /* Send queue */
+
void *rq_buffer; /* Call XDR encode buffer */
size_t rq_callsize;
void *rq_rbuffer; /* Reply XDR decode buffer */
@@ -242,6 +244,9 @@ struct rpc_xprt {
spinlock_t queue_lock; /* send/receive queue lock */
u32 xid; /* Next XID value to use */
struct rpc_task * snd_task; /* Task blocked in send */
+
+ struct list_head xmit_queue; /* Send queue */
+
struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */
#if defined(CONFIG_SUNRPC_BACKCHANNEL)
struct svc_serv *bc_serv; /* The RPC service which will */
@@ -339,6 +344,7 @@ void xprt_free_slot(struct rpc_xprt *xprt,
struct rpc_rqst *req);
void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task);
bool xprt_prepare_transmit(struct rpc_task *task);
+void xprt_request_enqueue_transmit(struct rpc_task *task);
void xprt_request_enqueue_receive(struct rpc_task *task);
void xprt_request_wait_receive(struct rpc_task *task);
void xprt_transmit(struct rpc_task *task);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index be0f06a8156b..c1a19a3e1356 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1156,11 +1156,11 @@ struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req)
*/
xbufp->len = xbufp->head[0].iov_len + xbufp->page_len +
xbufp->tail[0].iov_len;
- set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);

task->tk_action = call_bc_transmit;
atomic_inc(&task->tk_count);
WARN_ON_ONCE(atomic_read(&task->tk_count) != 2);
+ xprt_request_enqueue_transmit(task);
rpc_execute(task);

dprintk("RPC: rpc_run_bc_task: task= %p\n", task);
@@ -1759,8 +1759,6 @@ rpc_xdr_encode(struct rpc_task *task)

task->tk_status = rpcauth_wrap_req(task, encode, req, p,
task->tk_msg.rpc_argp);
- if (task->tk_status == 0)
- set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
}

/*
@@ -1964,6 +1962,7 @@ call_transmit(struct rpc_task *task)
/* Add task to reply queue before transmission to avoid races */
if (rpc_reply_expected(task))
xprt_request_enqueue_receive(task);
+ xprt_request_enqueue_transmit(task);

if (!xprt_prepare_transmit(task))
return;
@@ -1998,7 +1997,6 @@ call_transmit_status(struct rpc_task *task)
xprt_end_transmit(task);
break;
case -EBADMSG:
- clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
task->tk_action = call_transmit;
task->tk_status = 0;
xprt_end_transmit(task);
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index b242a1c78f8a..39a6f6e8ae01 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1057,6 +1057,72 @@ void xprt_request_wait_receive(struct rpc_task *task)
spin_unlock(&xprt->queue_lock);
}

+static bool
+xprt_request_need_transmit(struct rpc_task *task)
+{
+ return !(task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT) ||
+ xprt_request_retransmit_after_disconnect(task);
+}
+
+static bool
+xprt_request_need_enqueue_transmit(struct rpc_task *task, struct rpc_rqst *req)
+{
+ return xprt_request_need_transmit(task) &&
+ !test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
+}
+
+/**
+ * xprt_request_enqueue_transmit - queue a task for transmission
+ * @task: pointer to rpc_task
+ *
+ * Add a task to the transmission queue.
+ */
+void
+xprt_request_enqueue_transmit(struct rpc_task *task)
+{
+ struct rpc_rqst *req = task->tk_rqstp;
+ struct rpc_xprt *xprt = req->rq_xprt;
+
+ if (xprt_request_need_enqueue_transmit(task, req)) {
+ spin_lock(&xprt->queue_lock);
+ list_add_tail(&req->rq_xmit, &xprt->xmit_queue);
+ set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
+ spin_unlock(&xprt->queue_lock);
+ }
+}
+
+/**
+ * xprt_request_dequeue_transmit_locked - remove a task from the transmission queue
+ * @task: pointer to rpc_task
+ *
+ * Remove a task from the transmission queue
+ * Caller must hold xprt->queue_lock
+ */
+static void
+xprt_request_dequeue_transmit_locked(struct rpc_task *task)
+{
+ xprt_task_clear_bytes_sent(task);
+ if (test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
+ list_del(&task->tk_rqstp->rq_xmit);
+}
+
+/**
+ * xprt_request_dequeue_transmit - remove a task from the transmission queue
+ * @task: pointer to rpc_task
+ *
+ * Remove a task from the transmission queue
+ */
+static void
+xprt_request_dequeue_transmit(struct rpc_task *task)
+{
+ struct rpc_rqst *req = task->tk_rqstp;
+ struct rpc_xprt *xprt = req->rq_xprt;
+
+ spin_lock(&xprt->queue_lock);
+ xprt_request_dequeue_transmit_locked(task);
+ spin_unlock(&xprt->queue_lock);
+}
+
/**
* xprt_prepare_transmit - reserve the transport before sending a request
* @task: RPC task about to send a request
@@ -1076,12 +1142,8 @@ bool xprt_prepare_transmit(struct rpc_task *task)
task->tk_status = req->rq_reply_bytes_recvd;
goto out_unlock;
}
- if ((task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT) &&
- !xprt_request_retransmit_after_disconnect(task)) {
- xprt->ops->set_retrans_timeout(task);
- rpc_sleep_on(&xprt->pending, task, xprt_timer);
+ if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
goto out_unlock;
- }
}
if (!xprt->ops->reserve_xprt(xprt, task)) {
task->tk_status = -EAGAIN;
@@ -1115,11 +1177,11 @@ void xprt_transmit(struct rpc_task *task)

if (!req->rq_bytes_sent) {
if (xprt_request_data_received(task))
- return;
+ goto out_dequeue;
/* Verify that our message lies in the RPCSEC_GSS window */
if (rpcauth_xmit_need_reencode(task)) {
task->tk_status = -EBADMSG;
- return;
+ goto out_dequeue;
}
}

@@ -1134,7 +1196,6 @@ void xprt_transmit(struct rpc_task *task)
xprt_inject_disconnect(xprt);

dprintk("RPC: %5u xmit complete\n", task->tk_pid);
- clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
task->tk_flags |= RPC_TASK_SENT;
spin_lock_bh(&xprt->transport_lock);

@@ -1146,6 +1207,8 @@ void xprt_transmit(struct rpc_task *task)
spin_unlock_bh(&xprt->transport_lock);

req->rq_connect_cookie = connect_cookie;
+out_dequeue:
+ xprt_request_dequeue_transmit(task);
}

static void xprt_add_backlog(struct rpc_xprt *xprt, struct rpc_task *task)
@@ -1419,9 +1482,11 @@ xprt_request_dequeue_all(struct rpc_task *task, struct rpc_rqst *req)
{
struct rpc_xprt *xprt = req->rq_xprt;

- if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) ||
+ if (test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate) ||
+ test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) ||
xprt_is_pinned_rqst(req)) {
spin_lock(&xprt->queue_lock);
+ xprt_request_dequeue_transmit_locked(task);
xprt_request_dequeue_receive_locked(task);
while (xprt_is_pinned_rqst(req)) {
set_bit(RPC_TASK_MSG_PIN_WAIT, &task->tk_runstate);
@@ -1492,6 +1557,7 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net)

INIT_LIST_HEAD(&xprt->free);
INIT_LIST_HEAD(&xprt->recv_queue);
+ INIT_LIST_HEAD(&xprt->xmit_queue);
#if defined(CONFIG_SUNRPC_BACKCHANNEL)
spin_lock_init(&xprt->bc_pa_lock);
INIT_LIST_HEAD(&xprt->bc_pa_list);
--
2.17.1

2018-09-17 18:31:41

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 24/44] SUNRPC: Simplify xprt_prepare_transmit()

Remove the checks for whether or not we need to transmit, and whether
or not a reply has been received. Those are already handled in
call_transmit() itself.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xprt.c | 23 +++++++----------------
1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 3b31830ef851..385ee9f64353 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1122,27 +1122,18 @@ bool xprt_prepare_transmit(struct rpc_task *task)
{
struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt;
- bool ret = false;

dprintk("RPC: %5u xprt_prepare_transmit\n", task->tk_pid);

- spin_lock_bh(&xprt->transport_lock);
- if (!req->rq_bytes_sent) {
- if (req->rq_reply_bytes_recvd) {
- task->tk_status = req->rq_reply_bytes_recvd;
- goto out_unlock;
- }
+ if (!xprt_lock_write(xprt, task)) {
+ /* Race breaker: someone may have transmitted us */
if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
- goto out_unlock;
- }
- if (!xprt->ops->reserve_xprt(xprt, task)) {
- task->tk_status = -EAGAIN;
- goto out_unlock;
+ rpc_wake_up_queued_task_set_status(&xprt->sending,
+ task, 0);
+ return false;
+
}
- ret = true;
-out_unlock:
- spin_unlock_bh(&xprt->transport_lock);
- return ret;
+ return true;
}

void xprt_end_transmit(struct rpc_task *task)
--
2.17.1

2018-09-17 18:31:40

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 23/44] SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCK

If the request is still on the queue, this will be incorrect behaviour.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/clnt.c | 4 ----
net/sunrpc/xprt.c | 14 --------------
2 files changed, 18 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index dcefbf406482..4ca23a6607ba 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2128,15 +2128,11 @@ static void
call_status(struct rpc_task *task)
{
struct rpc_clnt *clnt = task->tk_client;
- struct rpc_rqst *req = task->tk_rqstp;
int status;

if (!task->tk_msg.rpc_proc->p_proc)
trace_xprt_ping(task->tk_xprt, task->tk_status);

- if (req->rq_reply_bytes_recvd > 0 && !req->rq_bytes_sent)
- task->tk_status = req->rq_reply_bytes_recvd;
-
dprint_status(task);

status = task->tk_status;
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 00b17cb49910..3b31830ef851 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -332,15 +332,6 @@ static void __xprt_lock_write_next_cong(struct rpc_xprt *xprt)
xprt_clear_locked(xprt);
}

-static void xprt_task_clear_bytes_sent(struct rpc_task *task)
-{
- if (task != NULL) {
- struct rpc_rqst *req = task->tk_rqstp;
- if (req != NULL)
- req->rq_bytes_sent = 0;
- }
-}
-
/**
* xprt_release_xprt - allow other requests to use a transport
* @xprt: transport with other tasks potentially waiting
@@ -351,7 +342,6 @@ static void xprt_task_clear_bytes_sent(struct rpc_task *task)
void xprt_release_xprt(struct rpc_xprt *xprt, struct rpc_task *task)
{
if (xprt->snd_task == task) {
- xprt_task_clear_bytes_sent(task);
xprt_clear_locked(xprt);
__xprt_lock_write_next(xprt);
}
@@ -369,7 +359,6 @@ EXPORT_SYMBOL_GPL(xprt_release_xprt);
void xprt_release_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task)
{
if (xprt->snd_task == task) {
- xprt_task_clear_bytes_sent(task);
xprt_clear_locked(xprt);
__xprt_lock_write_next_cong(xprt);
}
@@ -742,7 +731,6 @@ bool xprt_lock_connect(struct rpc_xprt *xprt,
goto out;
if (xprt->snd_task != task)
goto out;
- xprt_task_clear_bytes_sent(task);
xprt->snd_task = cookie;
ret = true;
out:
@@ -788,7 +776,6 @@ void xprt_connect(struct rpc_task *task)
xprt->ops->close(xprt);

if (!xprt_connected(xprt)) {
- task->tk_rqstp->rq_bytes_sent = 0;
task->tk_timeout = task->tk_rqstp->rq_timeout;
task->tk_rqstp->rq_connect_cookie = xprt->connect_cookie;
rpc_sleep_on(&xprt->pending, task, xprt_connect_status);
@@ -1093,7 +1080,6 @@ xprt_request_enqueue_transmit(struct rpc_task *task)
static void
xprt_request_dequeue_transmit_locked(struct rpc_task *task)
{
- xprt_task_clear_bytes_sent(task);
if (test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
list_del(&task->tk_rqstp->rq_xmit);
}
--
2.17.1

2018-09-17 18:31:40

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 22/44] SUNRPC: Treat the task and request as separate in the xprt_ops->send_request()

When we shift to using the transmit queue, then the task that holds the
write lock will not necessarily be the same as the one being transmitted.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprt.h | 2 +-
net/sunrpc/xprt.c | 2 +-
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 3 +--
net/sunrpc/xprtrdma/transport.c | 5 ++--
net/sunrpc/xprtsock.c | 27 +++++++++++-----------
5 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index b8a7de161f67..8c2bb078f00c 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -140,7 +140,7 @@ struct rpc_xprt_ops {
void (*connect)(struct rpc_xprt *xprt, struct rpc_task *task);
int (*buf_alloc)(struct rpc_task *task);
void (*buf_free)(struct rpc_task *task);
- int (*send_request)(struct rpc_task *task);
+ int (*send_request)(struct rpc_rqst *req, struct rpc_task *task);
void (*set_retrans_timeout)(struct rpc_task *task);
void (*timer)(struct rpc_xprt *xprt, struct rpc_task *task);
void (*release_request)(struct rpc_task *task);
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index d418bd4db7ff..00b17cb49910 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1190,7 +1190,7 @@ void xprt_transmit(struct rpc_task *task)
}

connect_cookie = xprt->connect_cookie;
- status = xprt->ops->send_request(task);
+ status = xprt->ops->send_request(req, task);
trace_xprt_transmit(xprt, req->rq_xid, status);
if (status != 0) {
task->tk_status = status;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index 09b12b7568fe..d1618c70edb4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -215,9 +215,8 @@ rpcrdma_bc_send_request(struct svcxprt_rdma *rdma, struct rpc_rqst *rqst)
* connection.
*/
static int
-xprt_rdma_bc_send_request(struct rpc_task *task)
+xprt_rdma_bc_send_request(struct rpc_rqst *rqst, struct rpc_task *task)
{
- struct rpc_rqst *rqst = task->tk_rqstp;
struct svc_xprt *sxprt = rqst->rq_xprt->bc_xprt;
struct svcxprt_rdma *rdma;
int ret;
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 143ce2579ba9..fa684bf4d090 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -706,9 +706,8 @@ xprt_rdma_free(struct rpc_task *task)
* sent. Do not try to send this message again.
*/
static int
-xprt_rdma_send_request(struct rpc_task *task)
+xprt_rdma_send_request(struct rpc_rqst *rqst, struct rpc_task *task)
{
- struct rpc_rqst *rqst = task->tk_rqstp;
struct rpc_xprt *xprt = rqst->rq_xprt;
struct rpcrdma_req *req = rpcr_to_rdmar(rqst);
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
@@ -741,7 +740,7 @@ xprt_rdma_send_request(struct rpc_task *task)
/* An RPC with no reply will throw off credit accounting,
* so drop the connection to reset the credit grant.
*/
- if (!rpc_reply_expected(task))
+ if (!rpc_reply_expected(rqst->rq_task))
goto drop_connection;
return 0;

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 8d6404259ff9..b8143eded4af 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -449,12 +449,12 @@ static void xs_nospace_callback(struct rpc_task *task)

/**
* xs_nospace - place task on wait queue if transmit was incomplete
+ * @req: pointer to RPC request
* @task: task to put to sleep
*
*/
-static int xs_nospace(struct rpc_task *task)
+static int xs_nospace(struct rpc_rqst *req, struct rpc_task *task)
{
- struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt;
struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
struct sock *sk = transport->inet;
@@ -513,6 +513,7 @@ static inline void xs_encode_stream_record_marker(struct xdr_buf *buf)

/**
* xs_local_send_request - write an RPC request to an AF_LOCAL socket
+ * @req: pointer to RPC request
* @task: RPC task that manages the state of an RPC request
*
* Return values:
@@ -522,9 +523,8 @@ static inline void xs_encode_stream_record_marker(struct xdr_buf *buf)
* ENOTCONN: Caller needs to invoke connect logic then call again
* other: Some other error occured, the request was not sent
*/
-static int xs_local_send_request(struct rpc_task *task)
+static int xs_local_send_request(struct rpc_rqst *req, struct rpc_task *task)
{
- struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt;
struct sock_xprt *transport =
container_of(xprt, struct sock_xprt, xprt);
@@ -569,7 +569,7 @@ static int xs_local_send_request(struct rpc_task *task)
case -ENOBUFS:
break;
case -EAGAIN:
- status = xs_nospace(task);
+ status = xs_nospace(req, task);
break;
default:
dprintk("RPC: sendmsg returned unrecognized error %d\n",
@@ -585,6 +585,7 @@ static int xs_local_send_request(struct rpc_task *task)

/**
* xs_udp_send_request - write an RPC request to a UDP socket
+ * @req: pointer to RPC request
* @task: address of RPC task that manages the state of an RPC request
*
* Return values:
@@ -594,9 +595,8 @@ static int xs_local_send_request(struct rpc_task *task)
* ENOTCONN: Caller needs to invoke connect logic then call again
* other: Some other error occurred, the request was not sent
*/
-static int xs_udp_send_request(struct rpc_task *task)
+static int xs_udp_send_request(struct rpc_rqst *req, struct rpc_task *task)
{
- struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt;
struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
struct xdr_buf *xdr = &req->rq_snd_buf;
@@ -638,7 +638,7 @@ static int xs_udp_send_request(struct rpc_task *task)
/* Should we call xs_close() here? */
break;
case -EAGAIN:
- status = xs_nospace(task);
+ status = xs_nospace(req, task);
break;
case -ENETUNREACH:
case -ENOBUFS:
@@ -658,6 +658,7 @@ static int xs_udp_send_request(struct rpc_task *task)

/**
* xs_tcp_send_request - write an RPC request to a TCP socket
+ * @req: pointer to RPC request
* @task: address of RPC task that manages the state of an RPC request
*
* Return values:
@@ -670,9 +671,8 @@ static int xs_udp_send_request(struct rpc_task *task)
* XXX: In the case of soft timeouts, should we eventually give up
* if sendmsg is not able to make progress?
*/
-static int xs_tcp_send_request(struct rpc_task *task)
+static int xs_tcp_send_request(struct rpc_rqst *req, struct rpc_task *task)
{
- struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt;
struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
struct xdr_buf *xdr = &req->rq_snd_buf;
@@ -697,7 +697,7 @@ static int xs_tcp_send_request(struct rpc_task *task)
* completes while the socket holds a reference to the pages,
* then we may end up resending corrupted data.
*/
- if (task->tk_flags & RPC_TASK_SENT)
+ if (req->rq_task->tk_flags & RPC_TASK_SENT)
zerocopy = false;

if (test_bit(XPRT_SOCK_UPD_TIMEOUT, &transport->sock_state))
@@ -761,7 +761,7 @@ static int xs_tcp_send_request(struct rpc_task *task)
/* Should we call xs_close() here? */
break;
case -EAGAIN:
- status = xs_nospace(task);
+ status = xs_nospace(req, task);
break;
case -ECONNRESET:
case -ECONNREFUSED:
@@ -2706,9 +2706,8 @@ static int bc_sendto(struct rpc_rqst *req)
/*
* The send routine. Borrows from svc_send
*/
-static int bc_send_request(struct rpc_task *task)
+static int bc_send_request(struct rpc_rqst *req, struct rpc_task *task)
{
- struct rpc_rqst *req = task->tk_rqstp;
struct svc_xprt *xprt;
int len;

--
2.17.1

2018-09-17 18:31:48

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 30/44] SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCK

This no longer causes them to lose their place in the transmission queue.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xprt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index ae1109c7b9b4..a523e59a074e 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -195,7 +195,7 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task)
out_sleep:
dprintk("RPC: %5u failed to lock transport %p\n",
task->tk_pid, xprt);
- task->tk_timeout = 0;
+ task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0;
task->tk_status = -EAGAIN;
if (req == NULL)
priority = RPC_PRIORITY_LOW;
@@ -274,7 +274,7 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task)
xprt_clear_locked(xprt);
out_sleep:
dprintk("RPC: %5u failed to lock transport %p\n", task->tk_pid, xprt);
- task->tk_timeout = 0;
+ task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0;
task->tk_status = -EAGAIN;
if (req == NULL)
priority = RPC_PRIORITY_LOW;
--
2.17.1

2018-09-17 18:31:25

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 11/44] SUNRPC: Add a helper to wake up a sleeping rpc_task and set its status

Add a helper that will wake up a task that is sleeping on a specific
queue, and will set the value of task->tk_status. This is mainly
intended for use by the transport layer to notify the task of an
error condition.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/sched.h | 3 ++
net/sunrpc/sched.c | 65 ++++++++++++++++++++++++++++++------
2 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 8062ce6b18e5..8840a420cf4c 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -235,6 +235,9 @@ void rpc_wake_up_queued_task_on_wq(struct workqueue_struct *wq,
struct rpc_task *task);
void rpc_wake_up_queued_task(struct rpc_wait_queue *,
struct rpc_task *);
+void rpc_wake_up_queued_task_set_status(struct rpc_wait_queue *,
+ struct rpc_task *,
+ int);
void rpc_wake_up(struct rpc_wait_queue *);
struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *);
struct rpc_task *rpc_wake_up_first_on_wq(struct workqueue_struct *wq,
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 3fe5d60ab0e2..dec01bd1b71c 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -440,14 +440,28 @@ static void __rpc_do_wake_up_task_on_wq(struct workqueue_struct *wq,
/*
* Wake up a queued task while the queue lock is being held
*/
-static void rpc_wake_up_task_on_wq_queue_locked(struct workqueue_struct *wq,
- struct rpc_wait_queue *queue, struct rpc_task *task)
+static struct rpc_task *
+rpc_wake_up_task_on_wq_queue_action_locked(struct workqueue_struct *wq,
+ struct rpc_wait_queue *queue, struct rpc_task *task,
+ bool (*action)(struct rpc_task *, void *), void *data)
{
if (RPC_IS_QUEUED(task)) {
smp_rmb();
- if (task->tk_waitqueue == queue)
- __rpc_do_wake_up_task_on_wq(wq, queue, task);
+ if (task->tk_waitqueue == queue) {
+ if (action == NULL || action(task, data)) {
+ __rpc_do_wake_up_task_on_wq(wq, queue, task);
+ return task;
+ }
+ }
}
+ return NULL;
+}
+
+static void
+rpc_wake_up_task_on_wq_queue_locked(struct workqueue_struct *wq,
+ struct rpc_wait_queue *queue, struct rpc_task *task)
+{
+ rpc_wake_up_task_on_wq_queue_action_locked(wq, queue, task, NULL, NULL);
}

/*
@@ -481,6 +495,40 @@ void rpc_wake_up_queued_task(struct rpc_wait_queue *queue, struct rpc_task *task
}
EXPORT_SYMBOL_GPL(rpc_wake_up_queued_task);

+static bool rpc_task_action_set_status(struct rpc_task *task, void *status)
+{
+ task->tk_status = *(int *)status;
+ return true;
+}
+
+static void
+rpc_wake_up_task_queue_set_status_locked(struct rpc_wait_queue *queue,
+ struct rpc_task *task, int status)
+{
+ rpc_wake_up_task_on_wq_queue_action_locked(rpciod_workqueue, queue,
+ task, rpc_task_action_set_status, &status);
+}
+
+/**
+ * rpc_wake_up_queued_task_set_status - wake up a task and set task->tk_status
+ * @queue: pointer to rpc_wait_queue
+ * @task: pointer to rpc_task
+ * @status: integer error value
+ *
+ * If @task is queued on @queue, then it is woken up, and @task->tk_status is
+ * set to the value of @status.
+ */
+void
+rpc_wake_up_queued_task_set_status(struct rpc_wait_queue *queue,
+ struct rpc_task *task, int status)
+{
+ if (!RPC_IS_QUEUED(task))
+ return;
+ spin_lock_bh(&queue->lock);
+ rpc_wake_up_task_queue_set_status_locked(queue, task, status);
+ spin_unlock_bh(&queue->lock);
+}
+
/*
* Wake up the next task on a priority queue.
*/
@@ -553,12 +601,9 @@ struct rpc_task *rpc_wake_up_first_on_wq(struct workqueue_struct *wq,
queue, rpc_qname(queue));
spin_lock_bh(&queue->lock);
task = __rpc_find_next_queued(queue);
- if (task != NULL) {
- if (func(task, data))
- rpc_wake_up_task_on_wq_queue_locked(wq, queue, task);
- else
- task = NULL;
- }
+ if (task != NULL)
+ task = rpc_wake_up_task_on_wq_queue_action_locked(wq, queue,
+ task, func, data);
spin_unlock_bh(&queue->lock);

return task;
--
2.17.1

2018-09-17 18:31:34

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 18/44] SUNRPC: Distinguish between the slot allocation list and receive queue

When storing a struct rpc_rqst on the slot allocation list, we currently
use the same field 'rq_list' as we use to store the request on the
receive queue. Since the structure is never on both lists at the same
time, this is OK.
However, for clarity, let's make that a union with different names for
the different lists so that we can more easily distinguish between
the two states.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprt.h | 9 +++++++--
net/sunrpc/xprt.c | 12 ++++++------
2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 4fa2af087cff..9cec2d0811f2 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -82,7 +82,11 @@ struct rpc_rqst {
struct page **rq_enc_pages; /* scratch pages for use by
gss privacy code */
void (*rq_release_snd_buf)(struct rpc_rqst *); /* release rq_enc_pages */
- struct list_head rq_list;
+
+ union {
+ struct list_head rq_list; /* Slot allocation list */
+ struct list_head rq_recv; /* Receive queue */
+ };

void *rq_buffer; /* Call XDR encode buffer */
size_t rq_callsize;
@@ -249,7 +253,8 @@ struct rpc_xprt {
struct list_head bc_pa_list; /* List of preallocated
* backchannel rpc_rqst's */
#endif /* CONFIG_SUNRPC_BACKCHANNEL */
- struct list_head recv;
+
+ struct list_head recv_queue; /* Receive queue */

struct {
unsigned long bind_count, /* total number of binds */
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index fe857ab18ee2..b242a1c78f8a 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -708,7 +708,7 @@ static void
xprt_schedule_autodisconnect(struct rpc_xprt *xprt)
__must_hold(&xprt->transport_lock)
{
- if (list_empty(&xprt->recv) && xprt_has_timer(xprt))
+ if (list_empty(&xprt->recv_queue) && xprt_has_timer(xprt))
mod_timer(&xprt->timer, xprt->last_used + xprt->idle_timeout);
}

@@ -718,7 +718,7 @@ xprt_init_autodisconnect(struct timer_list *t)
struct rpc_xprt *xprt = from_timer(xprt, t, timer);

spin_lock(&xprt->transport_lock);
- if (!list_empty(&xprt->recv))
+ if (!list_empty(&xprt->recv_queue))
goto out_abort;
/* Reset xprt->last_used to avoid connect/autodisconnect cycling */
xprt->last_used = jiffies;
@@ -848,7 +848,7 @@ struct rpc_rqst *xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid)
{
struct rpc_rqst *entry;

- list_for_each_entry(entry, &xprt->recv, rq_list)
+ list_for_each_entry(entry, &xprt->recv_queue, rq_recv)
if (entry->rq_xid == xid) {
trace_xprt_lookup_rqst(xprt, xid, 0);
entry->rq_rtt = ktime_sub(ktime_get(), entry->rq_xtime);
@@ -937,7 +937,7 @@ xprt_request_enqueue_receive(struct rpc_task *task)
sizeof(req->rq_private_buf));

/* Add request to the receive list */
- list_add_tail(&req->rq_list, &xprt->recv);
+ list_add_tail(&req->rq_recv, &xprt->recv_queue);
set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate);
spin_unlock(&xprt->queue_lock);

@@ -956,7 +956,7 @@ static void
xprt_request_dequeue_receive_locked(struct rpc_task *task)
{
if (test_and_clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate))
- list_del(&task->tk_rqstp->rq_list);
+ list_del(&task->tk_rqstp->rq_recv);
}

/**
@@ -1491,7 +1491,7 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net)
spin_lock_init(&xprt->queue_lock);

INIT_LIST_HEAD(&xprt->free);
- INIT_LIST_HEAD(&xprt->recv);
+ INIT_LIST_HEAD(&xprt->recv_queue);
#if defined(CONFIG_SUNRPC_BACKCHANNEL)
spin_lock_init(&xprt->bc_pa_lock);
INIT_LIST_HEAD(&xprt->bc_pa_list);
--
2.17.1

2018-09-17 18:31:53

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 35/44] SUNRPC: Convert xprt receive queue to use an rbtree

If the server is slow, we can find ourselves with quite a lot of entries
on the receive queue. Converting the search from an O(n) to O(log(n))
can make a significant difference, particularly since we have to hold
a number of locks while searching.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprt.h | 4 +-
net/sunrpc/xprt.c | 93 ++++++++++++++++++++++++++++++++-----
2 files changed, 84 insertions(+), 13 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 823860cce0bc..9be399020dab 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -85,7 +85,7 @@ struct rpc_rqst {

union {
struct list_head rq_list; /* Slot allocation list */
- struct list_head rq_recv; /* Receive queue */
+ struct rb_node rq_recv; /* Receive queue */
};

struct list_head rq_xmit; /* Send queue */
@@ -260,7 +260,7 @@ struct rpc_xprt {
* backchannel rpc_rqst's */
#endif /* CONFIG_SUNRPC_BACKCHANNEL */

- struct list_head recv_queue; /* Receive queue */
+ struct rb_root recv_queue; /* Receive queue */

struct {
unsigned long bind_count, /* total number of binds */
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index a1cb28a4adad..051638d5b39c 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -753,7 +753,7 @@ static void
xprt_schedule_autodisconnect(struct rpc_xprt *xprt)
__must_hold(&xprt->transport_lock)
{
- if (list_empty(&xprt->recv_queue) && xprt_has_timer(xprt))
+ if (RB_EMPTY_ROOT(&xprt->recv_queue) && xprt_has_timer(xprt))
mod_timer(&xprt->timer, xprt->last_used + xprt->idle_timeout);
}

@@ -763,7 +763,7 @@ xprt_init_autodisconnect(struct timer_list *t)
struct rpc_xprt *xprt = from_timer(xprt, t, timer);

spin_lock(&xprt->transport_lock);
- if (!list_empty(&xprt->recv_queue))
+ if (!RB_EMPTY_ROOT(&xprt->recv_queue))
goto out_abort;
/* Reset xprt->last_used to avoid connect/autodisconnect cycling */
xprt->last_used = jiffies;
@@ -880,6 +880,75 @@ static void xprt_connect_status(struct rpc_task *task)
}
}

+enum xprt_xid_rb_cmp {
+ XID_RB_EQUAL,
+ XID_RB_LEFT,
+ XID_RB_RIGHT,
+};
+static enum xprt_xid_rb_cmp
+xprt_xid_cmp(__be32 xid1, __be32 xid2)
+{
+ if (xid1 == xid2)
+ return XID_RB_EQUAL;
+ if ((__force u32)xid1 < (__force u32)xid2)
+ return XID_RB_LEFT;
+ return XID_RB_RIGHT;
+}
+
+static struct rpc_rqst *
+xprt_request_rb_find(struct rpc_xprt *xprt, __be32 xid)
+{
+ struct rb_node *n = xprt->recv_queue.rb_node;
+ struct rpc_rqst *req;
+
+ while (n != NULL) {
+ req = rb_entry(n, struct rpc_rqst, rq_recv);
+ switch (xprt_xid_cmp(xid, req->rq_xid)) {
+ case XID_RB_LEFT:
+ n = n->rb_left;
+ break;
+ case XID_RB_RIGHT:
+ n = n->rb_right;
+ break;
+ case XID_RB_EQUAL:
+ return req;
+ }
+ }
+ return NULL;
+}
+
+static void
+xprt_request_rb_insert(struct rpc_xprt *xprt, struct rpc_rqst *new)
+{
+ struct rb_node **p = &xprt->recv_queue.rb_node;
+ struct rb_node *n = NULL;
+ struct rpc_rqst *req;
+
+ while (*p != NULL) {
+ n = *p;
+ req = rb_entry(n, struct rpc_rqst, rq_recv);
+ switch(xprt_xid_cmp(new->rq_xid, req->rq_xid)) {
+ case XID_RB_LEFT:
+ p = &n->rb_left;
+ break;
+ case XID_RB_RIGHT:
+ p = &n->rb_right;
+ break;
+ case XID_RB_EQUAL:
+ WARN_ON_ONCE(new != req);
+ return;
+ }
+ }
+ rb_link_node(&new->rq_recv, n, p);
+ rb_insert_color(&new->rq_recv, &xprt->recv_queue);
+}
+
+static void
+xprt_request_rb_remove(struct rpc_xprt *xprt, struct rpc_rqst *req)
+{
+ rb_erase(&req->rq_recv, &xprt->recv_queue);
+}
+
/**
* xprt_lookup_rqst - find an RPC request corresponding to an XID
* @xprt: transport on which the original request was transmitted
@@ -891,12 +960,12 @@ struct rpc_rqst *xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid)
{
struct rpc_rqst *entry;

- list_for_each_entry(entry, &xprt->recv_queue, rq_recv)
- if (entry->rq_xid == xid) {
- trace_xprt_lookup_rqst(xprt, xid, 0);
- entry->rq_rtt = ktime_sub(ktime_get(), entry->rq_xtime);
- return entry;
- }
+ entry = xprt_request_rb_find(xprt, xid);
+ if (entry != NULL) {
+ trace_xprt_lookup_rqst(xprt, xid, 0);
+ entry->rq_rtt = ktime_sub(ktime_get(), entry->rq_xtime);
+ return entry;
+ }

dprintk("RPC: xprt_lookup_rqst did not find xid %08x\n",
ntohl(xid));
@@ -980,7 +1049,7 @@ xprt_request_enqueue_receive(struct rpc_task *task)
sizeof(req->rq_private_buf));

/* Add request to the receive list */
- list_add_tail(&req->rq_recv, &xprt->recv_queue);
+ xprt_request_rb_insert(xprt, req);
set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate);
spin_unlock(&xprt->queue_lock);

@@ -998,8 +1067,10 @@ xprt_request_enqueue_receive(struct rpc_task *task)
static void
xprt_request_dequeue_receive_locked(struct rpc_task *task)
{
+ struct rpc_rqst *req = task->tk_rqstp;
+
if (test_and_clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate))
- list_del(&task->tk_rqstp->rq_recv);
+ xprt_request_rb_remove(req->rq_xprt, req);
}

/**
@@ -1710,7 +1781,7 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net)
spin_lock_init(&xprt->queue_lock);

INIT_LIST_HEAD(&xprt->free);
- INIT_LIST_HEAD(&xprt->recv_queue);
+ xprt->recv_queue = RB_ROOT;
INIT_LIST_HEAD(&xprt->xmit_queue);
#if defined(CONFIG_SUNRPC_BACKCHANNEL)
spin_lock_init(&xprt->bc_pa_lock);
--
2.17.1

2018-09-17 18:31:55

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 36/44] SUNRPC: Fix priority queue fairness

Fix up the priority queue to not batch by owner, but by queue, so that
we allow '1 << priority' elements to be dequeued before switching to
the next priority queue.
The owner field is still used to wake up requests in round robin order
by owner to avoid single processes hogging the RPC layer by loading the
queues.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/sched.h | 2 -
net/sunrpc/sched.c | 109 +++++++++++++++++------------------
2 files changed, 54 insertions(+), 57 deletions(-)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 8840a420cf4c..7b540c066594 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -189,7 +189,6 @@ struct rpc_timer {
struct rpc_wait_queue {
spinlock_t lock;
struct list_head tasks[RPC_NR_PRIORITY]; /* task queue for each priority level */
- pid_t owner; /* process id of last task serviced */
unsigned char maxpriority; /* maximum priority (0 if queue is not a priority queue) */
unsigned char priority; /* current priority */
unsigned char nr; /* # tasks remaining for cookie */
@@ -205,7 +204,6 @@ struct rpc_wait_queue {
* from a single cookie. The aim is to improve
* performance of NFS operations such as read/write.
*/
-#define RPC_BATCH_COUNT 16
#define RPC_IS_PRIORITY(q) ((q)->maxpriority > 0)

/*
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 9a8ec012b449..57ca5bead1cb 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -99,64 +99,78 @@ __rpc_add_timer(struct rpc_wait_queue *queue, struct rpc_task *task)
list_add(&task->u.tk_wait.timer_list, &queue->timer_list.list);
}

-static void rpc_rotate_queue_owner(struct rpc_wait_queue *queue)
-{
- struct list_head *q = &queue->tasks[queue->priority];
- struct rpc_task *task;
-
- if (!list_empty(q)) {
- task = list_first_entry(q, struct rpc_task, u.tk_wait.list);
- if (task->tk_owner == queue->owner)
- list_move_tail(&task->u.tk_wait.list, q);
- }
-}
-
static void rpc_set_waitqueue_priority(struct rpc_wait_queue *queue, int priority)
{
if (queue->priority != priority) {
- /* Fairness: rotate the list when changing priority */
- rpc_rotate_queue_owner(queue);
queue->priority = priority;
+ queue->nr = 1U << priority;
}
}

-static void rpc_set_waitqueue_owner(struct rpc_wait_queue *queue, pid_t pid)
-{
- queue->owner = pid;
- queue->nr = RPC_BATCH_COUNT;
-}
-
static void rpc_reset_waitqueue_priority(struct rpc_wait_queue *queue)
{
rpc_set_waitqueue_priority(queue, queue->maxpriority);
- rpc_set_waitqueue_owner(queue, 0);
}

/*
- * Add new request to a priority queue.
+ * Add a request to a queue list
*/
-static void __rpc_add_wait_queue_priority(struct rpc_wait_queue *queue,
- struct rpc_task *task,
- unsigned char queue_priority)
+static void
+__rpc_list_enqueue_task(struct list_head *q, struct rpc_task *task)
{
- struct list_head *q;
struct rpc_task *t;

- INIT_LIST_HEAD(&task->u.tk_wait.links);
- if (unlikely(queue_priority > queue->maxpriority))
- queue_priority = queue->maxpriority;
- if (queue_priority > queue->priority)
- rpc_set_waitqueue_priority(queue, queue_priority);
- q = &queue->tasks[queue_priority];
list_for_each_entry(t, q, u.tk_wait.list) {
if (t->tk_owner == task->tk_owner) {
- list_add_tail(&task->u.tk_wait.list, &t->u.tk_wait.links);
+ list_add_tail(&task->u.tk_wait.links,
+ &t->u.tk_wait.links);
+ /* Cache the queue head in task->u.tk_wait.list */
+ task->u.tk_wait.list.next = q;
+ task->u.tk_wait.list.prev = NULL;
return;
}
}
+ INIT_LIST_HEAD(&task->u.tk_wait.links);
list_add_tail(&task->u.tk_wait.list, q);
}

+/*
+ * Remove request from a queue list
+ */
+static void
+__rpc_list_dequeue_task(struct rpc_task *task)
+{
+ struct list_head *q;
+ struct rpc_task *t;
+
+ if (task->u.tk_wait.list.prev == NULL) {
+ list_del(&task->u.tk_wait.links);
+ return;
+ }
+ if (!list_empty(&task->u.tk_wait.links)) {
+ t = list_first_entry(&task->u.tk_wait.links,
+ struct rpc_task,
+ u.tk_wait.links);
+ /* Assume __rpc_list_enqueue_task() cached the queue head */
+ q = t->u.tk_wait.list.next;
+ list_add_tail(&t->u.tk_wait.list, q);
+ list_del(&task->u.tk_wait.links);
+ }
+ list_del(&task->u.tk_wait.list);
+}
+
+/*
+ * Add new request to a priority queue.
+ */
+static void __rpc_add_wait_queue_priority(struct rpc_wait_queue *queue,
+ struct rpc_task *task,
+ unsigned char queue_priority)
+{
+ if (unlikely(queue_priority > queue->maxpriority))
+ queue_priority = queue->maxpriority;
+ __rpc_list_enqueue_task(&queue->tasks[queue_priority], task);
+}
+
/*
* Add new request to wait queue.
*
@@ -194,13 +208,7 @@ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue,
*/
static void __rpc_remove_wait_queue_priority(struct rpc_task *task)
{
- struct rpc_task *t;
-
- if (!list_empty(&task->u.tk_wait.links)) {
- t = list_entry(task->u.tk_wait.links.next, struct rpc_task, u.tk_wait.list);
- list_move(&t->u.tk_wait.list, &task->u.tk_wait.list);
- list_splice_init(&task->u.tk_wait.links, &t->u.tk_wait.links);
- }
+ __rpc_list_dequeue_task(task);
}

/*
@@ -212,7 +220,8 @@ static void __rpc_remove_wait_queue(struct rpc_wait_queue *queue, struct rpc_tas
__rpc_disable_timer(queue, task);
if (RPC_IS_PRIORITY(queue))
__rpc_remove_wait_queue_priority(task);
- list_del(&task->u.tk_wait.list);
+ else
+ list_del(&task->u.tk_wait.list);
queue->qlen--;
dprintk("RPC: %5u removed from queue %p \"%s\"\n",
task->tk_pid, queue, rpc_qname(queue));
@@ -545,17 +554,9 @@ static struct rpc_task *__rpc_find_next_queued_priority(struct rpc_wait_queue *q
* Service a batch of tasks from a single owner.
*/
q = &queue->tasks[queue->priority];
- if (!list_empty(q)) {
- task = list_entry(q->next, struct rpc_task, u.tk_wait.list);
- if (queue->owner == task->tk_owner) {
- if (--queue->nr)
- goto out;
- list_move_tail(&task->u.tk_wait.list, q);
- }
- /*
- * Check if we need to switch queues.
- */
- goto new_owner;
+ if (!list_empty(q) && --queue->nr) {
+ task = list_first_entry(q, struct rpc_task, u.tk_wait.list);
+ goto out;
}

/*
@@ -567,7 +568,7 @@ static struct rpc_task *__rpc_find_next_queued_priority(struct rpc_wait_queue *q
else
q = q - 1;
if (!list_empty(q)) {
- task = list_entry(q->next, struct rpc_task, u.tk_wait.list);
+ task = list_first_entry(q, struct rpc_task, u.tk_wait.list);
goto new_queue;
}
} while (q != &queue->tasks[queue->priority]);
@@ -577,8 +578,6 @@ static struct rpc_task *__rpc_find_next_queued_priority(struct rpc_wait_queue *q

new_queue:
rpc_set_waitqueue_priority(queue, (unsigned int)(q - &queue->tasks[0]));
-new_owner:
- rpc_set_waitqueue_owner(queue, task->tk_owner);
out:
return task;
}
--
2.17.1

2018-09-17 18:31:55

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 38/44] SUNRPC: Add a label for RPC calls that require allocation on receive

If the RPC call relies on the receive call allocating pages as buffers,
then let's label it so that we
a) Don't leak memory by allocating pages for requests that do not expect
this behaviour
b) Can optimise for the common case where calls do not require allocation.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs3xdr.c | 4 +++-
include/linux/sunrpc/xdr.h | 1 +
net/sunrpc/auth_gss/gss_rpc_xdr.c | 1 +
net/sunrpc/socklib.c | 2 +-
4 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index 64e4fa33d89f..d8c4c10b15f7 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -1364,10 +1364,12 @@ static void nfs3_xdr_enc_getacl3args(struct rpc_rqst *req,

encode_nfs_fh3(xdr, args->fh);
encode_uint32(xdr, args->mask);
- if (args->mask & (NFS_ACL | NFS_DFACL))
+ if (args->mask & (NFS_ACL | NFS_DFACL)) {
prepare_reply_buffer(req, args->pages, 0,
NFSACL_MAXPAGES << PAGE_SHIFT,
ACL3_getaclres_sz);
+ req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
+ }
}

static void nfs3_xdr_enc_setacl3args(struct rpc_rqst *req,
diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 2bd68177a442..431829233392 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -58,6 +58,7 @@ struct xdr_buf {
flags; /* Flags for data disposition */
#define XDRBUF_READ 0x01 /* target of file read */
#define XDRBUF_WRITE 0x02 /* source of file write */
+#define XDRBUF_SPARSE_PAGES 0x04 /* Page array is sparse */

unsigned int buflen, /* Total length of storage buffer */
len; /* Length of XDR encoded message */
diff --git a/net/sunrpc/auth_gss/gss_rpc_xdr.c b/net/sunrpc/auth_gss/gss_rpc_xdr.c
index 444380f968f1..006062ad5f58 100644
--- a/net/sunrpc/auth_gss/gss_rpc_xdr.c
+++ b/net/sunrpc/auth_gss/gss_rpc_xdr.c
@@ -784,6 +784,7 @@ void gssx_enc_accept_sec_context(struct rpc_rqst *req,
xdr_inline_pages(&req->rq_rcv_buf,
PAGE_SIZE/2 /* pretty arbitrary */,
arg->pages, 0 /* page base */, arg->npages * PAGE_SIZE);
+ req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
done:
if (err)
dprintk("RPC: gssx_enc_accept_sec_context: %d\n", err);
diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c
index f217c348b341..08f00a98151f 100644
--- a/net/sunrpc/socklib.c
+++ b/net/sunrpc/socklib.c
@@ -104,7 +104,7 @@ ssize_t xdr_partial_copy_from_skb(struct xdr_buf *xdr, unsigned int base, struct

/* ACL likes to be lazy in allocating pages - ACLs
* are small by default but can get huge. */
- if (unlikely(*ppage == NULL)) {
+ if ((xdr->flags & XDRBUF_SPARSE_PAGES) && *ppage == NULL) {
*ppage = alloc_page(GFP_ATOMIC);
if (unlikely(*ppage == NULL)) {
if (copied == 0)
--
2.17.1

2018-09-17 18:31:37

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code

Separate out the action of adding a request to the reply queue so that the
backchannel code can simply skip calling it altogether.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprt.h | 1 +
net/sunrpc/backchannel_rqst.c | 1 -
net/sunrpc/clnt.c | 5 ++
net/sunrpc/xprt.c | 126 +++++++++++++++++++-----------
net/sunrpc/xprtrdma/backchannel.c | 1 -
5 files changed, 88 insertions(+), 46 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index c25d0a5fda69..0250294c904a 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -334,6 +334,7 @@ void xprt_free_slot(struct rpc_xprt *xprt,
struct rpc_rqst *req);
void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task);
bool xprt_prepare_transmit(struct rpc_task *task);
+void xprt_request_enqueue_receive(struct rpc_task *task);
void xprt_transmit(struct rpc_task *task);
void xprt_end_transmit(struct rpc_task *task);
int xprt_adjust_timeout(struct rpc_rqst *req);
diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c
index 3c15a99b9700..fa5ba6ed3197 100644
--- a/net/sunrpc/backchannel_rqst.c
+++ b/net/sunrpc/backchannel_rqst.c
@@ -91,7 +91,6 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, gfp_t gfp_flags)
return NULL;

req->rq_xprt = xprt;
- INIT_LIST_HEAD(&req->rq_list);
INIT_LIST_HEAD(&req->rq_bc_list);

/* Preallocate one XDR receive buffer */
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index a858366cd15d..414966273a3f 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1962,6 +1962,11 @@ call_transmit(struct rpc_task *task)
return;
}
}
+
+ /* Add task to reply queue before transmission to avoid races */
+ if (rpc_reply_expected(task))
+ xprt_request_enqueue_receive(task);
+
if (!xprt_prepare_transmit(task))
return;
task->tk_action = call_transmit_status;
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 6e3d4b4ee79e..d8f870b5dd46 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -888,6 +888,61 @@ static void xprt_wait_on_pinned_rqst(struct rpc_rqst *req)
wait_var_event(&req->rq_pin, !xprt_is_pinned_rqst(req));
}

+static bool
+xprt_request_data_received(struct rpc_task *task)
+{
+ return !test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) &&
+ READ_ONCE(task->tk_rqstp->rq_reply_bytes_recvd) != 0;
+}
+
+static bool
+xprt_request_need_enqueue_receive(struct rpc_task *task, struct rpc_rqst *req)
+{
+ return !xprt_request_data_received(task);
+}
+
+/**
+ * xprt_request_enqueue_receive - Add an request to the receive queue
+ * @task: RPC task
+ *
+ */
+void
+xprt_request_enqueue_receive(struct rpc_task *task)
+{
+ struct rpc_rqst *req = task->tk_rqstp;
+ struct rpc_xprt *xprt = req->rq_xprt;
+
+ if (!xprt_request_need_enqueue_receive(task, req))
+ return;
+ spin_lock(&xprt->queue_lock);
+
+ /* Update the softirq receive buffer */
+ memcpy(&req->rq_private_buf, &req->rq_rcv_buf,
+ sizeof(req->rq_private_buf));
+
+ /* Add request to the receive list */
+ list_add_tail(&req->rq_list, &xprt->recv);
+ set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate);
+ spin_unlock(&xprt->queue_lock);
+
+ xprt_reset_majortimeo(req);
+ /* Turn off autodisconnect */
+ del_singleshot_timer_sync(&xprt->timer);
+}
+
+/**
+ * xprt_request_dequeue_receive_locked - Remove a request from the receive queue
+ * @task: RPC task
+ *
+ * Caller must hold xprt->queue_lock.
+ */
+static void
+xprt_request_dequeue_receive_locked(struct rpc_task *task)
+{
+ if (test_and_clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate))
+ list_del(&task->tk_rqstp->rq_list);
+}
+
/**
* xprt_update_rtt - Update RPC RTT statistics
* @task: RPC request that recently completed
@@ -927,24 +982,16 @@ void xprt_complete_rqst(struct rpc_task *task, int copied)

xprt->stat.recvs++;

- list_del_init(&req->rq_list);
req->rq_private_buf.len = copied;
/* Ensure all writes are done before we update */
/* req->rq_reply_bytes_recvd */
smp_wmb();
req->rq_reply_bytes_recvd = copied;
- clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate);
+ xprt_request_dequeue_receive_locked(task);
rpc_wake_up_queued_task(&xprt->pending, task);
}
EXPORT_SYMBOL_GPL(xprt_complete_rqst);

-static bool
-xprt_request_data_received(struct rpc_task *task)
-{
- return !test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) &&
- task->tk_rqstp->rq_reply_bytes_recvd != 0;
-}
-
static void xprt_timer(struct rpc_task *task)
{
struct rpc_rqst *req = task->tk_rqstp;
@@ -1018,32 +1065,15 @@ void xprt_transmit(struct rpc_task *task)

dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen);

- if (!req->rq_reply_bytes_recvd) {
-
+ if (!req->rq_bytes_sent) {
+ if (xprt_request_data_received(task))
+ return;
/* Verify that our message lies in the RPCSEC_GSS window */
- if (!req->rq_bytes_sent && rpcauth_xmit_need_reencode(task)) {
+ if (rpcauth_xmit_need_reencode(task)) {
task->tk_status = -EBADMSG;
return;
}
-
- if (list_empty(&req->rq_list) && rpc_reply_expected(task)) {
- /*
- * Add to the list only if we're expecting a reply
- */
- /* Update the softirq receive buffer */
- memcpy(&req->rq_private_buf, &req->rq_rcv_buf,
- sizeof(req->rq_private_buf));
- /* Add request to the receive list */
- spin_lock(&xprt->queue_lock);
- list_add_tail(&req->rq_list, &xprt->recv);
- set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate);
- spin_unlock(&xprt->queue_lock);
- xprt_reset_majortimeo(req);
- /* Turn off autodisconnect */
- del_singleshot_timer_sync(&xprt->timer);
- }
- } else if (xprt_request_data_received(task) && !req->rq_bytes_sent)
- return;
+ }

connect_cookie = xprt->connect_cookie;
status = xprt->ops->send_request(task);
@@ -1285,7 +1315,6 @@ xprt_request_init(struct rpc_task *task)
struct rpc_xprt *xprt = task->tk_xprt;
struct rpc_rqst *req = task->tk_rqstp;

- INIT_LIST_HEAD(&req->rq_list);
req->rq_timeout = task->tk_client->cl_timeout->to_initval;
req->rq_task = task;
req->rq_xprt = xprt;
@@ -1355,6 +1384,26 @@ void xprt_retry_reserve(struct rpc_task *task)
xprt_do_reserve(xprt, task);
}

+static void
+xprt_request_dequeue_all(struct rpc_task *task, struct rpc_rqst *req)
+{
+ struct rpc_xprt *xprt = req->rq_xprt;
+
+ if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) ||
+ xprt_is_pinned_rqst(req)) {
+ spin_lock(&xprt->queue_lock);
+ xprt_request_dequeue_receive_locked(task);
+ while (xprt_is_pinned_rqst(req)) {
+ set_bit(RPC_TASK_MSG_PIN_WAIT, &task->tk_runstate);
+ spin_unlock(&xprt->queue_lock);
+ xprt_wait_on_pinned_rqst(req);
+ spin_lock(&xprt->queue_lock);
+ clear_bit(RPC_TASK_MSG_PIN_WAIT, &task->tk_runstate);
+ }
+ spin_unlock(&xprt->queue_lock);
+ }
+}
+
/**
* xprt_release - release an RPC request slot
* @task: task which is finished with the slot
@@ -1379,18 +1428,7 @@ void xprt_release(struct rpc_task *task)
task->tk_ops->rpc_count_stats(task, task->tk_calldata);
else if (task->tk_client)
rpc_count_iostats(task, task->tk_client->cl_metrics);
- spin_lock(&xprt->queue_lock);
- if (!list_empty(&req->rq_list)) {
- list_del_init(&req->rq_list);
- if (xprt_is_pinned_rqst(req)) {
- set_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate);
- spin_unlock(&xprt->queue_lock);
- xprt_wait_on_pinned_rqst(req);
- spin_lock(&xprt->queue_lock);
- clear_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate);
- }
- }
- spin_unlock(&xprt->queue_lock);
+ xprt_request_dequeue_all(task, req);
spin_lock_bh(&xprt->transport_lock);
xprt->ops->release_xprt(xprt, task);
if (xprt->ops->release_request)
diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c
index 90adeff4c06b..ed58761e6b23 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -51,7 +51,6 @@ static int rpcrdma_bc_setup_reqs(struct rpcrdma_xprt *r_xprt,
rqst = &req->rl_slot;

rqst->rq_xprt = xprt;
- INIT_LIST_HEAD(&rqst->rq_list);
INIT_LIST_HEAD(&rqst->rq_bc_list);
__set_bit(RPC_BC_PA_IN_USE, &rqst->rq_bc_pa_state);
spin_lock_bh(&xprt->bc_pa_lock);
--
2.17.1

2018-09-17 18:31:52

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 32/44] SUNRPC: Clean up transport write space handling

Treat socket write space handling in the same way we now treat transport
congestion: by denying the XPRT_LOCK until the transport signals that it
has free buffer space.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/svc_xprt.h | 1 -
include/linux/sunrpc/xprt.h | 5 +-
net/sunrpc/clnt.c | 28 +++-----
net/sunrpc/svc_xprt.c | 2 -
net/sunrpc/xprt.c | 77 +++++++++++++---------
net/sunrpc/xprtrdma/rpc_rdma.c | 2 +-
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 7 +-
net/sunrpc/xprtsock.c | 33 ++++------
8 files changed, 73 insertions(+), 82 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index c3d72066d4b1..6b7a86c4d6e6 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -84,7 +84,6 @@ struct svc_xprt {
struct sockaddr_storage xpt_remote; /* remote peer's address */
size_t xpt_remotelen; /* length of address */
char xpt_remotebuf[INET6_ADDRSTRLEN + 10];
- struct rpc_wait_queue xpt_bc_pending; /* backchannel wait queue */
struct list_head xpt_users; /* callbacks on free */

struct net *xpt_net;
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 14c9b4d49fb4..5600242ccbf9 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -387,8 +387,8 @@ int xprt_load_transport(const char *);
void xprt_set_retrans_timeout_def(struct rpc_task *task);
void xprt_set_retrans_timeout_rtt(struct rpc_task *task);
void xprt_wake_pending_tasks(struct rpc_xprt *xprt, int status);
-void xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action action);
-void xprt_write_space(struct rpc_xprt *xprt);
+void xprt_wait_for_buffer_space(struct rpc_xprt *xprt);
+bool xprt_write_space(struct rpc_xprt *xprt);
void xprt_adjust_cwnd(struct rpc_xprt *xprt, struct rpc_task *task, int result);
struct rpc_rqst * xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid);
void xprt_update_rtt(struct rpc_task *task);
@@ -416,6 +416,7 @@ void xprt_unlock_connect(struct rpc_xprt *, void *);
#define XPRT_CLOSING (6)
#define XPRT_CONGESTED (9)
#define XPRT_CWND_WAIT (10)
+#define XPRT_WRITE_SPACE (11)

static inline void xprt_set_connected(struct rpc_xprt *xprt)
{
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index f03911f84953..0c4b2e7d791f 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1964,13 +1964,14 @@ call_transmit(struct rpc_task *task)
{
dprint_status(task);

+ task->tk_status = 0;
+ if (test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) {
+ if (!xprt_prepare_transmit(task))
+ return;
+ xprt_transmit(task);
+ }
task->tk_action = call_transmit_status;
- if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
- return;
-
- if (!xprt_prepare_transmit(task))
- return;
- xprt_transmit(task);
+ xprt_end_transmit(task);
}

/*
@@ -1986,7 +1987,6 @@ call_transmit_status(struct rpc_task *task)
* test first.
*/
if (task->tk_status == 0) {
- xprt_end_transmit(task);
xprt_request_wait_receive(task);
return;
}
@@ -1994,15 +1994,8 @@ call_transmit_status(struct rpc_task *task)
switch (task->tk_status) {
default:
dprint_status(task);
- xprt_end_transmit(task);
- break;
- case -EBADSLT:
- xprt_end_transmit(task);
- task->tk_action = call_transmit;
- task->tk_status = 0;
break;
case -EBADMSG:
- xprt_end_transmit(task);
task->tk_status = 0;
task->tk_action = call_encode;
break;
@@ -2015,6 +2008,7 @@ call_transmit_status(struct rpc_task *task)
case -ENOBUFS:
rpc_delay(task, HZ>>2);
/* fall through */
+ case -EBADSLT:
case -EAGAIN:
task->tk_action = call_transmit;
task->tk_status = 0;
@@ -2026,7 +2020,6 @@ call_transmit_status(struct rpc_task *task)
case -ENETUNREACH:
case -EPERM:
if (RPC_IS_SOFTCONN(task)) {
- xprt_end_transmit(task);
if (!task->tk_msg.rpc_proc->p_proc)
trace_xprt_ping(task->tk_xprt,
task->tk_status);
@@ -2069,9 +2062,6 @@ call_bc_transmit(struct rpc_task *task)

xprt_transmit(task);

- if (task->tk_status == -EAGAIN)
- goto out_retry;
-
xprt_end_transmit(task);
dprint_status(task);
switch (task->tk_status) {
@@ -2087,6 +2077,8 @@ call_bc_transmit(struct rpc_task *task)
case -ENOTCONN:
case -EPIPE:
break;
+ case -EAGAIN:
+ goto out_retry;
case -ETIMEDOUT:
/*
* Problem reaching the server. Disconnect and let the
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 5185efb9027b..87533fbb96cf 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -171,7 +171,6 @@ void svc_xprt_init(struct net *net, struct svc_xprt_class *xcl,
mutex_init(&xprt->xpt_mutex);
spin_lock_init(&xprt->xpt_lock);
set_bit(XPT_BUSY, &xprt->xpt_flags);
- rpc_init_wait_queue(&xprt->xpt_bc_pending, "xpt_bc_pending");
xprt->xpt_net = get_net(net);
strcpy(xprt->xpt_remotebuf, "uninitialized");
}
@@ -895,7 +894,6 @@ int svc_send(struct svc_rqst *rqstp)
else
len = xprt->xpt_ops->xpo_sendto(rqstp);
mutex_unlock(&xprt->xpt_mutex);
- rpc_wake_up(&xprt->xpt_bc_pending);
trace_svc_send(rqstp, len);
svc_xprt_release(rqstp);

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 6bdc10147297..e4d57f5be5e2 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -169,6 +169,17 @@ int xprt_load_transport(const char *transport_name)
}
EXPORT_SYMBOL_GPL(xprt_load_transport);

+static void xprt_clear_locked(struct rpc_xprt *xprt)
+{
+ xprt->snd_task = NULL;
+ if (!test_bit(XPRT_CLOSE_WAIT, &xprt->state)) {
+ smp_mb__before_atomic();
+ clear_bit(XPRT_LOCKED, &xprt->state);
+ smp_mb__after_atomic();
+ } else
+ queue_work(xprtiod_workqueue, &xprt->task_cleanup);
+}
+
/**
* xprt_reserve_xprt - serialize write access to transports
* @task: task that is requesting access to the transport
@@ -188,10 +199,14 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task)
return 1;
goto out_sleep;
}
+ if (test_bit(XPRT_WRITE_SPACE, &xprt->state))
+ goto out_unlock;
xprt->snd_task = task;

return 1;

+out_unlock:
+ xprt_clear_locked(xprt);
out_sleep:
dprintk("RPC: %5u failed to lock transport %p\n",
task->tk_pid, xprt);
@@ -208,17 +223,6 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task)
}
EXPORT_SYMBOL_GPL(xprt_reserve_xprt);

-static void xprt_clear_locked(struct rpc_xprt *xprt)
-{
- xprt->snd_task = NULL;
- if (!test_bit(XPRT_CLOSE_WAIT, &xprt->state)) {
- smp_mb__before_atomic();
- clear_bit(XPRT_LOCKED, &xprt->state);
- smp_mb__after_atomic();
- } else
- queue_work(xprtiod_workqueue, &xprt->task_cleanup);
-}
-
static bool
xprt_need_congestion_window_wait(struct rpc_xprt *xprt)
{
@@ -267,10 +271,13 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task)
xprt->snd_task = task;
return 1;
}
+ if (test_bit(XPRT_WRITE_SPACE, &xprt->state))
+ goto out_unlock;
if (!xprt_need_congestion_window_wait(xprt)) {
xprt->snd_task = task;
return 1;
}
+out_unlock:
xprt_clear_locked(xprt);
out_sleep:
dprintk("RPC: %5u failed to lock transport %p\n", task->tk_pid, xprt);
@@ -309,10 +316,12 @@ static void __xprt_lock_write_next(struct rpc_xprt *xprt)
{
if (test_and_set_bit(XPRT_LOCKED, &xprt->state))
return;
-
+ if (test_bit(XPRT_WRITE_SPACE, &xprt->state))
+ goto out_unlock;
if (rpc_wake_up_first_on_wq(xprtiod_workqueue, &xprt->sending,
__xprt_lock_write_func, xprt))
return;
+out_unlock:
xprt_clear_locked(xprt);
}

@@ -320,6 +329,8 @@ static void __xprt_lock_write_next_cong(struct rpc_xprt *xprt)
{
if (test_and_set_bit(XPRT_LOCKED, &xprt->state))
return;
+ if (test_bit(XPRT_WRITE_SPACE, &xprt->state))
+ goto out_unlock;
if (xprt_need_congestion_window_wait(xprt))
goto out_unlock;
if (rpc_wake_up_first_on_wq(xprtiod_workqueue, &xprt->sending,
@@ -510,39 +521,46 @@ EXPORT_SYMBOL_GPL(xprt_wake_pending_tasks);

/**
* xprt_wait_for_buffer_space - wait for transport output buffer to clear
- * @task: task to be put to sleep
- * @action: function pointer to be executed after wait
+ * @xprt: transport
*
* Note that we only set the timer for the case of RPC_IS_SOFT(), since
* we don't in general want to force a socket disconnection due to
* an incomplete RPC call transmission.
*/
-void xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action action)
+void xprt_wait_for_buffer_space(struct rpc_xprt *xprt)
{
- struct rpc_rqst *req = task->tk_rqstp;
- struct rpc_xprt *xprt = req->rq_xprt;
-
- task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0;
- rpc_sleep_on(&xprt->pending, task, action);
+ set_bit(XPRT_WRITE_SPACE, &xprt->state);
}
EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space);

+static bool
+xprt_clear_write_space_locked(struct rpc_xprt *xprt)
+{
+ if (test_and_clear_bit(XPRT_WRITE_SPACE, &xprt->state)) {
+ __xprt_lock_write_next(xprt);
+ dprintk("RPC: write space: waking waiting task on "
+ "xprt %p\n", xprt);
+ return true;
+ }
+ return false;
+}
+
/**
* xprt_write_space - wake the task waiting for transport output buffer space
* @xprt: transport with waiting tasks
*
* Can be called in a soft IRQ context, so xprt_write_space never sleeps.
*/
-void xprt_write_space(struct rpc_xprt *xprt)
+bool xprt_write_space(struct rpc_xprt *xprt)
{
+ bool ret;
+
+ if (!test_bit(XPRT_WRITE_SPACE, &xprt->state))
+ return false;
spin_lock_bh(&xprt->transport_lock);
- if (xprt->snd_task) {
- dprintk("RPC: write space: waking waiting task on "
- "xprt %p\n", xprt);
- rpc_wake_up_queued_task_on_wq(xprtiod_workqueue,
- &xprt->pending, xprt->snd_task);
- }
+ ret = xprt_clear_write_space_locked(xprt);
spin_unlock_bh(&xprt->transport_lock);
+ return ret;
}
EXPORT_SYMBOL_GPL(xprt_write_space);

@@ -653,6 +671,7 @@ void xprt_disconnect_done(struct rpc_xprt *xprt)
dprintk("RPC: disconnected transport %p\n", xprt);
spin_lock_bh(&xprt->transport_lock);
xprt_clear_connected(xprt);
+ xprt_clear_write_space_locked(xprt);
xprt_wake_pending_tasks(xprt, -EAGAIN);
spin_unlock_bh(&xprt->transport_lock);
}
@@ -1325,9 +1344,7 @@ xprt_transmit(struct rpc_task *task)
if (!xprt_request_data_received(task) ||
test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
continue;
- } else if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
- rpc_wake_up_queued_task(&xprt->pending, task);
- else
+ } else if (test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
task->tk_status = status;
break;
}
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 0020dc401215..53fa95d60015 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -866,7 +866,7 @@ rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst)
out_err:
switch (ret) {
case -EAGAIN:
- xprt_wait_for_buffer_space(rqst->rq_task, NULL);
+ xprt_wait_for_buffer_space(rqst->rq_xprt);
break;
case -ENOBUFS:
break;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index d1618c70edb4..35a8c3aab302 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -224,12 +224,7 @@ xprt_rdma_bc_send_request(struct rpc_rqst *rqst, struct rpc_task *task)
dprintk("svcrdma: sending bc call with xid: %08x\n",
be32_to_cpu(rqst->rq_xid));

- if (!mutex_trylock(&sxprt->xpt_mutex)) {
- rpc_sleep_on(&sxprt->xpt_bc_pending, task, NULL);
- if (!mutex_trylock(&sxprt->xpt_mutex))
- return -EAGAIN;
- rpc_wake_up_queued_task(&sxprt->xpt_bc_pending, task);
- }
+ mutex_lock(&sxprt->xpt_mutex);

ret = -ENOTCONN;
rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt);
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index f54e8110f4c6..ef8d0e81cbda 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -440,20 +440,12 @@ static int xs_sendpages(struct socket *sock, struct sockaddr *addr, int addrlen,
return err;
}

-static void xs_nospace_callback(struct rpc_task *task)
-{
- struct sock_xprt *transport = container_of(task->tk_rqstp->rq_xprt, struct sock_xprt, xprt);
-
- transport->inet->sk_write_pending--;
-}
-
/**
- * xs_nospace - place task on wait queue if transmit was incomplete
+ * xs_nospace - handle transmit was incomplete
* @req: pointer to RPC request
- * @task: task to put to sleep
*
*/
-static int xs_nospace(struct rpc_rqst *req, struct rpc_task *task)
+static int xs_nospace(struct rpc_rqst *req)
{
struct rpc_xprt *xprt = req->rq_xprt;
struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
@@ -461,7 +453,8 @@ static int xs_nospace(struct rpc_rqst *req, struct rpc_task *task)
int ret = -EAGAIN;

dprintk("RPC: %5u xmit incomplete (%u left of %u)\n",
- task->tk_pid, req->rq_slen - transport->xmit.offset,
+ req->rq_task->tk_pid,
+ req->rq_slen - transport->xmit.offset,
req->rq_slen);

/* Protect against races with write_space */
@@ -471,7 +464,7 @@ static int xs_nospace(struct rpc_rqst *req, struct rpc_task *task)
if (xprt_connected(xprt)) {
/* wait for more buffer space */
sk->sk_write_pending++;
- xprt_wait_for_buffer_space(task, xs_nospace_callback);
+ xprt_wait_for_buffer_space(xprt);
} else
ret = -ENOTCONN;

@@ -569,7 +562,7 @@ static int xs_local_send_request(struct rpc_rqst *req, struct rpc_task *task)
case -ENOBUFS:
break;
case -EAGAIN:
- status = xs_nospace(req, task);
+ status = xs_nospace(req);
break;
default:
dprintk("RPC: sendmsg returned unrecognized error %d\n",
@@ -642,7 +635,7 @@ static int xs_udp_send_request(struct rpc_rqst *req, struct rpc_task *task)
/* Should we call xs_close() here? */
break;
case -EAGAIN:
- status = xs_nospace(req, task);
+ status = xs_nospace(req);
break;
case -ENETUNREACH:
case -ENOBUFS:
@@ -765,7 +758,7 @@ static int xs_tcp_send_request(struct rpc_rqst *req, struct rpc_task *task)
/* Should we call xs_close() here? */
break;
case -EAGAIN:
- status = xs_nospace(req, task);
+ status = xs_nospace(req);
break;
case -ECONNRESET:
case -ECONNREFUSED:
@@ -1672,7 +1665,8 @@ static void xs_write_space(struct sock *sk)
if (!wq || test_and_clear_bit(SOCKWQ_ASYNC_NOSPACE, &wq->flags) == 0)
goto out;

- xprt_write_space(xprt);
+ if (xprt_write_space(xprt))
+ sk->sk_write_pending--;
out:
rcu_read_unlock();
}
@@ -2725,12 +2719,7 @@ static int bc_send_request(struct rpc_rqst *req, struct rpc_task *task)
* Grab the mutex to serialize data as the connection is shared
* with the fore channel
*/
- if (!mutex_trylock(&xprt->xpt_mutex)) {
- rpc_sleep_on(&xprt->xpt_bc_pending, task, NULL);
- if (!mutex_trylock(&xprt->xpt_mutex))
- return -EAGAIN;
- rpc_wake_up_queued_task(&xprt->xpt_bc_pending, task);
- }
+ mutex_lock(&xprt->xpt_mutex);
if (test_bit(XPT_DEAD, &xprt->xpt_flags))
len = -ENOTCONN;
else
--
2.17.1

2018-09-17 18:31:43

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 25/44] SUNRPC: Move RPC retransmission stat counter to xprt_transmit()

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/clnt.c | 6 ------
net/sunrpc/xprt.c | 19 ++++++++++++-------
2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 4ca23a6607ba..8dc3d33827c4 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1962,8 +1962,6 @@ call_connect_status(struct rpc_task *task)
static void
call_transmit(struct rpc_task *task)
{
- int is_retrans = RPC_WAS_SENT(task);
-
dprint_status(task);

task->tk_action = call_transmit_status;
@@ -1973,10 +1971,6 @@ call_transmit(struct rpc_task *task)
if (!xprt_prepare_transmit(task))
return;
xprt_transmit(task);
- if (task->tk_status < 0)
- return;
- if (is_retrans)
- task->tk_client->cl_stats->rpcretrans++;
}

/*
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 385ee9f64353..35f5df367591 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -191,8 +191,6 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task)
goto out_sleep;
}
xprt->snd_task = task;
- if (req != NULL)
- req->rq_ntrans++;

return 1;

@@ -247,7 +245,6 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task)
}
if (__xprt_get_cong(xprt, task)) {
xprt->snd_task = task;
- req->rq_ntrans++;
return 1;
}
xprt_clear_locked(xprt);
@@ -281,12 +278,8 @@ static inline int xprt_lock_write(struct rpc_xprt *xprt, struct rpc_task *task)
static bool __xprt_lock_write_func(struct rpc_task *task, void *data)
{
struct rpc_xprt *xprt = data;
- struct rpc_rqst *req;

- req = task->tk_rqstp;
xprt->snd_task = task;
- if (req)
- req->rq_ntrans++;
return true;
}

@@ -1152,6 +1145,7 @@ void xprt_transmit(struct rpc_task *task)
struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt;
unsigned int connect_cookie;
+ int is_retrans = RPC_WAS_SENT(task);
int status;

dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen);
@@ -1166,14 +1160,25 @@ void xprt_transmit(struct rpc_task *task)
}
}

+ /*
+ * Update req->rq_ntrans before transmitting to avoid races with
+ * xprt_update_rtt(), which needs to know that it is recording a
+ * reply to the first transmission.
+ */
+ req->rq_ntrans++;
+
connect_cookie = xprt->connect_cookie;
status = xprt->ops->send_request(req, task);
trace_xprt_transmit(xprt, req->rq_xid, status);
if (status != 0) {
+ req->rq_ntrans--;
task->tk_status = status;
return;
}

+ if (is_retrans)
+ task->tk_client->cl_stats->rpcretrans++;
+
xprt_inject_disconnect(xprt);

dprintk("RPC: %5u xmit complete\n", task->tk_pid);
--
2.17.1

2018-09-17 18:31:58

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators

Most of this code should also be reusable with other socket types.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xprtsock.h | 19 +-
include/trace/events/sunrpc.h | 15 +-
net/sunrpc/xprtsock.c | 694 +++++++++++++++-----------------
3 files changed, 335 insertions(+), 393 deletions(-)

diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
index 005cfb6e7238..458bfe0137f5 100644
--- a/include/linux/sunrpc/xprtsock.h
+++ b/include/linux/sunrpc/xprtsock.h
@@ -31,15 +31,16 @@ struct sock_xprt {
* State of TCP reply receive
*/
struct {
- __be32 fraghdr,
+ struct {
+ __be32 fraghdr,
xid,
calldir;
+ } __attribute__((packed));

u32 offset,
len;

- unsigned long copied,
- flags;
+ unsigned long copied;
} recv;

/*
@@ -76,21 +77,9 @@ struct sock_xprt {
void (*old_error_report)(struct sock *);
};

-/*
- * TCP receive state flags
- */
-#define TCP_RCV_LAST_FRAG (1UL << 0)
-#define TCP_RCV_COPY_FRAGHDR (1UL << 1)
-#define TCP_RCV_COPY_XID (1UL << 2)
-#define TCP_RCV_COPY_DATA (1UL << 3)
-#define TCP_RCV_READ_CALLDIR (1UL << 4)
-#define TCP_RCV_COPY_CALLDIR (1UL << 5)
-
/*
* TCP RPC flags
*/
-#define TCP_RPC_REPLY (1UL << 6)
-
#define XPRT_SOCK_CONNECTING 1U
#define XPRT_SOCK_DATA_READY (2)
#define XPRT_SOCK_UPD_TIMEOUT (3)
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 0aa347194e0f..19e08d12696c 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -497,16 +497,6 @@ TRACE_EVENT(xs_tcp_data_ready,
__get_str(port), __entry->err, __entry->total)
);

-#define rpc_show_sock_xprt_flags(flags) \
- __print_flags(flags, "|", \
- { TCP_RCV_LAST_FRAG, "TCP_RCV_LAST_FRAG" }, \
- { TCP_RCV_COPY_FRAGHDR, "TCP_RCV_COPY_FRAGHDR" }, \
- { TCP_RCV_COPY_XID, "TCP_RCV_COPY_XID" }, \
- { TCP_RCV_COPY_DATA, "TCP_RCV_COPY_DATA" }, \
- { TCP_RCV_READ_CALLDIR, "TCP_RCV_READ_CALLDIR" }, \
- { TCP_RCV_COPY_CALLDIR, "TCP_RCV_COPY_CALLDIR" }, \
- { TCP_RPC_REPLY, "TCP_RPC_REPLY" })
-
TRACE_EVENT(xs_tcp_data_recv,
TP_PROTO(struct sock_xprt *xs),

@@ -516,7 +506,6 @@ TRACE_EVENT(xs_tcp_data_recv,
__string(addr, xs->xprt.address_strings[RPC_DISPLAY_ADDR])
__string(port, xs->xprt.address_strings[RPC_DISPLAY_PORT])
__field(u32, xid)
- __field(unsigned long, flags)
__field(unsigned long, copied)
__field(unsigned int, reclen)
__field(unsigned long, offset)
@@ -526,15 +515,13 @@ TRACE_EVENT(xs_tcp_data_recv,
__assign_str(addr, xs->xprt.address_strings[RPC_DISPLAY_ADDR]);
__assign_str(port, xs->xprt.address_strings[RPC_DISPLAY_PORT]);
__entry->xid = be32_to_cpu(xs->recv.xid);
- __entry->flags = xs->recv.flags;
__entry->copied = xs->recv.copied;
__entry->reclen = xs->recv.len;
__entry->offset = xs->recv.offset;
),

- TP_printk("peer=[%s]:%s xid=0x%08x flags=%s copied=%lu reclen=%u offset=%lu",
+ TP_printk("peer=[%s]:%s xid=0x%08x copied=%lu reclen=%u offset=%lu",
__get_str(addr), __get_str(port), __entry->xid,
- rpc_show_sock_xprt_flags(__entry->flags),
__entry->copied, __entry->reclen, __entry->offset)
);

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index f16406228ead..5269ad98bb08 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -47,13 +47,13 @@
#include <net/checksum.h>
#include <net/udp.h>
#include <net/tcp.h>
+#include <linux/bvec.h>
+#include <linux/uio.h>

#include <trace/events/sunrpc.h>

#include "sunrpc.h"

-#define RPC_TCP_READ_CHUNK_SZ (3*512*1024)
-
static void xs_close(struct rpc_xprt *xprt);
static void xs_tcp_set_socket_timeouts(struct rpc_xprt *xprt,
struct socket *sock);
@@ -325,6 +325,320 @@ static void xs_free_peer_addresses(struct rpc_xprt *xprt)
}
}

+static size_t
+xs_alloc_sparse_pages(struct xdr_buf *buf, size_t want, gfp_t gfp)
+{
+ size_t i,n;
+
+ if (!(buf->flags & XDRBUF_SPARSE_PAGES))
+ return want;
+ if (want > buf->page_len)
+ want = buf->page_len;
+ n = (buf->page_base + want + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ for (i = 0; i < n; i++) {
+ if (buf->pages[i])
+ continue;
+ buf->bvec[i].bv_page = buf->pages[i] = alloc_page(gfp);
+ if (!buf->pages[i]) {
+ buf->page_len = (i * PAGE_SIZE) - buf->page_base;
+ return buf->page_len;
+ }
+ }
+ return want;
+}
+
+static ssize_t
+xs_sock_recvmsg(struct socket *sock, struct msghdr *msg, int flags, size_t seek)
+{
+ ssize_t ret;
+ if (seek != 0)
+ iov_iter_advance(&msg->msg_iter, seek);
+ ret = sock_recvmsg(sock, msg, flags);
+ return ret > 0 ? ret + seek : ret;
+}
+
+static ssize_t
+xs_read_kvec(struct socket *sock, struct msghdr *msg, int flags,
+ struct kvec *kvec, size_t count, size_t seek)
+{
+ iov_iter_kvec(&msg->msg_iter, READ | ITER_KVEC, kvec, 1, count);
+ return xs_sock_recvmsg(sock, msg, flags, seek);
+}
+
+static ssize_t
+xs_read_bvec(struct socket *sock, struct msghdr *msg, int flags,
+ struct bio_vec *bvec, unsigned long nr, size_t count,
+ size_t seek)
+{
+ iov_iter_bvec(&msg->msg_iter, READ | ITER_BVEC, bvec, nr, count);
+ return xs_sock_recvmsg(sock, msg, flags, seek);
+}
+
+static ssize_t
+xs_read_discard(struct socket *sock, struct msghdr *msg, int flags,
+ size_t count)
+{
+ struct kvec kvec = { 0 };
+ return xs_read_kvec(sock, msg, flags | MSG_TRUNC, &kvec, count, 0);
+}
+
+static ssize_t
+xs_read_xdr_buf(struct socket *sock, struct msghdr *msg, int flags,
+ struct xdr_buf *buf, size_t count, size_t seek, size_t *read)
+{
+ size_t want, seek_init = seek, offset = 0;
+ ssize_t ret;
+
+ if (seek < buf->head[0].iov_len) {
+ want = min_t(size_t, count, buf->head[0].iov_len);
+ ret = xs_read_kvec(sock, msg, flags, &buf->head[0], want, seek);
+ if (ret <= 0)
+ goto sock_err;
+ offset += ret;
+ if (offset == count || msg->msg_flags & (MSG_EOR|MSG_TRUNC))
+ goto out;
+ if (ret != want)
+ goto eagain;
+ seek = 0;
+ } else {
+ seek -= buf->head[0].iov_len;
+ offset += buf->head[0].iov_len;
+ }
+ if (buf->page_len && seek < buf->page_len) {
+ want = min_t(size_t, count - offset, buf->page_len);
+ want = xs_alloc_sparse_pages(buf, want, GFP_NOWAIT);
+ ret = xs_read_bvec(sock, msg, flags, buf->bvec,
+ xdr_buf_pagecount(buf),
+ want + buf->page_base,
+ seek + buf->page_base);
+ if (ret <= 0)
+ goto sock_err;
+ offset += ret;
+ if (offset == count || msg->msg_flags & (MSG_EOR|MSG_TRUNC))
+ goto out;
+ if (ret != want)
+ goto eagain;
+ seek = 0;
+ } else {
+ seek -= buf->page_len;
+ offset += buf->page_len;
+ }
+ if (buf->tail[0].iov_len && seek < buf->tail[0].iov_len) {
+ want = min_t(size_t, count - offset, buf->tail[0].iov_len);
+ ret = xs_read_kvec(sock, msg, flags, &buf->tail[0], want, seek);
+ if (ret <= 0)
+ goto sock_err;
+ offset += ret;
+ if (offset == count || msg->msg_flags & (MSG_EOR|MSG_TRUNC))
+ goto out;
+ if (ret != want)
+ goto eagain;
+ } else
+ offset += buf->tail[0].iov_len;
+ ret = -EMSGSIZE;
+ msg->msg_flags |= MSG_TRUNC;
+out:
+ *read = offset - seek_init;
+ return ret;
+eagain:
+ ret = -EAGAIN;
+ goto out;
+sock_err:
+ offset += seek;
+ goto out;
+}
+
+static void
+xs_read_header(struct sock_xprt *transport, struct xdr_buf *buf)
+{
+ if (!transport->recv.copied) {
+ if (buf->head[0].iov_len >= transport->recv.offset)
+ memcpy(buf->head[0].iov_base,
+ &transport->recv.xid,
+ transport->recv.offset);
+ transport->recv.copied = transport->recv.offset;
+ }
+}
+
+static bool
+xs_read_stream_request_done(struct sock_xprt *transport)
+{
+ return transport->recv.fraghdr & cpu_to_be32(RPC_LAST_STREAM_FRAGMENT);
+}
+
+static ssize_t
+xs_read_stream_request(struct sock_xprt *transport, struct msghdr *msg,
+ int flags, struct rpc_rqst *req)
+{
+ struct xdr_buf *buf = &req->rq_private_buf;
+ size_t want, read;
+ ssize_t ret;
+
+ xs_read_header(transport, buf);
+
+ want = transport->recv.len - transport->recv.offset;
+ ret = xs_read_xdr_buf(transport->sock, msg, flags, buf,
+ transport->recv.copied + want, transport->recv.copied,
+ &read);
+ transport->recv.offset += read;
+ transport->recv.copied += read;
+ if (transport->recv.offset == transport->recv.len) {
+ if (xs_read_stream_request_done(transport))
+ msg->msg_flags |= MSG_EOR;
+ return transport->recv.copied;
+ }
+
+ switch (ret) {
+ case -EMSGSIZE:
+ return transport->recv.copied;
+ case 0:
+ return -ESHUTDOWN;
+ default:
+ if (ret < 0)
+ return ret;
+ }
+ return -EAGAIN;
+}
+
+static size_t
+xs_read_stream_headersize(bool isfrag)
+{
+ if (isfrag)
+ return sizeof(__be32);
+ return 3 * sizeof(__be32);
+}
+
+static ssize_t
+xs_read_stream_header(struct sock_xprt *transport, struct msghdr *msg,
+ int flags, size_t want, size_t seek)
+{
+ struct kvec kvec = {
+ .iov_base = &transport->recv.fraghdr,
+ .iov_len = want,
+ };
+ return xs_read_kvec(transport->sock, msg, flags, &kvec, want, seek);
+}
+
+#if defined(CONFIG_SUNRPC_BACKCHANNEL)
+static ssize_t
+xs_read_stream_call(struct sock_xprt *transport, struct msghdr *msg, int flags)
+{
+ struct rpc_xprt *xprt = &transport->xprt;
+ struct rpc_rqst *req;
+ ssize_t ret;
+
+ /* Look up and lock the request corresponding to the given XID */
+ req = xprt_lookup_bc_request(xprt, transport->recv.xid);
+ if (!req) {
+ printk(KERN_WARNING "Callback slot table overflowed\n");
+ return -ESHUTDOWN;
+ }
+
+ ret = xs_read_stream_request(transport, msg, flags, req);
+ if (msg->msg_flags & (MSG_EOR|MSG_TRUNC))
+ xprt_complete_bc_request(req, ret);
+
+ return ret;
+}
+#else /* CONFIG_SUNRPC_BACKCHANNEL */
+static ssize_t
+xs_read_stream_call(struct sock_xprt *transport, struct msghdr *msg, int flags)
+{
+ return -ESHUTDOWN;
+}
+#endif /* CONFIG_SUNRPC_BACKCHANNEL */
+
+static ssize_t
+xs_read_stream_reply(struct sock_xprt *transport, struct msghdr *msg, int flags)
+{
+ struct rpc_xprt *xprt = &transport->xprt;
+ struct rpc_rqst *req;
+ ssize_t ret = 0;
+
+ /* Look up and lock the request corresponding to the given XID */
+ spin_lock(&xprt->queue_lock);
+ req = xprt_lookup_rqst(xprt, transport->recv.xid);
+ if (!req) {
+ msg->msg_flags |= MSG_TRUNC;
+ goto out;
+ }
+ xprt_pin_rqst(req);
+ spin_unlock(&xprt->queue_lock);
+
+ ret = xs_read_stream_request(transport, msg, flags, req);
+
+ spin_lock(&xprt->queue_lock);
+ if (msg->msg_flags & (MSG_EOR|MSG_TRUNC))
+ xprt_complete_rqst(req->rq_task, ret);
+ xprt_unpin_rqst(req);
+out:
+ spin_unlock(&xprt->queue_lock);
+ return ret;
+}
+
+static ssize_t
+xs_read_stream(struct sock_xprt *transport, int flags)
+{
+ struct msghdr msg = { 0 };
+ size_t want, read = 0;
+ ssize_t ret = 0;
+
+ if (transport->recv.len == 0) {
+ want = xs_read_stream_headersize(transport->recv.copied != 0);
+ ret = xs_read_stream_header(transport, &msg, flags, want,
+ transport->recv.offset);
+ if (ret <= 0)
+ goto out_err;
+ transport->recv.offset = ret;
+ if (ret != want) {
+ ret = -EAGAIN;
+ goto out_err;
+ }
+ transport->recv.len = be32_to_cpu(transport->recv.fraghdr) &
+ RPC_FRAGMENT_SIZE_MASK;
+ transport->recv.offset -= sizeof(transport->recv.fraghdr);
+ read = ret;
+ }
+
+ switch (be32_to_cpu(transport->recv.calldir)) {
+ case RPC_CALL:
+ ret = xs_read_stream_call(transport, &msg, flags);
+ break;
+ case RPC_REPLY:
+ ret = xs_read_stream_reply(transport, &msg, flags);
+ }
+ if (msg.msg_flags & MSG_TRUNC) {
+ transport->recv.calldir = cpu_to_be32(-1);
+ transport->recv.copied = -1;
+ }
+ if (ret < 0)
+ goto out_err;
+ read += ret;
+ if (transport->recv.offset < transport->recv.len) {
+ ret = xs_read_discard(transport->sock, &msg, flags,
+ transport->recv.len - transport->recv.offset);
+ if (ret <= 0)
+ goto out_err;
+ transport->recv.offset += ret;
+ read += ret;
+ }
+ if (xs_read_stream_request_done(transport)) {
+ trace_xs_tcp_data_recv(transport);
+ transport->recv.copied = 0;
+ }
+ transport->recv.offset = 0;
+ transport->recv.len = 0;
+ return read;
+out_err:
+ switch (ret) {
+ case 0:
+ case -ESHUTDOWN:
+ xprt_force_disconnect(&transport->xprt);
+ return -ESHUTDOWN;
+ }
+ return ret;
+}
+
#define XS_SENDMSG_FLAGS (MSG_DONTWAIT | MSG_NOSIGNAL)

static int xs_send_kvec(struct socket *sock, struct sockaddr *addr, int addrlen, struct kvec *vec, unsigned int base, int more)
@@ -484,6 +798,12 @@ static int xs_nospace(struct rpc_rqst *req)
return ret;
}

+static void
+xs_stream_prepare_request(struct rpc_rqst *req)
+{
+ req->rq_task->tk_status = xdr_alloc_bvec(&req->rq_rcv_buf, GFP_NOIO);
+}
+
/*
* Determine if the previous message in the stream was aborted before it
* could complete transmission.
@@ -1157,263 +1477,7 @@ static void xs_tcp_force_close(struct rpc_xprt *xprt)
xprt_force_disconnect(xprt);
}

-static inline void xs_tcp_read_fraghdr(struct rpc_xprt *xprt, struct xdr_skb_reader *desc)
-{
- struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
- size_t len, used;
- char *p;
-
- p = ((char *) &transport->recv.fraghdr) + transport->recv.offset;
- len = sizeof(transport->recv.fraghdr) - transport->recv.offset;
- used = xdr_skb_read_bits(desc, p, len);
- transport->recv.offset += used;
- if (used != len)
- return;
-
- transport->recv.len = ntohl(transport->recv.fraghdr);
- if (transport->recv.len & RPC_LAST_STREAM_FRAGMENT)
- transport->recv.flags |= TCP_RCV_LAST_FRAG;
- else
- transport->recv.flags &= ~TCP_RCV_LAST_FRAG;
- transport->recv.len &= RPC_FRAGMENT_SIZE_MASK;
-
- transport->recv.flags &= ~TCP_RCV_COPY_FRAGHDR;
- transport->recv.offset = 0;
-
- /* Sanity check of the record length */
- if (unlikely(transport->recv.len < 8)) {
- dprintk("RPC: invalid TCP record fragment length\n");
- xs_tcp_force_close(xprt);
- return;
- }
- dprintk("RPC: reading TCP record fragment of length %d\n",
- transport->recv.len);
-}
-
-static void xs_tcp_check_fraghdr(struct sock_xprt *transport)
-{
- if (transport->recv.offset == transport->recv.len) {
- transport->recv.flags |= TCP_RCV_COPY_FRAGHDR;
- transport->recv.offset = 0;
- if (transport->recv.flags & TCP_RCV_LAST_FRAG) {
- transport->recv.flags &= ~TCP_RCV_COPY_DATA;
- transport->recv.flags |= TCP_RCV_COPY_XID;
- transport->recv.copied = 0;
- }
- }
-}
-
-static inline void xs_tcp_read_xid(struct sock_xprt *transport, struct xdr_skb_reader *desc)
-{
- size_t len, used;
- char *p;
-
- len = sizeof(transport->recv.xid) - transport->recv.offset;
- dprintk("RPC: reading XID (%zu bytes)\n", len);
- p = ((char *) &transport->recv.xid) + transport->recv.offset;
- used = xdr_skb_read_bits(desc, p, len);
- transport->recv.offset += used;
- if (used != len)
- return;
- transport->recv.flags &= ~TCP_RCV_COPY_XID;
- transport->recv.flags |= TCP_RCV_READ_CALLDIR;
- transport->recv.copied = 4;
- dprintk("RPC: reading %s XID %08x\n",
- (transport->recv.flags & TCP_RPC_REPLY) ? "reply for"
- : "request with",
- ntohl(transport->recv.xid));
- xs_tcp_check_fraghdr(transport);
-}
-
-static inline void xs_tcp_read_calldir(struct sock_xprt *transport,
- struct xdr_skb_reader *desc)
-{
- size_t len, used;
- u32 offset;
- char *p;
-
- /*
- * We want transport->recv.offset to be 8 at the end of this routine
- * (4 bytes for the xid and 4 bytes for the call/reply flag).
- * When this function is called for the first time,
- * transport->recv.offset is 4 (after having already read the xid).
- */
- offset = transport->recv.offset - sizeof(transport->recv.xid);
- len = sizeof(transport->recv.calldir) - offset;
- dprintk("RPC: reading CALL/REPLY flag (%zu bytes)\n", len);
- p = ((char *) &transport->recv.calldir) + offset;
- used = xdr_skb_read_bits(desc, p, len);
- transport->recv.offset += used;
- if (used != len)
- return;
- transport->recv.flags &= ~TCP_RCV_READ_CALLDIR;
- /*
- * We don't yet have the XDR buffer, so we will write the calldir
- * out after we get the buffer from the 'struct rpc_rqst'
- */
- switch (ntohl(transport->recv.calldir)) {
- case RPC_REPLY:
- transport->recv.flags |= TCP_RCV_COPY_CALLDIR;
- transport->recv.flags |= TCP_RCV_COPY_DATA;
- transport->recv.flags |= TCP_RPC_REPLY;
- break;
- case RPC_CALL:
- transport->recv.flags |= TCP_RCV_COPY_CALLDIR;
- transport->recv.flags |= TCP_RCV_COPY_DATA;
- transport->recv.flags &= ~TCP_RPC_REPLY;
- break;
- default:
- dprintk("RPC: invalid request message type\n");
- xs_tcp_force_close(&transport->xprt);
- }
- xs_tcp_check_fraghdr(transport);
-}
-
-static inline void xs_tcp_read_common(struct rpc_xprt *xprt,
- struct xdr_skb_reader *desc,
- struct rpc_rqst *req)
-{
- struct sock_xprt *transport =
- container_of(xprt, struct sock_xprt, xprt);
- struct xdr_buf *rcvbuf;
- size_t len;
- ssize_t r;
-
- rcvbuf = &req->rq_private_buf;
-
- if (transport->recv.flags & TCP_RCV_COPY_CALLDIR) {
- /*
- * Save the RPC direction in the XDR buffer
- */
- memcpy(rcvbuf->head[0].iov_base + transport->recv.copied,
- &transport->recv.calldir,
- sizeof(transport->recv.calldir));
- transport->recv.copied += sizeof(transport->recv.calldir);
- transport->recv.flags &= ~TCP_RCV_COPY_CALLDIR;
- }
-
- len = desc->count;
- if (len > transport->recv.len - transport->recv.offset)
- desc->count = transport->recv.len - transport->recv.offset;
- r = xdr_partial_copy_from_skb(rcvbuf, transport->recv.copied,
- desc, xdr_skb_read_bits);
-
- if (desc->count) {
- /* Error when copying to the receive buffer,
- * usually because we weren't able to allocate
- * additional buffer pages. All we can do now
- * is turn off TCP_RCV_COPY_DATA, so the request
- * will not receive any additional updates,
- * and time out.
- * Any remaining data from this record will
- * be discarded.
- */
- transport->recv.flags &= ~TCP_RCV_COPY_DATA;
- dprintk("RPC: XID %08x truncated request\n",
- ntohl(transport->recv.xid));
- dprintk("RPC: xprt = %p, recv.copied = %lu, "
- "recv.offset = %u, recv.len = %u\n",
- xprt, transport->recv.copied,
- transport->recv.offset, transport->recv.len);
- return;
- }
-
- transport->recv.copied += r;
- transport->recv.offset += r;
- desc->count = len - r;
-
- dprintk("RPC: XID %08x read %zd bytes\n",
- ntohl(transport->recv.xid), r);
- dprintk("RPC: xprt = %p, recv.copied = %lu, recv.offset = %u, "
- "recv.len = %u\n", xprt, transport->recv.copied,
- transport->recv.offset, transport->recv.len);
-
- if (transport->recv.copied == req->rq_private_buf.buflen)
- transport->recv.flags &= ~TCP_RCV_COPY_DATA;
- else if (transport->recv.offset == transport->recv.len) {
- if (transport->recv.flags & TCP_RCV_LAST_FRAG)
- transport->recv.flags &= ~TCP_RCV_COPY_DATA;
- }
-}
-
-/*
- * Finds the request corresponding to the RPC xid and invokes the common
- * tcp read code to read the data.
- */
-static inline int xs_tcp_read_reply(struct rpc_xprt *xprt,
- struct xdr_skb_reader *desc)
-{
- struct sock_xprt *transport =
- container_of(xprt, struct sock_xprt, xprt);
- struct rpc_rqst *req;
-
- dprintk("RPC: read reply XID %08x\n", ntohl(transport->recv.xid));
-
- /* Find and lock the request corresponding to this xid */
- spin_lock(&xprt->queue_lock);
- req = xprt_lookup_rqst(xprt, transport->recv.xid);
- if (!req) {
- dprintk("RPC: XID %08x request not found!\n",
- ntohl(transport->recv.xid));
- spin_unlock(&xprt->queue_lock);
- return -1;
- }
- xprt_pin_rqst(req);
- spin_unlock(&xprt->queue_lock);
-
- xs_tcp_read_common(xprt, desc, req);
-
- spin_lock(&xprt->queue_lock);
- if (!(transport->recv.flags & TCP_RCV_COPY_DATA))
- xprt_complete_rqst(req->rq_task, transport->recv.copied);
- xprt_unpin_rqst(req);
- spin_unlock(&xprt->queue_lock);
- return 0;
-}
-
#if defined(CONFIG_SUNRPC_BACKCHANNEL)
-/*
- * Obtains an rpc_rqst previously allocated and invokes the common
- * tcp read code to read the data. The result is placed in the callback
- * queue.
- * If we're unable to obtain the rpc_rqst we schedule the closing of the
- * connection and return -1.
- */
-static int xs_tcp_read_callback(struct rpc_xprt *xprt,
- struct xdr_skb_reader *desc)
-{
- struct sock_xprt *transport =
- container_of(xprt, struct sock_xprt, xprt);
- struct rpc_rqst *req;
-
- /* Look up the request corresponding to the given XID */
- req = xprt_lookup_bc_request(xprt, transport->recv.xid);
- if (req == NULL) {
- printk(KERN_WARNING "Callback slot table overflowed\n");
- xprt_force_disconnect(xprt);
- return -1;
- }
-
- dprintk("RPC: read callback XID %08x\n", ntohl(req->rq_xid));
- xs_tcp_read_common(xprt, desc, req);
-
- if (!(transport->recv.flags & TCP_RCV_COPY_DATA))
- xprt_complete_bc_request(req, transport->recv.copied);
-
- return 0;
-}
-
-static inline int _xs_tcp_read_data(struct rpc_xprt *xprt,
- struct xdr_skb_reader *desc)
-{
- struct sock_xprt *transport =
- container_of(xprt, struct sock_xprt, xprt);
-
- return (transport->recv.flags & TCP_RPC_REPLY) ?
- xs_tcp_read_reply(xprt, desc) :
- xs_tcp_read_callback(xprt, desc);
-}
-
static int xs_tcp_bc_up(struct svc_serv *serv, struct net *net)
{
int ret;
@@ -1429,106 +1493,14 @@ static size_t xs_tcp_bc_maxpayload(struct rpc_xprt *xprt)
{
return PAGE_SIZE;
}
-#else
-static inline int _xs_tcp_read_data(struct rpc_xprt *xprt,
- struct xdr_skb_reader *desc)
-{
- return xs_tcp_read_reply(xprt, desc);
-}
#endif /* CONFIG_SUNRPC_BACKCHANNEL */

-/*
- * Read data off the transport. This can be either an RPC_CALL or an
- * RPC_REPLY. Relay the processing to helper functions.
- */
-static void xs_tcp_read_data(struct rpc_xprt *xprt,
- struct xdr_skb_reader *desc)
-{
- struct sock_xprt *transport =
- container_of(xprt, struct sock_xprt, xprt);
-
- if (_xs_tcp_read_data(xprt, desc) == 0)
- xs_tcp_check_fraghdr(transport);
- else {
- /*
- * The transport_lock protects the request handling.
- * There's no need to hold it to update the recv.flags.
- */
- transport->recv.flags &= ~TCP_RCV_COPY_DATA;
- }
-}
-
-static inline void xs_tcp_read_discard(struct sock_xprt *transport, struct xdr_skb_reader *desc)
-{
- size_t len;
-
- len = transport->recv.len - transport->recv.offset;
- if (len > desc->count)
- len = desc->count;
- desc->count -= len;
- desc->offset += len;
- transport->recv.offset += len;
- dprintk("RPC: discarded %zu bytes\n", len);
- xs_tcp_check_fraghdr(transport);
-}
-
-static int xs_tcp_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb, unsigned int offset, size_t len)
-{
- struct rpc_xprt *xprt = rd_desc->arg.data;
- struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
- struct xdr_skb_reader desc = {
- .skb = skb,
- .offset = offset,
- .count = len,
- };
- size_t ret;
-
- dprintk("RPC: xs_tcp_data_recv started\n");
- do {
- trace_xs_tcp_data_recv(transport);
- /* Read in a new fragment marker if necessary */
- /* Can we ever really expect to get completely empty fragments? */
- if (transport->recv.flags & TCP_RCV_COPY_FRAGHDR) {
- xs_tcp_read_fraghdr(xprt, &desc);
- continue;
- }
- /* Read in the xid if necessary */
- if (transport->recv.flags & TCP_RCV_COPY_XID) {
- xs_tcp_read_xid(transport, &desc);
- continue;
- }
- /* Read in the call/reply flag */
- if (transport->recv.flags & TCP_RCV_READ_CALLDIR) {
- xs_tcp_read_calldir(transport, &desc);
- continue;
- }
- /* Read in the request data */
- if (transport->recv.flags & TCP_RCV_COPY_DATA) {
- xs_tcp_read_data(xprt, &desc);
- continue;
- }
- /* Skip over any trailing bytes on short reads */
- xs_tcp_read_discard(transport, &desc);
- } while (desc.count);
- ret = len - desc.count;
- if (ret < rd_desc->count)
- rd_desc->count -= ret;
- else
- rd_desc->count = 0;
- trace_xs_tcp_data_recv(transport);
- dprintk("RPC: xs_tcp_data_recv done\n");
- return ret;
-}
-
static void xs_tcp_data_receive(struct sock_xprt *transport)
{
struct rpc_xprt *xprt = &transport->xprt;
struct sock *sk;
- read_descriptor_t rd_desc = {
- .arg.data = xprt,
- };
- unsigned long total = 0;
- int read = 0;
+ size_t read = 0;
+ ssize_t ret = 0;

restart:
mutex_lock(&transport->recv_mutex);
@@ -1536,18 +1508,12 @@ static void xs_tcp_data_receive(struct sock_xprt *transport)
if (sk == NULL)
goto out;

- /* We use rd_desc to pass struct xprt to xs_tcp_data_recv */
for (;;) {
- rd_desc.count = RPC_TCP_READ_CHUNK_SZ;
- lock_sock(sk);
- read = tcp_read_sock(sk, &rd_desc, xs_tcp_data_recv);
- if (rd_desc.count != 0 || read < 0) {
- clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state);
- release_sock(sk);
+ clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state);
+ ret = xs_read_stream(transport, MSG_DONTWAIT | MSG_NOSIGNAL);
+ if (ret < 0)
break;
- }
- release_sock(sk);
- total += read;
+ read += ret;
if (need_resched()) {
mutex_unlock(&transport->recv_mutex);
cond_resched();
@@ -1558,7 +1524,7 @@ static void xs_tcp_data_receive(struct sock_xprt *transport)
queue_work(xprtiod_workqueue, &transport->recv_worker);
out:
mutex_unlock(&transport->recv_mutex);
- trace_xs_tcp_data_ready(xprt, read, total);
+ trace_xs_tcp_data_ready(xprt, ret, read);
}

static void xs_tcp_data_receive_workfn(struct work_struct *work)
@@ -2380,7 +2346,6 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
transport->recv.offset = 0;
transport->recv.len = 0;
transport->recv.copied = 0;
- transport->recv.flags = TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
transport->xmit.offset = 0;

/* Tell the socket layer to start connecting... */
@@ -2802,6 +2767,7 @@ static const struct rpc_xprt_ops xs_tcp_ops = {
.connect = xs_connect,
.buf_alloc = rpc_malloc,
.buf_free = rpc_free,
+ .prepare_request = xs_stream_prepare_request,
.send_request = xs_tcp_send_request,
.set_retrans_timeout = xprt_set_retrans_timeout_def,
.close = xs_tcp_shutdown,
--
2.17.1

2018-09-17 18:31:55

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 37/44] SUNRPC: Convert the xprt->sending queue back to an ordinary wait queue

We no longer need priority semantics on the xprt->sending queue, because
the order in which tasks are sent is now dictated by their position in
the send queue.
Note that the backlog queue remains a priority queue, meaning that
slot resources are still managed in order of task priority.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xprt.c | 20 +++-----------------
1 file changed, 3 insertions(+), 17 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 051638d5b39c..d1a67e97e7d3 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -192,7 +192,6 @@ static void xprt_clear_locked(struct rpc_xprt *xprt)
int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task)
{
struct rpc_rqst *req = task->tk_rqstp;
- int priority;

if (test_and_set_bit(XPRT_LOCKED, &xprt->state)) {
if (task == xprt->snd_task)
@@ -212,13 +211,7 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task)
task->tk_pid, xprt);
task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0;
task->tk_status = -EAGAIN;
- if (req == NULL)
- priority = RPC_PRIORITY_LOW;
- else if (!req->rq_ntrans)
- priority = RPC_PRIORITY_NORMAL;
- else
- priority = RPC_PRIORITY_HIGH;
- rpc_sleep_on_priority(&xprt->sending, task, NULL, priority);
+ rpc_sleep_on(&xprt->sending, task, NULL);
return 0;
}
EXPORT_SYMBOL_GPL(xprt_reserve_xprt);
@@ -260,7 +253,6 @@ xprt_test_and_clear_congestion_window_wait(struct rpc_xprt *xprt)
int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task)
{
struct rpc_rqst *req = task->tk_rqstp;
- int priority;

if (test_and_set_bit(XPRT_LOCKED, &xprt->state)) {
if (task == xprt->snd_task)
@@ -283,13 +275,7 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task)
dprintk("RPC: %5u failed to lock transport %p\n", task->tk_pid, xprt);
task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0;
task->tk_status = -EAGAIN;
- if (req == NULL)
- priority = RPC_PRIORITY_LOW;
- else if (!req->rq_ntrans)
- priority = RPC_PRIORITY_NORMAL;
- else
- priority = RPC_PRIORITY_HIGH;
- rpc_sleep_on_priority(&xprt->sending, task, NULL, priority);
+ rpc_sleep_on(&xprt->sending, task, NULL);
return 0;
}
EXPORT_SYMBOL_GPL(xprt_reserve_xprt_cong);
@@ -1795,7 +1781,7 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net)

rpc_init_wait_queue(&xprt->binding, "xprt_binding");
rpc_init_wait_queue(&xprt->pending, "xprt_pending");
- rpc_init_priority_wait_queue(&xprt->sending, "xprt_sending");
+ rpc_init_wait_queue(&xprt->sending, "xprt_sending");
rpc_init_priority_wait_queue(&xprt->backlog, "xprt_backlog");

xprt_init_xid(xprt);
--
2.17.1

2018-09-17 18:31:56

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v3 39/44] SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter()

Add a bvec array to struct xdr_buf, and have the client allocate it
when we need to receive data into pages.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/xdr.h | 7 +++++++
include/linux/sunrpc/xprt.h | 2 ++
net/sunrpc/clnt.c | 4 +++-
net/sunrpc/xdr.c | 34 ++++++++++++++++++++++++++++++++++
net/sunrpc/xprt.c | 17 +++++++++++++++++
5 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 431829233392..745587132a87 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -18,6 +18,7 @@
#include <asm/unaligned.h>
#include <linux/scatterlist.h>

+struct bio_vec;
struct rpc_rqst;

/*
@@ -52,6 +53,7 @@ struct xdr_buf {
struct kvec head[1], /* RPC header + non-page data */
tail[1]; /* Appended after page data */

+ struct bio_vec *bvec;
struct page ** pages; /* Array of pages */
unsigned int page_base, /* Start of page data */
page_len, /* Length of page data */
@@ -70,6 +72,8 @@ xdr_buf_init(struct xdr_buf *buf, void *start, size_t len)
buf->head[0].iov_base = start;
buf->head[0].iov_len = len;
buf->tail[0].iov_len = 0;
+ buf->bvec = NULL;
+ buf->pages = NULL;
buf->page_len = 0;
buf->flags = 0;
buf->len = 0;
@@ -116,6 +120,9 @@ __be32 *xdr_decode_netobj(__be32 *p, struct xdr_netobj *);
void xdr_inline_pages(struct xdr_buf *, unsigned int,
struct page **, unsigned int, unsigned int);
void xdr_terminate_string(struct xdr_buf *, const u32);
+size_t xdr_buf_pagecount(struct xdr_buf *buf);
+int xdr_alloc_bvec(struct xdr_buf *buf, gfp_t gfp);
+void xdr_free_bvec(struct xdr_buf *buf);

static inline __be32 *xdr_encode_array(__be32 *p, const void *s, unsigned int len)
{
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 9be399020dab..a4ab4f8d9140 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -141,6 +141,7 @@ struct rpc_xprt_ops {
void (*connect)(struct rpc_xprt *xprt, struct rpc_task *task);
int (*buf_alloc)(struct rpc_task *task);
void (*buf_free)(struct rpc_task *task);
+ void (*prepare_request)(struct rpc_rqst *req);
int (*send_request)(struct rpc_rqst *req);
void (*set_retrans_timeout)(struct rpc_task *task);
void (*timer)(struct rpc_xprt *xprt, struct rpc_task *task);
@@ -343,6 +344,7 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task);
void xprt_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task);
void xprt_free_slot(struct rpc_xprt *xprt,
struct rpc_rqst *req);
+void xprt_request_prepare(struct rpc_rqst *req);
bool xprt_prepare_transmit(struct rpc_task *task);
void xprt_request_enqueue_transmit(struct rpc_task *task);
void xprt_request_enqueue_receive(struct rpc_task *task);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 0c4b2e7d791f..ae3b8145da35 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1753,6 +1753,8 @@ rpc_xdr_encode(struct rpc_task *task)

task->tk_status = rpcauth_wrap_req(task, encode, req, p,
task->tk_msg.rpc_argp);
+ if (task->tk_status == 0)
+ xprt_request_prepare(req);
}

/*
@@ -1768,7 +1770,7 @@ call_encode(struct rpc_task *task)
/* Did the encode result in an error condition? */
if (task->tk_status != 0) {
/* Was the error nonfatal? */
- if (task->tk_status == -EAGAIN)
+ if (task->tk_status == -EAGAIN || task->tk_status == -ENOMEM)
rpc_delay(task, HZ >> 4);
else
rpc_exit(task, task->tk_status);
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 30afbd236656..2bbb8d38d2bf 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -15,6 +15,7 @@
#include <linux/errno.h>
#include <linux/sunrpc/xdr.h>
#include <linux/sunrpc/msg_prot.h>
+#include <linux/bvec.h>

/*
* XDR functions for basic NFS types
@@ -128,6 +129,39 @@ xdr_terminate_string(struct xdr_buf *buf, const u32 len)
}
EXPORT_SYMBOL_GPL(xdr_terminate_string);

+size_t
+xdr_buf_pagecount(struct xdr_buf *buf)
+{
+ if (!buf->page_len)
+ return 0;
+ return (buf->page_base + buf->page_len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+}
+
+int
+xdr_alloc_bvec(struct xdr_buf *buf, gfp_t gfp)
+{
+ size_t i, n = xdr_buf_pagecount(buf);
+
+ if (n != 0 && buf->bvec == NULL) {
+ buf->bvec = kmalloc_array(n, sizeof(buf->bvec[0]), gfp);
+ if (!buf->bvec)
+ return -ENOMEM;
+ for (i = 0; i < n; i++) {
+ buf->bvec[i].bv_page = buf->pages[i];
+ buf->bvec[i].bv_len = PAGE_SIZE;
+ buf->bvec[i].bv_offset = 0;
+ }
+ }
+ return 0;
+}
+
+void
+xdr_free_bvec(struct xdr_buf *buf)
+{
+ kfree(buf->bvec);
+ buf->bvec = NULL;
+}
+
void
xdr_inline_pages(struct xdr_buf *xdr, unsigned int offset,
struct page **pages, unsigned int base, unsigned int len)
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index d1a67e97e7d3..547519f25878 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1262,6 +1262,22 @@ xprt_request_dequeue_transmit(struct rpc_task *task)
spin_unlock(&xprt->queue_lock);
}

+/**
+ * xprt_request_prepare - prepare an encoded request for transport
+ * @req: pointer to rpc_rqst
+ *
+ * Calls into the transport layer to do whatever is needed to prepare
+ * the request for transmission or receive.
+ */
+void
+xprt_request_prepare(struct rpc_rqst *req)
+{
+ struct rpc_xprt *xprt = req->rq_xprt;
+
+ if (xprt->ops->prepare_request)
+ xprt->ops->prepare_request(req);
+}
+
/**
* xprt_request_need_retransmit - Test if a task needs retransmission
* @task: pointer to rpc_task
@@ -1726,6 +1742,7 @@ void xprt_release(struct rpc_task *task)
if (req->rq_buffer)
xprt->ops->buf_free(task);
xprt_inject_disconnect(xprt);
+ xdr_free_bvec(&req->rq_rcv_buf);
if (req->rq_cred != NULL)
put_rpccred(req->rq_cred);
task->tk_rqstp = NULL;
--
2.17.1

2018-09-18 02:14:23

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators

T24gTW9uLCAyMDE4LTA5LTE3IGF0IDA5OjAzIC0wNDAwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6
DQo+IE1vc3Qgb2YgdGhpcyBjb2RlIHNob3VsZCBhbHNvIGJlIHJldXNhYmxlIHdpdGggb3RoZXIg
c29ja2V0IHR5cGVzLg0KPiANCj4gU2lnbmVkLW9mZi1ieTogVHJvbmQgTXlrbGVidXN0IDx0cm9u
ZC5teWtsZWJ1c3RAaGFtbWVyc3BhY2UuY29tPg0KPiAtLS0NCj4gIGluY2x1ZGUvbGludXgvc3Vu
cnBjL3hwcnRzb2NrLmggfCAgMTkgKy0NCj4gIGluY2x1ZGUvdHJhY2UvZXZlbnRzL3N1bnJwYy5o
ICAgfCAgMTUgKy0NCj4gIG5ldC9zdW5ycGMveHBydHNvY2suYyAgICAgICAgICAgfCA2OTQgKysr
KysrKysrKysrKysrLS0tLS0tLS0tLS0tLS0tDQo+IC0tDQo+ICAzIGZpbGVzIGNoYW5nZWQsIDMz
NSBpbnNlcnRpb25zKCspLCAzOTMgZGVsZXRpb25zKC0pDQo+IA0KPiBkaWZmIC0tZ2l0IGEvaW5j
bHVkZS9saW51eC9zdW5ycGMveHBydHNvY2suaA0KPiBiL2luY2x1ZGUvbGludXgvc3VucnBjL3hw
cnRzb2NrLmgNCj4gaW5kZXggMDA1Y2ZiNmU3MjM4Li40NThiZmUwMTM3ZjUgMTAwNjQ0DQo+IC0t
LSBhL2luY2x1ZGUvbGludXgvc3VucnBjL3hwcnRzb2NrLmgNCj4gKysrIGIvaW5jbHVkZS9saW51
eC9zdW5ycGMveHBydHNvY2suaA0KPiBAQCAtMzEsMTUgKzMxLDE2IEBAIHN0cnVjdCBzb2NrX3hw
cnQgew0KPiAgCSAqIFN0YXRlIG9mIFRDUCByZXBseSByZWNlaXZlDQo+ICAJICovDQo+ICAJc3Ry
dWN0IHsNCj4gLQkJX19iZTMyCQlmcmFnaGRyLA0KPiArCQlzdHJ1Y3Qgew0KPiArCQkJX19iZTMy
CWZyYWdoZHIsDQo+ICAJCQkJeGlkLA0KPiAgCQkJCWNhbGxkaXI7DQo+ICsJCX0gX19hdHRyaWJ1
dGVfXygocGFja2VkKSk7DQo+ICANCj4gIAkJdTMyCQlvZmZzZXQsDQo+ICAJCQkJbGVuOw0KPiAg
DQo+IC0JCXVuc2lnbmVkIGxvbmcJY29waWVkLA0KPiAtCQkJCWZsYWdzOw0KPiArCQl1bnNpZ25l
ZCBsb25nCWNvcGllZDsNCj4gIAl9IHJlY3Y7DQo+ICANCj4gIAkvKg0KPiBAQCAtNzYsMjEgKzc3
LDkgQEAgc3RydWN0IHNvY2tfeHBydCB7DQo+ICAJdm9pZAkJCSgqb2xkX2Vycm9yX3JlcG9ydCko
c3RydWN0IHNvY2sgKik7DQo+ICB9Ow0KPiAgDQo+IC0vKg0KPiAtICogVENQIHJlY2VpdmUgc3Rh
dGUgZmxhZ3MNCj4gLSAqLw0KPiAtI2RlZmluZSBUQ1BfUkNWX0xBU1RfRlJBRwkoMVVMIDw8IDAp
DQo+IC0jZGVmaW5lIFRDUF9SQ1ZfQ09QWV9GUkFHSERSCSgxVUwgPDwgMSkNCj4gLSNkZWZpbmUg
VENQX1JDVl9DT1BZX1hJRAkoMVVMIDw8IDIpDQo+IC0jZGVmaW5lIFRDUF9SQ1ZfQ09QWV9EQVRB
CSgxVUwgPDwgMykNCj4gLSNkZWZpbmUgVENQX1JDVl9SRUFEX0NBTExESVIJKDFVTCA8PCA0KQ0K
PiAtI2RlZmluZSBUQ1BfUkNWX0NPUFlfQ0FMTERJUgkoMVVMIDw8IDUpDQo+IC0NCj4gIC8qDQo+
ICAgKiBUQ1AgUlBDIGZsYWdzDQo+ICAgKi8NCj4gLSNkZWZpbmUgVENQX1JQQ19SRVBMWQkJKDFV
TCA8PCA2KQ0KPiAtDQo+ICAjZGVmaW5lIFhQUlRfU09DS19DT05ORUNUSU5HCTFVDQo+ICAjZGVm
aW5lIFhQUlRfU09DS19EQVRBX1JFQURZCSgyKQ0KPiAgI2RlZmluZSBYUFJUX1NPQ0tfVVBEX1RJ
TUVPVVQJKDMpDQo+IGRpZmYgLS1naXQgYS9pbmNsdWRlL3RyYWNlL2V2ZW50cy9zdW5ycGMuaA0K
PiBiL2luY2x1ZGUvdHJhY2UvZXZlbnRzL3N1bnJwYy5oDQo+IGluZGV4IDBhYTM0NzE5NGUwZi4u
MTllMDhkMTI2OTZjIDEwMDY0NA0KPiAtLS0gYS9pbmNsdWRlL3RyYWNlL2V2ZW50cy9zdW5ycGMu
aA0KPiArKysgYi9pbmNsdWRlL3RyYWNlL2V2ZW50cy9zdW5ycGMuaA0KPiBAQCAtNDk3LDE2ICs0
OTcsNiBAQCBUUkFDRV9FVkVOVCh4c190Y3BfZGF0YV9yZWFkeSwNCj4gIAkJCV9fZ2V0X3N0cihw
b3J0KSwgX19lbnRyeS0+ZXJyLCBfX2VudHJ5LT50b3RhbCkNCj4gICk7DQo+ICANCj4gLSNkZWZp
bmUgcnBjX3Nob3dfc29ja194cHJ0X2ZsYWdzKGZsYWdzKSBcDQo+IC0JX19wcmludF9mbGFncyhm
bGFncywgInwiLCBcDQo+IC0JCXsgVENQX1JDVl9MQVNUX0ZSQUcsICJUQ1BfUkNWX0xBU1RfRlJB
RyIgfSwgXA0KPiAtCQl7IFRDUF9SQ1ZfQ09QWV9GUkFHSERSLCAiVENQX1JDVl9DT1BZX0ZSQUdI
RFIiIH0sIFwNCj4gLQkJeyBUQ1BfUkNWX0NPUFlfWElELCAiVENQX1JDVl9DT1BZX1hJRCIgfSwg
XA0KPiAtCQl7IFRDUF9SQ1ZfQ09QWV9EQVRBLCAiVENQX1JDVl9DT1BZX0RBVEEiIH0sIFwNCj4g
LQkJeyBUQ1BfUkNWX1JFQURfQ0FMTERJUiwgIlRDUF9SQ1ZfUkVBRF9DQUxMRElSIiB9LCBcDQo+
IC0JCXsgVENQX1JDVl9DT1BZX0NBTExESVIsICJUQ1BfUkNWX0NPUFlfQ0FMTERJUiIgfSwgXA0K
PiAtCQl7IFRDUF9SUENfUkVQTFksICJUQ1BfUlBDX1JFUExZIiB9KQ0KPiAtDQo+ICBUUkFDRV9F
VkVOVCh4c190Y3BfZGF0YV9yZWN2LA0KPiAgCVRQX1BST1RPKHN0cnVjdCBzb2NrX3hwcnQgKnhz
KSwNCj4gIA0KPiBAQCAtNTE2LDcgKzUwNiw2IEBAIFRSQUNFX0VWRU5UKHhzX3RjcF9kYXRhX3Jl
Y3YsDQo+ICAJCV9fc3RyaW5nKGFkZHIsIHhzLQ0KPiA+eHBydC5hZGRyZXNzX3N0cmluZ3NbUlBD
X0RJU1BMQVlfQUREUl0pDQo+ICAJCV9fc3RyaW5nKHBvcnQsIHhzLQ0KPiA+eHBydC5hZGRyZXNz
X3N0cmluZ3NbUlBDX0RJU1BMQVlfUE9SVF0pDQo+ICAJCV9fZmllbGQodTMyLCB4aWQpDQo+IC0J
CV9fZmllbGQodW5zaWduZWQgbG9uZywgZmxhZ3MpDQo+ICAJCV9fZmllbGQodW5zaWduZWQgbG9u
ZywgY29waWVkKQ0KPiAgCQlfX2ZpZWxkKHVuc2lnbmVkIGludCwgcmVjbGVuKQ0KPiAgCQlfX2Zp
ZWxkKHVuc2lnbmVkIGxvbmcsIG9mZnNldCkNCj4gQEAgLTUyNiwxNSArNTE1LDEzIEBAIFRSQUNF
X0VWRU5UKHhzX3RjcF9kYXRhX3JlY3YsDQo+ICAJCV9fYXNzaWduX3N0cihhZGRyLCB4cy0NCj4g
PnhwcnQuYWRkcmVzc19zdHJpbmdzW1JQQ19ESVNQTEFZX0FERFJdKTsNCj4gIAkJX19hc3NpZ25f
c3RyKHBvcnQsIHhzLQ0KPiA+eHBydC5hZGRyZXNzX3N0cmluZ3NbUlBDX0RJU1BMQVlfUE9SVF0p
Ow0KPiAgCQlfX2VudHJ5LT54aWQgPSBiZTMyX3RvX2NwdSh4cy0+cmVjdi54aWQpOw0KPiAtCQlf
X2VudHJ5LT5mbGFncyA9IHhzLT5yZWN2LmZsYWdzOw0KPiAgCQlfX2VudHJ5LT5jb3BpZWQgPSB4
cy0+cmVjdi5jb3BpZWQ7DQo+ICAJCV9fZW50cnktPnJlY2xlbiA9IHhzLT5yZWN2LmxlbjsNCj4g
IAkJX19lbnRyeS0+b2Zmc2V0ID0geHMtPnJlY3Yub2Zmc2V0Ow0KPiAgCSksDQo+ICANCj4gLQlU
UF9wcmludGsoInBlZXI9WyVzXTolcyB4aWQ9MHglMDh4IGZsYWdzPSVzIGNvcGllZD0lbHUNCj4g
cmVjbGVuPSV1IG9mZnNldD0lbHUiLA0KPiArCVRQX3ByaW50aygicGVlcj1bJXNdOiVzIHhpZD0w
eCUwOHggY29waWVkPSVsdSByZWNsZW49JXUNCj4gb2Zmc2V0PSVsdSIsDQo+ICAJCQlfX2dldF9z
dHIoYWRkciksIF9fZ2V0X3N0cihwb3J0KSwgX19lbnRyeS0+eGlkLA0KPiAtCQkJcnBjX3Nob3df
c29ja194cHJ0X2ZsYWdzKF9fZW50cnktPmZsYWdzKSwNCj4gIAkJCV9fZW50cnktPmNvcGllZCwg
X19lbnRyeS0+cmVjbGVuLCBfX2VudHJ5LQ0KPiA+b2Zmc2V0KQ0KPiAgKTsNCj4gIA0KPiBkaWZm
IC0tZ2l0IGEvbmV0L3N1bnJwYy94cHJ0c29jay5jIGIvbmV0L3N1bnJwYy94cHJ0c29jay5jDQo+
IGluZGV4IGYxNjQwNjIyOGVhZC4uNTI2OWFkOThiYjA4IDEwMDY0NA0KPiAtLS0gYS9uZXQvc3Vu
cnBjL3hwcnRzb2NrLmMNCj4gKysrIGIvbmV0L3N1bnJwYy94cHJ0c29jay5jDQo+IEBAIC00Nywx
MyArNDcsMTMgQEANCj4gICNpbmNsdWRlIDxuZXQvY2hlY2tzdW0uaD4NCj4gICNpbmNsdWRlIDxu
ZXQvdWRwLmg+DQo+ICAjaW5jbHVkZSA8bmV0L3RjcC5oPg0KPiArI2luY2x1ZGUgPGxpbnV4L2J2
ZWMuaD4NCj4gKyNpbmNsdWRlIDxsaW51eC91aW8uaD4NCj4gIA0KPiAgI2luY2x1ZGUgPHRyYWNl
L2V2ZW50cy9zdW5ycGMuaD4NCj4gIA0KPiAgI2luY2x1ZGUgInN1bnJwYy5oIg0KPiAgDQo+IC0j
ZGVmaW5lIFJQQ19UQ1BfUkVBRF9DSFVOS19TWgkoMyo1MTIqMTAyNCkNCj4gLQ0KPiAgc3RhdGlj
IHZvaWQgeHNfY2xvc2Uoc3RydWN0IHJwY194cHJ0ICp4cHJ0KTsNCj4gIHN0YXRpYyB2b2lkIHhz
X3RjcF9zZXRfc29ja2V0X3RpbWVvdXRzKHN0cnVjdCBycGNfeHBydCAqeHBydCwNCj4gIAkJc3Ry
dWN0IHNvY2tldCAqc29jayk7DQo+IEBAIC0zMjUsNiArMzI1LDMyMCBAQCBzdGF0aWMgdm9pZCB4
c19mcmVlX3BlZXJfYWRkcmVzc2VzKHN0cnVjdA0KPiBycGNfeHBydCAqeHBydCkNCj4gIAkJfQ0K
PiAgfQ0KPiAgDQo+ICtzdGF0aWMgc2l6ZV90DQo+ICt4c19hbGxvY19zcGFyc2VfcGFnZXMoc3Ry
dWN0IHhkcl9idWYgKmJ1Ziwgc2l6ZV90IHdhbnQsIGdmcF90IGdmcCkNCj4gK3sNCj4gKwlzaXpl
X3QgaSxuOw0KPiArDQo+ICsJaWYgKCEoYnVmLT5mbGFncyAmIFhEUkJVRl9TUEFSU0VfUEFHRVMp
KQ0KPiArCQlyZXR1cm4gd2FudDsNCj4gKwlpZiAod2FudCA+IGJ1Zi0+cGFnZV9sZW4pDQo+ICsJ
CXdhbnQgPSBidWYtPnBhZ2VfbGVuOw0KPiArCW4gPSAoYnVmLT5wYWdlX2Jhc2UgKyB3YW50ICsg
UEFHRV9TSVpFIC0gMSkgPj4gUEFHRV9TSElGVDsNCj4gKwlmb3IgKGkgPSAwOyBpIDwgbjsgaSsr
KSB7DQo+ICsJCWlmIChidWYtPnBhZ2VzW2ldKQ0KPiArCQkJY29udGludWU7DQo+ICsJCWJ1Zi0+
YnZlY1tpXS5idl9wYWdlID0gYnVmLT5wYWdlc1tpXSA9IGFsbG9jX3BhZ2UoZ2ZwKTsNCj4gKwkJ
aWYgKCFidWYtPnBhZ2VzW2ldKSB7DQo+ICsJCQlidWYtPnBhZ2VfbGVuID0gKGkgKiBQQUdFX1NJ
WkUpIC0gYnVmLQ0KPiA+cGFnZV9iYXNlOw0KPiArCQkJcmV0dXJuIGJ1Zi0+cGFnZV9sZW47DQo+
ICsJCX0NCj4gKwl9DQo+ICsJcmV0dXJuIHdhbnQ7DQo+ICt9DQo+ICsNCj4gK3N0YXRpYyBzc2l6
ZV90DQo+ICt4c19zb2NrX3JlY3Ztc2coc3RydWN0IHNvY2tldCAqc29jaywgc3RydWN0IG1zZ2hk
ciAqbXNnLCBpbnQgZmxhZ3MsDQo+IHNpemVfdCBzZWVrKQ0KPiArew0KPiArCXNzaXplX3QgcmV0
Ow0KPiArCWlmIChzZWVrICE9IDApDQo+ICsJCWlvdl9pdGVyX2FkdmFuY2UoJm1zZy0+bXNnX2l0
ZXIsIHNlZWspOw0KPiArCXJldCA9IHNvY2tfcmVjdm1zZyhzb2NrLCBtc2csIGZsYWdzKTsNCj4g
KwlyZXR1cm4gcmV0ID4gMCA/IHJldCArIHNlZWsgOiByZXQ7DQo+ICt9DQo+ICsNCj4gK3N0YXRp
YyBzc2l6ZV90DQo+ICt4c19yZWFkX2t2ZWMoc3RydWN0IHNvY2tldCAqc29jaywgc3RydWN0IG1z
Z2hkciAqbXNnLCBpbnQgZmxhZ3MsDQo+ICsJCXN0cnVjdCBrdmVjICprdmVjLCBzaXplX3QgY291
bnQsIHNpemVfdCBzZWVrKQ0KPiArew0KPiArCWlvdl9pdGVyX2t2ZWMoJm1zZy0+bXNnX2l0ZXIs
IFJFQUQgfCBJVEVSX0tWRUMsIGt2ZWMsIDEsDQo+IGNvdW50KTsNCj4gKwlyZXR1cm4geHNfc29j
a19yZWN2bXNnKHNvY2ssIG1zZywgZmxhZ3MsIHNlZWspOw0KPiArfQ0KPiArDQo+ICtzdGF0aWMg
c3NpemVfdA0KPiAreHNfcmVhZF9idmVjKHN0cnVjdCBzb2NrZXQgKnNvY2ssIHN0cnVjdCBtc2do
ZHIgKm1zZywgaW50IGZsYWdzLA0KPiArCQlzdHJ1Y3QgYmlvX3ZlYyAqYnZlYywgdW5zaWduZWQg
bG9uZyBuciwgc2l6ZV90IGNvdW50LA0KPiArCQlzaXplX3Qgc2VlaykNCj4gK3sNCj4gKwlpb3Zf
aXRlcl9idmVjKCZtc2ctPm1zZ19pdGVyLCBSRUFEIHwgSVRFUl9CVkVDLCBidmVjLCBuciwNCj4g
Y291bnQpOw0KPiArCXJldHVybiB4c19zb2NrX3JlY3Ztc2coc29jaywgbXNnLCBmbGFncywgc2Vl
ayk7DQo+ICt9DQo+ICsNCj4gK3N0YXRpYyBzc2l6ZV90DQo+ICt4c19yZWFkX2Rpc2NhcmQoc3Ry
dWN0IHNvY2tldCAqc29jaywgc3RydWN0IG1zZ2hkciAqbXNnLCBpbnQgZmxhZ3MsDQo+ICsJCXNp
emVfdCBjb3VudCkNCj4gK3sNCj4gKwlzdHJ1Y3Qga3ZlYyBrdmVjID0geyAwIH07DQo+ICsJcmV0
dXJuIHhzX3JlYWRfa3ZlYyhzb2NrLCBtc2csIGZsYWdzIHwgTVNHX1RSVU5DLCAma3ZlYywgY291
bnQsDQo+IDApOw0KPiArfQ0KPiArDQo+ICtzdGF0aWMgc3NpemVfdA0KPiAreHNfcmVhZF94ZHJf
YnVmKHN0cnVjdCBzb2NrZXQgKnNvY2ssIHN0cnVjdCBtc2doZHIgKm1zZywgaW50IGZsYWdzLA0K
PiArCQlzdHJ1Y3QgeGRyX2J1ZiAqYnVmLCBzaXplX3QgY291bnQsIHNpemVfdCBzZWVrLCBzaXpl
X3QNCj4gKnJlYWQpDQo+ICt7DQo+ICsJc2l6ZV90IHdhbnQsIHNlZWtfaW5pdCA9IHNlZWssIG9m
ZnNldCA9IDA7DQo+ICsJc3NpemVfdCByZXQ7DQo+ICsNCj4gKwlpZiAoc2VlayA8IGJ1Zi0+aGVh
ZFswXS5pb3ZfbGVuKSB7DQo+ICsJCXdhbnQgPSBtaW5fdChzaXplX3QsIGNvdW50LCBidWYtPmhl
YWRbMF0uaW92X2xlbik7DQo+ICsJCXJldCA9IHhzX3JlYWRfa3ZlYyhzb2NrLCBtc2csIGZsYWdz
LCAmYnVmLT5oZWFkWzBdLA0KPiB3YW50LCBzZWVrKTsNCj4gKwkJaWYgKHJldCA8PSAwKQ0KPiAr
CQkJZ290byBzb2NrX2VycjsNCj4gKwkJb2Zmc2V0ICs9IHJldDsNCj4gKwkJaWYgKG9mZnNldCA9
PSBjb3VudCB8fCBtc2ctPm1zZ19mbGFncyAmDQo+IChNU0dfRU9SfE1TR19UUlVOQykpDQo+ICsJ
CQlnb3RvIG91dDsNCj4gKwkJaWYgKHJldCAhPSB3YW50KQ0KPiArCQkJZ290byBlYWdhaW47DQo+
ICsJCXNlZWsgPSAwOw0KPiArCX0gZWxzZSB7DQo+ICsJCXNlZWsgLT0gYnVmLT5oZWFkWzBdLmlv
dl9sZW47DQo+ICsJCW9mZnNldCArPSBidWYtPmhlYWRbMF0uaW92X2xlbjsNCj4gKwl9DQo+ICsJ
aWYgKGJ1Zi0+cGFnZV9sZW4gJiYgc2VlayA8IGJ1Zi0+cGFnZV9sZW4pIHsNCj4gKwkJd2FudCA9
IG1pbl90KHNpemVfdCwgY291bnQgLSBvZmZzZXQsIGJ1Zi0+cGFnZV9sZW4pOw0KPiArCQl3YW50
ID0geHNfYWxsb2Nfc3BhcnNlX3BhZ2VzKGJ1Ziwgd2FudCwgR0ZQX05PV0FJVCk7DQo+ICsJCXJl
dCA9IHhzX3JlYWRfYnZlYyhzb2NrLCBtc2csIGZsYWdzLCBidWYtPmJ2ZWMsDQo+ICsJCQkJeGRy
X2J1Zl9wYWdlY291bnQoYnVmKSwNCj4gKwkJCQl3YW50ICsgYnVmLT5wYWdlX2Jhc2UsDQo+ICsJ
CQkJc2VlayArIGJ1Zi0+cGFnZV9iYXNlKTsNCj4gKwkJaWYgKHJldCA8PSAwKQ0KPiArCQkJZ290
byBzb2NrX2VycjsNCj4gKwkJb2Zmc2V0ICs9IHJldDsNCg0KVGhlcmUgaXMgYSBidWcgaGVyZSB0
aGF0IGhhcyBiZWVuIGZpeGVkIHVwIGluIHRoZSBsaW51eC1uZnMub3JnIHRlc3RpbmcNCmJyYW5j
aC4NCg0KPiArCQlpZiAob2Zmc2V0ID09IGNvdW50IHx8IG1zZy0+bXNnX2ZsYWdzICYNCj4gKE1T
R19FT1J8TVNHX1RSVU5DKSkNCj4gKwkJCWdvdG8gb3V0Ow0KPiArCQlpZiAocmV0ICE9IHdhbnQp
DQo+ICsJCQlnb3RvIGVhZ2FpbjsNCj4gKwkJc2VlayA9IDA7DQo+ICsJfSBlbHNlIHsNCj4gKwkJ
c2VlayAtPSBidWYtPnBhZ2VfbGVuOw0KPiArCQlvZmZzZXQgKz0gYnVmLT5wYWdlX2xlbjsNCj4g
Kwl9DQo+ICsJaWYgKGJ1Zi0+dGFpbFswXS5pb3ZfbGVuICYmIHNlZWsgPCBidWYtPnRhaWxbMF0u
aW92X2xlbikgew0KPiArCQl3YW50ID0gbWluX3Qoc2l6ZV90LCBjb3VudCAtIG9mZnNldCwgYnVm
LQ0KPiA+dGFpbFswXS5pb3ZfbGVuKTsNCj4gKwkJcmV0ID0geHNfcmVhZF9rdmVjKHNvY2ssIG1z
ZywgZmxhZ3MsICZidWYtPnRhaWxbMF0sDQo+IHdhbnQsIHNlZWspOw0KPiArCQlpZiAocmV0IDw9
IDApDQo+ICsJCQlnb3RvIHNvY2tfZXJyOw0KPiArCQlvZmZzZXQgKz0gcmV0Ow0KPiArCQlpZiAo
b2Zmc2V0ID09IGNvdW50IHx8IG1zZy0+bXNnX2ZsYWdzICYNCj4gKE1TR19FT1J8TVNHX1RSVU5D
KSkNCj4gKwkJCWdvdG8gb3V0Ow0KPiArCQlpZiAocmV0ICE9IHdhbnQpDQo+ICsJCQlnb3RvIGVh
Z2FpbjsNCj4gKwl9IGVsc2UNCj4gKwkJb2Zmc2V0ICs9IGJ1Zi0+dGFpbFswXS5pb3ZfbGVuOw0K
PiArCXJldCA9IC1FTVNHU0laRTsNCj4gKwltc2ctPm1zZ19mbGFncyB8PSBNU0dfVFJVTkM7DQo+
ICtvdXQ6DQo+ICsJKnJlYWQgPSBvZmZzZXQgLSBzZWVrX2luaXQ7DQo+ICsJcmV0dXJuIHJldDsN
Cj4gK2VhZ2FpbjoNCj4gKwlyZXQgPSAtRUFHQUlOOw0KPiArCWdvdG8gb3V0Ow0KPiArc29ja19l
cnI6DQo+ICsJb2Zmc2V0ICs9IHNlZWs7DQo+ICsJZ290byBvdXQ7DQo+ICt9DQo+ICsNCj4gK3N0
YXRpYyB2b2lkDQo+ICt4c19yZWFkX2hlYWRlcihzdHJ1Y3Qgc29ja194cHJ0ICp0cmFuc3BvcnQs
IHN0cnVjdCB4ZHJfYnVmICpidWYpDQo+ICt7DQo+ICsJaWYgKCF0cmFuc3BvcnQtPnJlY3YuY29w
aWVkKSB7DQo+ICsJCWlmIChidWYtPmhlYWRbMF0uaW92X2xlbiA+PSB0cmFuc3BvcnQtPnJlY3Yu
b2Zmc2V0KQ0KPiArCQkJbWVtY3B5KGJ1Zi0+aGVhZFswXS5pb3ZfYmFzZSwNCj4gKwkJCQkJJnRy
YW5zcG9ydC0+cmVjdi54aWQsDQo+ICsJCQkJCXRyYW5zcG9ydC0+cmVjdi5vZmZzZXQpOw0KPiAr
CQl0cmFuc3BvcnQtPnJlY3YuY29waWVkID0gdHJhbnNwb3J0LT5yZWN2Lm9mZnNldDsNCj4gKwl9
DQo+ICt9DQo+ICsNCj4gK3N0YXRpYyBib29sDQo+ICt4c19yZWFkX3N0cmVhbV9yZXF1ZXN0X2Rv
bmUoc3RydWN0IHNvY2tfeHBydCAqdHJhbnNwb3J0KQ0KPiArew0KPiArCXJldHVybiB0cmFuc3Bv
cnQtPnJlY3YuZnJhZ2hkciAmDQo+IGNwdV90b19iZTMyKFJQQ19MQVNUX1NUUkVBTV9GUkFHTUVO
VCk7DQo+ICt9DQo+ICsNCj4gK3N0YXRpYyBzc2l6ZV90DQo+ICt4c19yZWFkX3N0cmVhbV9yZXF1
ZXN0KHN0cnVjdCBzb2NrX3hwcnQgKnRyYW5zcG9ydCwgc3RydWN0IG1zZ2hkcg0KPiAqbXNnLA0K
PiArCQlpbnQgZmxhZ3MsIHN0cnVjdCBycGNfcnFzdCAqcmVxKQ0KPiArew0KPiArCXN0cnVjdCB4
ZHJfYnVmICpidWYgPSAmcmVxLT5ycV9wcml2YXRlX2J1ZjsNCj4gKwlzaXplX3Qgd2FudCwgcmVh
ZDsNCj4gKwlzc2l6ZV90IHJldDsNCj4gKw0KPiArCXhzX3JlYWRfaGVhZGVyKHRyYW5zcG9ydCwg
YnVmKTsNCj4gKw0KPiArCXdhbnQgPSB0cmFuc3BvcnQtPnJlY3YubGVuIC0gdHJhbnNwb3J0LT5y
ZWN2Lm9mZnNldDsNCj4gKwlyZXQgPSB4c19yZWFkX3hkcl9idWYodHJhbnNwb3J0LT5zb2NrLCBt
c2csIGZsYWdzLCBidWYsDQo+ICsJCQl0cmFuc3BvcnQtPnJlY3YuY29waWVkICsgd2FudCwgdHJh
bnNwb3J0LQ0KPiA+cmVjdi5jb3BpZWQsDQo+ICsJCQkmcmVhZCk7DQo+ICsJdHJhbnNwb3J0LT5y
ZWN2Lm9mZnNldCArPSByZWFkOw0KPiArCXRyYW5zcG9ydC0+cmVjdi5jb3BpZWQgKz0gcmVhZDsN
Cj4gKwlpZiAodHJhbnNwb3J0LT5yZWN2Lm9mZnNldCA9PSB0cmFuc3BvcnQtPnJlY3YubGVuKSB7
DQo+ICsJCWlmICh4c19yZWFkX3N0cmVhbV9yZXF1ZXN0X2RvbmUodHJhbnNwb3J0KSkNCj4gKwkJ
CW1zZy0+bXNnX2ZsYWdzIHw9IE1TR19FT1I7DQo+ICsJCXJldHVybiB0cmFuc3BvcnQtPnJlY3Yu
Y29waWVkOw0KPiArCX0NCj4gKw0KPiArCXN3aXRjaCAocmV0KSB7DQo+ICsJY2FzZSAtRU1TR1NJ
WkU6DQo+ICsJCXJldHVybiB0cmFuc3BvcnQtPnJlY3YuY29waWVkOw0KPiArCWNhc2UgMDoNCj4g
KwkJcmV0dXJuIC1FU0hVVERPV047DQo+ICsJZGVmYXVsdDoNCj4gKwkJaWYgKHJldCA8IDApDQo+
ICsJCQlyZXR1cm4gcmV0Ow0KPiArCX0NCj4gKwlyZXR1cm4gLUVBR0FJTjsNCj4gK30NCj4gKw0K
PiArc3RhdGljIHNpemVfdA0KPiAreHNfcmVhZF9zdHJlYW1faGVhZGVyc2l6ZShib29sIGlzZnJh
ZykNCj4gK3sNCj4gKwlpZiAoaXNmcmFnKQ0KPiArCQlyZXR1cm4gc2l6ZW9mKF9fYmUzMik7DQo+
ICsJcmV0dXJuIDMgKiBzaXplb2YoX19iZTMyKTsNCj4gK30NCj4gKw0KPiArc3RhdGljIHNzaXpl
X3QNCj4gK3hzX3JlYWRfc3RyZWFtX2hlYWRlcihzdHJ1Y3Qgc29ja194cHJ0ICp0cmFuc3BvcnQs
IHN0cnVjdCBtc2doZHINCj4gKm1zZywNCj4gKwkJaW50IGZsYWdzLCBzaXplX3Qgd2FudCwgc2l6
ZV90IHNlZWspDQo+ICt7DQo+ICsJc3RydWN0IGt2ZWMga3ZlYyA9IHsNCj4gKwkJLmlvdl9iYXNl
ID0gJnRyYW5zcG9ydC0+cmVjdi5mcmFnaGRyLA0KPiArCQkuaW92X2xlbiA9IHdhbnQsDQo+ICsJ
fTsNCj4gKwlyZXR1cm4geHNfcmVhZF9rdmVjKHRyYW5zcG9ydC0+c29jaywgbXNnLCBmbGFncywg
Jmt2ZWMsIHdhbnQsDQo+IHNlZWspOw0KPiArfQ0KPiArDQo+ICsjaWYgZGVmaW5lZChDT05GSUdf
U1VOUlBDX0JBQ0tDSEFOTkVMKQ0KPiArc3RhdGljIHNzaXplX3QNCj4gK3hzX3JlYWRfc3RyZWFt
X2NhbGwoc3RydWN0IHNvY2tfeHBydCAqdHJhbnNwb3J0LCBzdHJ1Y3QgbXNnaGRyICptc2csDQo+
IGludCBmbGFncykNCj4gK3sNCj4gKwlzdHJ1Y3QgcnBjX3hwcnQgKnhwcnQgPSAmdHJhbnNwb3J0
LT54cHJ0Ow0KPiArCXN0cnVjdCBycGNfcnFzdCAqcmVxOw0KPiArCXNzaXplX3QgcmV0Ow0KPiAr
DQo+ICsJLyogTG9vayB1cCBhbmQgbG9jayB0aGUgcmVxdWVzdCBjb3JyZXNwb25kaW5nIHRvIHRo
ZSBnaXZlbiBYSUQNCj4gKi8NCj4gKwlyZXEgPSB4cHJ0X2xvb2t1cF9iY19yZXF1ZXN0KHhwcnQs
IHRyYW5zcG9ydC0+cmVjdi54aWQpOw0KPiArCWlmICghcmVxKSB7DQo+ICsJCXByaW50ayhLRVJO
X1dBUk5JTkcgIkNhbGxiYWNrIHNsb3QgdGFibGUNCj4gb3ZlcmZsb3dlZFxuIik7DQo+ICsJCXJl
dHVybiAtRVNIVVRET1dOOw0KPiArCX0NCj4gKw0KPiArCXJldCA9IHhzX3JlYWRfc3RyZWFtX3Jl
cXVlc3QodHJhbnNwb3J0LCBtc2csIGZsYWdzLCByZXEpOw0KPiArCWlmIChtc2ctPm1zZ19mbGFn
cyAmIChNU0dfRU9SfE1TR19UUlVOQykpDQo+ICsJCXhwcnRfY29tcGxldGVfYmNfcmVxdWVzdChy
ZXEsIHJldCk7DQo+ICsNCj4gKwlyZXR1cm4gcmV0Ow0KPiArfQ0KPiArI2Vsc2UgLyogQ09ORklH
X1NVTlJQQ19CQUNLQ0hBTk5FTCAqLw0KPiArc3RhdGljIHNzaXplX3QNCj4gK3hzX3JlYWRfc3Ry
ZWFtX2NhbGwoc3RydWN0IHNvY2tfeHBydCAqdHJhbnNwb3J0LCBzdHJ1Y3QgbXNnaGRyICptc2cs
DQo+IGludCBmbGFncykNCj4gK3sNCj4gKwlyZXR1cm4gLUVTSFVURE9XTjsNCj4gK30NCj4gKyNl
bmRpZiAvKiBDT05GSUdfU1VOUlBDX0JBQ0tDSEFOTkVMICovDQo+ICsNCj4gK3N0YXRpYyBzc2l6
ZV90DQo+ICt4c19yZWFkX3N0cmVhbV9yZXBseShzdHJ1Y3Qgc29ja194cHJ0ICp0cmFuc3BvcnQs
IHN0cnVjdCBtc2doZHINCj4gKm1zZywgaW50IGZsYWdzKQ0KPiArew0KPiArCXN0cnVjdCBycGNf
eHBydCAqeHBydCA9ICZ0cmFuc3BvcnQtPnhwcnQ7DQo+ICsJc3RydWN0IHJwY19ycXN0ICpyZXE7
DQo+ICsJc3NpemVfdCByZXQgPSAwOw0KPiArDQo+ICsJLyogTG9vayB1cCBhbmQgbG9jayB0aGUg
cmVxdWVzdCBjb3JyZXNwb25kaW5nIHRvIHRoZSBnaXZlbiBYSUQNCj4gKi8NCj4gKwlzcGluX2xv
Y2soJnhwcnQtPnF1ZXVlX2xvY2spOw0KPiArCXJlcSA9IHhwcnRfbG9va3VwX3Jxc3QoeHBydCwg
dHJhbnNwb3J0LT5yZWN2LnhpZCk7DQo+ICsJaWYgKCFyZXEpIHsNCj4gKwkJbXNnLT5tc2dfZmxh
Z3MgfD0gTVNHX1RSVU5DOw0KPiArCQlnb3RvIG91dDsNCj4gKwl9DQo+ICsJeHBydF9waW5fcnFz
dChyZXEpOw0KPiArCXNwaW5fdW5sb2NrKCZ4cHJ0LT5xdWV1ZV9sb2NrKTsNCj4gKw0KPiArCXJl
dCA9IHhzX3JlYWRfc3RyZWFtX3JlcXVlc3QodHJhbnNwb3J0LCBtc2csIGZsYWdzLCByZXEpOw0K
PiArDQo+ICsJc3Bpbl9sb2NrKCZ4cHJ0LT5xdWV1ZV9sb2NrKTsNCj4gKwlpZiAobXNnLT5tc2df
ZmxhZ3MgJiAoTVNHX0VPUnxNU0dfVFJVTkMpKQ0KPiArCQl4cHJ0X2NvbXBsZXRlX3Jxc3QocmVx
LT5ycV90YXNrLCByZXQpOw0KPiArCXhwcnRfdW5waW5fcnFzdChyZXEpOw0KPiArb3V0Og0KPiAr
CXNwaW5fdW5sb2NrKCZ4cHJ0LT5xdWV1ZV9sb2NrKTsNCj4gKwlyZXR1cm4gcmV0Ow0KPiArfQ0K
PiArDQo+ICtzdGF0aWMgc3NpemVfdA0KPiAreHNfcmVhZF9zdHJlYW0oc3RydWN0IHNvY2tfeHBy
dCAqdHJhbnNwb3J0LCBpbnQgZmxhZ3MpDQo+ICt7DQo+ICsJc3RydWN0IG1zZ2hkciBtc2cgPSB7
IDAgfTsNCj4gKwlzaXplX3Qgd2FudCwgcmVhZCA9IDA7DQo+ICsJc3NpemVfdCByZXQgPSAwOw0K
PiArDQo+ICsJaWYgKHRyYW5zcG9ydC0+cmVjdi5sZW4gPT0gMCkgew0KPiArCQl3YW50ID0geHNf
cmVhZF9zdHJlYW1faGVhZGVyc2l6ZSh0cmFuc3BvcnQtPnJlY3YuY29waWVkIA0KPiAhPSAwKTsN
Cj4gKwkJcmV0ID0geHNfcmVhZF9zdHJlYW1faGVhZGVyKHRyYW5zcG9ydCwgJm1zZywgZmxhZ3Ms
DQo+IHdhbnQsDQo+ICsJCQkJdHJhbnNwb3J0LT5yZWN2Lm9mZnNldCk7DQo+ICsJCWlmIChyZXQg
PD0gMCkNCj4gKwkJCWdvdG8gb3V0X2VycjsNCj4gKwkJdHJhbnNwb3J0LT5yZWN2Lm9mZnNldCA9
IHJldDsNCj4gKwkJaWYgKHJldCAhPSB3YW50KSB7DQo+ICsJCQlyZXQgPSAtRUFHQUlOOw0KPiAr
CQkJZ290byBvdXRfZXJyOw0KPiArCQl9DQo+ICsJCXRyYW5zcG9ydC0+cmVjdi5sZW4gPSBiZTMy
X3RvX2NwdSh0cmFuc3BvcnQtDQo+ID5yZWN2LmZyYWdoZHIpICYNCj4gKwkJCVJQQ19GUkFHTUVO
VF9TSVpFX01BU0s7DQo+ICsJCXRyYW5zcG9ydC0+cmVjdi5vZmZzZXQgLT0gc2l6ZW9mKHRyYW5z
cG9ydC0NCj4gPnJlY3YuZnJhZ2hkcik7DQo+ICsJCXJlYWQgPSByZXQ7DQo+ICsJfQ0KPiArDQo+
ICsJc3dpdGNoIChiZTMyX3RvX2NwdSh0cmFuc3BvcnQtPnJlY3YuY2FsbGRpcikpIHsNCj4gKwlj
YXNlIFJQQ19DQUxMOg0KPiArCQlyZXQgPSB4c19yZWFkX3N0cmVhbV9jYWxsKHRyYW5zcG9ydCwg
Jm1zZywgZmxhZ3MpOw0KPiArCQlicmVhazsNCj4gKwljYXNlIFJQQ19SRVBMWToNCj4gKwkJcmV0
ID0geHNfcmVhZF9zdHJlYW1fcmVwbHkodHJhbnNwb3J0LCAmbXNnLCBmbGFncyk7DQo+ICsJfQ0K
PiArCWlmIChtc2cubXNnX2ZsYWdzICYgTVNHX1RSVU5DKSB7DQo+ICsJCXRyYW5zcG9ydC0+cmVj
di5jYWxsZGlyID0gY3B1X3RvX2JlMzIoLTEpOw0KPiArCQl0cmFuc3BvcnQtPnJlY3YuY29waWVk
ID0gLTE7DQo+ICsJfQ0KPiArCWlmIChyZXQgPCAwKQ0KPiArCQlnb3RvIG91dF9lcnI7DQo+ICsJ
cmVhZCArPSByZXQ7DQo+ICsJaWYgKHRyYW5zcG9ydC0+cmVjdi5vZmZzZXQgPCB0cmFuc3BvcnQt
PnJlY3YubGVuKSB7DQo+ICsJCXJldCA9IHhzX3JlYWRfZGlzY2FyZCh0cmFuc3BvcnQtPnNvY2ss
ICZtc2csIGZsYWdzLA0KPiArCQkJCXRyYW5zcG9ydC0+cmVjdi5sZW4gLSB0cmFuc3BvcnQtDQo+
ID5yZWN2Lm9mZnNldCk7DQo+ICsJCWlmIChyZXQgPD0gMCkNCj4gKwkJCWdvdG8gb3V0X2VycjsN
Cj4gKwkJdHJhbnNwb3J0LT5yZWN2Lm9mZnNldCArPSByZXQ7DQo+ICsJCXJlYWQgKz0gcmV0Ow0K
DQouLi5hbmQgYW5vdGhlciBidWcgaGVyZS4NCg0KPiArCX0NCj4gKwlpZiAoeHNfcmVhZF9zdHJl
YW1fcmVxdWVzdF9kb25lKHRyYW5zcG9ydCkpIHsNCj4gKwkJdHJhY2VfeHNfdGNwX2RhdGFfcmVj
dih0cmFuc3BvcnQpOw0KPiArCQl0cmFuc3BvcnQtPnJlY3YuY29waWVkID0gMDsNCj4gKwl9DQo+
ICsJdHJhbnNwb3J0LT5yZWN2Lm9mZnNldCA9IDA7DQo+ICsJdHJhbnNwb3J0LT5yZWN2LmxlbiA9
IDA7DQo+ICsJcmV0dXJuIHJlYWQ7DQo+ICtvdXRfZXJyOg0KPiArCXN3aXRjaCAocmV0KSB7DQo+
ICsJY2FzZSAwOg0KPiArCWNhc2UgLUVTSFVURE9XTjoNCj4gKwkJeHBydF9mb3JjZV9kaXNjb25u
ZWN0KCZ0cmFuc3BvcnQtPnhwcnQpOw0KPiArCQlyZXR1cm4gLUVTSFVURE9XTjsNCj4gKwl9DQo+
ICsJcmV0dXJuIHJldDsNCj4gK30NCj4gKw0KPiAgI2RlZmluZSBYU19TRU5ETVNHX0ZMQUdTCShN
U0dfRE9OVFdBSVQgfCBNU0dfTk9TSUdOQUwpDQo+ICANCj4gIHN0YXRpYyBpbnQgeHNfc2VuZF9r
dmVjKHN0cnVjdCBzb2NrZXQgKnNvY2ssIHN0cnVjdCBzb2NrYWRkciAqYWRkciwNCj4gaW50IGFk
ZHJsZW4sIHN0cnVjdCBrdmVjICp2ZWMsIHVuc2lnbmVkIGludCBiYXNlLCBpbnQgbW9yZSkNCj4g
QEAgLTQ4NCw2ICs3OTgsMTIgQEAgc3RhdGljIGludCB4c19ub3NwYWNlKHN0cnVjdCBycGNfcnFz
dCAqcmVxKQ0KPiAgCXJldHVybiByZXQ7DQo+ICB9DQo+ICANCj4gK3N0YXRpYyB2b2lkDQo+ICt4
c19zdHJlYW1fcHJlcGFyZV9yZXF1ZXN0KHN0cnVjdCBycGNfcnFzdCAqcmVxKQ0KPiArew0KPiAr
CXJlcS0+cnFfdGFzay0+dGtfc3RhdHVzID0geGRyX2FsbG9jX2J2ZWMoJnJlcS0+cnFfcmN2X2J1
ZiwNCj4gR0ZQX05PSU8pOw0KPiArfQ0KPiArDQo+ICAvKg0KPiAgICogRGV0ZXJtaW5lIGlmIHRo
ZSBwcmV2aW91cyBtZXNzYWdlIGluIHRoZSBzdHJlYW0gd2FzIGFib3J0ZWQNCj4gYmVmb3JlIGl0
DQo+ICAgKiBjb3VsZCBjb21wbGV0ZSB0cmFuc21pc3Npb24uDQo+IEBAIC0xMTU3LDI2MyArMTQ3
Nyw3IEBAIHN0YXRpYyB2b2lkIHhzX3RjcF9mb3JjZV9jbG9zZShzdHJ1Y3QNCj4gcnBjX3hwcnQg
KnhwcnQpDQo+ICAJeHBydF9mb3JjZV9kaXNjb25uZWN0KHhwcnQpOw0KPiAgfQ0KPiAgDQo+IC1z
dGF0aWMgaW5saW5lIHZvaWQgeHNfdGNwX3JlYWRfZnJhZ2hkcihzdHJ1Y3QgcnBjX3hwcnQgKnhw
cnQsIHN0cnVjdA0KPiB4ZHJfc2tiX3JlYWRlciAqZGVzYykNCj4gLXsNCj4gLQlzdHJ1Y3Qgc29j
a194cHJ0ICp0cmFuc3BvcnQgPSBjb250YWluZXJfb2YoeHBydCwgc3RydWN0DQo+IHNvY2tfeHBy
dCwgeHBydCk7DQo+IC0Jc2l6ZV90IGxlbiwgdXNlZDsNCj4gLQljaGFyICpwOw0KPiAtDQo+IC0J
cCA9ICgoY2hhciAqKSAmdHJhbnNwb3J0LT5yZWN2LmZyYWdoZHIpICsgdHJhbnNwb3J0LQ0KPiA+
cmVjdi5vZmZzZXQ7DQo+IC0JbGVuID0gc2l6ZW9mKHRyYW5zcG9ydC0+cmVjdi5mcmFnaGRyKSAt
IHRyYW5zcG9ydC0+cmVjdi5vZmZzZXQ7DQo+IC0JdXNlZCA9IHhkcl9za2JfcmVhZF9iaXRzKGRl
c2MsIHAsIGxlbik7DQo+IC0JdHJhbnNwb3J0LT5yZWN2Lm9mZnNldCArPSB1c2VkOw0KPiAtCWlm
ICh1c2VkICE9IGxlbikNCj4gLQkJcmV0dXJuOw0KPiAtDQo+IC0JdHJhbnNwb3J0LT5yZWN2Lmxl
biA9IG50b2hsKHRyYW5zcG9ydC0+cmVjdi5mcmFnaGRyKTsNCj4gLQlpZiAodHJhbnNwb3J0LT5y
ZWN2LmxlbiAmIFJQQ19MQVNUX1NUUkVBTV9GUkFHTUVOVCkNCj4gLQkJdHJhbnNwb3J0LT5yZWN2
LmZsYWdzIHw9IFRDUF9SQ1ZfTEFTVF9GUkFHOw0KPiAtCWVsc2UNCj4gLQkJdHJhbnNwb3J0LT5y
ZWN2LmZsYWdzICY9IH5UQ1BfUkNWX0xBU1RfRlJBRzsNCj4gLQl0cmFuc3BvcnQtPnJlY3YubGVu
ICY9IFJQQ19GUkFHTUVOVF9TSVpFX01BU0s7DQo+IC0NCj4gLQl0cmFuc3BvcnQtPnJlY3YuZmxh
Z3MgJj0gflRDUF9SQ1ZfQ09QWV9GUkFHSERSOw0KPiAtCXRyYW5zcG9ydC0+cmVjdi5vZmZzZXQg
PSAwOw0KPiAtDQo+IC0JLyogU2FuaXR5IGNoZWNrIG9mIHRoZSByZWNvcmQgbGVuZ3RoICovDQo+
IC0JaWYgKHVubGlrZWx5KHRyYW5zcG9ydC0+cmVjdi5sZW4gPCA4KSkgew0KPiAtCQlkcHJpbnRr
KCJSUEM6ICAgICAgIGludmFsaWQgVENQIHJlY29yZCBmcmFnbWVudA0KPiBsZW5ndGhcbiIpOw0K
PiAtCQl4c190Y3BfZm9yY2VfY2xvc2UoeHBydCk7DQo+IC0JCXJldHVybjsNCj4gLQl9DQo+IC0J
ZHByaW50aygiUlBDOiAgICAgICByZWFkaW5nIFRDUCByZWNvcmQgZnJhZ21lbnQgb2YgbGVuZ3Ro
DQo+ICVkXG4iLA0KPiAtCQkJdHJhbnNwb3J0LT5yZWN2Lmxlbik7DQo+IC19DQo+IC0NCj4gLXN0
YXRpYyB2b2lkIHhzX3RjcF9jaGVja19mcmFnaGRyKHN0cnVjdCBzb2NrX3hwcnQgKnRyYW5zcG9y
dCkNCj4gLXsNCj4gLQlpZiAodHJhbnNwb3J0LT5yZWN2Lm9mZnNldCA9PSB0cmFuc3BvcnQtPnJl
Y3YubGVuKSB7DQo+IC0JCXRyYW5zcG9ydC0+cmVjdi5mbGFncyB8PSBUQ1BfUkNWX0NPUFlfRlJB
R0hEUjsNCj4gLQkJdHJhbnNwb3J0LT5yZWN2Lm9mZnNldCA9IDA7DQo+IC0JCWlmICh0cmFuc3Bv
cnQtPnJlY3YuZmxhZ3MgJiBUQ1BfUkNWX0xBU1RfRlJBRykgew0KPiAtCQkJdHJhbnNwb3J0LT5y
ZWN2LmZsYWdzICY9IH5UQ1BfUkNWX0NPUFlfREFUQTsNCj4gLQkJCXRyYW5zcG9ydC0+cmVjdi5m
bGFncyB8PSBUQ1BfUkNWX0NPUFlfWElEOw0KPiAtCQkJdHJhbnNwb3J0LT5yZWN2LmNvcGllZCA9
IDA7DQo+IC0JCX0NCj4gLQl9DQo+IC19DQo+IC0NCj4gLXN0YXRpYyBpbmxpbmUgdm9pZCB4c190
Y3BfcmVhZF94aWQoc3RydWN0IHNvY2tfeHBydCAqdHJhbnNwb3J0LA0KPiBzdHJ1Y3QgeGRyX3Nr
Yl9yZWFkZXIgKmRlc2MpDQo+IC17DQo+IC0Jc2l6ZV90IGxlbiwgdXNlZDsNCj4gLQljaGFyICpw
Ow0KPiAtDQo+IC0JbGVuID0gc2l6ZW9mKHRyYW5zcG9ydC0+cmVjdi54aWQpIC0gdHJhbnNwb3J0
LT5yZWN2Lm9mZnNldDsNCj4gLQlkcHJpbnRrKCJSUEM6ICAgICAgIHJlYWRpbmcgWElEICglenUg
Ynl0ZXMpXG4iLCBsZW4pOw0KPiAtCXAgPSAoKGNoYXIgKikgJnRyYW5zcG9ydC0+cmVjdi54aWQp
ICsgdHJhbnNwb3J0LT5yZWN2Lm9mZnNldDsNCj4gLQl1c2VkID0geGRyX3NrYl9yZWFkX2JpdHMo
ZGVzYywgcCwgbGVuKTsNCj4gLQl0cmFuc3BvcnQtPnJlY3Yub2Zmc2V0ICs9IHVzZWQ7DQo+IC0J
aWYgKHVzZWQgIT0gbGVuKQ0KPiAtCQlyZXR1cm47DQo+IC0JdHJhbnNwb3J0LT5yZWN2LmZsYWdz
ICY9IH5UQ1BfUkNWX0NPUFlfWElEOw0KPiAtCXRyYW5zcG9ydC0+cmVjdi5mbGFncyB8PSBUQ1Bf
UkNWX1JFQURfQ0FMTERJUjsNCj4gLQl0cmFuc3BvcnQtPnJlY3YuY29waWVkID0gNDsNCj4gLQlk
cHJpbnRrKCJSUEM6ICAgICAgIHJlYWRpbmcgJXMgWElEICUwOHhcbiIsDQo+IC0JCQkodHJhbnNw
b3J0LT5yZWN2LmZsYWdzICYgVENQX1JQQ19SRVBMWSkgPw0KPiAicmVwbHkgZm9yIg0KPiAtCQkJ
CQkJCSAgICAgIDoNCj4gInJlcXVlc3Qgd2l0aCIsDQo+IC0JCQludG9obCh0cmFuc3BvcnQtPnJl
Y3YueGlkKSk7DQo+IC0JeHNfdGNwX2NoZWNrX2ZyYWdoZHIodHJhbnNwb3J0KTsNCj4gLX0NCj4g
LQ0KPiAtc3RhdGljIGlubGluZSB2b2lkIHhzX3RjcF9yZWFkX2NhbGxkaXIoc3RydWN0IHNvY2tf
eHBydCAqdHJhbnNwb3J0LA0KPiAtCQkJCSAgICAgICBzdHJ1Y3QgeGRyX3NrYl9yZWFkZXIgKmRl
c2MpDQo+IC17DQo+IC0Jc2l6ZV90IGxlbiwgdXNlZDsNCj4gLQl1MzIgb2Zmc2V0Ow0KPiAtCWNo
YXIgKnA7DQo+IC0NCj4gLQkvKg0KPiAtCSAqIFdlIHdhbnQgdHJhbnNwb3J0LT5yZWN2Lm9mZnNl
dCB0byBiZSA4IGF0IHRoZSBlbmQgb2YgdGhpcw0KPiByb3V0aW5lDQo+IC0JICogKDQgYnl0ZXMg
Zm9yIHRoZSB4aWQgYW5kIDQgYnl0ZXMgZm9yIHRoZSBjYWxsL3JlcGx5IGZsYWcpLg0KPiAtCSAq
IFdoZW4gdGhpcyBmdW5jdGlvbiBpcyBjYWxsZWQgZm9yIHRoZSBmaXJzdCB0aW1lLA0KPiAtCSAq
IHRyYW5zcG9ydC0+cmVjdi5vZmZzZXQgaXMgNCAoYWZ0ZXIgaGF2aW5nIGFscmVhZHkgcmVhZCB0
aGUNCj4geGlkKS4NCj4gLQkgKi8NCj4gLQlvZmZzZXQgPSB0cmFuc3BvcnQtPnJlY3Yub2Zmc2V0
IC0gc2l6ZW9mKHRyYW5zcG9ydC0+cmVjdi54aWQpOw0KPiAtCWxlbiA9IHNpemVvZih0cmFuc3Bv
cnQtPnJlY3YuY2FsbGRpcikgLSBvZmZzZXQ7DQo+IC0JZHByaW50aygiUlBDOiAgICAgICByZWFk
aW5nIENBTEwvUkVQTFkgZmxhZyAoJXp1IGJ5dGVzKVxuIiwNCj4gbGVuKTsNCj4gLQlwID0gKChj
aGFyICopICZ0cmFuc3BvcnQtPnJlY3YuY2FsbGRpcikgKyBvZmZzZXQ7DQo+IC0JdXNlZCA9IHhk
cl9za2JfcmVhZF9iaXRzKGRlc2MsIHAsIGxlbik7DQo+IC0JdHJhbnNwb3J0LT5yZWN2Lm9mZnNl
dCArPSB1c2VkOw0KPiAtCWlmICh1c2VkICE9IGxlbikNCj4gLQkJcmV0dXJuOw0KPiAtCXRyYW5z
cG9ydC0+cmVjdi5mbGFncyAmPSB+VENQX1JDVl9SRUFEX0NBTExESVI7DQo+IC0JLyoNCj4gLQkg
KiBXZSBkb24ndCB5ZXQgaGF2ZSB0aGUgWERSIGJ1ZmZlciwgc28gd2Ugd2lsbCB3cml0ZSB0aGUN
Cj4gY2FsbGRpcg0KPiAtCSAqIG91dCBhZnRlciB3ZSBnZXQgdGhlIGJ1ZmZlciBmcm9tIHRoZSAn
c3RydWN0IHJwY19ycXN0Jw0KPiAtCSAqLw0KPiAtCXN3aXRjaCAobnRvaGwodHJhbnNwb3J0LT5y
ZWN2LmNhbGxkaXIpKSB7DQo+IC0JY2FzZSBSUENfUkVQTFk6DQo+IC0JCXRyYW5zcG9ydC0+cmVj
di5mbGFncyB8PSBUQ1BfUkNWX0NPUFlfQ0FMTERJUjsNCj4gLQkJdHJhbnNwb3J0LT5yZWN2LmZs
YWdzIHw9IFRDUF9SQ1ZfQ09QWV9EQVRBOw0KPiAtCQl0cmFuc3BvcnQtPnJlY3YuZmxhZ3MgfD0g
VENQX1JQQ19SRVBMWTsNCj4gLQkJYnJlYWs7DQo+IC0JY2FzZSBSUENfQ0FMTDoNCj4gLQkJdHJh
bnNwb3J0LT5yZWN2LmZsYWdzIHw9IFRDUF9SQ1ZfQ09QWV9DQUxMRElSOw0KPiAtCQl0cmFuc3Bv
cnQtPnJlY3YuZmxhZ3MgfD0gVENQX1JDVl9DT1BZX0RBVEE7DQo+IC0JCXRyYW5zcG9ydC0+cmVj
di5mbGFncyAmPSB+VENQX1JQQ19SRVBMWTsNCj4gLQkJYnJlYWs7DQo+IC0JZGVmYXVsdDoNCj4g
LQkJZHByaW50aygiUlBDOiAgICAgICBpbnZhbGlkIHJlcXVlc3QgbWVzc2FnZSB0eXBlXG4iKTsN
Cj4gLQkJeHNfdGNwX2ZvcmNlX2Nsb3NlKCZ0cmFuc3BvcnQtPnhwcnQpOw0KPiAtCX0NCj4gLQl4
c190Y3BfY2hlY2tfZnJhZ2hkcih0cmFuc3BvcnQpOw0KPiAtfQ0KPiAtDQo+IC1zdGF0aWMgaW5s
aW5lIHZvaWQgeHNfdGNwX3JlYWRfY29tbW9uKHN0cnVjdCBycGNfeHBydCAqeHBydCwNCj4gLQkJ
CQkgICAgIHN0cnVjdCB4ZHJfc2tiX3JlYWRlciAqZGVzYywNCj4gLQkJCQkgICAgIHN0cnVjdCBy
cGNfcnFzdCAqcmVxKQ0KPiAtew0KPiAtCXN0cnVjdCBzb2NrX3hwcnQgKnRyYW5zcG9ydCA9DQo+
IC0JCQkJY29udGFpbmVyX29mKHhwcnQsIHN0cnVjdCBzb2NrX3hwcnQsDQo+IHhwcnQpOw0KPiAt
CXN0cnVjdCB4ZHJfYnVmICpyY3ZidWY7DQo+IC0Jc2l6ZV90IGxlbjsNCj4gLQlzc2l6ZV90IHI7
DQo+IC0NCj4gLQlyY3ZidWYgPSAmcmVxLT5ycV9wcml2YXRlX2J1ZjsNCj4gLQ0KPiAtCWlmICh0
cmFuc3BvcnQtPnJlY3YuZmxhZ3MgJiBUQ1BfUkNWX0NPUFlfQ0FMTERJUikgew0KPiAtCQkvKg0K
PiAtCQkgKiBTYXZlIHRoZSBSUEMgZGlyZWN0aW9uIGluIHRoZSBYRFIgYnVmZmVyDQo+IC0JCSAq
Lw0KPiAtCQltZW1jcHkocmN2YnVmLT5oZWFkWzBdLmlvdl9iYXNlICsgdHJhbnNwb3J0LQ0KPiA+
cmVjdi5jb3BpZWQsDQo+IC0JCQkmdHJhbnNwb3J0LT5yZWN2LmNhbGxkaXIsDQo+IC0JCQlzaXpl
b2YodHJhbnNwb3J0LT5yZWN2LmNhbGxkaXIpKTsNCj4gLQkJdHJhbnNwb3J0LT5yZWN2LmNvcGll
ZCArPSBzaXplb2YodHJhbnNwb3J0LQ0KPiA+cmVjdi5jYWxsZGlyKTsNCj4gLQkJdHJhbnNwb3J0
LT5yZWN2LmZsYWdzICY9IH5UQ1BfUkNWX0NPUFlfQ0FMTERJUjsNCj4gLQl9DQo+IC0NCj4gLQls
ZW4gPSBkZXNjLT5jb3VudDsNCj4gLQlpZiAobGVuID4gdHJhbnNwb3J0LT5yZWN2LmxlbiAtIHRy
YW5zcG9ydC0+cmVjdi5vZmZzZXQpDQo+IC0JCWRlc2MtPmNvdW50ID0gdHJhbnNwb3J0LT5yZWN2
LmxlbiAtIHRyYW5zcG9ydC0NCj4gPnJlY3Yub2Zmc2V0Ow0KPiAtCXIgPSB4ZHJfcGFydGlhbF9j
b3B5X2Zyb21fc2tiKHJjdmJ1ZiwgdHJhbnNwb3J0LT5yZWN2LmNvcGllZCwNCj4gLQkJCQkJICBk
ZXNjLCB4ZHJfc2tiX3JlYWRfYml0cyk7DQo+IC0NCj4gLQlpZiAoZGVzYy0+Y291bnQpIHsNCj4g
LQkJLyogRXJyb3Igd2hlbiBjb3B5aW5nIHRvIHRoZSByZWNlaXZlIGJ1ZmZlciwNCj4gLQkJICog
dXN1YWxseSBiZWNhdXNlIHdlIHdlcmVuJ3QgYWJsZSB0byBhbGxvY2F0ZQ0KPiAtCQkgKiBhZGRp
dGlvbmFsIGJ1ZmZlciBwYWdlcy4gQWxsIHdlIGNhbiBkbyBub3cNCj4gLQkJICogaXMgdHVybiBv
ZmYgVENQX1JDVl9DT1BZX0RBVEEsIHNvIHRoZSByZXF1ZXN0DQo+IC0JCSAqIHdpbGwgbm90IHJl
Y2VpdmUgYW55IGFkZGl0aW9uYWwgdXBkYXRlcywNCj4gLQkJICogYW5kIHRpbWUgb3V0Lg0KPiAt
CQkgKiBBbnkgcmVtYWluaW5nIGRhdGEgZnJvbSB0aGlzIHJlY29yZCB3aWxsDQo+IC0JCSAqIGJl
IGRpc2NhcmRlZC4NCj4gLQkJICovDQo+IC0JCXRyYW5zcG9ydC0+cmVjdi5mbGFncyAmPSB+VENQ
X1JDVl9DT1BZX0RBVEE7DQo+IC0JCWRwcmludGsoIlJQQzogICAgICAgWElEICUwOHggdHJ1bmNh
dGVkIHJlcXVlc3RcbiIsDQo+IC0JCQkJbnRvaGwodHJhbnNwb3J0LT5yZWN2LnhpZCkpOw0KPiAt
CQlkcHJpbnRrKCJSUEM6ICAgICAgIHhwcnQgPSAlcCwgcmVjdi5jb3BpZWQgPSAlbHUsICINCj4g
LQkJCQkicmVjdi5vZmZzZXQgPSAldSwgcmVjdi5sZW4gPSAldVxuIiwNCj4gLQkJCQl4cHJ0LCB0
cmFuc3BvcnQtPnJlY3YuY29waWVkLA0KPiAtCQkJCXRyYW5zcG9ydC0+cmVjdi5vZmZzZXQsIHRy
YW5zcG9ydC0NCj4gPnJlY3YubGVuKTsNCj4gLQkJcmV0dXJuOw0KPiAtCX0NCj4gLQ0KPiAtCXRy
YW5zcG9ydC0+cmVjdi5jb3BpZWQgKz0gcjsNCj4gLQl0cmFuc3BvcnQtPnJlY3Yub2Zmc2V0ICs9
IHI7DQo+IC0JZGVzYy0+Y291bnQgPSBsZW4gLSByOw0KPiAtDQo+IC0JZHByaW50aygiUlBDOiAg
ICAgICBYSUQgJTA4eCByZWFkICV6ZCBieXRlc1xuIiwNCj4gLQkJCW50b2hsKHRyYW5zcG9ydC0+
cmVjdi54aWQpLCByKTsNCj4gLQlkcHJpbnRrKCJSUEM6ICAgICAgIHhwcnQgPSAlcCwgcmVjdi5j
b3BpZWQgPSAlbHUsIHJlY3Yub2Zmc2V0ID0NCj4gJXUsICINCj4gLQkJCSJyZWN2LmxlbiA9ICV1
XG4iLCB4cHJ0LCB0cmFuc3BvcnQtDQo+ID5yZWN2LmNvcGllZCwNCj4gLQkJCXRyYW5zcG9ydC0+
cmVjdi5vZmZzZXQsIHRyYW5zcG9ydC0+cmVjdi5sZW4pOw0KPiAtDQo+IC0JaWYgKHRyYW5zcG9y
dC0+cmVjdi5jb3BpZWQgPT0gcmVxLT5ycV9wcml2YXRlX2J1Zi5idWZsZW4pDQo+IC0JCXRyYW5z
cG9ydC0+cmVjdi5mbGFncyAmPSB+VENQX1JDVl9DT1BZX0RBVEE7DQo+IC0JZWxzZSBpZiAodHJh
bnNwb3J0LT5yZWN2Lm9mZnNldCA9PSB0cmFuc3BvcnQtPnJlY3YubGVuKSB7DQo+IC0JCWlmICh0
cmFuc3BvcnQtPnJlY3YuZmxhZ3MgJiBUQ1BfUkNWX0xBU1RfRlJBRykNCj4gLQkJCXRyYW5zcG9y
dC0+cmVjdi5mbGFncyAmPSB+VENQX1JDVl9DT1BZX0RBVEE7DQo+IC0JfQ0KPiAtfQ0KPiAtDQo+
IC0vKg0KPiAtICogRmluZHMgdGhlIHJlcXVlc3QgY29ycmVzcG9uZGluZyB0byB0aGUgUlBDIHhp
ZCBhbmQgaW52b2tlcyB0aGUNCj4gY29tbW9uDQo+IC0gKiB0Y3AgcmVhZCBjb2RlIHRvIHJlYWQg
dGhlIGRhdGEuDQo+IC0gKi8NCj4gLXN0YXRpYyBpbmxpbmUgaW50IHhzX3RjcF9yZWFkX3JlcGx5
KHN0cnVjdCBycGNfeHBydCAqeHBydCwNCj4gLQkJCQkgICAgc3RydWN0IHhkcl9za2JfcmVhZGVy
ICpkZXNjKQ0KPiAtew0KPiAtCXN0cnVjdCBzb2NrX3hwcnQgKnRyYW5zcG9ydCA9DQo+IC0JCQkJ
Y29udGFpbmVyX29mKHhwcnQsIHN0cnVjdCBzb2NrX3hwcnQsDQo+IHhwcnQpOw0KPiAtCXN0cnVj
dCBycGNfcnFzdCAqcmVxOw0KPiAtDQo+IC0JZHByaW50aygiUlBDOiAgICAgICByZWFkIHJlcGx5
IFhJRCAlMDh4XG4iLCBudG9obCh0cmFuc3BvcnQtDQo+ID5yZWN2LnhpZCkpOw0KPiAtDQo+IC0J
LyogRmluZCBhbmQgbG9jayB0aGUgcmVxdWVzdCBjb3JyZXNwb25kaW5nIHRvIHRoaXMgeGlkICov
DQo+IC0Jc3Bpbl9sb2NrKCZ4cHJ0LT5xdWV1ZV9sb2NrKTsNCj4gLQlyZXEgPSB4cHJ0X2xvb2t1
cF9ycXN0KHhwcnQsIHRyYW5zcG9ydC0+cmVjdi54aWQpOw0KPiAtCWlmICghcmVxKSB7DQo+IC0J
CWRwcmludGsoIlJQQzogICAgICAgWElEICUwOHggcmVxdWVzdCBub3QgZm91bmQhXG4iLA0KPiAt
CQkJCW50b2hsKHRyYW5zcG9ydC0+cmVjdi54aWQpKTsNCj4gLQkJc3Bpbl91bmxvY2soJnhwcnQt
PnF1ZXVlX2xvY2spOw0KPiAtCQlyZXR1cm4gLTE7DQo+IC0JfQ0KPiAtCXhwcnRfcGluX3Jxc3Qo
cmVxKTsNCj4gLQlzcGluX3VubG9jaygmeHBydC0+cXVldWVfbG9jayk7DQo+IC0NCj4gLQl4c190
Y3BfcmVhZF9jb21tb24oeHBydCwgZGVzYywgcmVxKTsNCj4gLQ0KPiAtCXNwaW5fbG9jaygmeHBy
dC0+cXVldWVfbG9jayk7DQo+IC0JaWYgKCEodHJhbnNwb3J0LT5yZWN2LmZsYWdzICYgVENQX1JD
Vl9DT1BZX0RBVEEpKQ0KPiAtCQl4cHJ0X2NvbXBsZXRlX3Jxc3QocmVxLT5ycV90YXNrLCB0cmFu
c3BvcnQtDQo+ID5yZWN2LmNvcGllZCk7DQo+IC0JeHBydF91bnBpbl9ycXN0KHJlcSk7DQo+IC0J
c3Bpbl91bmxvY2soJnhwcnQtPnF1ZXVlX2xvY2spOw0KPiAtCXJldHVybiAwOw0KPiAtfQ0KPiAt
DQo+ICAjaWYgZGVmaW5lZChDT05GSUdfU1VOUlBDX0JBQ0tDSEFOTkVMKQ0KPiAtLyoNCj4gLSAq
IE9idGFpbnMgYW4gcnBjX3Jxc3QgcHJldmlvdXNseSBhbGxvY2F0ZWQgYW5kIGludm9rZXMgdGhl
IGNvbW1vbg0KPiAtICogdGNwIHJlYWQgY29kZSB0byByZWFkIHRoZSBkYXRhLiAgVGhlIHJlc3Vs
dCBpcyBwbGFjZWQgaW4gdGhlDQo+IGNhbGxiYWNrDQo+IC0gKiBxdWV1ZS4NCj4gLSAqIElmIHdl
J3JlIHVuYWJsZSB0byBvYnRhaW4gdGhlIHJwY19ycXN0IHdlIHNjaGVkdWxlIHRoZSBjbG9zaW5n
IG9mDQo+IHRoZQ0KPiAtICogY29ubmVjdGlvbiBhbmQgcmV0dXJuIC0xLg0KPiAtICovDQo+IC1z
dGF0aWMgaW50IHhzX3RjcF9yZWFkX2NhbGxiYWNrKHN0cnVjdCBycGNfeHBydCAqeHBydCwNCj4g
LQkJCQkgICAgICAgc3RydWN0IHhkcl9za2JfcmVhZGVyICpkZXNjKQ0KPiAtew0KPiAtCXN0cnVj
dCBzb2NrX3hwcnQgKnRyYW5zcG9ydCA9DQo+IC0JCQkJY29udGFpbmVyX29mKHhwcnQsIHN0cnVj
dCBzb2NrX3hwcnQsDQo+IHhwcnQpOw0KPiAtCXN0cnVjdCBycGNfcnFzdCAqcmVxOw0KPiAtDQo+
IC0JLyogTG9vayB1cCB0aGUgcmVxdWVzdCBjb3JyZXNwb25kaW5nIHRvIHRoZSBnaXZlbiBYSUQg
Ki8NCj4gLQlyZXEgPSB4cHJ0X2xvb2t1cF9iY19yZXF1ZXN0KHhwcnQsIHRyYW5zcG9ydC0+cmVj
di54aWQpOw0KPiAtCWlmIChyZXEgPT0gTlVMTCkgew0KPiAtCQlwcmludGsoS0VSTl9XQVJOSU5H
ICJDYWxsYmFjayBzbG90IHRhYmxlDQo+IG92ZXJmbG93ZWRcbiIpOw0KPiAtCQl4cHJ0X2ZvcmNl
X2Rpc2Nvbm5lY3QoeHBydCk7DQo+IC0JCXJldHVybiAtMTsNCj4gLQl9DQo+IC0NCj4gLQlkcHJp
bnRrKCJSUEM6ICAgICAgIHJlYWQgY2FsbGJhY2sgIFhJRCAlMDh4XG4iLCBudG9obChyZXEtDQo+
ID5ycV94aWQpKTsNCj4gLQl4c190Y3BfcmVhZF9jb21tb24oeHBydCwgZGVzYywgcmVxKTsNCj4g
LQ0KPiAtCWlmICghKHRyYW5zcG9ydC0+cmVjdi5mbGFncyAmIFRDUF9SQ1ZfQ09QWV9EQVRBKSkN
Cj4gLQkJeHBydF9jb21wbGV0ZV9iY19yZXF1ZXN0KHJlcSwgdHJhbnNwb3J0LT5yZWN2LmNvcGll
ZCk7DQo+IC0NCj4gLQlyZXR1cm4gMDsNCj4gLX0NCj4gLQ0KPiAtc3RhdGljIGlubGluZSBpbnQg
X3hzX3RjcF9yZWFkX2RhdGEoc3RydWN0IHJwY194cHJ0ICp4cHJ0LA0KPiAtCQkJCQlzdHJ1Y3Qg
eGRyX3NrYl9yZWFkZXIgKmRlc2MpDQo+IC17DQo+IC0Jc3RydWN0IHNvY2tfeHBydCAqdHJhbnNw
b3J0ID0NCj4gLQkJCQljb250YWluZXJfb2YoeHBydCwgc3RydWN0IHNvY2tfeHBydCwNCj4geHBy
dCk7DQo+IC0NCj4gLQlyZXR1cm4gKHRyYW5zcG9ydC0+cmVjdi5mbGFncyAmIFRDUF9SUENfUkVQ
TFkpID8NCj4gLQkJeHNfdGNwX3JlYWRfcmVwbHkoeHBydCwgZGVzYykgOg0KPiAtCQl4c190Y3Bf
cmVhZF9jYWxsYmFjayh4cHJ0LCBkZXNjKTsNCj4gLX0NCj4gLQ0KPiAgc3RhdGljIGludCB4c190
Y3BfYmNfdXAoc3RydWN0IHN2Y19zZXJ2ICpzZXJ2LCBzdHJ1Y3QgbmV0ICpuZXQpDQo+ICB7DQo+
ICAJaW50IHJldDsNCj4gQEAgLTE0MjksMTA2ICsxNDkzLDE0IEBAIHN0YXRpYyBzaXplX3QgeHNf
dGNwX2JjX21heHBheWxvYWQoc3RydWN0DQo+IHJwY194cHJ0ICp4cHJ0KQ0KPiAgew0KPiAgCXJl
dHVybiBQQUdFX1NJWkU7DQo+ICB9DQo+IC0jZWxzZQ0KPiAtc3RhdGljIGlubGluZSBpbnQgX3hz
X3RjcF9yZWFkX2RhdGEoc3RydWN0IHJwY194cHJ0ICp4cHJ0LA0KPiAtCQkJCQlzdHJ1Y3QgeGRy
X3NrYl9yZWFkZXIgKmRlc2MpDQo+IC17DQo+IC0JcmV0dXJuIHhzX3RjcF9yZWFkX3JlcGx5KHhw
cnQsIGRlc2MpOw0KPiAtfQ0KPiAgI2VuZGlmIC8qIENPTkZJR19TVU5SUENfQkFDS0NIQU5ORUwg
Ki8NCj4gIA0KPiAtLyoNCj4gLSAqIFJlYWQgZGF0YSBvZmYgdGhlIHRyYW5zcG9ydC4gIFRoaXMg
Y2FuIGJlIGVpdGhlciBhbiBSUENfQ0FMTCBvcg0KPiBhbg0KPiAtICogUlBDX1JFUExZLiAgUmVs
YXkgdGhlIHByb2Nlc3NpbmcgdG8gaGVscGVyIGZ1bmN0aW9ucy4NCj4gLSAqLw0KPiAtc3RhdGlj
IHZvaWQgeHNfdGNwX3JlYWRfZGF0YShzdHJ1Y3QgcnBjX3hwcnQgKnhwcnQsDQo+IC0JCQkJICAg
IHN0cnVjdCB4ZHJfc2tiX3JlYWRlciAqZGVzYykNCj4gLXsNCj4gLQlzdHJ1Y3Qgc29ja194cHJ0
ICp0cmFuc3BvcnQgPQ0KPiAtCQkJCWNvbnRhaW5lcl9vZih4cHJ0LCBzdHJ1Y3Qgc29ja194cHJ0
LA0KPiB4cHJ0KTsNCj4gLQ0KPiAtCWlmIChfeHNfdGNwX3JlYWRfZGF0YSh4cHJ0LCBkZXNjKSA9
PSAwKQ0KPiAtCQl4c190Y3BfY2hlY2tfZnJhZ2hkcih0cmFuc3BvcnQpOw0KPiAtCWVsc2Ugew0K
PiAtCQkvKg0KPiAtCQkgKiBUaGUgdHJhbnNwb3J0X2xvY2sgcHJvdGVjdHMgdGhlIHJlcXVlc3Qg
aGFuZGxpbmcuDQo+IC0JCSAqIFRoZXJlJ3Mgbm8gbmVlZCB0byBob2xkIGl0IHRvIHVwZGF0ZSB0
aGUgcmVjdi5mbGFncy4NCj4gLQkJICovDQo+IC0JCXRyYW5zcG9ydC0+cmVjdi5mbGFncyAmPSB+
VENQX1JDVl9DT1BZX0RBVEE7DQo+IC0JfQ0KPiAtfQ0KPiAtDQo+IC1zdGF0aWMgaW5saW5lIHZv
aWQgeHNfdGNwX3JlYWRfZGlzY2FyZChzdHJ1Y3Qgc29ja194cHJ0ICp0cmFuc3BvcnQsDQo+IHN0
cnVjdCB4ZHJfc2tiX3JlYWRlciAqZGVzYykNCj4gLXsNCj4gLQlzaXplX3QgbGVuOw0KPiAtDQo+
IC0JbGVuID0gdHJhbnNwb3J0LT5yZWN2LmxlbiAtIHRyYW5zcG9ydC0+cmVjdi5vZmZzZXQ7DQo+
IC0JaWYgKGxlbiA+IGRlc2MtPmNvdW50KQ0KPiAtCQlsZW4gPSBkZXNjLT5jb3VudDsNCj4gLQlk
ZXNjLT5jb3VudCAtPSBsZW47DQo+IC0JZGVzYy0+b2Zmc2V0ICs9IGxlbjsNCj4gLQl0cmFuc3Bv
cnQtPnJlY3Yub2Zmc2V0ICs9IGxlbjsNCj4gLQlkcHJpbnRrKCJSUEM6ICAgICAgIGRpc2NhcmRl
ZCAlenUgYnl0ZXNcbiIsIGxlbik7DQo+IC0JeHNfdGNwX2NoZWNrX2ZyYWdoZHIodHJhbnNwb3J0
KTsNCj4gLX0NCj4gLQ0KPiAtc3RhdGljIGludCB4c190Y3BfZGF0YV9yZWN2KHJlYWRfZGVzY3Jp
cHRvcl90ICpyZF9kZXNjLCBzdHJ1Y3QNCj4gc2tfYnVmZiAqc2tiLCB1bnNpZ25lZCBpbnQgb2Zm
c2V0LCBzaXplX3QgbGVuKQ0KPiAtew0KPiAtCXN0cnVjdCBycGNfeHBydCAqeHBydCA9IHJkX2Rl
c2MtPmFyZy5kYXRhOw0KPiAtCXN0cnVjdCBzb2NrX3hwcnQgKnRyYW5zcG9ydCA9IGNvbnRhaW5l
cl9vZih4cHJ0LCBzdHJ1Y3QNCj4gc29ja194cHJ0LCB4cHJ0KTsNCj4gLQlzdHJ1Y3QgeGRyX3Nr
Yl9yZWFkZXIgZGVzYyA9IHsNCj4gLQkJLnNrYgk9IHNrYiwNCj4gLQkJLm9mZnNldAk9IG9mZnNl
dCwNCj4gLQkJLmNvdW50CT0gbGVuLA0KPiAtCX07DQo+IC0Jc2l6ZV90IHJldDsNCj4gLQ0KPiAt
CWRwcmludGsoIlJQQzogICAgICAgeHNfdGNwX2RhdGFfcmVjdiBzdGFydGVkXG4iKTsNCj4gLQlk
byB7DQo+IC0JCXRyYWNlX3hzX3RjcF9kYXRhX3JlY3YodHJhbnNwb3J0KTsNCj4gLQkJLyogUmVh
ZCBpbiBhIG5ldyBmcmFnbWVudCBtYXJrZXIgaWYgbmVjZXNzYXJ5ICovDQo+IC0JCS8qIENhbiB3
ZSBldmVyIHJlYWxseSBleHBlY3QgdG8gZ2V0IGNvbXBsZXRlbHkgZW1wdHkNCj4gZnJhZ21lbnRz
PyAqLw0KPiAtCQlpZiAodHJhbnNwb3J0LT5yZWN2LmZsYWdzICYgVENQX1JDVl9DT1BZX0ZSQUdI
RFIpIHsNCj4gLQkJCXhzX3RjcF9yZWFkX2ZyYWdoZHIoeHBydCwgJmRlc2MpOw0KPiAtCQkJY29u
dGludWU7DQo+IC0JCX0NCj4gLQkJLyogUmVhZCBpbiB0aGUgeGlkIGlmIG5lY2Vzc2FyeSAqLw0K
PiAtCQlpZiAodHJhbnNwb3J0LT5yZWN2LmZsYWdzICYgVENQX1JDVl9DT1BZX1hJRCkgew0KPiAt
CQkJeHNfdGNwX3JlYWRfeGlkKHRyYW5zcG9ydCwgJmRlc2MpOw0KPiAtCQkJY29udGludWU7DQo+
IC0JCX0NCj4gLQkJLyogUmVhZCBpbiB0aGUgY2FsbC9yZXBseSBmbGFnICovDQo+IC0JCWlmICh0
cmFuc3BvcnQtPnJlY3YuZmxhZ3MgJiBUQ1BfUkNWX1JFQURfQ0FMTERJUikgew0KPiAtCQkJeHNf
dGNwX3JlYWRfY2FsbGRpcih0cmFuc3BvcnQsICZkZXNjKTsNCj4gLQkJCWNvbnRpbnVlOw0KPiAt
CQl9DQo+IC0JCS8qIFJlYWQgaW4gdGhlIHJlcXVlc3QgZGF0YSAqLw0KPiAtCQlpZiAodHJhbnNw
b3J0LT5yZWN2LmZsYWdzICYgVENQX1JDVl9DT1BZX0RBVEEpIHsNCj4gLQkJCXhzX3RjcF9yZWFk
X2RhdGEoeHBydCwgJmRlc2MpOw0KPiAtCQkJY29udGludWU7DQo+IC0JCX0NCj4gLQkJLyogU2tp
cCBvdmVyIGFueSB0cmFpbGluZyBieXRlcyBvbiBzaG9ydCByZWFkcyAqLw0KPiAtCQl4c190Y3Bf
cmVhZF9kaXNjYXJkKHRyYW5zcG9ydCwgJmRlc2MpOw0KPiAtCX0gd2hpbGUgKGRlc2MuY291bnQp
Ow0KPiAtCXJldCA9IGxlbiAtIGRlc2MuY291bnQ7DQo+IC0JaWYgKHJldCA8IHJkX2Rlc2MtPmNv
dW50KQ0KPiAtCQlyZF9kZXNjLT5jb3VudCAtPSByZXQ7DQo+IC0JZWxzZQ0KPiAtCQlyZF9kZXNj
LT5jb3VudCA9IDA7DQo+IC0JdHJhY2VfeHNfdGNwX2RhdGFfcmVjdih0cmFuc3BvcnQpOw0KPiAt
CWRwcmludGsoIlJQQzogICAgICAgeHNfdGNwX2RhdGFfcmVjdiBkb25lXG4iKTsNCj4gLQlyZXR1
cm4gcmV0Ow0KPiAtfQ0KPiAtDQo+ICBzdGF0aWMgdm9pZCB4c190Y3BfZGF0YV9yZWNlaXZlKHN0
cnVjdCBzb2NrX3hwcnQgKnRyYW5zcG9ydCkNCj4gIHsNCj4gIAlzdHJ1Y3QgcnBjX3hwcnQgKnhw
cnQgPSAmdHJhbnNwb3J0LT54cHJ0Ow0KPiAgCXN0cnVjdCBzb2NrICpzazsNCj4gLQlyZWFkX2Rl
c2NyaXB0b3JfdCByZF9kZXNjID0gew0KPiAtCQkuYXJnLmRhdGEgPSB4cHJ0LA0KPiAtCX07DQo+
IC0JdW5zaWduZWQgbG9uZyB0b3RhbCA9IDA7DQo+IC0JaW50IHJlYWQgPSAwOw0KPiArCXNpemVf
dCByZWFkID0gMDsNCj4gKwlzc2l6ZV90IHJldCA9IDA7DQo+ICANCj4gIHJlc3RhcnQ6DQo+ICAJ
bXV0ZXhfbG9jaygmdHJhbnNwb3J0LT5yZWN2X211dGV4KTsNCj4gQEAgLTE1MzYsMTggKzE1MDgs
MTIgQEAgc3RhdGljIHZvaWQgeHNfdGNwX2RhdGFfcmVjZWl2ZShzdHJ1Y3QNCj4gc29ja194cHJ0
ICp0cmFuc3BvcnQpDQo+ICAJaWYgKHNrID09IE5VTEwpDQo+ICAJCWdvdG8gb3V0Ow0KPiAgDQo+
IC0JLyogV2UgdXNlIHJkX2Rlc2MgdG8gcGFzcyBzdHJ1Y3QgeHBydCB0byB4c190Y3BfZGF0YV9y
ZWN2ICovDQo+ICAJZm9yICg7Oykgew0KPiAtCQlyZF9kZXNjLmNvdW50ID0gUlBDX1RDUF9SRUFE
X0NIVU5LX1NaOw0KPiAtCQlsb2NrX3NvY2soc2spOw0KPiAtCQlyZWFkID0gdGNwX3JlYWRfc29j
ayhzaywgJnJkX2Rlc2MsIHhzX3RjcF9kYXRhX3JlY3YpOw0KPiAtCQlpZiAocmRfZGVzYy5jb3Vu
dCAhPSAwIHx8IHJlYWQgPCAwKSB7DQo+IC0JCQljbGVhcl9iaXQoWFBSVF9TT0NLX0RBVEFfUkVB
RFksICZ0cmFuc3BvcnQtDQo+ID5zb2NrX3N0YXRlKTsNCj4gLQkJCXJlbGVhc2Vfc29jayhzayk7
DQo+ICsJCWNsZWFyX2JpdChYUFJUX1NPQ0tfREFUQV9SRUFEWSwgJnRyYW5zcG9ydC0NCj4gPnNv
Y2tfc3RhdGUpOw0KPiArCQlyZXQgPSB4c19yZWFkX3N0cmVhbSh0cmFuc3BvcnQsIE1TR19ET05U
V0FJVCB8DQo+IE1TR19OT1NJR05BTCk7DQo+ICsJCWlmIChyZXQgPCAwKQ0KPiAgCQkJYnJlYWs7
DQo+IC0JCX0NCj4gLQkJcmVsZWFzZV9zb2NrKHNrKTsNCj4gLQkJdG90YWwgKz0gcmVhZDsNCj4g
KwkJcmVhZCArPSByZXQ7DQo+ICAJCWlmIChuZWVkX3Jlc2NoZWQoKSkgew0KPiAgCQkJbXV0ZXhf
dW5sb2NrKCZ0cmFuc3BvcnQtPnJlY3ZfbXV0ZXgpOw0KPiAgCQkJY29uZF9yZXNjaGVkKCk7DQo+
IEBAIC0xNTU4LDcgKzE1MjQsNyBAQCBzdGF0aWMgdm9pZCB4c190Y3BfZGF0YV9yZWNlaXZlKHN0
cnVjdA0KPiBzb2NrX3hwcnQgKnRyYW5zcG9ydCkNCj4gIAkJcXVldWVfd29yayh4cHJ0aW9kX3dv
cmtxdWV1ZSwgJnRyYW5zcG9ydC0+cmVjdl93b3JrZXIpOw0KPiAgb3V0Og0KPiAgCW11dGV4X3Vu
bG9jaygmdHJhbnNwb3J0LT5yZWN2X211dGV4KTsNCj4gLQl0cmFjZV94c190Y3BfZGF0YV9yZWFk
eSh4cHJ0LCByZWFkLCB0b3RhbCk7DQo+ICsJdHJhY2VfeHNfdGNwX2RhdGFfcmVhZHkoeHBydCwg
cmV0LCByZWFkKTsNCj4gIH0NCj4gIA0KPiAgc3RhdGljIHZvaWQgeHNfdGNwX2RhdGFfcmVjZWl2
ZV93b3JrZm4oc3RydWN0IHdvcmtfc3RydWN0ICp3b3JrKQ0KPiBAQCAtMjM4MCw3ICsyMzQ2LDYg
QEAgc3RhdGljIGludCB4c190Y3BfZmluaXNoX2Nvbm5lY3Rpbmcoc3RydWN0DQo+IHJwY194cHJ0
ICp4cHJ0LCBzdHJ1Y3Qgc29ja2V0ICpzb2NrKQ0KPiAgCXRyYW5zcG9ydC0+cmVjdi5vZmZzZXQg
PSAwOw0KPiAgCXRyYW5zcG9ydC0+cmVjdi5sZW4gPSAwOw0KPiAgCXRyYW5zcG9ydC0+cmVjdi5j
b3BpZWQgPSAwOw0KPiAtCXRyYW5zcG9ydC0+cmVjdi5mbGFncyA9IFRDUF9SQ1ZfQ09QWV9GUkFH
SERSIHwNCj4gVENQX1JDVl9DT1BZX1hJRDsNCj4gIAl0cmFuc3BvcnQtPnhtaXQub2Zmc2V0ID0g
MDsNCj4gIA0KPiAgCS8qIFRlbGwgdGhlIHNvY2tldCBsYXllciB0byBzdGFydCBjb25uZWN0aW5n
Li4uICovDQo+IEBAIC0yODAyLDYgKzI3NjcsNyBAQCBzdGF0aWMgY29uc3Qgc3RydWN0IHJwY194
cHJ0X29wcyB4c190Y3Bfb3BzID0gew0KPiAgCS5jb25uZWN0CQk9IHhzX2Nvbm5lY3QsDQo+ICAJ
LmJ1Zl9hbGxvYwkJPSBycGNfbWFsbG9jLA0KPiAgCS5idWZfZnJlZQkJPSBycGNfZnJlZSwNCj4g
KwkucHJlcGFyZV9yZXF1ZXN0CT0geHNfc3RyZWFtX3ByZXBhcmVfcmVxdWVzdCwNCj4gIAkuc2Vu
ZF9yZXF1ZXN0CQk9IHhzX3RjcF9zZW5kX3JlcXVlc3QsDQo+ICAJLnNldF9yZXRyYW5zX3RpbWVv
dXQJPSB4cHJ0X3NldF9yZXRyYW5zX3RpbWVvdXRfZGVmLA0KPiAgCS5jbG9zZQkJCT0geHNfdGNw
X3NodXRkb3duLA0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRh
aW5lciwgSGFtbWVyc3BhY2UNCnRyb25kLm15a2xlYnVzdEBoYW1tZXJzcGFjZS5jb20NCg0KDQo=

2018-09-19 02:35:49

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code

Hi Trond,

I'm seeing this crash while running cthon tests (on any NFS version) after
applying this patch:

[ 50.780104] general protection fault: 0000 [#1] PREEMPT SMP PTI
[ 50.780796] CPU: 0 PID: 384 Comm: kworker/u5:1 Not tainted 4.19.0-rc4-ANNA+
#7455
[ 50.781601] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 50.782232] Workqueue: xprtiod xs_tcp_data_receive_workfn [sunrpc]
[ 50.782911] RIP: 0010:xprt_lookup_rqst+0x2c/0x150 [sunrpc]
[ 50.783510] Code: 48 8d 97 58 04 00 00 41 54 49 89 fc 55 89 f5 53 48 8b 87 58
04 00 00 48 39 c2 74 26 48 8d 98 48 ff ff ff 3b 70 e0 75 07 eb 3f <39> 68 e0 74
3a 48 8b 83 b8 00 00 00 48 8d 98 48 ff ff ff 48 39 c2
[ 50.785501] RSP: 0018:ffffc90000bebd60 EFLAGS: 00010202
[ 50.786090] RAX: dead000000000100 RBX: dead000000000048 RCX: 0000000000000051
[ 50.786853] RDX: ffff8800b915dc58 RSI: 000000005a1c5631 RDI: ffff8800b915d800
[ 50.787616] RBP: 000000005a1c5631 R08: 0000000000000000 R09: 00646f6974727078
[ 50.788380] R10: 8080808080808080 R11: 00000000000ee5f3 R12: ffff8800b915d800
[ 50.789153] R13: ffff8800b915dc18 R14: ffff8800b915d800 R15: ffffffffa03265b4
[ 50.789930] FS: 0000000000000000(0000) GS:ffff8800bca00000(0000)
knlGS:0000000000000000
[ 50.790797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 50.791416] CR2: 00007f9b670538b0 CR3: 000000000200a001 CR4: 00000000001606f0
[ 50.792182] Call Trace:
[ 50.792471] xs_tcp_data_recv+0x3a6/0x780 [sunrpc]
[ 50.792993] ? __switch_to_asm+0x34/0x70
[ 50.793426] ? xs_tcp_check_fraghdr.part.1+0x40/0x40 [sunrpc]
[ 50.794047] tcp_read_sock+0x93/0x1b0
[ 50.794447] ? __switch_to_asm+0x40/0x70
[ 50.794879] xs_tcp_data_receive_workfn+0xb2/0x190 [sunrpc]
[ 50.795482] process_one_work+0x1e6/0x3c0
[ 50.795928] worker_thread+0x28/0x3c0
[ 50.796337] ? process_one_work+0x3c0/0x3c0
[ 50.796814] kthread+0x10d/0x130
[ 50.797170] ? kthread_park+0x80/0x80
[ 50.797570] ret_from_fork+0x35/0x40
[ 50.797961] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache
cfg80211 rpcrdma rfkill crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel joydev pcbc mousedev aesni_intel psmouse aes_x86_64 evdev
crypto_simd cryptd input_leds glue_helper led_class mac_hid pcspkr intel_agp
intel_gtt i2c_piix4 nfsd button auth_rpcgss nfs_acl lockd grace sunrpc
sch_fq_codel ip_tables x_tables ata_generic pata_acpi ata_piix serio_raw
uhci_hcd atkbd ehci_pci libps2 ehci_hcd libata usbcore usb_common i8042 floppy
serio scsi_mod xfs virtio_balloon virtio_net net_failover failover virtio_pci
virtio_blk virtio_ring virtio


Cheers,
Anna

On Mon, 2018-09-17 at 09:03 -0400, Trond Myklebust wrote:
> Separate out the action of adding a request to the reply queue so that the
> backchannel code can simply skip calling it altogether.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> include/linux/sunrpc/xprt.h | 1 +
> net/sunrpc/backchannel_rqst.c | 1 -
> net/sunrpc/clnt.c | 5 ++
> net/sunrpc/xprt.c | 126 +++++++++++++++++++-----------
> net/sunrpc/xprtrdma/backchannel.c | 1 -
> 5 files changed, 88 insertions(+), 46 deletions(-)
>
> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
> index c25d0a5fda69..0250294c904a 100644
> --- a/include/linux/sunrpc/xprt.h
> +++ b/include/linux/sunrpc/xprt.h
> @@ -334,6 +334,7 @@ void xprt_free_slot(struct rpc_xprt
> *xprt,
> struct rpc_rqst *req);
> void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct
> rpc_task *task);
> bool xprt_prepare_transmit(struct rpc_task *task);
> +void xprt_request_enqueue_receive(struct rpc_task *task);
> void xprt_transmit(struct rpc_task *task);
> void xprt_end_transmit(struct rpc_task *task);
> int xprt_adjust_timeout(struct rpc_rqst *req);
> diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c
> index 3c15a99b9700..fa5ba6ed3197 100644
> --- a/net/sunrpc/backchannel_rqst.c
> +++ b/net/sunrpc/backchannel_rqst.c
> @@ -91,7 +91,6 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt,
> gfp_t gfp_flags)
> return NULL;
>
> req->rq_xprt = xprt;
> - INIT_LIST_HEAD(&req->rq_list);
> INIT_LIST_HEAD(&req->rq_bc_list);
>
> /* Preallocate one XDR receive buffer */
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index a858366cd15d..414966273a3f 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1962,6 +1962,11 @@ call_transmit(struct rpc_task *task)
> return;
> }
> }
> +
> + /* Add task to reply queue before transmission to avoid races */
> + if (rpc_reply_expected(task))
> + xprt_request_enqueue_receive(task);
> +
> if (!xprt_prepare_transmit(task))
> return;
> task->tk_action = call_transmit_status;
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index 6e3d4b4ee79e..d8f870b5dd46 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -888,6 +888,61 @@ static void xprt_wait_on_pinned_rqst(struct rpc_rqst
> *req)
> wait_var_event(&req->rq_pin, !xprt_is_pinned_rqst(req));
> }
>
> +static bool
> +xprt_request_data_received(struct rpc_task *task)
> +{
> + return !test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) &&
> + READ_ONCE(task->tk_rqstp->rq_reply_bytes_recvd) != 0;
> +}
> +
> +static bool
> +xprt_request_need_enqueue_receive(struct rpc_task *task, struct rpc_rqst
> *req)
> +{
> + return !xprt_request_data_received(task);
> +}
> +
> +/**
> + * xprt_request_enqueue_receive - Add an request to the receive queue
> + * @task: RPC task
> + *
> + */
> +void
> +xprt_request_enqueue_receive(struct rpc_task *task)
> +{
> + struct rpc_rqst *req = task->tk_rqstp;
> + struct rpc_xprt *xprt = req->rq_xprt;
> +
> + if (!xprt_request_need_enqueue_receive(task, req))
> + return;
> + spin_lock(&xprt->queue_lock);
> +
> + /* Update the softirq receive buffer */
> + memcpy(&req->rq_private_buf, &req->rq_rcv_buf,
> + sizeof(req->rq_private_buf));
> +
> + /* Add request to the receive list */
> + list_add_tail(&req->rq_list, &xprt->recv);
> + set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate);
> + spin_unlock(&xprt->queue_lock);
> +
> + xprt_reset_majortimeo(req);
> + /* Turn off autodisconnect */
> + del_singleshot_timer_sync(&xprt->timer);
> +}
> +
> +/**
> + * xprt_request_dequeue_receive_locked - Remove a request from the receive
> queue
> + * @task: RPC task
> + *
> + * Caller must hold xprt->queue_lock.
> + */
> +static void
> +xprt_request_dequeue_receive_locked(struct rpc_task *task)
> +{
> + if (test_and_clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate))
> + list_del(&task->tk_rqstp->rq_list);
> +}
> +
> /**
> * xprt_update_rtt - Update RPC RTT statistics
> * @task: RPC request that recently completed
> @@ -927,24 +982,16 @@ void xprt_complete_rqst(struct rpc_task *task, int
> copied)
>
> xprt->stat.recvs++;
>
> - list_del_init(&req->rq_list);
> req->rq_private_buf.len = copied;
> /* Ensure all writes are done before we update */
> /* req->rq_reply_bytes_recvd */
> smp_wmb();
> req->rq_reply_bytes_recvd = copied;
> - clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate);
> + xprt_request_dequeue_receive_locked(task);
> rpc_wake_up_queued_task(&xprt->pending, task);
> }
> EXPORT_SYMBOL_GPL(xprt_complete_rqst);
>
> -static bool
> -xprt_request_data_received(struct rpc_task *task)
> -{
> - return !test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) &&
> - task->tk_rqstp->rq_reply_bytes_recvd != 0;
> -}
> -
> static void xprt_timer(struct rpc_task *task)
> {
> struct rpc_rqst *req = task->tk_rqstp;
> @@ -1018,32 +1065,15 @@ void xprt_transmit(struct rpc_task *task)
>
> dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen);
>
> - if (!req->rq_reply_bytes_recvd) {
> -
> + if (!req->rq_bytes_sent) {
> + if (xprt_request_data_received(task))
> + return;
> /* Verify that our message lies in the RPCSEC_GSS window */
> - if (!req->rq_bytes_sent && rpcauth_xmit_need_reencode(task)) {
> + if (rpcauth_xmit_need_reencode(task)) {
> task->tk_status = -EBADMSG;
> return;
> }
> -
> - if (list_empty(&req->rq_list) && rpc_reply_expected(task)) {
> - /*
> - * Add to the list only if we're expecting a reply
> - */
> - /* Update the softirq receive buffer */
> - memcpy(&req->rq_private_buf, &req->rq_rcv_buf,
> - sizeof(req->rq_private_buf));
> - /* Add request to the receive list */
> - spin_lock(&xprt->queue_lock);
> - list_add_tail(&req->rq_list, &xprt->recv);
> - set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate);
> - spin_unlock(&xprt->queue_lock);
> - xprt_reset_majortimeo(req);
> - /* Turn off autodisconnect */
> - del_singleshot_timer_sync(&xprt->timer);
> - }
> - } else if (xprt_request_data_received(task) && !req->rq_bytes_sent)
> - return;
> + }
>
> connect_cookie = xprt->connect_cookie;
> status = xprt->ops->send_request(task);
> @@ -1285,7 +1315,6 @@ xprt_request_init(struct rpc_task *task)
> struct rpc_xprt *xprt = task->tk_xprt;
> struct rpc_rqst *req = task->tk_rqstp;
>
> - INIT_LIST_HEAD(&req->rq_list);
> req->rq_timeout = task->tk_client->cl_timeout->to_initval;
> req->rq_task = task;
> req->rq_xprt = xprt;
> @@ -1355,6 +1384,26 @@ void xprt_retry_reserve(struct rpc_task *task)
> xprt_do_reserve(xprt, task);
> }
>
> +static void
> +xprt_request_dequeue_all(struct rpc_task *task, struct rpc_rqst *req)
> +{
> + struct rpc_xprt *xprt = req->rq_xprt;
> +
> + if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) ||
> + xprt_is_pinned_rqst(req)) {
> + spin_lock(&xprt->queue_lock);
> + xprt_request_dequeue_receive_locked(task);
> + while (xprt_is_pinned_rqst(req)) {
> + set_bit(RPC_TASK_MSG_PIN_WAIT, &task->tk_runstate);
> + spin_unlock(&xprt->queue_lock);
> + xprt_wait_on_pinned_rqst(req);
> + spin_lock(&xprt->queue_lock);
> + clear_bit(RPC_TASK_MSG_PIN_WAIT, &task->tk_runstate);
> + }
> + spin_unlock(&xprt->queue_lock);
> + }
> +}
> +
> /**
> * xprt_release - release an RPC request slot
> * @task: task which is finished with the slot
> @@ -1379,18 +1428,7 @@ void xprt_release(struct rpc_task *task)
> task->tk_ops->rpc_count_stats(task, task->tk_calldata);
> else if (task->tk_client)
> rpc_count_iostats(task, task->tk_client->cl_metrics);
> - spin_lock(&xprt->queue_lock);
> - if (!list_empty(&req->rq_list)) {
> - list_del_init(&req->rq_list);
> - if (xprt_is_pinned_rqst(req)) {
> - set_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task-
> >tk_runstate);
> - spin_unlock(&xprt->queue_lock);
> - xprt_wait_on_pinned_rqst(req);
> - spin_lock(&xprt->queue_lock);
> - clear_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task-
> >tk_runstate);
> - }
> - }
> - spin_unlock(&xprt->queue_lock);
> + xprt_request_dequeue_all(task, req);
> spin_lock_bh(&xprt->transport_lock);
> xprt->ops->release_xprt(xprt, task);
> if (xprt->ops->release_request)
> diff --git a/net/sunrpc/xprtrdma/backchannel.c
> b/net/sunrpc/xprtrdma/backchannel.c
> index 90adeff4c06b..ed58761e6b23 100644
> --- a/net/sunrpc/xprtrdma/backchannel.c
> +++ b/net/sunrpc/xprtrdma/backchannel.c
> @@ -51,7 +51,6 @@ static int rpcrdma_bc_setup_reqs(struct rpcrdma_xprt
> *r_xprt,
> rqst = &req->rl_slot;
>
> rqst->rq_xprt = xprt;
> - INIT_LIST_HEAD(&rqst->rq_list);
> INIT_LIST_HEAD(&rqst->rq_bc_list);
> __set_bit(RPC_BC_PA_IN_USE, &rqst->rq_bc_pa_state);
> spin_lock_bh(&xprt->bc_pa_lock);

2018-09-19 21:26:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code

T24gVHVlLCAyMDE4LTA5LTE4IGF0IDE3OjAxIC0wNDAwLCBBbm5hIFNjaHVtYWtlciB3cm90ZToN
Cj4gSGkgVHJvbmQsDQo+IA0KPiBJJ20gc2VlaW5nIHRoaXMgY3Jhc2ggd2hpbGUgcnVubmluZyBj
dGhvbiB0ZXN0cyAob24gYW55IE5GUyB2ZXJzaW9uKQ0KPiBhZnRlcg0KPiBhcHBseWluZyB0aGlz
IHBhdGNoOg0KPiANCj4gWyAgIDUwLjc4MDEwNF0gZ2VuZXJhbCBwcm90ZWN0aW9uIGZhdWx0OiAw
MDAwIFsjMV0gUFJFRU1QVCBTTVAgUFRJDQo+IFsgICA1MC43ODA3OTZdIENQVTogMCBQSUQ6IDM4
NCBDb21tOiBrd29ya2VyL3U1OjEgTm90IHRhaW50ZWQgNC4xOS4wLQ0KPiByYzQtQU5OQSsNCj4g
Izc0NTUNCj4gWyAgIDUwLjc4MTYwMV0gSGFyZHdhcmUgbmFtZTogQm9jaHMgQm9jaHMsIEJJT1Mg
Qm9jaHMgMDEvMDEvMjAxMQ0KPiBbICAgNTAuNzgyMjMyXSBXb3JrcXVldWU6IHhwcnRpb2QgeHNf
dGNwX2RhdGFfcmVjZWl2ZV93b3JrZm4gW3N1bnJwY10NCj4gWyAgIDUwLjc4MjkxMV0gUklQOiAw
MDEwOnhwcnRfbG9va3VwX3Jxc3QrMHgyYy8weDE1MCBbc3VucnBjXQ0KPiBbICAgNTAuNzgzNTEw
XSBDb2RlOiA0OCA4ZCA5NyA1OCAwNCAwMCAwMCA0MSA1NCA0OSA4OSBmYyA1NSA4OSBmNSA1Mw0K
PiA0OCA4YiA4NyA1OA0KPiAwNCAwMCAwMCA0OCAzOSBjMiA3NCAyNiA0OCA4ZCA5OCA0OCBmZiBm
ZiBmZiAzYiA3MCBlMCA3NSAwNyBlYiAzZg0KPiA8Mzk+IDY4IGUwIDc0DQo+IDNhIDQ4IDhiIDgz
IGI4IDAwIDAwIDAwIDQ4IDhkIDk4IDQ4IGZmIGZmIGZmIDQ4IDM5IGMyDQo+IFsgICA1MC43ODU1
MDFdIFJTUDogMDAxODpmZmZmYzkwMDAwYmViZDYwIEVGTEFHUzogMDAwMTAyMDINCj4gWyAgIDUw
Ljc4NjA5MF0gUkFYOiBkZWFkMDAwMDAwMDAwMTAwIFJCWDogZGVhZDAwMDAwMDAwMDA0OCBSQ1g6
DQo+IDAwMDAwMDAwMDAwMDAwNTENCj4gWyAgIDUwLjc4Njg1M10gUkRYOiBmZmZmODgwMGI5MTVk
YzU4IFJTSTogMDAwMDAwMDA1YTFjNTYzMSBSREk6DQo+IGZmZmY4ODAwYjkxNWQ4MDANCj4gWyAg
IDUwLjc4NzYxNl0gUkJQOiAwMDAwMDAwMDVhMWM1NjMxIFIwODogMDAwMDAwMDAwMDAwMDAwMCBS
MDk6DQo+IDAwNjQ2ZjY5NzQ3MjcwNzgNCj4gWyAgIDUwLjc4ODM4MF0gUjEwOiA4MDgwODA4MDgw
ODA4MDgwIFIxMTogMDAwMDAwMDAwMDBlZTVmMyBSMTI6DQo+IGZmZmY4ODAwYjkxNWQ4MDANCj4g
WyAgIDUwLjc4OTE1M10gUjEzOiBmZmZmODgwMGI5MTVkYzE4IFIxNDogZmZmZjg4MDBiOTE1ZDgw
MCBSMTU6DQo+IGZmZmZmZmZmYTAzMjY1YjQNCj4gWyAgIDUwLjc4OTkzMF0gRlM6ICAwMDAwMDAw
MDAwMDAwMDAwKDAwMDApIEdTOmZmZmY4ODAwYmNhMDAwMDAoMDAwMCkNCj4ga25sR1M6MDAwMDAw
MDAwMDAwMDAwMA0KPiBbICAgNTAuNzkwNzk3XSBDUzogIDAwMTAgRFM6IDAwMDAgRVM6IDAwMDAg
Q1IwOiAwMDAwMDAwMDgwMDUwMDMzDQo+IFsgICA1MC43OTE0MTZdIENSMjogMDAwMDdmOWI2NzA1
MzhiMCBDUjM6IDAwMDAwMDAwMDIwMGEwMDEgQ1I0Og0KPiAwMDAwMDAwMDAwMTYwNmYwDQo+IFsg
ICA1MC43OTIxODJdIENhbGwgVHJhY2U6DQo+IFsgICA1MC43OTI0NzFdICB4c190Y3BfZGF0YV9y
ZWN2KzB4M2E2LzB4NzgwIFtzdW5ycGNdDQo+IFsgICA1MC43OTI5OTNdICA/IF9fc3dpdGNoX3Rv
X2FzbSsweDM0LzB4NzANCj4gWyAgIDUwLjc5MzQyNl0gID8geHNfdGNwX2NoZWNrX2ZyYWdoZHIu
cGFydC4xKzB4NDAvMHg0MCBbc3VucnBjXQ0KPiBbICAgNTAuNzk0MDQ3XSAgdGNwX3JlYWRfc29j
aysweDkzLzB4MWIwDQo+IFsgICA1MC43OTQ0NDddICA/IF9fc3dpdGNoX3RvX2FzbSsweDQwLzB4
NzANCj4gWyAgIDUwLjc5NDg3OV0gIHhzX3RjcF9kYXRhX3JlY2VpdmVfd29ya2ZuKzB4YjIvMHgx
OTAgW3N1bnJwY10NCj4gWyAgIDUwLjc5NTQ4Ml0gIHByb2Nlc3Nfb25lX3dvcmsrMHgxZTYvMHgz
YzANCj4gWyAgIDUwLjc5NTkyOF0gIHdvcmtlcl90aHJlYWQrMHgyOC8weDNjMA0KPiBbICAgNTAu
Nzk2MzM3XSAgPyBwcm9jZXNzX29uZV93b3JrKzB4M2MwLzB4M2MwDQo+IFsgICA1MC43OTY4MTRd
ICBrdGhyZWFkKzB4MTBkLzB4MTMwDQo+IFsgICA1MC43OTcxNzBdICA/IGt0aHJlYWRfcGFyaysw
eDgwLzB4ODANCj4gWyAgIDUwLjc5NzU3MF0gIHJldF9mcm9tX2ZvcmsrMHgzNS8weDQwDQo+IFsg
ICA1MC43OTc5NjFdIE1vZHVsZXMgbGlua2VkIGluOiBuZnN2MyBycGNzZWNfZ3NzX2tyYjUgbmZz
djQgbmZzDQo+IGZzY2FjaGUNCj4gY2ZnODAyMTEgcnBjcmRtYSByZmtpbGwgY3JjdDEwZGlmX3Bj
bG11bCBjcmMzMl9wY2xtdWwgY3JjMzJjX2ludGVsDQo+IGdoYXNoX2NsbXVsbmlfaW50ZWwgam95
ZGV2IHBjYmMgbW91c2VkZXYgYWVzbmlfaW50ZWwgcHNtb3VzZQ0KPiBhZXNfeDg2XzY0IGV2ZGV2
DQo+IGNyeXB0b19zaW1kIGNyeXB0ZCBpbnB1dF9sZWRzIGdsdWVfaGVscGVyIGxlZF9jbGFzcyBt
YWNfaGlkIHBjc3Brcg0KPiBpbnRlbF9hZ3ANCj4gaW50ZWxfZ3R0IGkyY19waWl4NCBuZnNkIGJ1
dHRvbiBhdXRoX3JwY2dzcyBuZnNfYWNsIGxvY2tkIGdyYWNlDQo+IHN1bnJwYw0KPiBzY2hfZnFf
Y29kZWwgaXBfdGFibGVzIHhfdGFibGVzIGF0YV9nZW5lcmljIHBhdGFfYWNwaSBhdGFfcGlpeA0K
PiBzZXJpb19yYXcNCj4gdWhjaV9oY2QgYXRrYmQgZWhjaV9wY2kgbGlicHMyIGVoY2lfaGNkIGxp
YmF0YSB1c2Jjb3JlIHVzYl9jb21tb24NCj4gaTgwNDIgZmxvcHB5DQo+IHNlcmlvIHNjc2lfbW9k
IHhmcyB2aXJ0aW9fYmFsbG9vbiB2aXJ0aW9fbmV0IG5ldF9mYWlsb3ZlciBmYWlsb3Zlcg0KPiB2
aXJ0aW9fcGNpDQo+IHZpcnRpb19ibGsgdmlydGlvX3JpbmcgdmlydGlvDQo+IA0KDQpUaGFua3Mg
Zm9yIGZpbmRpbmcgdGhhdCEgSXQgbG9va3MgbGlrZSB0aGUgZGVmaW5pdGlvbiBvZg0KeHBydF9y
ZXF1ZXN0X25lZWRfZW5xdWV1ZV9yZWNlaXZlKCkgd2FzIGluY29ycmVjdCBzbyBJJ3ZlIHB1c2hl
ZCBvdXQgYQ0KZml4ZWQgdmVyc2lvbiB0byB0aGUgJ3Rlc3RpbmcnIGJyYW5jaC4NCg0KLS0gDQpU
cm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lciwgSGFtbWVyc3BhY2UN
CnRyb25kLm15a2xlYnVzdEBoYW1tZXJzcGFjZS5jb20NCg0KDQo=

2018-09-19 23:09:26

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code

On Wed, 2018-09-19 at 15:48 +0000, Trond Myklebust wrote:
> On Tue, 2018-09-18 at 17:01 -0400, Anna Schumaker wrote:
> > Hi Trond,
> >
> > I'm seeing this crash while running cthon tests (on any NFS version)
> > after
> > applying this patch:
> >
> > [ 50.780104] general protection fault: 0000 [#1] PREEMPT SMP PTI
> > [ 50.780796] CPU: 0 PID: 384 Comm: kworker/u5:1 Not tainted 4.19.0-
> > rc4-ANNA+
> > #7455
> > [ 50.781601] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [ 50.782232] Workqueue: xprtiod xs_tcp_data_receive_workfn [sunrpc]
> > [ 50.782911] RIP: 0010:xprt_lookup_rqst+0x2c/0x150 [sunrpc]
> > [ 50.783510] Code: 48 8d 97 58 04 00 00 41 54 49 89 fc 55 89 f5 53
> > 48 8b 87 58
> > 04 00 00 48 39 c2 74 26 48 8d 98 48 ff ff ff 3b 70 e0 75 07 eb 3f
> > <39> 68 e0 74
> > 3a 48 8b 83 b8 00 00 00 48 8d 98 48 ff ff ff 48 39 c2
> > [ 50.785501] RSP: 0018:ffffc90000bebd60 EFLAGS: 00010202
> > [ 50.786090] RAX: dead000000000100 RBX: dead000000000048 RCX:
> > 0000000000000051
> > [ 50.786853] RDX: ffff8800b915dc58 RSI: 000000005a1c5631 RDI:
> > ffff8800b915d800
> > [ 50.787616] RBP: 000000005a1c5631 R08: 0000000000000000 R09:
> > 00646f6974727078
> > [ 50.788380] R10: 8080808080808080 R11: 00000000000ee5f3 R12:
> > ffff8800b915d800
> > [ 50.789153] R13: ffff8800b915dc18 R14: ffff8800b915d800 R15:
> > ffffffffa03265b4
> > [ 50.789930] FS: 0000000000000000(0000) GS:ffff8800bca00000(0000)
> > knlGS:0000000000000000
> > [ 50.790797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 50.791416] CR2: 00007f9b670538b0 CR3: 000000000200a001 CR4:
> > 00000000001606f0
> > [ 50.792182] Call Trace:
> > [ 50.792471] xs_tcp_data_recv+0x3a6/0x780 [sunrpc]
> > [ 50.792993] ? __switch_to_asm+0x34/0x70
> > [ 50.793426] ? xs_tcp_check_fraghdr.part.1+0x40/0x40 [sunrpc]
> > [ 50.794047] tcp_read_sock+0x93/0x1b0
> > [ 50.794447] ? __switch_to_asm+0x40/0x70
> > [ 50.794879] xs_tcp_data_receive_workfn+0xb2/0x190 [sunrpc]
> > [ 50.795482] process_one_work+0x1e6/0x3c0
> > [ 50.795928] worker_thread+0x28/0x3c0
> > [ 50.796337] ? process_one_work+0x3c0/0x3c0
> > [ 50.796814] kthread+0x10d/0x130
> > [ 50.797170] ? kthread_park+0x80/0x80
> > [ 50.797570] ret_from_fork+0x35/0x40
> > [ 50.797961] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs
> > fscache
> > cfg80211 rpcrdma rfkill crct10dif_pclmul crc32_pclmul crc32c_intel
> > ghash_clmulni_intel joydev pcbc mousedev aesni_intel psmouse
> > aes_x86_64 evdev
> > crypto_simd cryptd input_leds glue_helper led_class mac_hid pcspkr
> > intel_agp
> > intel_gtt i2c_piix4 nfsd button auth_rpcgss nfs_acl lockd grace
> > sunrpc
> > sch_fq_codel ip_tables x_tables ata_generic pata_acpi ata_piix
> > serio_raw
> > uhci_hcd atkbd ehci_pci libps2 ehci_hcd libata usbcore usb_common
> > i8042 floppy
> > serio scsi_mod xfs virtio_balloon virtio_net net_failover failover
> > virtio_pci
> > virtio_blk virtio_ring virtio
> >
>
> Thanks for finding that! It looks like the definition of
> xprt_request_need_enqueue_receive() was incorrect so I've pushed out a
> fixed version to the 'testing' branch.

The new version works for me, thanks!

Anna

>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2018-11-09 11:19:34

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators

Hi Trond,

On Mon, Sep 17, 2018 at 09:03:31AM -0400, Trond Myklebust wrote:
> Most of this code should also be reusable with other socket types.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> include/linux/sunrpc/xprtsock.h | 19 +-
> include/trace/events/sunrpc.h | 15 +-
> net/sunrpc/xprtsock.c | 694 +++++++++++++++-----------------
> 3 files changed, 335 insertions(+), 393 deletions(-)

With latest mainline (24ccea7e102d, it includes Al Viro's iov_iter
fixup) I started hitting some severe slowdown and systemd timeouts with
nfsroot on arm64 machines (physical or guests under KVM). Interestingly,
it only happens when the client kernel is configured with 64K pages, the
4K pages configuration runs fine. It also runs fine if I add rsize=65536
to the nfsroot= argument.

Bisecting led me to commit 277e4ab7d530 ("SUNRPC: Simplify TCP receive
code by switching to using iterators"). Prior to this commit, it works
fine.

Some more info:

- defconfig with CONFIG_ARM64_64K_PAGES enabled

- kernel cmdline arg: nfsroot=<some-server>:/srv/nfs/debian-arm64,tcp,v4

- if it matters, the server is also an arm64 machine running 4.19 with
4K pages configuration

I haven't figured out what's wrong or even how to debug this as I'm not
familiar with the sunrpc code. Any suggestion?

Thanks.

--
Catalin

2018-11-29 19:28:52

by Cristian Marussi

[permalink] [raw]
Subject: Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators

Hi Trond, Catalin

On 09/11/2018 11:19, Catalin Marinas wrote:
> Hi Trond,
>
> On Mon, Sep 17, 2018 at 09:03:31AM -0400, Trond Myklebust wrote:
>> Most of this code should also be reusable with other socket types.
>>
>> Signed-off-by: Trond Myklebust <[email protected]>
>> ---
>> include/linux/sunrpc/xprtsock.h | 19 +-
>> include/trace/events/sunrpc.h | 15 +-
>> net/sunrpc/xprtsock.c | 694 +++++++++++++++-----------------
>> 3 files changed, 335 insertions(+), 393 deletions(-)
>
> With latest mainline (24ccea7e102d, it includes Al Viro's iov_iter
> fixup) I started hitting some severe slowdown and systemd timeouts with
> nfsroot on arm64 machines (physical or guests under KVM). Interestingly,
> it only happens when the client kernel is configured with 64K pages, the
> 4K pages configuration runs fine. It also runs fine if I add rsize=65536
> to the nfsroot= argument.
>
> Bisecting led me to commit 277e4ab7d530 ("SUNRPC: Simplify TCP receive
> code by switching to using iterators"). Prior to this commit, it works
> fine.
>
> Some more info:
>
> - defconfig with CONFIG_ARM64_64K_PAGES enabled
>
> - kernel cmdline arg: nfsroot=<some-server>:/srv/nfs/debian-arm64,tcp,v4
>
> - if it matters, the server is also an arm64 machine running 4.19 with
> 4K pages configuration
>
> I haven't figured out what's wrong or even how to debug this as I'm not
> familiar with the sunrpc code. Any suggestion?
>
> Thanks.
>

I've done a bit of experiments/observations with this since it was seriously
impacting all form of testing on arm64 with a 64K pages configuration.

I can confirm rsize=65536 workaround above mentioned by Catalin is effective
also for me, as it is to reset back before the commit mentioned in the subject.

In the following I tested instead with:

- linus arm64 v4.20-rc1 64K pages
+ "Debug Lockups and Hangs" Enabled
- hw Juno-r2
- fully NFS mounted rootfs (Debian 9)
- NO rsize workaround
- NFS Client config(nfsstat -m)
Flags:
rw,relatime,vers=4.0,rsize=4096,wsize=4096,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.1,local_lock=none,addr=192.168.0.254

Observations:

1. despite some general boot slow down (not so evident in my setup), I hit the
issue when simply trying to launch LTP or LKP tests (all is in NFS mounted
rootfs): the immediately observable behavior is just that the application gets
'apparently' stuck straight away (NO output whatsoever). Waiting some seconds
yields no progress or result NOR any Lockup is detected by Kernel. A good deal
of effort is needed to kill the process at this point...BUT it is feasible (many
SIGSTOP + KILL)...and the system is back alive.


2. running again LKP via 'strace' we can observe the process apparently starting
fine but then suddenly hanging randomly multiple times: at first on an execve()
and then on some read() while trying to load its own file components; each hang
lasts 30-45 seconds approximately.

In LKP as an example:

$ strace lkp run ./dbench-100%.yaml

....
newfstatat(AT_FDCWD, "/opt/lkp-tests/bin/run-local", {st_mode=S_IFREG|0755,
st_size=4367, ...}, 0) = 0
faccessat(AT_FDCWD, "/opt/lkp-tests/bin/run-local", X_OK) = 0
execve("/opt/lkp-tests/bin/run-local", ["/opt/lkp-tests/bin/run-local",
"./dbench-100%.yaml"], [/* 12 vars */] <<< HANGGG
...
...
30-40 secs..
....
openat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems.rb",
O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=33018, ...}) = 0
close(7) = 0
getuid() = 0
geteuid() = 0
getgid() = 0
getegid() = 0
oopenat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems.rb",
O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7
fcntl(7, F_SETFL, O_RDONLY) = 0
fstat(7, {st_mode=S_IFREG|0644, st_size=33018, ...}) = 0
fstat(7, {st_mode=S_IFREG|0644, st_size=33018, ...}) = 0
ioctl(7, TCGETS, 0xffffdd0895d8) = -1 ENOTTY (Inappropriate ioctl for device)
read(7, <<<HANGGSSS

....
~30-40 secs
....

"# frozen_string_literal: true\n# "..., 8192) = 8192
read(7, ") as the standard configuration "..., 8192) = 8192
brk(0xaaaaeea70000) = 0xaaaaeea70000
read(7, "ady been required, then we have "..., 8192) = 8192
read(7, "lf.user_home\n @user_home ||= "..., 8192) = 8192
read(7, " require \"rubygems/defaults/#"..., 8192) = 250
read(7, "", 8192) = 0
close(7) = 0
....
....

openat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems/specification.rb",
O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=81998, ...}) = 0
close(7) = 0
getuid() = 0
geteuid() = 0
getgid() = 0
getegid() = 0
openat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems/specification.rb",
O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7
fcntl(7, F_SETFL, O_RDONLY) = 0
fstat(7, {st_mode=S_IFREG|0644, st_size=81998, ...}) = 0
fstat(7, {st_mode=S_IFREG|0644, st_size=81998, ...}) = 0
ioctl(7, TCGETS, 0xffffef390b38) = -1 ENOTTY (Inappropriate ioctl for device)
read(7, "# -*- coding: utf-8 -*-\n# frozen"..., 8192) = 8192
read(7, "e platform attribute appropriate"..., 8192) = 8192
read(7, "sealicense.com/.\n #\n # You sho"..., 8192) = 8192
read(7, " TODO: find all extraneous adds\n"..., 8192) = 8192
read(7, "rn a list of all outdated local "..., 8192) = 8192
read(7, " = authors.collect { |a"..., 8192) = 8192
read(7, "ends on.\n #\n # Use #add_depend"..., 8192) = 8192
read(7, "ns true you\n # probably want to"..., 8192) = 8192
read(7, <<<< HANGGGGG

Note that this last hang happens halfway through a file read !


3. Having a look at the underlying network traffic with Wireshark I could see in
fact that the NFS packets stop flowing completely for 30-40s when all of the
above happens...but I cannot see any error or timeout or NFS retries.
Same happened when I tried to reduce NFS timeo to 150 (15secs) from the original
600 (60 secs). This lack of retries was confirmed by stats:
root@sqwt-ubuntu:/opt/lkp-tests# nfsstat -r
Client rpc stats:
calls retrans authrefrsh
16008 0 16019

The only notable thing is a routine TCP KeepAlive sent by the NFS Client TCP
stack during the 30/40 secs quiet window: NFS data flow restarts anyway after
another 10/12 secs after the KeepAlive is sent and ACKed by the server, so it
does not seem the trigger itself for the restart.


4. Waiting forever (minutes) I was finally be able to see LKP completing
initialization and the dbench test being run. Below you can see the results with
and without the rsize workaround:

DBENCH SANE RESULTS WITH WORKAROUND RSIZE=65536
------------------------------------------------
...
6 122759 5.62 MB/sec execute 599 sec latency 84.304 ms
6 cleanup 600 sec
0 cleanup 600 sec

Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 108397 11.100 358.921
Close 79762 16.243 406.213
Rename 4582 19.399 259.180
Unlink 21728 3.610 190.564
Qpathinfo 98367 1.773 289.554
Qfileinfo 17163 9.917 232.687
Qfsinfo 17903 2.130 216.804
Sfileinfo 8828 17.427 234.069
Find 37915 3.478 287.326
WriteX 53503 0.048 2.992
ReadX 169707 0.592 199.341
LockX 350 13.536 242.800
UnlockX 350 2.801 124.317
Flush 7548 20.248 229.864

Throughput 5.61504 MB/sec 6 clients 6 procs max_latency=406.225 ms


DBENCH RESULTS WITHOUT WORKAROUND
---------------------------------
...
6 111273 5.06 MB/sec execute 599 sec latency 4066.072 ms
6 cleanup 600 sec
3 cleanup 601 sec
0 cleanup 601 sec

Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 97674 12.244 13773.786
Close 71583 19.126 13774.836
Rename 4135 19.177 277.171
Unlink 19881 4.286 12842.441
Qpathinfo 88303 1.942 12923.151
Qfileinfo 15305 9.636 203.481
Qfsinfo 16305 1.801 227.405
Sfileinfo 7960 15.871 186.799
Find 34164 3.409 255.098
WriteX 48105 0.053 5.428
ReadX 152460 0.926 13759.913
LockX 314 7.562 53.131
UnlockX 314 1.847 47.083
Flush 6872 19.222 200.180

Throughput 5.06232 MB/sec 6 clients 6 procs max_latency=13774.850 ms

Then I tried to running also:

./nfstest_io -d /mnt/t/data -v all -n 10 -r 3600

with WORKAROUND it took:
INFO: 2018-11-28 16:31:25.031280 TIME: 449 secs

WITHOUT:
INFO: 2018-11-28 17:55:39.438688 TIME: 1348 secs


4. All of the above 'slowliness' disappeared when I re-run the same tests the
second time, since NFS had cached all locally in the first run probably.


5. Reboot hangs similarly.


The fact that the traffic stops without triggering any NFS timeo and retry makes
me thing that are the egressing NFS rpc requests themselves that got stuck
somehow (but I'm far from being an NFS expert)

Any idea or thoughts ? Additional sensible test cases to run ?
Or hint on where to look inside NFS code Trond ?
(I'll re-test with a newer 4.20 RC and trying to ftrace something...in the next
days)

Thanks

Cristian








2018-11-29 19:56:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators

On Thu, 2018-11-29 at 19:28 +0000, Cristian Marussi wrote:
> Hi Trond, Catalin
>
> On 09/11/2018 11:19, Catalin Marinas wrote:
> > Hi Trond,
> >
> > On Mon, Sep 17, 2018 at 09:03:31AM -0400, Trond Myklebust wrote:
> > > Most of this code should also be reusable with other socket
> > > types.
> > >
> > > Signed-off-by: Trond Myklebust <[email protected]>
> > > ---
> > > include/linux/sunrpc/xprtsock.h | 19 +-
> > > include/trace/events/sunrpc.h | 15 +-
> > > net/sunrpc/xprtsock.c | 694 +++++++++++++++-----------
> > > ------
> > > 3 files changed, 335 insertions(+), 393 deletions(-)
> >
> > With latest mainline (24ccea7e102d, it includes Al Viro's iov_iter
> > fixup) I started hitting some severe slowdown and systemd timeouts
> > with
> > nfsroot on arm64 machines (physical or guests under KVM).
> > Interestingly,
> > it only happens when the client kernel is configured with 64K
> > pages, the
> > 4K pages configuration runs fine. It also runs fine if I add
> > rsize=65536
> > to the nfsroot= argument.
> >
> > Bisecting led me to commit 277e4ab7d530 ("SUNRPC: Simplify TCP
> > receive
> > code by switching to using iterators"). Prior to this commit, it
> > works
> > fine.
> >
> > Some more info:
> >
> > - defconfig with CONFIG_ARM64_64K_PAGES enabled
> >
> > - kernel cmdline arg: nfsroot=<some-server>:/srv/nfs/debian-
> > arm64,tcp,v4
> >
> > - if it matters, the server is also an arm64 machine running 4.19
> > with
> > 4K pages configuration
> >

Question to you both: when this happens, does /proc/*/stack show any of
the processes hanging in the socket or sunrpc code? If so, can you
please send me examples of those stack traces (i.e. the contents of
/proc/<pid>/stack for the processes that are hanging)

I'd be particularly interested if the processes in question are
related to the rpciod workqueue.

Thanks
Trond

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2018-11-30 16:19:10

by Cristian Marussi

[permalink] [raw]
Subject: Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators

Hi

On 29/11/2018 19:56, Trond Myklebust wrote:
> On Thu, 2018-11-29 at 19:28 +0000, Cristian Marussi wrote:
>> Hi Trond, Catalin
[snip]
>
> Question to you both: when this happens, does /proc/*/stack show any of
> the processes hanging in the socket or sunrpc code? If so, can you
> please send me examples of those stack traces (i.e. the contents of
> /proc/<pid>/stack for the processes that are hanging)

(using a reverse shell since starting ssh causes a lot of pain and traffic)

Looking at NFS traffic holes(30-40 secs) to detect Client side various HANGS
----------------------------------------------------------------------------

root@sqwt-ubuntu:/opt/lkp-tests# nc -lk -e /bin/bash -s 192.168.0.1 -p 1235 &
root@sqwt-ubuntu:/opt/lkp-tests# lkp run ./dbench-100%.yaml

$ nc 192.168.0.1 1235

cat /proc/2833/cmdline
ruby/opt/lkp-tests/bin/run-local./dbench-100%.yaml

HANG CLOSE
----------
cat /proc/2833/stack
[<0>] __switch_to+0x6c/0x90
[<0>] rpc_wait_bit_killable+0x2c/0xb0
[<0>] __rpc_wait_for_completion_task+0x3c/0x48
[<0>] nfs4_do_close+0x1ec/0x2b0
[<0>] __nfs4_close+0x130/0x198
[<0>] nfs4_close_sync+0x34/0x40
[<0>] nfs4_close_context+0x40/0x50
[<0>] __put_nfs_open_context+0xac/0x118
[<0>] nfs_file_clear_open_context+0x38/0x58
[<0>] nfs_file_release+0x7c/0x90
[<0>] __fput+0x94/0x1c0
[<0>] ____fput+0x20/0x30
[<0>] task_work_run+0x98/0xb8
[<0>] do_notify_resume+0x2d0/0x318
[<0>] work_pending+0x8/0x10
[<0>] 0xffffffffffffffff

HANG READ
---------
cat /proc/2833/stack
[<0>] __switch_to+0x6c/0x90
[<0>] io_schedule+0x20/0x40
[<0>] wait_on_page_bit_killable+0x164/0x260
[<0>] generic_file_read_iter+0x1c4/0x820
[<0>] nfs_file_read+0xa4/0x108
[<0>] __vfs_read+0x120/0x170
[<0>] vfs_read+0x94/0x150
[<0>] ksys_read+0x6c/0xd8
[<0>] __arm64_sys_read+0x24/0x30
[<0>] el0_svc_handler+0x7c/0x118
[<0>] el0_svc+0x8/0xc
[<0>] 0xffffffffffffffff


HANG STAT
---------
cat /proc/2833/stack
[<0>] __switch_to+0x6c/0x90
[<0>] rpc_wait_bit_killable+0x2c/0xb0
[<0>] __rpc_execute+0x1cc/0x528
[<0>] rpc_execute+0xe4/0x1b0
[<0>] rpc_run_task+0x130/0x168
[<0>] nfs4_call_sync_sequence+0x80/0xc8
[<0>] _nfs4_proc_getattr+0xc8/0xf8
[<0>] nfs4_proc_getattr+0x88/0x1d8
[<0>] __nfs_revalidate_inode+0x1f8/0x468
[<0>] nfs_getattr+0x14c/0x420
[<0>] vfs_getattr_nosec+0x7c/0x98
[<0>] vfs_getattr+0x48/0x58
[<0>] vfs_statx+0xb4/0x118
[<0>] __se_sys_newfstatat+0x58/0x98
[<0>] __arm64_sys_newfstatat+0x24/0x30
[<0>] el0_svc_handler+0x7c/0x118
[<0>] el0_svc+0x8/0xc
[<0>] 0xffffffffffffffff

....


Looking at a straced lkp to detect HANGS
----------------------------------------

cat /proc/2878/cmdline
ruby/opt/lkp-tests/bin/run-local./dbench-100%.yaml

HANG READ
----------
cat /proc/2878/stack
[<0>] __switch_to+0x6c/0x90
[<0>] io_schedule+0x20/0x40
[<0>] wait_on_page_bit_killable+0x164/0x260
[<0>] generic_file_read_iter+0x1c4/0x820
[<0>] nfs_file_read+0xa4/0x108
[<0>] __vfs_read+0x120/0x170
[<0>] vfs_read+0x94/0x150
[<0>] ksys_read+0x6c/0xd8
[<0>] __arm64_sys_read+0x24/0x30
[<0>] el0_svc_handler+0x7c/0x118
[<0>] el0_svc+0x8/0xc
[<0>] 0xffffffffffffffff

...

cat /proc/2878/status
Name: ruby
Umask: 0022
State: D (disk sleep)
Tgid: 2878
Ngid: 0
Pid: 2878
PPid: 2876
TracerPid: 2876
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 64
Groups:
NStgid: 2878
NSpid: 2878
NSpgid: 2876
NSsid: 2822
VmPeak: 24192 kB
VmSize: 24192 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 13376 kB
VmRSS: 13376 kB
RssAnon: 8768 kB
RssFile: 4608 kB
RssShmem: 0 kB
VmData: 9792 kB
VmStk: 8192 kB
VmExe: 64 kB
VmLib: 5888 kB
VmPTE: 320 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
Threads: 2
SigQ: 0/7534
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 00000001c2007e4f
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: unknown
Cpus_allowed: 3f
Cpus_allowed_list: 0-5
Mems_allowed: 1
Mems_allowed_list: 0
voluntary_ctxt_switches: 7547
nonvoluntary_ctxt_switches: 564


Thanks

Cristian

2018-11-30 19:31:51

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators

On Fri, 2018-11-30 at 16:19 +0000, Cristian Marussi wrote:
> Hi
>
> On 29/11/2018 19:56, Trond Myklebust wrote:
> > On Thu, 2018-11-29 at 19:28 +0000, Cristian Marussi wrote:
> > > Hi Trond, Catalin
> [snip]
> > Question to you both: when this happens, does /proc/*/stack show
> > any of
> > the processes hanging in the socket or sunrpc code? If so, can you
> > please send me examples of those stack traces (i.e. the contents of
> > /proc/<pid>/stack for the processes that are hanging)
>
> (using a reverse shell since starting ssh causes a lot of pain and
> traffic)
>
> Looking at NFS traffic holes(30-40 secs) to detect Client side
> various HANGS
> -------------------------------------------------------------------
> ---------
>
> root@sqwt-ubuntu:/opt/lkp-tests# nc -lk -e /bin/bash -s 192.168.0.1
> -p 1235 &
> root@sqwt-ubuntu:/opt/lkp-tests# lkp run ./dbench-100%.yaml
>
> $ nc 192.168.0.1 1235
>
> cat /proc/2833/cmdline
> ruby/opt/lkp-tests/bin/run-local./dbench-100%.yaml
>
> HANG CLOSE
> ----------
> cat /proc/2833/stack
> [<0>] __switch_to+0x6c/0x90
> [<0>] rpc_wait_bit_killable+0x2c/0xb0
> [<0>] __rpc_wait_for_completion_task+0x3c/0x48
> [<0>] nfs4_do_close+0x1ec/0x2b0
> [<0>] __nfs4_close+0x130/0x198
> [<0>] nfs4_close_sync+0x34/0x40
> [<0>] nfs4_close_context+0x40/0x50
> [<0>] __put_nfs_open_context+0xac/0x118
> [<0>] nfs_file_clear_open_context+0x38/0x58
> [<0>] nfs_file_release+0x7c/0x90
> [<0>] __fput+0x94/0x1c0
> [<0>] ____fput+0x20/0x30
> [<0>] task_work_run+0x98/0xb8
> [<0>] do_notify_resume+0x2d0/0x318
> [<0>] work_pending+0x8/0x10
> [<0>] 0xffffffffffffffff
>
> HANG READ
> ---------
> cat /proc/2833/stack
> [<0>] __switch_to+0x6c/0x90
> [<0>] io_schedule+0x20/0x40
> [<0>] wait_on_page_bit_killable+0x164/0x260
> [<0>] generic_file_read_iter+0x1c4/0x820
> [<0>] nfs_file_read+0xa4/0x108
> [<0>] __vfs_read+0x120/0x170
> [<0>] vfs_read+0x94/0x150
> [<0>] ksys_read+0x6c/0xd8
> [<0>] __arm64_sys_read+0x24/0x30
> [<0>] el0_svc_handler+0x7c/0x118
> [<0>] el0_svc+0x8/0xc
> [<0>] 0xffffffffffffffff
>
>
> HANG STAT
> ---------
> cat /proc/2833/stack
> [<0>] __switch_to+0x6c/0x90
> [<0>] rpc_wait_bit_killable+0x2c/0xb0
> [<0>] __rpc_execute+0x1cc/0x528
> [<0>] rpc_execute+0xe4/0x1b0
> [<0>] rpc_run_task+0x130/0x168
> [<0>] nfs4_call_sync_sequence+0x80/0xc8
> [<0>] _nfs4_proc_getattr+0xc8/0xf8
> [<0>] nfs4_proc_getattr+0x88/0x1d8
> [<0>] __nfs_revalidate_inode+0x1f8/0x468
> [<0>] nfs_getattr+0x14c/0x420
> [<0>] vfs_getattr_nosec+0x7c/0x98
> [<0>] vfs_getattr+0x48/0x58
> [<0>] vfs_statx+0xb4/0x118
> [<0>] __se_sys_newfstatat+0x58/0x98
> [<0>] __arm64_sys_newfstatat+0x24/0x30
> [<0>] el0_svc_handler+0x7c/0x118
> [<0>] el0_svc+0x8/0xc
> [<0>] 0xffffffffffffffff
>
> ....

Is there anything else blocked in the RPC layer? The above are all
standard tasks waiting for the rpciod/xprtiod workqueues to complete
the calls to the server.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2018-12-02 16:45:34

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators

On Fri, 2018-11-30 at 14:31 -0500, Trond Myklebust wrote:
> On Fri, 2018-11-30 at 16:19 +0000, Cristian Marussi wrote:
> > Hi
> >
> > On 29/11/2018 19:56, Trond Myklebust wrote:
> > > On Thu, 2018-11-29 at 19:28 +0000, Cristian Marussi wrote:
> > > > Hi Trond, Catalin
> > [snip]
> > > Question to you both: when this happens, does /proc/*/stack show
> > > any of
> > > the processes hanging in the socket or sunrpc code? If so, can
> > > you
> > > please send me examples of those stack traces (i.e. the contents
> > > of
> > > /proc/<pid>/stack for the processes that are hanging)
> >
> > (using a reverse shell since starting ssh causes a lot of pain and
> > traffic)
> >
> > Looking at NFS traffic holes(30-40 secs) to detect Client side
> > various HANGS
> > -------------------------------------------------------------------
> >

Hi Cristian and Catalin

Chuck and I have identified a few issues that might have an effect on
the hangs you report. Could you please give the linux-next branch in my
repository on git.linux-nfs.org (
https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/linux-next
) a try?

git pull git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next

Thanks!
Trond
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2018-12-03 11:46:03

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators

Hi Trond,

On Sun, Dec 02, 2018 at 04:44:49PM +0000, Trond Myklebust wrote:
> On Fri, 2018-11-30 at 14:31 -0500, Trond Myklebust wrote:
> > On Fri, 2018-11-30 at 16:19 +0000, Cristian Marussi wrote:
> > > On 29/11/2018 19:56, Trond Myklebust wrote:
> > > > On Thu, 2018-11-29 at 19:28 +0000, Cristian Marussi wrote:
> > > > Question to you both: when this happens, does /proc/*/stack show
> > > > any of the processes hanging in the socket or sunrpc code? If
> > > > so, can you please send me examples of those stack traces (i.e.
> > > > the contents of /proc/<pid>/stack for the processes that are
> > > > hanging)
> > >
> > > (using a reverse shell since starting ssh causes a lot of pain and
> > > traffic)
> > >
> > > Looking at NFS traffic holes(30-40 secs) to detect Client side
> > > various HANGS
>
> Chuck and I have identified a few issues that might have an effect on
> the hangs you report. Could you please give the linux-next branch in my
> repository on git.linux-nfs.org (
> https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/linux-next
> ) a try?
>
> git pull git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next

I tried, unfortunately there's no difference for me (I merged the above
branch on top of 4.20-rc5).

--
Catalin

2018-12-03 11:54:03

by Cristian Marussi

[permalink] [raw]
Subject: Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators

Hi

On 03/12/2018 11:45, Catalin Marinas wrote:
> Hi Trond,
>
> On Sun, Dec 02, 2018 at 04:44:49PM +0000, Trond Myklebust wrote:
>> On Fri, 2018-11-30 at 14:31 -0500, Trond Myklebust wrote:
>>> On Fri, 2018-11-30 at 16:19 +0000, Cristian Marussi wrote:
>>>> On 29/11/2018 19:56, Trond Myklebust wrote:
>>>>> On Thu, 2018-11-29 at 19:28 +0000, Cristian Marussi wrote:
>>>>> Question to you both: when this happens, does /proc/*/stack show
>>>>> any of the processes hanging in the socket or sunrpc code? If
>>>>> so, can you please send me examples of those stack traces (i.e.
>>>>> the contents of /proc/<pid>/stack for the processes that are
>>>>> hanging)
>>>>
>>>> (using a reverse shell since starting ssh causes a lot of pain and
>>>> traffic)
>>>>
>>>> Looking at NFS traffic holes(30-40 secs) to detect Client side
>>>> various HANGS
>>
>> Chuck and I have identified a few issues that might have an effect on
>> the hangs you report. Could you please give the linux-next branch in my
>> repository on git.linux-nfs.org (
>> https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/linux-next
>> ) a try?
>>
>> git pull git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
>
> I tried, unfortunately there's no difference for me (I merged the above
> branch on top of 4.20-rc5).
>

same for me. Issue still there.

Beside I saw some differences in the dbench result which I used for testing.

From the dbench (comparing with previous mail) it seems that
Unlink and Qpathinfo MaxLat has normalized.

Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 90820 13.613 13855.620
Close 66565 18.075 13853.289
Rename 3845 23.668 326.642
Unlink 18450 4.581 186.062
Qpathinfo 82068 2.677 280.203
Qfileinfo 14235 10.357 176.373
Qfsinfo 15156 2.822 242.794
Sfileinfo 7400 17.018 240.546
Find 31812 5.988 277.332
WriteX 44735 0.155 14.685
ReadX 141872 0.741 13817.870
LockX 288 10.558 96.179
UnlockX 288 3.307 57.939
Flush 6389 20.427 187.429


> Is there anything else blocked in the RPC layer? The above are all
> standard tasks waiting for the rpciod/xprtiod workqueues to complete
> the calls to the server.
cat /proc/692/stack
[<0>] __switch_to+0x6c/0x90
[<0>] rescuer_thread+0x2e8/0x360
[<0>] kthread+0x134/0x138
[<0>] ret_from_fork+0x10/0x1c
[<0>] 0xffffffffffffffff

I was now trying to collect more evidence ftracing during the quiet-stuck-period
till the restart happens.

Thanks

Cristian

2018-12-03 18:54:49

by Cristian Marussi

[permalink] [raw]
Subject: Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators

Hi

On 03/12/2018 11:53, Cristian Marussi wrote:
> Hi
>
[snip]
> same for me. Issue still there.
>
> Beside I saw some differences in the dbench result which I used for testing.
>
> From the dbench (comparing with previous mail) it seems that
> Unlink and Qpathinfo MaxLat has normalized.
>
> Operation Count AvgLat MaxLat
> ----------------------------------------
> NTCreateX 90820 13.613 13855.620
> Close 66565 18.075 13853.289
> Rename 3845 23.668 326.642
> Unlink 18450 4.581 186.062
> Qpathinfo 82068 2.677 280.203
> Qfileinfo 14235 10.357 176.373
> Qfsinfo 15156 2.822 242.794
> Sfileinfo 7400 17.018 240.546
> Find 31812 5.988 277.332
> WriteX 44735 0.155 14.685
> ReadX 141872 0.741 13817.870
> LockX 288 10.558 96.179
> UnlockX 288 3.307 57.939
> Flush 6389 20.427 187.429
>
>
>> Is there anything else blocked in the RPC layer? The above are all
>> standard tasks waiting for the rpciod/xprtiod workqueues to complete
>> the calls to the server.
> cat /proc/692/stack
> [<0>] __switch_to+0x6c/0x90
> [<0>] rescuer_thread+0x2e8/0x360
> [<0>] kthread+0x134/0x138
> [<0>] ret_from_fork+0x10/0x1c
> [<0>] 0xffffffffffffffff
>
> I was now trying to collect more evidence ftracing during the quiet-stuck-period
> till the restart happens.
>

attached to this mail there is a 3secs ftrace function-graph taken during the
quiet/stalled period of an 'LKP run dbench'; issued directly from console (no
ssh or netcat shell traffic).

Ftrace filter was pre-set as:

set_ftrace_filter was set to : nfs* rpc* xprt* tcp*

and tracing started once NO traffic was observed flowing on Wireshark.

Using ARM64 64k pages on Linux NFS next branch like previous mail this morning.

Thanks

Cristian



Attachments:
nfs_64k_stuck_ftrace_filtered_3secs_stalled.txt (56.23 kB)

2018-12-27 19:22:04

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks



> On Sep 17, 2018, at 9:03 AM, Trond Myklebust <[email protected]> wrote:
>
> One of the intentions with the priority queues was to ensure that no
> single process can hog the transport. The field task->tk_owner therefore
> identifies the RPC call's origin, and is intended to allow the RPC layer
> to organise queues for fairness.
> This commit therefore modifies the transmit queue to group requests
> by task->tk_owner, and ensures that we round robin among those groups.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> include/linux/sunrpc/xprt.h | 1 +
> net/sunrpc/xprt.c | 27 ++++++++++++++++++++++++---
> 2 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
> index 8c2bb078f00c..e377620b9744 100644
> --- a/include/linux/sunrpc/xprt.h
> +++ b/include/linux/sunrpc/xprt.h
> @@ -89,6 +89,7 @@ struct rpc_rqst {
> };
>
> struct list_head rq_xmit; /* Send queue */
> + struct list_head rq_xmit2; /* Send queue */
>
> void *rq_buffer; /* Call XDR encode buffer */
> size_t rq_callsize;
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index 35f5df367591..3e68f35f71f6 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -1052,12 +1052,21 @@ xprt_request_need_enqueue_transmit(struct rpc_task *task, struct rpc_rqst *req)
> void
> xprt_request_enqueue_transmit(struct rpc_task *task)
> {
> - struct rpc_rqst *req = task->tk_rqstp;
> + struct rpc_rqst *pos, *req = task->tk_rqstp;
> struct rpc_xprt *xprt = req->rq_xprt;
>
> if (xprt_request_need_enqueue_transmit(task, req)) {
> spin_lock(&xprt->queue_lock);
> + list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) {
> + if (pos->rq_task->tk_owner != task->tk_owner)
> + continue;
> + list_add_tail(&req->rq_xmit2, &pos->rq_xmit2);
> + INIT_LIST_HEAD(&req->rq_xmit);
> + goto out;
> + }
> list_add_tail(&req->rq_xmit, &xprt->xmit_queue);
> + INIT_LIST_HEAD(&req->rq_xmit2);
> +out:
> set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate);
> spin_unlock(&xprt->queue_lock);
> }
> @@ -1073,8 +1082,20 @@ xprt_request_enqueue_transmit(struct rpc_task *task)
> static void
> xprt_request_dequeue_transmit_locked(struct rpc_task *task)
> {
> - if (test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
> - list_del(&task->tk_rqstp->rq_xmit);
> + struct rpc_rqst *req = task->tk_rqstp;
> +
> + if (!test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate))
> + return;
> + if (!list_empty(&req->rq_xmit)) {
> + list_del(&req->rq_xmit);
> + if (!list_empty(&req->rq_xmit2)) {
> + struct rpc_rqst *next = list_first_entry(&req->rq_xmit2,
> + struct rpc_rqst, rq_xmit2);
> + list_del(&req->rq_xmit2);
> + list_add_tail(&next->rq_xmit, &next->rq_xprt->xmit_queue);
> + }
> + } else
> + list_del(&req->rq_xmit2);
> }
>
> /**
> --
> 2.17.1

Hi Trond-

I've chased down a couple of remaining regressions with the v4.20 NFS client,
and they seem to be rooted in this commit.

When using sec=krb5, krb5i, or krb5p I found that multi-threaded workloads
trigger a lot of server-side disconnects. This is with TCP and RDMA transports.
An instrumented server shows that the client is under-running the GSS sequence
number window. I monitored the order in which GSS sequence numbers appear on
the wire, and after this commit, the sequence numbers are wildly misordered.
If I revert the hunk in xprt_request_enqueue_transmit, the problem goes away.

I also found that reverting that hunk results in a 3-4% improvement in fio
IOPS rates, as well as improvement in average and maximum latency as reported
by fio.


--
Chuck Lever




2018-12-27 22:14:22

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks



> On Dec 27, 2018, at 20:21, Chuck Lever <[email protected]> wrote:
>
> Hi Trond-
>
> I've chased down a couple of remaining regressions with the v4.20 NFS client,
> and they seem to be rooted in this commit.
>
> When using sec=krb5, krb5i, or krb5p I found that multi-threaded workloads
> trigger a lot of server-side disconnects. This is with TCP and RDMA transports.
> An instrumented server shows that the client is under-running the GSS sequence
> number window. I monitored the order in which GSS sequence numbers appear on
> the wire, and after this commit, the sequence numbers are wildly misordered.
> If I revert the hunk in xprt_request_enqueue_transmit, the problem goes away.
>
> I also found that reverting that hunk results in a 3-4% improvement in fio
> IOPS rates, as well as improvement in average and maximum latency as reported
> by fio.
>

Hmm… Provided the sequence numbers still lie within the window, then why would the order matter?

Cheers
Trond


_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

2018-12-27 22:34:50

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks


> On Dec 27, 2018, at 5:14 PM, Trond Myklebust <[email protected]> wrote:
>
>
>
>> On Dec 27, 2018, at 20:21, Chuck Lever <[email protected]> wrote:
>>
>> Hi Trond-
>>
>> I've chased down a couple of remaining regressions with the v4.20 NFS client,
>> and they seem to be rooted in this commit.
>>
>> When using sec=krb5, krb5i, or krb5p I found that multi-threaded workloads
>> trigger a lot of server-side disconnects. This is with TCP and RDMA transports.
>> An instrumented server shows that the client is under-running the GSS sequence
>> number window. I monitored the order in which GSS sequence numbers appear on
>> the wire, and after this commit, the sequence numbers are wildly misordered.
>> If I revert the hunk in xprt_request_enqueue_transmit, the problem goes away.
>>
>> I also found that reverting that hunk results in a 3-4% improvement in fio
>> IOPS rates, as well as improvement in average and maximum latency as reported
>> by fio.
>>
>
> Hmm… Provided the sequence numbers still lie within the window, then why would the order matter?

The misordering is so bad that one request is delayed long enough to
fall outside the window. The new “need re-encode” logic does not
trigger.


> Cheers
> Trond
>
>
> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>


2018-12-31 18:09:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote:
> > On Dec 27, 2018, at 5:14 PM, Trond Myklebust <
> > [email protected]> wrote:
> >
> >
> >
> > > On Dec 27, 2018, at 20:21, Chuck Lever <[email protected]>
> > > wrote:
> > >
> > > Hi Trond-
> > >
> > > I've chased down a couple of remaining regressions with the v4.20
> > > NFS client,
> > > and they seem to be rooted in this commit.
> > >
> > > When using sec=krb5, krb5i, or krb5p I found that multi-threaded
> > > workloads
> > > trigger a lot of server-side disconnects. This is with TCP and
> > > RDMA transports.
> > > An instrumented server shows that the client is under-running the
> > > GSS sequence
> > > number window. I monitored the order in which GSS sequence
> > > numbers appear on
> > > the wire, and after this commit, the sequence numbers are wildly
> > > misordered.
> > > If I revert the hunk in xprt_request_enqueue_transmit, the
> > > problem goes away.
> > >
> > > I also found that reverting that hunk results in a 3-4%
> > > improvement in fio
> > > IOPS rates, as well as improvement in average and maximum latency
> > > as reported
> > > by fio.
> > >
> >
> > Hmm… Provided the sequence numbers still lie within the window,
> > then why would the order matter?
>
> The misordering is so bad that one request is delayed long enough to
> fall outside the window. The new “need re-encode” logic does not
> trigger.
>

That's weird. I can't see anything wrong with need re-encode at this
point. Do the window sizes agree on the client and the server?

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2018-12-31 18:44:35

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks



> On Dec 31, 2018, at 1:09 PM, Trond Myklebust <[email protected]> wrote:
>
> On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote:
>>> On Dec 27, 2018, at 5:14 PM, Trond Myklebust <
>>> [email protected]> wrote:
>>>
>>>
>>>
>>>> On Dec 27, 2018, at 20:21, Chuck Lever <[email protected]>
>>>> wrote:
>>>>
>>>> Hi Trond-
>>>>
>>>> I've chased down a couple of remaining regressions with the v4.20
>>>> NFS client,
>>>> and they seem to be rooted in this commit.
>>>>
>>>> When using sec=krb5, krb5i, or krb5p I found that multi-threaded
>>>> workloads
>>>> trigger a lot of server-side disconnects. This is with TCP and
>>>> RDMA transports.
>>>> An instrumented server shows that the client is under-running the
>>>> GSS sequence
>>>> number window. I monitored the order in which GSS sequence
>>>> numbers appear on
>>>> the wire, and after this commit, the sequence numbers are wildly
>>>> misordered.
>>>> If I revert the hunk in xprt_request_enqueue_transmit, the
>>>> problem goes away.
>>>>
>>>> I also found that reverting that hunk results in a 3-4%
>>>> improvement in fio
>>>> IOPS rates, as well as improvement in average and maximum latency
>>>> as reported
>>>> by fio.
>>>>
>>>
>>> Hmm… Provided the sequence numbers still lie within the window,
>>> then why would the order matter?
>>
>> The misordering is so bad that one request is delayed long enough to
>> fall outside the window. The new “need re-encode” logic does not
>> trigger.
>>
>
> That's weird. I can't see anything wrong with need re-encode at this
> point.

I don't think there is anything wrong with it, it looks like it's
not called in this case.


> Do the window sizes agree on the client and the server?

Yes, both are 128. I also tried with 64 on the client side and 128
on the server side. That reduces the frequency of disconnects, but
does not eliminate them.

I'm not clear what problem the logic in xprt_request_enqueue_transmit
is trying to address. It seems to me that the initial, simple
implementation of this function is entirely adequate..?


--
Chuck Lever




2018-12-31 18:59:49

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

On Mon, 2018-12-31 at 13:44 -0500, Chuck Lever wrote:
> > On Dec 31, 2018, at 1:09 PM, Trond Myklebust <
> > [email protected]> wrote:
> >
> > On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote:
> > > > On Dec 27, 2018, at 5:14 PM, Trond Myklebust <
> > > > [email protected]> wrote:
> > > >
> > > >
> > > >
> > > > > On Dec 27, 2018, at 20:21, Chuck Lever <
> > > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > Hi Trond-
> > > > >
> > > > > I've chased down a couple of remaining regressions with the
> > > > > v4.20
> > > > > NFS client,
> > > > > and they seem to be rooted in this commit.
> > > > >
> > > > > When using sec=krb5, krb5i, or krb5p I found that multi-
> > > > > threaded
> > > > > workloads
> > > > > trigger a lot of server-side disconnects. This is with TCP
> > > > > and
> > > > > RDMA transports.
> > > > > An instrumented server shows that the client is under-running
> > > > > the
> > > > > GSS sequence
> > > > > number window. I monitored the order in which GSS sequence
> > > > > numbers appear on
> > > > > the wire, and after this commit, the sequence numbers are
> > > > > wildly
> > > > > misordered.
> > > > > If I revert the hunk in xprt_request_enqueue_transmit, the
> > > > > problem goes away.
> > > > >
> > > > > I also found that reverting that hunk results in a 3-4%
> > > > > improvement in fio
> > > > > IOPS rates, as well as improvement in average and maximum
> > > > > latency
> > > > > as reported
> > > > > by fio.
> > > > >
> > > >
> > > > Hmm… Provided the sequence numbers still lie within the window,
> > > > then why would the order matter?
> > >
> > > The misordering is so bad that one request is delayed long enough
> > > to
> > > fall outside the window. The new “need re-encode” logic does not
> > > trigger.
> > >
> >
> > That's weird. I can't see anything wrong with need re-encode at
> > this
> > point.
>
> I don't think there is anything wrong with it, it looks like it's
> not called in this case.

So you are saying that the call to rpcauth_xmit_need_reencode() is
triggering the EBADMSG, but that this fails to cause a re-encode of the
message?

>
> > Do the window sizes agree on the client and the server?
>
> Yes, both are 128. I also tried with 64 on the client side and 128
> on the server side. That reduces the frequency of disconnects, but
> does not eliminate them.
>
> I'm not clear what problem the logic in xprt_request_enqueue_transmit
> is trying to address. It seems to me that the initial, simple
> implementation of this function is entirely adequate..?

I agree that the fair queueing code could result in a reordering that
could screw up the RPCSEC_GSS sequencing. However, we do expect the
need reencode stuff to catch that.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2018-12-31 19:09:16

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks



> On Dec 31, 2018, at 1:59 PM, Trond Myklebust <[email protected]> wrote:
>
> On Mon, 2018-12-31 at 13:44 -0500, Chuck Lever wrote:
>>> On Dec 31, 2018, at 1:09 PM, Trond Myklebust <
>>> [email protected]> wrote:
>>>
>>> On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote:
>>>>> On Dec 27, 2018, at 5:14 PM, Trond Myklebust <
>>>>> [email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> On Dec 27, 2018, at 20:21, Chuck Lever <
>>>>>> [email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Trond-
>>>>>>
>>>>>> I've chased down a couple of remaining regressions with the
>>>>>> v4.20
>>>>>> NFS client,
>>>>>> and they seem to be rooted in this commit.
>>>>>>
>>>>>> When using sec=krb5, krb5i, or krb5p I found that multi-
>>>>>> threaded
>>>>>> workloads
>>>>>> trigger a lot of server-side disconnects. This is with TCP
>>>>>> and
>>>>>> RDMA transports.
>>>>>> An instrumented server shows that the client is under-running
>>>>>> the
>>>>>> GSS sequence
>>>>>> number window. I monitored the order in which GSS sequence
>>>>>> numbers appear on
>>>>>> the wire, and after this commit, the sequence numbers are
>>>>>> wildly
>>>>>> misordered.
>>>>>> If I revert the hunk in xprt_request_enqueue_transmit, the
>>>>>> problem goes away.
>>>>>>
>>>>>> I also found that reverting that hunk results in a 3-4%
>>>>>> improvement in fio
>>>>>> IOPS rates, as well as improvement in average and maximum
>>>>>> latency
>>>>>> as reported
>>>>>> by fio.
>>>>>>
>>>>>
>>>>> Hmm… Provided the sequence numbers still lie within the window,
>>>>> then why would the order matter?
>>>>
>>>> The misordering is so bad that one request is delayed long enough
>>>> to
>>>> fall outside the window. The new “need re-encode” logic does not
>>>> trigger.
>>>>
>>>
>>> That's weird. I can't see anything wrong with need re-encode at
>>> this
>>> point.
>>
>> I don't think there is anything wrong with it, it looks like it's
>> not called in this case.
>
> So you are saying that the call to rpcauth_xmit_need_reencode() is
> triggering the EBADMSG, but that this fails to cause a re-encode of the
> message?

No, I think what's going on is that the need_reencode happens when the
RPC is enqueued, and is successful.

But xprt_request_enqueue_transmit places the RPC somewhere in the middle
of xmit_queue. xmit_queue is long enough that more than 128 requests are
before the enqueued request.


>>> Do the window sizes agree on the client and the server?
>>
>> Yes, both are 128. I also tried with 64 on the client side and 128
>> on the server side. That reduces the frequency of disconnects, but
>> does not eliminate them.
>>
>> I'm not clear what problem the logic in xprt_request_enqueue_transmit
>> is trying to address. It seems to me that the initial, simple
>> implementation of this function is entirely adequate..?
>
> I agree that the fair queueing code could result in a reordering that
> could screw up the RPCSEC_GSS sequencing. However, we do expect the
> need reencode stuff to catch that.

--
Chuck Lever




2018-12-31 19:18:51

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
> > On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
> > [email protected]> wrote:
> >
> > On Mon, 2018-12-31 at 13:44 -0500, Chuck Lever wrote:
> > > > On Dec 31, 2018, at 1:09 PM, Trond Myklebust <
> > > > [email protected]> wrote:
> > > >
> > > > On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote:
> > > > > > On Dec 27, 2018, at 5:14 PM, Trond Myklebust <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On Dec 27, 2018, at 20:21, Chuck Lever <
> > > > > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > Hi Trond-
> > > > > > >
> > > > > > > I've chased down a couple of remaining regressions with
> > > > > > > the
> > > > > > > v4.20
> > > > > > > NFS client,
> > > > > > > and they seem to be rooted in this commit.
> > > > > > >
> > > > > > > When using sec=krb5, krb5i, or krb5p I found that multi-
> > > > > > > threaded
> > > > > > > workloads
> > > > > > > trigger a lot of server-side disconnects. This is with
> > > > > > > TCP
> > > > > > > and
> > > > > > > RDMA transports.
> > > > > > > An instrumented server shows that the client is under-
> > > > > > > running
> > > > > > > the
> > > > > > > GSS sequence
> > > > > > > number window. I monitored the order in which GSS
> > > > > > > sequence
> > > > > > > numbers appear on
> > > > > > > the wire, and after this commit, the sequence numbers are
> > > > > > > wildly
> > > > > > > misordered.
> > > > > > > If I revert the hunk in xprt_request_enqueue_transmit,
> > > > > > > the
> > > > > > > problem goes away.
> > > > > > >
> > > > > > > I also found that reverting that hunk results in a 3-4%
> > > > > > > improvement in fio
> > > > > > > IOPS rates, as well as improvement in average and maximum
> > > > > > > latency
> > > > > > > as reported
> > > > > > > by fio.
> > > > > > >
> > > > > >
> > > > > > Hmm… Provided the sequence numbers still lie within the
> > > > > > window,
> > > > > > then why would the order matter?
> > > > >
> > > > > The misordering is so bad that one request is delayed long
> > > > > enough
> > > > > to
> > > > > fall outside the window. The new “need re-encode” logic does
> > > > > not
> > > > > trigger.
> > > > >
> > > >
> > > > That's weird. I can't see anything wrong with need re-encode at
> > > > this
> > > > point.
> > >
> > > I don't think there is anything wrong with it, it looks like it's
> > > not called in this case.
> >
> > So you are saying that the call to rpcauth_xmit_need_reencode() is
> > triggering the EBADMSG, but that this fails to cause a re-encode of
> > the
> > message?
>
> No, I think what's going on is that the need_reencode happens when
> the
> RPC is enqueued, and is successful.
>
> But xprt_request_enqueue_transmit places the RPC somewhere in the
> middle
> of xmit_queue. xmit_queue is long enough that more than 128 requests
> are
> before the enqueued request.

The test for rpcauth_xmit_need_reencode() happens when we call
xprt_request_transmit() to actually put the RPC call on the wire. The
enqueue order should not be able to defeat that test.

Hmm... Is it perhaps the test for req->rq_bytes_sent that is failing
because this is a retransmission after a disconnect/reconnect that
didn't trigger a re-encode?

> > > > Do the window sizes agree on the client and the server?
> > >
> > > Yes, both are 128. I also tried with 64 on the client side and
> > > 128
> > > on the server side. That reduces the frequency of disconnects,
> > > but
> > > does not eliminate them.
> > >
> > > I'm not clear what problem the logic in
> > > xprt_request_enqueue_transmit
> > > is trying to address. It seems to me that the initial, simple
> > > implementation of this function is entirely adequate..?
> >
> > I agree that the fair queueing code could result in a reordering
> > that
> > could screw up the RPCSEC_GSS sequencing. However, we do expect the
> > need reencode stuff to catch that.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2018-12-31 19:21:33

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
> > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
> > > [email protected]> wrote:
> > >
> > > On Mon, 2018-12-31 at 13:44 -0500, Chuck Lever wrote:
> > > > > On Dec 31, 2018, at 1:09 PM, Trond Myklebust <
> > > > > [email protected]> wrote:
> > > > >
> > > > > On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote:
> > > > > > > On Dec 27, 2018, at 5:14 PM, Trond Myklebust <
> > > > > > > [email protected]> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > On Dec 27, 2018, at 20:21, Chuck Lever <
> > > > > > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Trond-
> > > > > > > >
> > > > > > > > I've chased down a couple of remaining regressions with
> > > > > > > > the
> > > > > > > > v4.20
> > > > > > > > NFS client,
> > > > > > > > and they seem to be rooted in this commit.
> > > > > > > >
> > > > > > > > When using sec=krb5, krb5i, or krb5p I found that
> > > > > > > > multi-
> > > > > > > > threaded
> > > > > > > > workloads
> > > > > > > > trigger a lot of server-side disconnects. This is with
> > > > > > > > TCP
> > > > > > > > and
> > > > > > > > RDMA transports.
> > > > > > > > An instrumented server shows that the client is under-
> > > > > > > > running
> > > > > > > > the
> > > > > > > > GSS sequence
> > > > > > > > number window. I monitored the order in which GSS
> > > > > > > > sequence
> > > > > > > > numbers appear on
> > > > > > > > the wire, and after this commit, the sequence numbers
> > > > > > > > are
> > > > > > > > wildly
> > > > > > > > misordered.
> > > > > > > > If I revert the hunk in xprt_request_enqueue_transmit,
> > > > > > > > the
> > > > > > > > problem goes away.
> > > > > > > >
> > > > > > > > I also found that reverting that hunk results in a 3-4%
> > > > > > > > improvement in fio
> > > > > > > > IOPS rates, as well as improvement in average and
> > > > > > > > maximum
> > > > > > > > latency
> > > > > > > > as reported
> > > > > > > > by fio.
> > > > > > > >
> > > > > > >
> > > > > > > Hmm… Provided the sequence numbers still lie within the
> > > > > > > window,
> > > > > > > then why would the order matter?
> > > > > >
> > > > > > The misordering is so bad that one request is delayed long
> > > > > > enough
> > > > > > to
> > > > > > fall outside the window. The new “need re-encode” logic
> > > > > > does
> > > > > > not
> > > > > > trigger.
> > > > > >
> > > > >
> > > > > That's weird. I can't see anything wrong with need re-encode
> > > > > at
> > > > > this
> > > > > point.
> > > >
> > > > I don't think there is anything wrong with it, it looks like
> > > > it's
> > > > not called in this case.
> > >
> > > So you are saying that the call to rpcauth_xmit_need_reencode()
> > > is
> > > triggering the EBADMSG, but that this fails to cause a re-encode
> > > of
> > > the
> > > message?
> >
> > No, I think what's going on is that the need_reencode happens when
> > the
> > RPC is enqueued, and is successful.
> >
> > But xprt_request_enqueue_transmit places the RPC somewhere in the
> > middle
> > of xmit_queue. xmit_queue is long enough that more than 128
> > requests
> > are
> > before the enqueued request.
>
> The test for rpcauth_xmit_need_reencode() happens when we call
> xprt_request_transmit() to actually put the RPC call on the wire. The
> enqueue order should not be able to defeat that test.
>
> Hmm... Is it perhaps the test for req->rq_bytes_sent that is failing
> because this is a retransmission after a disconnect/reconnect that
> didn't trigger a re-encode?

Actually, it might be worth a try to move the test for
rpcauth_xmit_need_reencode() outside the enclosing test for req-
>rq_bytes_sent as that is just a minor optimisation.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-01-02 18:18:15

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks



> On Dec 31, 2018, at 2:21 PM, Trond Myklebust <[email protected]> wrote:
>
> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
>>>> [email protected]> wrote:
>>>>
>>>> On Mon, 2018-12-31 at 13:44 -0500, Chuck Lever wrote:
>>>>>> On Dec 31, 2018, at 1:09 PM, Trond Myklebust <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote:
>>>>>>>> On Dec 27, 2018, at 5:14 PM, Trond Myklebust <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Dec 27, 2018, at 20:21, Chuck Lever <
>>>>>>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Trond-
>>>>>>>>>
>>>>>>>>> I've chased down a couple of remaining regressions with
>>>>>>>>> the
>>>>>>>>> v4.20
>>>>>>>>> NFS client,
>>>>>>>>> and they seem to be rooted in this commit.
>>>>>>>>>
>>>>>>>>> When using sec=krb5, krb5i, or krb5p I found that
>>>>>>>>> multi-
>>>>>>>>> threaded
>>>>>>>>> workloads
>>>>>>>>> trigger a lot of server-side disconnects. This is with
>>>>>>>>> TCP
>>>>>>>>> and
>>>>>>>>> RDMA transports.
>>>>>>>>> An instrumented server shows that the client is under-
>>>>>>>>> running
>>>>>>>>> the
>>>>>>>>> GSS sequence
>>>>>>>>> number window. I monitored the order in which GSS
>>>>>>>>> sequence
>>>>>>>>> numbers appear on
>>>>>>>>> the wire, and after this commit, the sequence numbers
>>>>>>>>> are
>>>>>>>>> wildly
>>>>>>>>> misordered.
>>>>>>>>> If I revert the hunk in xprt_request_enqueue_transmit,
>>>>>>>>> the
>>>>>>>>> problem goes away.
>>>>>>>>>
>>>>>>>>> I also found that reverting that hunk results in a 3-4%
>>>>>>>>> improvement in fio
>>>>>>>>> IOPS rates, as well as improvement in average and
>>>>>>>>> maximum
>>>>>>>>> latency
>>>>>>>>> as reported
>>>>>>>>> by fio.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hmm… Provided the sequence numbers still lie within the
>>>>>>>> window,
>>>>>>>> then why would the order matter?
>>>>>>>
>>>>>>> The misordering is so bad that one request is delayed long
>>>>>>> enough
>>>>>>> to
>>>>>>> fall outside the window. The new “need re-encode” logic
>>>>>>> does
>>>>>>> not
>>>>>>> trigger.
>>>>>>>
>>>>>>
>>>>>> That's weird. I can't see anything wrong with need re-encode
>>>>>> at
>>>>>> this
>>>>>> point.
>>>>>
>>>>> I don't think there is anything wrong with it, it looks like
>>>>> it's
>>>>> not called in this case.
>>>>
>>>> So you are saying that the call to rpcauth_xmit_need_reencode()
>>>> is
>>>> triggering the EBADMSG, but that this fails to cause a re-encode
>>>> of
>>>> the
>>>> message?
>>>
>>> No, I think what's going on is that the need_reencode happens when
>>> the
>>> RPC is enqueued, and is successful.
>>>
>>> But xprt_request_enqueue_transmit places the RPC somewhere in the
>>> middle
>>> of xmit_queue. xmit_queue is long enough that more than 128
>>> requests
>>> are
>>> before the enqueued request.
>>
>> The test for rpcauth_xmit_need_reencode() happens when we call
>> xprt_request_transmit() to actually put the RPC call on the wire. The
>> enqueue order should not be able to defeat that test.
>>
>> Hmm... Is it perhaps the test for req->rq_bytes_sent that is failing
>> because this is a retransmission after a disconnect/reconnect that
>> didn't trigger a re-encode?
>
> Actually, it might be worth a try to move the test for
> rpcauth_xmit_need_reencode() outside the enclosing test for req-
>> rq_bytes_sent as that is just a minor optimisation.

Perhaps that's the case for TCP, but RPCs sent via xprtrdma never set
req->rq_bytes_sent to a non-zero value. The body of the "if" statement
is always executed for those RPCs.


--
Chuck Lever




2019-01-02 18:45:55

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
> > On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
> > [email protected]> wrote:
> >
> > On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
> > > On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
> > > > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
> > > > > [email protected]> wrote:
> > > > >
> > > > >
> > > The test for rpcauth_xmit_need_reencode() happens when we call
> > > xprt_request_transmit() to actually put the RPC call on the wire.
> > > The
> > > enqueue order should not be able to defeat that test.
> > >
> > > Hmm... Is it perhaps the test for req->rq_bytes_sent that is
> > > failing
> > > because this is a retransmission after a disconnect/reconnect
> > > that
> > > didn't trigger a re-encode?
> >
> > Actually, it might be worth a try to move the test for
> > rpcauth_xmit_need_reencode() outside the enclosing test for req-
> > > rq_bytes_sent as that is just a minor optimisation.
>
> Perhaps that's the case for TCP, but RPCs sent via xprtrdma never set
> req->rq_bytes_sent to a non-zero value. The body of the "if"
> statement
> is always executed for those RPCs.
>

Then the question is what is defeating the call to
rpcauth_xmit_need_reencode() in xprt_request_transmit() and causing it
not to trigger in the misordered cases?

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-01-02 18:51:14

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks



> On Jan 2, 2019, at 1:45 PM, Trond Myklebust <[email protected]> wrote:
>
> On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
>>> On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
>>> [email protected]> wrote:
>>>
>>> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
>>>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
>>>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>
>>>> The test for rpcauth_xmit_need_reencode() happens when we call
>>>> xprt_request_transmit() to actually put the RPC call on the wire.
>>>> The
>>>> enqueue order should not be able to defeat that test.
>>>>
>>>> Hmm... Is it perhaps the test for req->rq_bytes_sent that is
>>>> failing
>>>> because this is a retransmission after a disconnect/reconnect
>>>> that
>>>> didn't trigger a re-encode?
>>>
>>> Actually, it might be worth a try to move the test for
>>> rpcauth_xmit_need_reencode() outside the enclosing test for req-
>>>> rq_bytes_sent as that is just a minor optimisation.
>>
>> Perhaps that's the case for TCP, but RPCs sent via xprtrdma never set
>> req->rq_bytes_sent to a non-zero value. The body of the "if"
>> statement
>> is always executed for those RPCs.
>>
>
> Then the question is what is defeating the call to
> rpcauth_xmit_need_reencode() in xprt_request_transmit() and causing it
> not to trigger in the misordered cases?

Here's a sample RPC/RDMA case.

My instrumented server reports this:

Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped: seq_num=141220 sd->sd_max=141360


ftrace log on the client shows this:

kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode: task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode unneeded
kworker/u28:12-2191 [004] 194.048534: xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 status=-57
kworker/u28:12-2191 [004] 194.048534: rpc_task_run_action: task:1779@5 flags=ASYNC runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57 action=call_transmit_status
kworker/u28:12-2191 [004] 194.048535: rpc_task_run_action: task:1779@5 flags=ASYNC runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 action=call_transmit
kworker/u28:12-2191 [004] 194.048535: rpc_task_sleep: task:1779@5 flags=ASYNC runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0 queue=xprt_sending


kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode: task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode unneeded
kworker/u28:12-2191 [004] 194.048557: xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 status=0


kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode: task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 reencode unneeded
kworker/u28:12-2191 [004] 194.048563: xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360 status=0


Note that first need_reencode: the sequence numbers show that the xmit
queue has been significantly re-ordered. The request being transmitted is
already very close to the lower end of the GSS sequence number window.

The server then re-ordereds these two slightly because the first one had
some Read chunks that need to be pulled over, the second was pure inline
and therefore could be processed immediately. That is enough to force the
first one outside the GSS sequence number window.

I haven't looked closely at the pathology of the TCP case.


--
Chuck Lever




2019-01-02 18:57:16

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote:
> > On Jan 2, 2019, at 1:45 PM, Trond Myklebust <
> > [email protected]> wrote:
> >
> > On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
> > > > On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
> > > > [email protected]> wrote:
> > > >
> > > > On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
> > > > > On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
> > > > > > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
> > > > > > > [email protected]> wrote:
> > > > > > >
> > > > > > >
> > > > > The test for rpcauth_xmit_need_reencode() happens when we
> > > > > call
> > > > > xprt_request_transmit() to actually put the RPC call on the
> > > > > wire.
> > > > > The
> > > > > enqueue order should not be able to defeat that test.
> > > > >
> > > > > Hmm... Is it perhaps the test for req->rq_bytes_sent that is
> > > > > failing
> > > > > because this is a retransmission after a disconnect/reconnect
> > > > > that
> > > > > didn't trigger a re-encode?
> > > >
> > > > Actually, it might be worth a try to move the test for
> > > > rpcauth_xmit_need_reencode() outside the enclosing test for
> > > > req-
> > > > > rq_bytes_sent as that is just a minor optimisation.
> > >
> > > Perhaps that's the case for TCP, but RPCs sent via xprtrdma never
> > > set
> > > req->rq_bytes_sent to a non-zero value. The body of the "if"
> > > statement
> > > is always executed for those RPCs.
> > >
> >
> > Then the question is what is defeating the call to
> > rpcauth_xmit_need_reencode() in xprt_request_transmit() and causing
> > it
> > not to trigger in the misordered cases?
>
> Here's a sample RPC/RDMA case.
>
> My instrumented server reports this:
>
> Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped:
> seq_num=141220 sd->sd_max=141360
>
>
> ftrace log on the client shows this:
>
> kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode:
> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode
> unneeded
> kworker/u28:12-2191 [004] 194.048534:
> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
> status=-57
> kworker/u28:12-2191 [004] 194.048534:
> rpc_task_run_action: task:1779@5 flags=ASYNC
> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57
> action=call_transmit_status
> kworker/u28:12-2191 [004] 194.048535:
> rpc_task_run_action: task:1779@5 flags=ASYNC
> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0
> action=call_transmit
> kworker/u28:12-2191 [004] 194.048535:
> rpc_task_sleep: task:1779@5 flags=ASYNC
> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0
> queue=xprt_sending
>
>
> kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode:
> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode
> unneeded
> kworker/u28:12-2191 [004] 194.048557:
> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
> status=0
>
>
> kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode:
> task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 reencode
> unneeded
> kworker/u28:12-2191 [004] 194.048563:
> xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360
> status=0
>
>
> Note that first need_reencode: the sequence numbers show that the
> xmit
> queue has been significantly re-ordered. The request being
> transmitted is
> already very close to the lower end of the GSS sequence number
> window.
>
> The server then re-ordereds these two slightly because the first one
> had
> some Read chunks that need to be pulled over, the second was pure
> inline
> and therefore could be processed immediately. That is enough to force
> the
> first one outside the GSS sequence number window.
>
> I haven't looked closely at the pathology of the TCP case.

Wait a minute... That's not OK. The client can't be expected to take
into account reordering that happens on the server side.


--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-01-02 19:07:26

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

On Wed, 2019-01-02 at 13:57 -0500, Trond Myklebust wrote:
> On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote:
> > > On Jan 2, 2019, at 1:45 PM, Trond Myklebust <
> > > [email protected]> wrote:
> > >
> > > On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
> > > > > On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
> > > > > [email protected]> wrote:
> > > > >
> > > > > On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
> > > > > > On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
> > > > > > > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
> > > > > > > > [email protected]> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > The test for rpcauth_xmit_need_reencode() happens when we
> > > > > > call
> > > > > > xprt_request_transmit() to actually put the RPC call on the
> > > > > > wire.
> > > > > > The
> > > > > > enqueue order should not be able to defeat that test.
> > > > > >
> > > > > > Hmm... Is it perhaps the test for req->rq_bytes_sent that
> > > > > > is
> > > > > > failing
> > > > > > because this is a retransmission after a
> > > > > > disconnect/reconnect
> > > > > > that
> > > > > > didn't trigger a re-encode?
> > > > >
> > > > > Actually, it might be worth a try to move the test for
> > > > > rpcauth_xmit_need_reencode() outside the enclosing test for
> > > > > req-
> > > > > > rq_bytes_sent as that is just a minor optimisation.
> > > >
> > > > Perhaps that's the case for TCP, but RPCs sent via xprtrdma
> > > > never
> > > > set
> > > > req->rq_bytes_sent to a non-zero value. The body of the "if"
> > > > statement
> > > > is always executed for those RPCs.
> > > >
> > >
> > > Then the question is what is defeating the call to
> > > rpcauth_xmit_need_reencode() in xprt_request_transmit() and
> > > causing
> > > it
> > > not to trigger in the misordered cases?
> >
> > Here's a sample RPC/RDMA case.
> >
> > My instrumented server reports this:
> >
> > Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped:
> > seq_num=141220 sd->sd_max=141360
> >
> >
> > ftrace log on the client shows this:
> >
> > kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode:
> > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode
> > unneeded
> > kworker/u28:12-2191 [004] 194.048534:
> > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
> > status=-57
> > kworker/u28:12-2191 [004] 194.048534:
> > rpc_task_run_action: task:1779@5 flags=ASYNC
> > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57
> > action=call_transmit_status
> > kworker/u28:12-2191 [004] 194.048535:
> > rpc_task_run_action: task:1779@5 flags=ASYNC
> > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0
> > action=call_transmit
> > kworker/u28:12-2191 [004] 194.048535:
> > rpc_task_sleep: task:1779@5 flags=ASYNC
> > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0
> > queue=xprt_sending
> >
> >
> > kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode:
> > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode
> > unneeded
> > kworker/u28:12-2191 [004] 194.048557:
> > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
> > status=0
> >
> >
> > kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode:
> > task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 reencode
> > unneeded
> > kworker/u28:12-2191 [004] 194.048563:
> > xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360
> > status=0
> >
> >
> > Note that first need_reencode: the sequence numbers show that the
> > xmit
> > queue has been significantly re-ordered. The request being
> > transmitted is
> > already very close to the lower end of the GSS sequence number
> > window.
> >
> > The server then re-ordereds these two slightly because the first
> > one
> > had
> > some Read chunks that need to be pulled over, the second was pure
> > inline
> > and therefore could be processed immediately. That is enough to
> > force
> > the
> > first one outside the GSS sequence number window.
> >
> > I haven't looked closely at the pathology of the TCP case.
>
> Wait a minute... That's not OK. The client can't be expected to take
> into account reordering that happens on the server side.

If that's the case, then we would need to halt transmission as soon as
we hit the RPCSEC_GSS window edge. Off the cuff, I'm not sure how to do
that, since those windows are per session (i.e. per user).

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-01-02 19:08:10

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks



> On Jan 2, 2019, at 1:57 PM, Trond Myklebust <[email protected]> wrote:
>
> On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote:
>>> On Jan 2, 2019, at 1:45 PM, Trond Myklebust <
>>> [email protected]> wrote:
>>>
>>> On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
>>>>> On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
>>>>> [email protected]> wrote:
>>>>>
>>>>> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
>>>>>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
>>>>>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>> The test for rpcauth_xmit_need_reencode() happens when we
>>>>>> call
>>>>>> xprt_request_transmit() to actually put the RPC call on the
>>>>>> wire.
>>>>>> The
>>>>>> enqueue order should not be able to defeat that test.
>>>>>>
>>>>>> Hmm... Is it perhaps the test for req->rq_bytes_sent that is
>>>>>> failing
>>>>>> because this is a retransmission after a disconnect/reconnect
>>>>>> that
>>>>>> didn't trigger a re-encode?
>>>>>
>>>>> Actually, it might be worth a try to move the test for
>>>>> rpcauth_xmit_need_reencode() outside the enclosing test for
>>>>> req-
>>>>>> rq_bytes_sent as that is just a minor optimisation.
>>>>
>>>> Perhaps that's the case for TCP, but RPCs sent via xprtrdma never
>>>> set
>>>> req->rq_bytes_sent to a non-zero value. The body of the "if"
>>>> statement
>>>> is always executed for those RPCs.
>>>>
>>>
>>> Then the question is what is defeating the call to
>>> rpcauth_xmit_need_reencode() in xprt_request_transmit() and causing
>>> it
>>> not to trigger in the misordered cases?
>>
>> Here's a sample RPC/RDMA case.
>>
>> My instrumented server reports this:
>>
>> Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped:
>> seq_num=141220 sd->sd_max=141360
>>
>>
>> ftrace log on the client shows this:
>>
>> kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode:
>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode
>> unneeded
>> kworker/u28:12-2191 [004] 194.048534:
>> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
>> status=-57
>> kworker/u28:12-2191 [004] 194.048534:
>> rpc_task_run_action: task:1779@5 flags=ASYNC
>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57
>> action=call_transmit_status
>> kworker/u28:12-2191 [004] 194.048535:
>> rpc_task_run_action: task:1779@5 flags=ASYNC
>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0
>> action=call_transmit
>> kworker/u28:12-2191 [004] 194.048535:
>> rpc_task_sleep: task:1779@5 flags=ASYNC
>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0
>> queue=xprt_sending
>>
>>
>> kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode:
>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode
>> unneeded
>> kworker/u28:12-2191 [004] 194.048557:
>> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
>> status=0
>>
>>
>> kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode:
>> task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 reencode
>> unneeded
>> kworker/u28:12-2191 [004] 194.048563:
>> xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360
>> status=0
>>
>>
>> Note that first need_reencode: the sequence numbers show that the
>> xmit
>> queue has been significantly re-ordered. The request being
>> transmitted is
>> already very close to the lower end of the GSS sequence number
>> window.
>>
>> The server then re-ordereds these two slightly because the first one
>> had
>> some Read chunks that need to be pulled over, the second was pure
>> inline
>> and therefore could be processed immediately. That is enough to force
>> the
>> first one outside the GSS sequence number window.
>>
>> I haven't looked closely at the pathology of the TCP case.
>
> Wait a minute... That's not OK. The client can't be expected to take
> into account reordering that happens on the server side.

Conversely, the client can't assume the transport and the server don't
re-order. This does not appear to be a problem for the v4.19 client:
I don't see disconnect storms with that client.


--
Chuck Lever




2019-01-02 19:14:02

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

On Wed, 2019-01-02 at 14:08 -0500, Chuck Lever wrote:
> > On Jan 2, 2019, at 1:57 PM, Trond Myklebust <
> > [email protected]> wrote:
> >
> > On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote:
> > > > On Jan 2, 2019, at 1:45 PM, Trond Myklebust <
> > > > [email protected]> wrote:
> > > >
> > > > On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
> > > > > > On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
> > > > > > > On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
> > > > > > > > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
> > > > > > > > > [email protected]> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > The test for rpcauth_xmit_need_reencode() happens when we
> > > > > > > call
> > > > > > > xprt_request_transmit() to actually put the RPC call on
> > > > > > > the
> > > > > > > wire.
> > > > > > > The
> > > > > > > enqueue order should not be able to defeat that test.
> > > > > > >
> > > > > > > Hmm... Is it perhaps the test for req->rq_bytes_sent that
> > > > > > > is
> > > > > > > failing
> > > > > > > because this is a retransmission after a
> > > > > > > disconnect/reconnect
> > > > > > > that
> > > > > > > didn't trigger a re-encode?
> > > > > >
> > > > > > Actually, it might be worth a try to move the test for
> > > > > > rpcauth_xmit_need_reencode() outside the enclosing test for
> > > > > > req-
> > > > > > > rq_bytes_sent as that is just a minor optimisation.
> > > > >
> > > > > Perhaps that's the case for TCP, but RPCs sent via xprtrdma
> > > > > never
> > > > > set
> > > > > req->rq_bytes_sent to a non-zero value. The body of the "if"
> > > > > statement
> > > > > is always executed for those RPCs.
> > > > >
> > > >
> > > > Then the question is what is defeating the call to
> > > > rpcauth_xmit_need_reencode() in xprt_request_transmit() and
> > > > causing
> > > > it
> > > > not to trigger in the misordered cases?
> > >
> > > Here's a sample RPC/RDMA case.
> > >
> > > My instrumented server reports this:
> > >
> > > Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped:
> > > seq_num=141220 sd->sd_max=141360
> > >
> > >
> > > ftrace log on the client shows this:
> > >
> > > kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode:
> > > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336
> > > reencode
> > > unneeded
> > > kworker/u28:12-2191 [004] 194.048534:
> > > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
> > > status=-57
> > > kworker/u28:12-2191 [004] 194.048534:
> > > rpc_task_run_action: task:1779@5 flags=ASYNC
> > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57
> > > action=call_transmit_status
> > > kworker/u28:12-2191 [004] 194.048535:
> > > rpc_task_run_action: task:1779@5 flags=ASYNC
> > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0
> > > action=call_transmit
> > > kworker/u28:12-2191 [004] 194.048535:
> > > rpc_task_sleep: task:1779@5 flags=ASYNC
> > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0
> > > queue=xprt_sending
> > >
> > >
> > > kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode:
> > > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336
> > > reencode
> > > unneeded
> > > kworker/u28:12-2191 [004] 194.048557:
> > > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
> > > status=0
> > >
> > >
> > > kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode:
> > > task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336
> > > reencode
> > > unneeded
> > > kworker/u28:12-2191 [004] 194.048563:
> > > xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360
> > > status=0
> > >
> > >
> > > Note that first need_reencode: the sequence numbers show that the
> > > xmit
> > > queue has been significantly re-ordered. The request being
> > > transmitted is
> > > already very close to the lower end of the GSS sequence number
> > > window.
> > >
> > > The server then re-ordereds these two slightly because the first
> > > one
> > > had
> > > some Read chunks that need to be pulled over, the second was pure
> > > inline
> > > and therefore could be processed immediately. That is enough to
> > > force
> > > the
> > > first one outside the GSS sequence number window.
> > >
> > > I haven't looked closely at the pathology of the TCP case.
> >
> > Wait a minute... That's not OK. The client can't be expected to
> > take
> > into account reordering that happens on the server side.
>
> Conversely, the client can't assume the transport and the server
> don't
> re-order. This does not appear to be a problem for the v4.19 client:
> I don't see disconnect storms with that client.

There is absolutely nothing stopping it from happening in 4.19. It's
just very unlikely because the stream is strictly ordered on the client
side. So the misordering on the server would have to be pretty extreme.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-01-02 19:28:37

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

On Wed, 2019-01-02 at 14:06 -0500, Trond Myklebust wrote:
> On Wed, 2019-01-02 at 13:57 -0500, Trond Myklebust wrote:
> > On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote:
> > > > On Jan 2, 2019, at 1:45 PM, Trond Myklebust <
> > > > [email protected]> wrote:
> > > >
> > > > On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
> > > > > > On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
> > > > > > > On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
> > > > > > > > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
> > > > > > > > > [email protected]> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > The test for rpcauth_xmit_need_reencode() happens when we
> > > > > > > call
> > > > > > > xprt_request_transmit() to actually put the RPC call on
> > > > > > > the
> > > > > > > wire.
> > > > > > > The
> > > > > > > enqueue order should not be able to defeat that test.
> > > > > > >
> > > > > > > Hmm... Is it perhaps the test for req->rq_bytes_sent that
> > > > > > > is
> > > > > > > failing
> > > > > > > because this is a retransmission after a
> > > > > > > disconnect/reconnect
> > > > > > > that
> > > > > > > didn't trigger a re-encode?
> > > > > >
> > > > > > Actually, it might be worth a try to move the test for
> > > > > > rpcauth_xmit_need_reencode() outside the enclosing test for
> > > > > > req-
> > > > > > > rq_bytes_sent as that is just a minor optimisation.
> > > > >
> > > > > Perhaps that's the case for TCP, but RPCs sent via xprtrdma
> > > > > never
> > > > > set
> > > > > req->rq_bytes_sent to a non-zero value. The body of the "if"
> > > > > statement
> > > > > is always executed for those RPCs.
> > > > >
> > > >
> > > > Then the question is what is defeating the call to
> > > > rpcauth_xmit_need_reencode() in xprt_request_transmit() and
> > > > causing
> > > > it
> > > > not to trigger in the misordered cases?
> > >
> > > Here's a sample RPC/RDMA case.
> > >
> > > My instrumented server reports this:
> > >
> > > Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped:
> > > seq_num=141220 sd->sd_max=141360
> > >
> > >
> > > ftrace log on the client shows this:
> > >
> > > kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode:
> > > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336
> > > reencode
> > > unneeded
> > > kworker/u28:12-2191 [004] 194.048534:
> > > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
> > > status=-57
> > > kworker/u28:12-2191 [004] 194.048534:
> > > rpc_task_run_action: task:1779@5 flags=ASYNC
> > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57
> > > action=call_transmit_status
> > > kworker/u28:12-2191 [004] 194.048535:
> > > rpc_task_run_action: task:1779@5 flags=ASYNC
> > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0
> > > action=call_transmit
> > > kworker/u28:12-2191 [004] 194.048535:
> > > rpc_task_sleep: task:1779@5 flags=ASYNC
> > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0
> > > queue=xprt_sending
> > >
> > >
> > > kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode:
> > > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336
> > > reencode
> > > unneeded
> > > kworker/u28:12-2191 [004] 194.048557:
> > > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
> > > status=0
> > >
> > >
> > > kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode:
> > > task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336
> > > reencode
> > > unneeded
> > > kworker/u28:12-2191 [004] 194.048563:
> > > xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360
> > > status=0
> > >
> > >
> > > Note that first need_reencode: the sequence numbers show that the
> > > xmit
> > > queue has been significantly re-ordered. The request being
> > > transmitted is
> > > already very close to the lower end of the GSS sequence number
> > > window.
> > >
> > > The server then re-ordereds these two slightly because the first
> > > one
> > > had
> > > some Read chunks that need to be pulled over, the second was pure
> > > inline
> > > and therefore could be processed immediately. That is enough to
> > > force
> > > the
> > > first one outside the GSS sequence number window.
> > >
> > > I haven't looked closely at the pathology of the TCP case.
> >
> > Wait a minute... That's not OK. The client can't be expected to
> > take
> > into account reordering that happens on the server side.
>
> If that's the case, then we would need to halt transmission as soon
> as
> we hit the RPCSEC_GSS window edge. Off the cuff, I'm not sure how to
> do
> that, since those windows are per session (i.e. per user).

So here is something we probably could do: modify
xprt_request_enqueue_transmit() to order the list in req->rq_xmit2 by
req->rq_seqno. Since task->tk_owner is actually a pid, then that's not
a perfect solution, but we could further mitigate by modifying
gss_xmit_need_reencode() to only allow transmission of requests that
are within 2/3 of the window.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-01-02 19:33:44

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks



> On Jan 2, 2019, at 2:24 PM, Trond Myklebust <[email protected]> wrote:
>
> On Wed, 2019-01-02 at 14:06 -0500, Trond Myklebust wrote:
>> On Wed, 2019-01-02 at 13:57 -0500, Trond Myklebust wrote:
>>> On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote:
>>>>> On Jan 2, 2019, at 1:45 PM, Trond Myklebust <
>>>>> [email protected]> wrote:
>>>>>
>>>>> On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
>>>>>>> On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
>>>>>>>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
>>>>>>>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> The test for rpcauth_xmit_need_reencode() happens when we
>>>>>>>> call
>>>>>>>> xprt_request_transmit() to actually put the RPC call on
>>>>>>>> the
>>>>>>>> wire.
>>>>>>>> The
>>>>>>>> enqueue order should not be able to defeat that test.
>>>>>>>>
>>>>>>>> Hmm... Is it perhaps the test for req->rq_bytes_sent that
>>>>>>>> is
>>>>>>>> failing
>>>>>>>> because this is a retransmission after a
>>>>>>>> disconnect/reconnect
>>>>>>>> that
>>>>>>>> didn't trigger a re-encode?
>>>>>>>
>>>>>>> Actually, it might be worth a try to move the test for
>>>>>>> rpcauth_xmit_need_reencode() outside the enclosing test for
>>>>>>> req-
>>>>>>>> rq_bytes_sent as that is just a minor optimisation.
>>>>>>
>>>>>> Perhaps that's the case for TCP, but RPCs sent via xprtrdma
>>>>>> never
>>>>>> set
>>>>>> req->rq_bytes_sent to a non-zero value. The body of the "if"
>>>>>> statement
>>>>>> is always executed for those RPCs.
>>>>>>
>>>>>
>>>>> Then the question is what is defeating the call to
>>>>> rpcauth_xmit_need_reencode() in xprt_request_transmit() and
>>>>> causing
>>>>> it
>>>>> not to trigger in the misordered cases?
>>>>
>>>> Here's a sample RPC/RDMA case.
>>>>
>>>> My instrumented server reports this:
>>>>
>>>> Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped:
>>>> seq_num=141220 sd->sd_max=141360
>>>>
>>>>
>>>> ftrace log on the client shows this:
>>>>
>>>> kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode:
>>>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336
>>>> reencode
>>>> unneeded
>>>> kworker/u28:12-2191 [004] 194.048534:
>>>> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
>>>> status=-57
>>>> kworker/u28:12-2191 [004] 194.048534:
>>>> rpc_task_run_action: task:1779@5 flags=ASYNC
>>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57
>>>> action=call_transmit_status
>>>> kworker/u28:12-2191 [004] 194.048535:
>>>> rpc_task_run_action: task:1779@5 flags=ASYNC
>>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0
>>>> action=call_transmit
>>>> kworker/u28:12-2191 [004] 194.048535:
>>>> rpc_task_sleep: task:1779@5 flags=ASYNC
>>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0
>>>> queue=xprt_sending
>>>>
>>>>
>>>> kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode:
>>>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336
>>>> reencode
>>>> unneeded
>>>> kworker/u28:12-2191 [004] 194.048557:
>>>> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
>>>> status=0
>>>>
>>>>
>>>> kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode:
>>>> task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336
>>>> reencode
>>>> unneeded
>>>> kworker/u28:12-2191 [004] 194.048563:
>>>> xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360
>>>> status=0
>>>>
>>>>
>>>> Note that first need_reencode: the sequence numbers show that the
>>>> xmit
>>>> queue has been significantly re-ordered. The request being
>>>> transmitted is
>>>> already very close to the lower end of the GSS sequence number
>>>> window.
>>>>
>>>> The server then re-ordereds these two slightly because the first
>>>> one
>>>> had
>>>> some Read chunks that need to be pulled over, the second was pure
>>>> inline
>>>> and therefore could be processed immediately. That is enough to
>>>> force
>>>> the
>>>> first one outside the GSS sequence number window.
>>>>
>>>> I haven't looked closely at the pathology of the TCP case.
>>>
>>> Wait a minute... That's not OK. The client can't be expected to
>>> take
>>> into account reordering that happens on the server side.
>>
>> If that's the case, then we would need to halt transmission as soon
>> as
>> we hit the RPCSEC_GSS window edge. Off the cuff, I'm not sure how to
>> do
>> that, since those windows are per session (i.e. per user).
>
> So here is something we probably could do: modify
> xprt_request_enqueue_transmit() to order the list in req->rq_xmit2 by
> req->rq_seqno.

Why not add " && !req->rq_seq_no " to the third arm? Calls are already
enqueued in sequence number order.


> Since task->tk_owner is actually a pid, then that's not
> a perfect solution, but we could further mitigate by modifying
> gss_xmit_need_reencode() to only allow transmission of requests that
> are within 2/3 of the window.
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]

--
Chuck Lever