2019-01-02 22:54:23

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 0/4] bugfixes for RPCSEC_GSS client support

These patches address a couple of issues that are mainly affecting our
RPCSEC_GSS client support.

Trond Myklebust (4):
SUNRPC: Ensure rq_bytes_sent is reset before request transmission
SUNRPC: Fix the RPCSEC_GSS sequence semantics after request
re-encoding
SUNRPC: Allow for some re-ordering of RPCSEC_GSS requests on the
server
SUNRPC: Ensure we respect the RPCSEC_GSS sequence number limit

net/sunrpc/auth_gss/auth_gss.c | 22 +++++++++++++++-------
net/sunrpc/clnt.c | 20 ++++++++++++--------
net/sunrpc/xprt.c | 7 +++++++
3 files changed, 34 insertions(+), 15 deletions(-)

--
2.20.1



2019-01-02 22:54:23

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 1/4] SUNRPC: Ensure rq_bytes_sent is reset before request transmission

When we resend a request, ensure that the 'rq_bytes_sent' is reset
to zero.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/clnt.c | 1 -
net/sunrpc/xprt.c | 1 +
2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 24cbddc44c88..2189fbc4c570 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1738,7 +1738,6 @@ rpc_xdr_encode(struct rpc_task *task)
xdr_buf_init(&req->rq_rcv_buf,
req->rq_rbuffer,
req->rq_rcvsize);
- req->rq_bytes_sent = 0;

p = rpc_encode_header(task);
if (p == NULL) {
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 73547d17d3c6..9075ae150ae5 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1151,6 +1151,7 @@ xprt_request_enqueue_transmit(struct rpc_task *task)
struct rpc_xprt *xprt = req->rq_xprt;

if (xprt_request_need_enqueue_transmit(task, req)) {
+ req->rq_bytes_sent = 0;
spin_lock(&xprt->queue_lock);
/*
* Requests that carry congestion control credits are added
--
2.20.1


2019-01-02 22:54:24

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 2/4] SUNRPC: Fix the RPCSEC_GSS sequence semantics after request re-encoding

When we have to re-encode the RPCSEC_GSS request because it has fallen
outside the GSS sequence window, we want to avoid moving it to the front
of the transmission queue, since that moves the entire sequence window
forward, and may result in more requests falling out.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xprt.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 9075ae150ae5..040c45c68fb3 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1322,6 +1322,12 @@ xprt_request_transmit(struct rpc_rqst *req, struct rpc_task *snd_task)
}
/* Verify that our message lies in the RPCSEC_GSS window */
if (rpcauth_xmit_need_reencode(task)) {
+ if (xprt->ops->release_request &&
+ xprt_request_retransmit_after_disconnect(task)) {
+ spin_lock_bh(&xprt->transport_lock);
+ xprt->ops->release_request(task);
+ spin_unlock_bh(&xprt->transport_lock);
+ }
status = -EBADMSG;
goto out_dequeue;
}
--
2.20.1


2019-01-02 22:54:26

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 3/4] SUNRPC: Allow for some re-ordering of RPCSEC_GSS requests on the server

Strictly speaking, the client is probably not supposed to transmit more
requests than are allowed for in the RPCSEC_GSS window. However since
we do, try to ensure that we allow for some re-ordering by the server.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/auth_gss/auth_gss.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index ba765473d1f0..dc7124d45c8a 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -2029,9 +2029,6 @@ gss_xmit_need_reencode(struct rpc_task *task)
if (!ctx)
return true;

- if (gss_seq_is_newer(req->rq_seqno, READ_ONCE(ctx->gc_seq)))
- goto out;
-
seq_xmit = READ_ONCE(ctx->gc_seq_xmit);
while (gss_seq_is_newer(req->rq_seqno, seq_xmit)) {
u32 tmp = seq_xmit;
@@ -2043,7 +2040,12 @@ gss_xmit_need_reencode(struct rpc_task *task)
}
}

- win = ctx->gc_win;
+ /*
+ * Ensure the request is within 3/4 of the RPCSEC_GSS sequence
+ * window so that we allow for some re-ordering of the requests
+ * by the server.
+ */
+ win = 3 * ((ctx->gc_win + 3) >> 2);
if (win > 0)
ret = !gss_seq_is_newer(req->rq_seqno, seq_xmit - win);
out:
--
2.20.1


2019-01-02 22:54:26

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 4/4] SUNRPC: Ensure we respect the RPCSEC_GSS sequence number limit

According to RFC2203, the RPCSEC_GSS sequence numbers are bounded to
an upper limit of MAXSEQ = 0x80000000. Ensure that we handle that
correctly.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/auth_gss/auth_gss.c | 12 +++++++++---
net/sunrpc/clnt.c | 19 ++++++++++++-------
2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index dc7124d45c8a..96c1f12e791e 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -1563,8 +1563,10 @@ gss_marshal(struct rpc_task *task, __be32 *p)
cred_len = p++;

spin_lock(&ctx->gc_seq_lock);
- req->rq_seqno = ctx->gc_seq++;
+ req->rq_seqno = (ctx->gc_seq < MAXSEQ) ? ctx->gc_seq++ : MAXSEQ;
spin_unlock(&ctx->gc_seq_lock);
+ if (req->rq_seqno == MAXSEQ)
+ goto out_expired;

*p++ = htonl((u32) RPC_GSS_VERSION);
*p++ = htonl((u32) ctx->gc_proc);
@@ -1586,14 +1588,18 @@ gss_marshal(struct rpc_task *task, __be32 *p)
mic.data = (u8 *)(p + 1);
maj_stat = gss_get_mic(ctx->gc_gss_ctx, &verf_buf, &mic);
if (maj_stat == GSS_S_CONTEXT_EXPIRED) {
- clear_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags);
+ goto out_expired;
} else if (maj_stat != 0) {
- printk("gss_marshal: gss_get_mic FAILED (%d)\n", maj_stat);
+ pr_warn("gss_marshal: gss_get_mic FAILED (%d)\n", maj_stat);
+ task->tk_status = -EIO;
goto out_put_ctx;
}
p = xdr_encode_opaque(p, NULL, mic.len);
gss_put_ctx(ctx);
return p;
+out_expired:
+ clear_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags);
+ task->tk_status = -EKEYEXPIRED;
out_put_ctx:
gss_put_ctx(ctx);
return NULL;
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 2189fbc4c570..1ee04e0ec4bc 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1740,11 +1740,8 @@ rpc_xdr_encode(struct rpc_task *task)
req->rq_rcvsize);

p = rpc_encode_header(task);
- if (p == NULL) {
- printk(KERN_INFO "RPC: couldn't encode RPC header, exit EIO\n");
- rpc_exit(task, -EIO);
+ if (p == NULL)
return;
- }

encode = task->tk_msg.rpc_proc->p_encode;
if (encode == NULL)
@@ -1769,10 +1766,17 @@ call_encode(struct rpc_task *task)
/* Did the encode result in an error condition? */
if (task->tk_status != 0) {
/* Was the error nonfatal? */
- if (task->tk_status == -EAGAIN || task->tk_status == -ENOMEM)
+ switch (task->tk_status) {
+ case -EAGAIN:
+ case -ENOMEM:
rpc_delay(task, HZ >> 4);
- else
+ break;
+ case -EKEYEXPIRED:
+ task->tk_action = call_refresh;
+ break;
+ default:
rpc_exit(task, task->tk_status);
+ }
return;
}

@@ -2334,7 +2338,8 @@ rpc_encode_header(struct rpc_task *task)
*p++ = htonl(clnt->cl_vers); /* program version */
*p++ = htonl(task->tk_msg.rpc_proc->p_proc); /* procedure */
p = rpcauth_marshcred(task, p);
- req->rq_slen = xdr_adjust_iovec(&req->rq_svec[0], p);
+ if (p)
+ req->rq_slen = xdr_adjust_iovec(&req->rq_svec[0], p);
return p;
}

--
2.20.1


2019-01-03 15:29:39

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 1/4] SUNRPC: Ensure rq_bytes_sent is reset before request transmission

Hi Trond-

I was curious about this one because yesterday I saw evidence (for
other reasons) that rq_bytes_sent wasn't always zeroed when it should
be.


> On Jan 2, 2019, at 5:53 PM, Trond Myklebust <[email protected]> wrote:
>
> When we resend a request, ensure that the 'rq_bytes_sent' is reset
> to zero.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> net/sunrpc/clnt.c | 1 -
> net/sunrpc/xprt.c | 1 +
> 2 files changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 24cbddc44c88..2189fbc4c570 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1738,7 +1738,6 @@ rpc_xdr_encode(struct rpc_task *task)
> xdr_buf_init(&req->rq_rcv_buf,
> req->rq_rbuffer,
> req->rq_rcvsize);
> - req->rq_bytes_sent = 0;

I agree this line is not sufficient, and it should be moved.
Not every retransmission requires a re-encode. However, the
patch description should explain that, and it probably needs
a Fixes: tag.

Can you now also remove the same line from xprt_request_init
and xprt_init_bc_request ?

Also, I notice that UDP does not touch rq_bytes_sent. Since
RDMA also does not use rq_bytes_sent, maybe the same line
can be removed from xprtrdma/transport.c and
xprtrdma/backchannel.c ?


> p = rpc_encode_header(task);
> if (p == NULL) {
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index 73547d17d3c6..9075ae150ae5 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -1151,6 +1151,7 @@ xprt_request_enqueue_transmit(struct rpc_task *task)
> struct rpc_xprt *xprt = req->rq_xprt;
>
> if (xprt_request_need_enqueue_transmit(task, req)) {
> + req->rq_bytes_sent = 0;
> spin_lock(&xprt->queue_lock);
> /*
> * Requests that carry congestion control credits are added

So I'm not convinced this covers every case. I need some
time to investigate.


--
Chuck Lever




2019-01-03 16:05:38

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/4] SUNRPC: Ensure rq_bytes_sent is reset before request transmission

On Thu, 2019-01-03 at 10:29 -0500, Chuck Lever wrote:
> Hi Trond-
>
> I was curious about this one because yesterday I saw evidence (for
> other reasons) that rq_bytes_sent wasn't always zeroed when it should
> be.
>
>
> > On Jan 2, 2019, at 5:53 PM, Trond Myklebust <[email protected]>
> > wrote:
> >
> > When we resend a request, ensure that the 'rq_bytes_sent' is reset
> > to zero.
> >
> > Signed-off-by: Trond Myklebust <[email protected]>
> > ---
> > net/sunrpc/clnt.c | 1 -
> > net/sunrpc/xprt.c | 1 +
> > 2 files changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > index 24cbddc44c88..2189fbc4c570 100644
> > --- a/net/sunrpc/clnt.c
> > +++ b/net/sunrpc/clnt.c
> > @@ -1738,7 +1738,6 @@ rpc_xdr_encode(struct rpc_task *task)
> > xdr_buf_init(&req->rq_rcv_buf,
> > req->rq_rbuffer,
> > req->rq_rcvsize);
> > - req->rq_bytes_sent = 0;
>
> I agree this line is not sufficient, and it should be moved.
> Not every retransmission requires a re-encode. However, the
> patch description should explain that, and it probably needs
> a Fixes: tag.
>
> Can you now also remove the same line from xprt_request_init
> and xprt_init_bc_request ?
>
> Also, I notice that UDP does not touch rq_bytes_sent. Since
> RDMA also does not use rq_bytes_sent, maybe the same line
> can be removed from xprtrdma/transport.c and
> xprtrdma/backchannel.c ?

Sure.

So please note that rq_bytes_sent == 0 no longer means "this request
needs to be retransmitted" and we no longer test for it in
net/sunrpc/clnt.c. We do still have a couple of tests of rq_bytes_sent
in net/sunrpc/xprt.c and net/sunrpc/xprtsock.c, but those are more
about checking if a transmission of that request is currently in
progress, in which case we don't want to queue anything in front of it
on the transmission queue, and we don't want to abort the transmission
unless we also close the socket.

The intention now is that if we know the request needs retransmission
(due to a transport connection loss or a timeout), then we just add it
to the transmission queue.


> > p = rpc_encode_header(task);
> > if (p == NULL) {
> > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > index 73547d17d3c6..9075ae150ae5 100644
> > --- a/net/sunrpc/xprt.c
> > +++ b/net/sunrpc/xprt.c
> > @@ -1151,6 +1151,7 @@ xprt_request_enqueue_transmit(struct rpc_task
> > *task)
> > struct rpc_xprt *xprt = req->rq_xprt;
> >
> > if (xprt_request_need_enqueue_transmit(task, req)) {
> > + req->rq_bytes_sent = 0;
> > spin_lock(&xprt->queue_lock);
> > /*
> > * Requests that carry congestion control credits are
> > added
>
> So I'm not convinced this covers every case. I need some
> time to investigate.

It should normally cover all cases. As I said, the only remaining tests
are in xprt.c and xprtsock.c

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-01-03 16:25:18

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH 1/4] SUNRPC: Ensure rq_bytes_sent is reset before request transmission

On 1/3/2019 11:05 AM, Trond Myklebust wrote:
> On Thu, 2019-01-03 at 10:29 -0500, Chuck Lever wrote:
>> Hi Trond-
>>
>> I was curious about this one because yesterday I saw evidence (for
>> other reasons) that rq_bytes_sent wasn't always zeroed when it should
>> be.
>>
>>
>>> On Jan 2, 2019, at 5:53 PM, Trond Myklebust <[email protected]>
>>> wrote:
>>>
>>> When we resend a request, ensure that the 'rq_bytes_sent' is reset
>>> to zero.
>>>
>>> Signed-off-by: Trond Myklebust <[email protected]>
>>> ---
>>> net/sunrpc/clnt.c | 1 -
>>> net/sunrpc/xprt.c | 1 +
>>> 2 files changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>> index 24cbddc44c88..2189fbc4c570 100644
>>> --- a/net/sunrpc/clnt.c
>>> +++ b/net/sunrpc/clnt.c
>>> @@ -1738,7 +1738,6 @@ rpc_xdr_encode(struct rpc_task *task)
>>> xdr_buf_init(&req->rq_rcv_buf,
>>> req->rq_rbuffer,
>>> req->rq_rcvsize);
>>> - req->rq_bytes_sent = 0;
>>
>> I agree this line is not sufficient, and it should be moved.
>> Not every retransmission requires a re-encode. However, the
>> patch description should explain that, and it probably needs
>> a Fixes: tag.
>>
>> Can you now also remove the same line from xprt_request_init
>> and xprt_init_bc_request ?
>>
>> Also, I notice that UDP does not touch rq_bytes_sent. Since
>> RDMA also does not use rq_bytes_sent, maybe the same line
>> can be removed from xprtrdma/transport.c and
>> xprtrdma/backchannel.c ?
>
> Sure.
>
> So please note that rq_bytes_sent == 0 no longer means "this request
> needs to be retransmitted" and we no longer test for it in
> net/sunrpc/clnt.c. We do still have a couple of tests of rq_bytes_sent
> in net/sunrpc/xprt.c and net/sunrpc/xprtsock.c, but those are more
> about checking if a transmission of that request is currently in
> progress, in which case we don't want to queue anything in front of it
> on the transmission queue, and we don't want to abort the transmission
> unless we also close the socket.

I think rq_bytes_sent is all about managing sends atomically. On stream
transports (which allow buffering partial segments), it would be fatal
to allow intermingling. On datagram transports, it's a non-issue since
no sends are ever partial.

IOW, couldn't rq_bytes_sent simply be a boolean?

Tom.

> The intention now is that if we know the request needs retransmission
> (due to a transport connection loss or a timeout), then we just add it
> to the transmission queue.
>
>
>>> p = rpc_encode_header(task);
>>> if (p == NULL) {
>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>>> index 73547d17d3c6..9075ae150ae5 100644
>>> --- a/net/sunrpc/xprt.c
>>> +++ b/net/sunrpc/xprt.c
>>> @@ -1151,6 +1151,7 @@ xprt_request_enqueue_transmit(struct rpc_task
>>> *task)
>>> struct rpc_xprt *xprt = req->rq_xprt;
>>>
>>> if (xprt_request_need_enqueue_transmit(task, req)) {
>>> + req->rq_bytes_sent = 0;
>>> spin_lock(&xprt->queue_lock);
>>> /*
>>> * Requests that carry congestion control credits are
>>> added
>>
>> So I'm not convinced this covers every case. I need some
>> time to investigate.
>
> It should normally cover all cases. As I said, the only remaining tests
> are in xprt.c and xprtsock.c
>

2019-01-03 16:27:27

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/4] SUNRPC: Ensure rq_bytes_sent is reset before request transmission

On Thu, 2019-01-03 at 11:17 -0500, Tom Talpey wrote:
> On 1/3/2019 11:05 AM, Trond Myklebust wrote:
> > On Thu, 2019-01-03 at 10:29 -0500, Chuck Lever wrote:
> > > Hi Trond-
> > >
> > > I was curious about this one because yesterday I saw evidence
> > > (for
> > > other reasons) that rq_bytes_sent wasn't always zeroed when it
> > > should
> > > be.
> > >
> > >
> > > > On Jan 2, 2019, at 5:53 PM, Trond Myklebust <[email protected]>
> > > > wrote:
> > > >
> > > > When we resend a request, ensure that the 'rq_bytes_sent' is
> > > > reset
> > > > to zero.
> > > >
> > > > Signed-off-by: Trond Myklebust <[email protected]
> > > > >
> > > > ---
> > > > net/sunrpc/clnt.c | 1 -
> > > > net/sunrpc/xprt.c | 1 +
> > > > 2 files changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > > > index 24cbddc44c88..2189fbc4c570 100644
> > > > --- a/net/sunrpc/clnt.c
> > > > +++ b/net/sunrpc/clnt.c
> > > > @@ -1738,7 +1738,6 @@ rpc_xdr_encode(struct rpc_task *task)
> > > > xdr_buf_init(&req->rq_rcv_buf,
> > > > req->rq_rbuffer,
> > > > req->rq_rcvsize);
> > > > - req->rq_bytes_sent = 0;
> > >
> > > I agree this line is not sufficient, and it should be moved.
> > > Not every retransmission requires a re-encode. However, the
> > > patch description should explain that, and it probably needs
> > > a Fixes: tag.
> > >
> > > Can you now also remove the same line from xprt_request_init
> > > and xprt_init_bc_request ?
> > >
> > > Also, I notice that UDP does not touch rq_bytes_sent. Since
> > > RDMA also does not use rq_bytes_sent, maybe the same line
> > > can be removed from xprtrdma/transport.c and
> > > xprtrdma/backchannel.c ?
> >
> > Sure.
> >
> > So please note that rq_bytes_sent == 0 no longer means "this
> > request
> > needs to be retransmitted" and we no longer test for it in
> > net/sunrpc/clnt.c. We do still have a couple of tests of
> > rq_bytes_sent
> > in net/sunrpc/xprt.c and net/sunrpc/xprtsock.c, but those are more
> > about checking if a transmission of that request is currently in
> > progress, in which case we don't want to queue anything in front of
> > it
> > on the transmission queue, and we don't want to abort the
> > transmission
> > unless we also close the socket.
>
> I think rq_bytes_sent is all about managing sends atomically. On
> stream
> transports (which allow buffering partial segments), it would be
> fatal
> to allow intermingling. On datagram transports, it's a non-issue
> since
> no sends are ever partial.
>
> IOW, couldn't rq_bytes_sent simply be a boolean?

Sends can be partial for TCP and AF_LOCAL because the stream socket
operations are non-blocking.

Strictly speaking, though, we probably could replace rq_bytes_sent with
a boolean that represents "transmission in progress" since the TCP
layer itself now tracks how many bytes have been transmitted for the
request being transmitted. i.e. the boolean would be set by
xs_local_send_request() and xs_tcp_send_request(), and cleared by those
same functions once the transmission is complete. Meh...


> Tom.
>
> > The intention now is that if we know the request needs
> > retransmission
> > (due to a transport connection loss or a timeout), then we just add
> > it
> > to the transmission queue.
> >
> >
> > > > p = rpc_encode_header(task);
> > > > if (p == NULL) {
> > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > > > index 73547d17d3c6..9075ae150ae5 100644
> > > > --- a/net/sunrpc/xprt.c
> > > > +++ b/net/sunrpc/xprt.c
> > > > @@ -1151,6 +1151,7 @@ xprt_request_enqueue_transmit(struct
> > > > rpc_task
> > > > *task)
> > > > struct rpc_xprt *xprt = req->rq_xprt;
> > > >
> > > > if (xprt_request_need_enqueue_transmit(task, req)) {
> > > > + req->rq_bytes_sent = 0;
> > > > spin_lock(&xprt->queue_lock);
> > > > /*
> > > > * Requests that carry congestion control
> > > > credits are
> > > > added
> > >
> > > So I'm not convinced this covers every case. I need some
> > > time to investigate.
> >
> > It should normally cover all cases. As I said, the only remaining
> > tests
> > are in xprt.c and xprtsock.c
> >
--
Trond Myklebust
CTO, Hammerspace Inc
4300 El Camino Real, Suite 105
Los Altos, CA 94022
http://www.hammer.space

2019-01-03 16:39:29

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 1/4] SUNRPC: Ensure rq_bytes_sent is reset before request transmission



> On Jan 3, 2019, at 11:17 AM, Tom Talpey <[email protected]> wrote:
>
> On 1/3/2019 11:05 AM, Trond Myklebust wrote:
>> On Thu, 2019-01-03 at 10:29 -0500, Chuck Lever wrote:
>>> Hi Trond-
>>>
>>> I was curious about this one because yesterday I saw evidence (for
>>> other reasons) that rq_bytes_sent wasn't always zeroed when it should
>>> be.
>>>
>>>
>>>> On Jan 2, 2019, at 5:53 PM, Trond Myklebust <[email protected]>
>>>> wrote:
>>>>
>>>> When we resend a request, ensure that the 'rq_bytes_sent' is reset
>>>> to zero.
>>>>
>>>> Signed-off-by: Trond Myklebust <[email protected]>
>>>> ---
>>>> net/sunrpc/clnt.c | 1 -
>>>> net/sunrpc/xprt.c | 1 +
>>>> 2 files changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>>> index 24cbddc44c88..2189fbc4c570 100644
>>>> --- a/net/sunrpc/clnt.c
>>>> +++ b/net/sunrpc/clnt.c
>>>> @@ -1738,7 +1738,6 @@ rpc_xdr_encode(struct rpc_task *task)
>>>> xdr_buf_init(&req->rq_rcv_buf,
>>>> req->rq_rbuffer,
>>>> req->rq_rcvsize);
>>>> - req->rq_bytes_sent = 0;
>>>
>>> I agree this line is not sufficient, and it should be moved.
>>> Not every retransmission requires a re-encode. However, the
>>> patch description should explain that, and it probably needs
>>> a Fixes: tag.
>>>
>>> Can you now also remove the same line from xprt_request_init
>>> and xprt_init_bc_request ?
>>>
>>> Also, I notice that UDP does not touch rq_bytes_sent. Since
>>> RDMA also does not use rq_bytes_sent, maybe the same line
>>> can be removed from xprtrdma/transport.c and
>>> xprtrdma/backchannel.c ?
>> Sure.
>> So please note that rq_bytes_sent == 0 no longer means "this request
>> needs to be retransmitted" and we no longer test for it in
>> net/sunrpc/clnt.c. We do still have a couple of tests of rq_bytes_sent
>> in net/sunrpc/xprt.c and net/sunrpc/xprtsock.c, but those are more
>> about checking if a transmission of that request is currently in
>> progress, in which case we don't want to queue anything in front of it
>> on the transmission queue, and we don't want to abort the transmission
>> unless we also close the socket.
>
> I think rq_bytes_sent is all about managing sends atomically. On stream
> transports (which allow buffering partial segments), it would be fatal to allow intermingling. On datagram transports, it's a non-issue since
> no sends are ever partial.
>
> IOW, couldn't rq_bytes_sent simply be a boolean?

I read somewhere recently that a boolean would take up as much space
as a u32 in rpc_rqst. Not sure it saves much.

I would be interested in removing rq_bytes_sent from generic paths,
as a minor optimization. It seems to be something that stream
transports need, but the others don't.


> Tom.
>
>> The intention now is that if we know the request needs retransmission
>> (due to a transport connection loss or a timeout), then we just add it
>> to the transmission queue.
>>>> p = rpc_encode_header(task);
>>>> if (p == NULL) {
>>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>>>> index 73547d17d3c6..9075ae150ae5 100644
>>>> --- a/net/sunrpc/xprt.c
>>>> +++ b/net/sunrpc/xprt.c
>>>> @@ -1151,6 +1151,7 @@ xprt_request_enqueue_transmit(struct rpc_task
>>>> *task)
>>>> struct rpc_xprt *xprt = req->rq_xprt;
>>>>
>>>> if (xprt_request_need_enqueue_transmit(task, req)) {
>>>> + req->rq_bytes_sent = 0;
>>>> spin_lock(&xprt->queue_lock);
>>>> /*
>>>> * Requests that carry congestion control credits are
>>>> added
>>>
>>> So I'm not convinced this covers every case. I need some
>>> time to investigate.
>> It should normally cover all cases. As I said, the only remaining tests
>> are in xprt.c and xprtsock.c

--
Chuck Lever




2019-01-03 16:41:26

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 1/4] SUNRPC: Ensure rq_bytes_sent is reset before request transmission



> On Jan 3, 2019, at 11:05 AM, Trond Myklebust <[email protected]> wrote:
>
> On Thu, 2019-01-03 at 10:29 -0500, Chuck Lever wrote:
>> Hi Trond-
>>
>> I was curious about this one because yesterday I saw evidence (for
>> other reasons) that rq_bytes_sent wasn't always zeroed when it should
>> be.
>>
>>
>>> On Jan 2, 2019, at 5:53 PM, Trond Myklebust <[email protected]>
>>> wrote:
>>>
>>> When we resend a request, ensure that the 'rq_bytes_sent' is reset
>>> to zero.
>>>
>>> Signed-off-by: Trond Myklebust <[email protected]>
>>> ---
>>> net/sunrpc/clnt.c | 1 -
>>> net/sunrpc/xprt.c | 1 +
>>> 2 files changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>> index 24cbddc44c88..2189fbc4c570 100644
>>> --- a/net/sunrpc/clnt.c
>>> +++ b/net/sunrpc/clnt.c
>>> @@ -1738,7 +1738,6 @@ rpc_xdr_encode(struct rpc_task *task)
>>> xdr_buf_init(&req->rq_rcv_buf,
>>> req->rq_rbuffer,
>>> req->rq_rcvsize);
>>> - req->rq_bytes_sent = 0;
>>
>> I agree this line is not sufficient, and it should be moved.
>> Not every retransmission requires a re-encode. However, the
>> patch description should explain that, and it probably needs
>> a Fixes: tag.
>>
>> Can you now also remove the same line from xprt_request_init
>> and xprt_init_bc_request ?
>>
>> Also, I notice that UDP does not touch rq_bytes_sent. Since
>> RDMA also does not use rq_bytes_sent, maybe the same line
>> can be removed from xprtrdma/transport.c and
>> xprtrdma/backchannel.c ?
>
> Sure.
>
> So please note that rq_bytes_sent == 0 no longer means "this request
> needs to be retransmitted" and we no longer test for it in
> net/sunrpc/clnt.c. We do still have a couple of tests of rq_bytes_sent
> in net/sunrpc/xprt.c and net/sunrpc/xprtsock.c, but those are more
> about checking if a transmission of that request is currently in
> progress, in which case we don't want to queue anything in front of it
> on the transmission queue, and we don't want to abort the transmission
> unless we also close the socket.
>
> The intention now is that if we know the request needs retransmission
> (due to a transport connection loss or a timeout), then we just add it
> to the transmission queue.
>
>
>>> p = rpc_encode_header(task);
>>> if (p == NULL) {
>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>>> index 73547d17d3c6..9075ae150ae5 100644
>>> --- a/net/sunrpc/xprt.c
>>> +++ b/net/sunrpc/xprt.c
>>> @@ -1151,6 +1151,7 @@ xprt_request_enqueue_transmit(struct rpc_task
>>> *task)
>>> struct rpc_xprt *xprt = req->rq_xprt;
>>>
>>> if (xprt_request_need_enqueue_transmit(task, req)) {
>>> + req->rq_bytes_sent = 0;
>>> spin_lock(&xprt->queue_lock);
>>> /*
>>> * Requests that carry congestion control credits are
>>> added
>>
>> So I'm not convinced this covers every case. I need some
>> time to investigate.
>
> It should normally cover all cases. As I said, the only remaining tests
> are in xprt.c and xprtsock.c

In the patch I have that removes xprt::tsh_size, I'm using rq_bytes_sent
to figure out when to insert a record marker. Every once in a while, it
sticks in a record marker where it shouldn't.


--
Chuck Lever




2019-01-03 17:15:35

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 0/4] bugfixes for RPCSEC_GSS client support



> On Jan 2, 2019, at 5:53 PM, Trond Myklebust <[email protected]> wrote:
>
> These patches address a couple of issues that are mainly affecting our
> RPCSEC_GSS client support.
>
> Trond Myklebust (4):
> SUNRPC: Ensure rq_bytes_sent is reset before request transmission
> SUNRPC: Fix the RPCSEC_GSS sequence semantics after request
> re-encoding
> SUNRPC: Allow for some re-ordering of RPCSEC_GSS requests on the
> server
> SUNRPC: Ensure we respect the RPCSEC_GSS sequence number limit

I tested with only 1 and 2 applied, no change.

I tested with only 1, 2, and 3 applied, and the disconnect rate
was reduced did not go to zero, similar to my test where the
client's window was 64 and the server's was 128.


> net/sunrpc/auth_gss/auth_gss.c | 22 +++++++++++++++-------
> net/sunrpc/clnt.c | 20 ++++++++++++--------
> net/sunrpc/xprt.c | 7 +++++++
> 3 files changed, 34 insertions(+), 15 deletions(-)
>
> --
> 2.20.1
>

--
Chuck Lever




2019-01-03 18:11:59

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/4] SUNRPC: Ensure rq_bytes_sent is reset before request transmission

On Thu, 2019-01-03 at 11:41 -0500, Chuck Lever wrote:
> > On Jan 3, 2019, at 11:05 AM, Trond Myklebust <
> > [email protected]> wrote:
> >
> > On Thu, 2019-01-03 at 10:29 -0500, Chuck Lever wrote:
> > > Hi Trond-
> > >
> > > I was curious about this one because yesterday I saw evidence
> > > (for
> > > other reasons) that rq_bytes_sent wasn't always zeroed when it
> > > should
> > > be.
> > >
> > >
> > > > On Jan 2, 2019, at 5:53 PM, Trond Myklebust <[email protected]>
> > > > wrote:
> > > >
> > > > When we resend a request, ensure that the 'rq_bytes_sent' is
> > > > reset
> > > > to zero.
> > > >
> > > > Signed-off-by: Trond Myklebust <[email protected]
> > > > >
> > > > ---
> > > > net/sunrpc/clnt.c | 1 -
> > > > net/sunrpc/xprt.c | 1 +
> > > > 2 files changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > > > index 24cbddc44c88..2189fbc4c570 100644
> > > > --- a/net/sunrpc/clnt.c
> > > > +++ b/net/sunrpc/clnt.c
> > > > @@ -1738,7 +1738,6 @@ rpc_xdr_encode(struct rpc_task *task)
> > > > xdr_buf_init(&req->rq_rcv_buf,
> > > > req->rq_rbuffer,
> > > > req->rq_rcvsize);
> > > > - req->rq_bytes_sent = 0;
> > >
> > > I agree this line is not sufficient, and it should be moved.
> > > Not every retransmission requires a re-encode. However, the
> > > patch description should explain that, and it probably needs
> > > a Fixes: tag.
> > >
> > > Can you now also remove the same line from xprt_request_init
> > > and xprt_init_bc_request ?
> > >
> > > Also, I notice that UDP does not touch rq_bytes_sent. Since
> > > RDMA also does not use rq_bytes_sent, maybe the same line
> > > can be removed from xprtrdma/transport.c and
> > > xprtrdma/backchannel.c ?
> >
> > Sure.
> >
> > So please note that rq_bytes_sent == 0 no longer means "this
> > request
> > needs to be retransmitted" and we no longer test for it in
> > net/sunrpc/clnt.c. We do still have a couple of tests of
> > rq_bytes_sent
> > in net/sunrpc/xprt.c and net/sunrpc/xprtsock.c, but those are more
> > about checking if a transmission of that request is currently in
> > progress, in which case we don't want to queue anything in front of
> > it
> > on the transmission queue, and we don't want to abort the
> > transmission
> > unless we also close the socket.
> >
> > The intention now is that if we know the request needs
> > retransmission
> > (due to a transport connection loss or a timeout), then we just add
> > it
> > to the transmission queue.
> >
> >
> > > > p = rpc_encode_header(task);
> > > > if (p == NULL) {
> > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > > > index 73547d17d3c6..9075ae150ae5 100644
> > > > --- a/net/sunrpc/xprt.c
> > > > +++ b/net/sunrpc/xprt.c
> > > > @@ -1151,6 +1151,7 @@ xprt_request_enqueue_transmit(struct
> > > > rpc_task
> > > > *task)
> > > > struct rpc_xprt *xprt = req->rq_xprt;
> > > >
> > > > if (xprt_request_need_enqueue_transmit(task, req)) {
> > > > + req->rq_bytes_sent = 0;
> > > > spin_lock(&xprt->queue_lock);
> > > > /*
> > > > * Requests that carry congestion control
> > > > credits are
> > > > added
> > >
> > > So I'm not convinced this covers every case. I need some
> > > time to investigate.
> >
> > It should normally cover all cases. As I said, the only remaining
> > tests
> > are in xprt.c and xprtsock.c
>
> In the patch I have that removes xprt::tsh_size, I'm using
> rq_bytes_sent
> to figure out when to insert a record marker. Every once in a while,
> it
> sticks in a record marker where it shouldn't.

I'm not sure I understand. req->rq_bytes_sent is expected to be zero
when we call rpc_encode_header(), which is where we call
xprt_skip_transport_header().

If you want to remove tsh_size, then I would suggest replacing
xprt_skip_transport_header() with an operation in struct rpc_xprt_ops
that returns a pointer to the first word in the send buffer that is not
reserved for transport use.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]