Commit 3832591e6fa5 ("SUNRPC: Handle connection issues correctly on
the back channel") intended to make backchannel RPCs fail
immediately when there is no forward channel connection. What's
currently happening is, when the forward channel conneciton goes
away, backchannel operations are causing hard loops because
call_transmit_status's SOFTCONN logic ignores ENOTCONN.
To avoid changing the behavior of call_transmit_status in the
forward direction, make backchannel RPCs return with a different
error than ENOTCONN when they fail.
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 15 ++++++++++-----
net/sunrpc/xprtsock.c | 6 ++++--
2 files changed, 14 insertions(+), 7 deletions(-)
I'm playing with this fix. It seems to be required in order to get
Kerberos mounts to work under load with NFSv4.1 and later on RDMA.
If there are no objections, I can carry this for v5.7-rc ...
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index d510a3a15d4b..b8a72d7fbcc2 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -207,11 +207,16 @@ rpcrdma_bc_send_request(struct svcxprt_rdma *rdma, struct rpc_rqst *rqst)
drop_connection:
dprintk("svcrdma: failed to send bc call\n");
- return -ENOTCONN;
+ return -EHOSTUNREACH;
}
-/* Send an RPC call on the passive end of a transport
- * connection.
+/**
+ * xprt_rdma_bc_send_request - send an RPC backchannel Call
+ * @rqst: RPC Call in rq_snd_buf
+ *
+ * Returns:
+ * %0 if the RPC message has been sent
+ * %-EHOSTUNREACH if the Call could not be sent
*/
static int
xprt_rdma_bc_send_request(struct rpc_rqst *rqst)
@@ -225,11 +230,11 @@ xprt_rdma_bc_send_request(struct rpc_rqst *rqst)
mutex_lock(&sxprt->xpt_mutex);
- ret = -ENOTCONN;
+ ret = -EHOSTUNREACH;
rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt);
if (!test_bit(XPT_DEAD, &sxprt->xpt_flags)) {
ret = rpcrdma_bc_send_request(rdma, rqst);
- if (ret == -ENOTCONN)
+ if (ret < 0)
svc_close_xprt(sxprt);
}
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 17cb902e5153..92a358fd2ff0 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2543,7 +2543,9 @@ static int bc_sendto(struct rpc_rqst *req)
req->rq_xtime = ktime_get();
err = xprt_sock_sendmsg(transport->sock, &msg, xdr, 0, marker, &sent);
xdr_free_bvec(xdr);
- if (err < 0 || sent != (xdr->len + sizeof(marker)))
+ if (err < 0)
+ return -EHOSTUNREACH;
+ if (sent != (xdr->len + sizeof(marker)))
return -EAGAIN;
return sent;
}
@@ -2567,7 +2569,7 @@ static int bc_send_request(struct rpc_rqst *req)
*/
mutex_lock(&xprt->xpt_mutex);
if (test_bit(XPT_DEAD, &xprt->xpt_flags))
- len = -ENOTCONN;
+ len = -EHOSTUNREACH;
else
len = bc_sendto(req);
mutex_unlock(&xprt->xpt_mutex);
On Fri, 2020-04-03 at 15:42 -0400, Chuck Lever wrote:
> Commit 3832591e6fa5 ("SUNRPC: Handle connection issues correctly on
> the back channel") intended to make backchannel RPCs fail
> immediately when there is no forward channel connection. What's
> currently happening is, when the forward channel conneciton goes
> away, backchannel operations are causing hard loops because
> call_transmit_status's SOFTCONN logic ignores ENOTCONN.
Won't RPC_TASK_NOCONNECT do the right thing? It should cause the
request to exit with an ENOTCONN error when it hits call_connect().
>
> To avoid changing the behavior of call_transmit_status in the
> forward direction, make backchannel RPCs return with a different
> error than ENOTCONN when they fail.
>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 15 ++++++++++-----
> net/sunrpc/xprtsock.c | 6 ++++--
> 2 files changed, 14 insertions(+), 7 deletions(-)
>
> I'm playing with this fix. It seems to be required in order to get
> Kerberos mounts to work under load with NFSv4.1 and later on RDMA.
>
> If there are no objections, I can carry this for v5.7-rc ...
>
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> index d510a3a15d4b..b8a72d7fbcc2 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> @@ -207,11 +207,16 @@ rpcrdma_bc_send_request(struct svcxprt_rdma
> *rdma, struct rpc_rqst *rqst)
>
> drop_connection:
> dprintk("svcrdma: failed to send bc call\n");
> - return -ENOTCONN;
> + return -EHOSTUNREACH;
> }
>
> -/* Send an RPC call on the passive end of a transport
> - * connection.
> +/**
> + * xprt_rdma_bc_send_request - send an RPC backchannel Call
> + * @rqst: RPC Call in rq_snd_buf
> + *
> + * Returns:
> + * %0 if the RPC message has been sent
> + * %-EHOSTUNREACH if the Call could not be sent
> */
> static int
> xprt_rdma_bc_send_request(struct rpc_rqst *rqst)
> @@ -225,11 +230,11 @@ xprt_rdma_bc_send_request(struct rpc_rqst
> *rqst)
>
> mutex_lock(&sxprt->xpt_mutex);
>
> - ret = -ENOTCONN;
> + ret = -EHOSTUNREACH;
> rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt);
> if (!test_bit(XPT_DEAD, &sxprt->xpt_flags)) {
> ret = rpcrdma_bc_send_request(rdma, rqst);
> - if (ret == -ENOTCONN)
> + if (ret < 0)
> svc_close_xprt(sxprt);
> }
>
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 17cb902e5153..92a358fd2ff0 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -2543,7 +2543,9 @@ static int bc_sendto(struct rpc_rqst *req)
> req->rq_xtime = ktime_get();
> err = xprt_sock_sendmsg(transport->sock, &msg, xdr, 0, marker,
> &sent);
> xdr_free_bvec(xdr);
> - if (err < 0 || sent != (xdr->len + sizeof(marker)))
> + if (err < 0)
> + return -EHOSTUNREACH;
> + if (sent != (xdr->len + sizeof(marker)))
> return -EAGAIN;
> return sent;
> }
> @@ -2567,7 +2569,7 @@ static int bc_send_request(struct rpc_rqst
> *req)
> */
> mutex_lock(&xprt->xpt_mutex);
> if (test_bit(XPT_DEAD, &xprt->xpt_flags))
> - len = -ENOTCONN;
> + len = -EHOSTUNREACH;
> else
> len = bc_sendto(req);
> mutex_unlock(&xprt->xpt_mutex);
>
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
Hi Trond, thanks for the look!
> On Apr 3, 2020, at 4:00 PM, Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2020-04-03 at 15:42 -0400, Chuck Lever wrote:
>> Commit 3832591e6fa5 ("SUNRPC: Handle connection issues correctly on
>> the back channel") intended to make backchannel RPCs fail
>> immediately when there is no forward channel connection. What's
>> currently happening is, when the forward channel conneciton goes
>> away, backchannel operations are causing hard loops because
>> call_transmit_status's SOFTCONN logic ignores ENOTCONN.
>
> Won't RPC_TASK_NOCONNECT do the right thing? It should cause the
> request to exit with an ENOTCONN error when it hits call_connect().
OK, so does that mean SOFTCONN is entirely the wrong semantic here?
Was RPC_TASK_NOCONNECT available when 3832591e6fa5 was merged?
>> To avoid changing the behavior of call_transmit_status in the
>> forward direction, make backchannel RPCs return with a different
>> error than ENOTCONN when they fail.
>>
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 15 ++++++++++-----
>> net/sunrpc/xprtsock.c | 6 ++++--
>> 2 files changed, 14 insertions(+), 7 deletions(-)
>>
>> I'm playing with this fix. It seems to be required in order to get
>> Kerberos mounts to work under load with NFSv4.1 and later on RDMA.
>>
>> If there are no objections, I can carry this for v5.7-rc ...
>>
>>
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>> b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>> index d510a3a15d4b..b8a72d7fbcc2 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>> @@ -207,11 +207,16 @@ rpcrdma_bc_send_request(struct svcxprt_rdma
>> *rdma, struct rpc_rqst *rqst)
>>
>> drop_connection:
>> dprintk("svcrdma: failed to send bc call\n");
>> - return -ENOTCONN;
>> + return -EHOSTUNREACH;
>> }
>>
>> -/* Send an RPC call on the passive end of a transport
>> - * connection.
>> +/**
>> + * xprt_rdma_bc_send_request - send an RPC backchannel Call
>> + * @rqst: RPC Call in rq_snd_buf
>> + *
>> + * Returns:
>> + * %0 if the RPC message has been sent
>> + * %-EHOSTUNREACH if the Call could not be sent
>> */
>> static int
>> xprt_rdma_bc_send_request(struct rpc_rqst *rqst)
>> @@ -225,11 +230,11 @@ xprt_rdma_bc_send_request(struct rpc_rqst
>> *rqst)
>>
>> mutex_lock(&sxprt->xpt_mutex);
>>
>> - ret = -ENOTCONN;
>> + ret = -EHOSTUNREACH;
>> rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt);
>> if (!test_bit(XPT_DEAD, &sxprt->xpt_flags)) {
>> ret = rpcrdma_bc_send_request(rdma, rqst);
>> - if (ret == -ENOTCONN)
>> + if (ret < 0)
>> svc_close_xprt(sxprt);
>> }
>>
>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>> index 17cb902e5153..92a358fd2ff0 100644
>> --- a/net/sunrpc/xprtsock.c
>> +++ b/net/sunrpc/xprtsock.c
>> @@ -2543,7 +2543,9 @@ static int bc_sendto(struct rpc_rqst *req)
>> req->rq_xtime = ktime_get();
>> err = xprt_sock_sendmsg(transport->sock, &msg, xdr, 0, marker,
>> &sent);
>> xdr_free_bvec(xdr);
>> - if (err < 0 || sent != (xdr->len + sizeof(marker)))
>> + if (err < 0)
>> + return -EHOSTUNREACH;
>> + if (sent != (xdr->len + sizeof(marker)))
>> return -EAGAIN;
>> return sent;
>> }
>> @@ -2567,7 +2569,7 @@ static int bc_send_request(struct rpc_rqst
>> *req)
>> */
>> mutex_lock(&xprt->xpt_mutex);
>> if (test_bit(XPT_DEAD, &xprt->xpt_flags))
>> - len = -ENOTCONN;
>> + len = -EHOSTUNREACH;
>> else
>> len = bc_sendto(req);
>> mutex_unlock(&xprt->xpt_mutex);
>>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
--
Chuck Lever
> On Apr 3, 2020, at 4:05 PM, Chuck Lever <[email protected]> wrote:
>
> Hi Trond, thanks for the look!
>
>> On Apr 3, 2020, at 4:00 PM, Trond Myklebust <[email protected]> wrote:
>>
>> On Fri, 2020-04-03 at 15:42 -0400, Chuck Lever wrote:
>>> Commit 3832591e6fa5 ("SUNRPC: Handle connection issues correctly on
>>> the back channel") intended to make backchannel RPCs fail
>>> immediately when there is no forward channel connection. What's
>>> currently happening is, when the forward channel conneciton goes
>>> away, backchannel operations are causing hard loops because
>>> call_transmit_status's SOFTCONN logic ignores ENOTCONN.
>>
>> Won't RPC_TASK_NOCONNECT do the right thing? It should cause the
>> request to exit with an ENOTCONN error when it hits call_connect().
>
> OK, so does that mean SOFTCONN is entirely the wrong semantic here?
>
> Was RPC_TASK_NOCONNECT available when 3832591e6fa5 was merged?
It turns out 3832591e6fa5 is not related. It's 58255a4e3ce5 that added
RPC_TASK_SOFTCONN on NFSv4 callback Calls.
However, the server uses nfsd4_run_cb_work() for both NFSv4.0 and
NFSv4.1 callbacks. IMO a fix for this will have to take care that
RPC_TASK_NOCONNECT is not set on NFSv4.0 callback tasks.
Unfortunately the looping behavior appears also related to the
RPC_IS_SIGNALLED bug I reported earlier this week. The CB request is
signalled as it is firing up, and then drops into a hard loop.
nfsd-1986 [001] 123.028240: svc_drop: addr=192.168.2.51:52077 xid=0x489a88f6 flags=RQ_SECURE|RQ_USEDEFERRAL|RQ_SPLICE_OK|RQ_BUSY
kworker/u8:12-442 [003] 123.028242: rpc_task_signalled: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE status=0 action=rpc_prepare_task
kworker/u8:10-440 [000] 123.028289: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=rpc_prepare_task
kworker/u8:10-440 [000] 123.028289: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_start
kworker/u8:10-440 [000] 123.028291: rpc_request: task:47@3 nfs4_cbv1 CB_RECALL (async)
kworker/u8:10-440 [000] 123.028291: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_reserve
kworker/u8:10-440 [000] 123.028298: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_reserveresult
kworker/u8:10-440 [000] 123.028299: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_refresh
kworker/u8:10-440 [000] 123.028324: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_refreshresult
kworker/u8:10-440 [000] 123.028324: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_allocate
kworker/u8:10-440 [000] 123.028340: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_encode
kworker/u8:10-440 [000] 123.028356: xprt_enq_xmit: task:47@3 xid=0x8b95e935 seqno=0 stage=4
kworker/u8:10-440 [000] 123.028357: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV|SIGNALLED status=0 action=call_transmit
kworker/u8:10-440 [000] 123.028363: xprt_reserve_cong: task:47@3 snd_task:47 cong=0 cwnd=256
kworker/u8:10-440 [000] 123.028365: xprt_transmit: task:47@3 xid=0x8b95e935 seqno=0 status=-512
kworker/u8:10-440 [000] 123.028368: xprt_release_cong: task:47@3 snd_task:4294967295 cong=0 cwnd=256
kworker/u8:10-440 [000] 123.028368: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 action=call_transmit_status
kworker/u8:10-440 [000] 123.028371: rpc_task_sleep: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 timeout=0 queue=xprt_pending
kworker/u8:10-440 [000] 123.028376: rpc_task_wakeup: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|QUEUED|ACTIVE|NEED_RECV|SIGNALLED status=-107 timeout=9000 queue=xprt_pending
kworker/u8:10-440 [000] 123.028377: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107 action=xprt_timer
kworker/u8:10-440 [000] 123.028377: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107 action=call_status
kworker/u8:10-440 [000] 123.028378: rpc_call_status: task:47@3 status=-107
kworker/u8:10-440 [000] 123.028378: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 action=call_encode
kworker/u8:10-440 [000] 123.028380: xprt_enq_xmit: task:47@3 xid=0x8b95e935 seqno=0 stage=4
kworker/u8:10-440 [000] 123.028380: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV|SIGNALLED status=0 action=call_transmit
kworker/u8:10-440 [000] 123.028381: xprt_reserve_cong: task:47@3 snd_task:47 cong=0 cwnd=256
kworker/u8:10-440 [000] 123.028381: xprt_transmit: task:47@3 xid=0x8b95e935 seqno=0 status=-512
kworker/u8:10-440 [000] 123.028382: xprt_release_cong: task:47@3 snd_task:4294967295 cong=0 cwnd=256
kworker/u8:10-440 [000] 123.028382: rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 action=call_transmit_status
If we want to avoid the early RPC_IS_SIGNALLED check in the scheduler,
I guess an alternative would be to have call_transmit_status() check for
RPC_TASK_SIGNALLED. That's not as broad a fix, but it would address
both the NULL and the CB looping cases, IIUC.
>>> To avoid changing the behavior of call_transmit_status in the
>>> forward direction, make backchannel RPCs return with a different
>>> error than ENOTCONN when they fail.
>>>
>>> Signed-off-by: Chuck Lever <[email protected]>
>>> ---
>>> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 15 ++++++++++-----
>>> net/sunrpc/xprtsock.c | 6 ++++--
>>> 2 files changed, 14 insertions(+), 7 deletions(-)
>>>
>>> I'm playing with this fix. It seems to be required in order to get
>>> Kerberos mounts to work under load with NFSv4.1 and later on RDMA.
>>>
>>> If there are no objections, I can carry this for v5.7-rc ...
>>>
>>>
>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>> b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>> index d510a3a15d4b..b8a72d7fbcc2 100644
>>> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>> @@ -207,11 +207,16 @@ rpcrdma_bc_send_request(struct svcxprt_rdma
>>> *rdma, struct rpc_rqst *rqst)
>>>
>>> drop_connection:
>>> dprintk("svcrdma: failed to send bc call\n");
>>> - return -ENOTCONN;
>>> + return -EHOSTUNREACH;
>>> }
>>>
>>> -/* Send an RPC call on the passive end of a transport
>>> - * connection.
>>> +/**
>>> + * xprt_rdma_bc_send_request - send an RPC backchannel Call
>>> + * @rqst: RPC Call in rq_snd_buf
>>> + *
>>> + * Returns:
>>> + * %0 if the RPC message has been sent
>>> + * %-EHOSTUNREACH if the Call could not be sent
>>> */
>>> static int
>>> xprt_rdma_bc_send_request(struct rpc_rqst *rqst)
>>> @@ -225,11 +230,11 @@ xprt_rdma_bc_send_request(struct rpc_rqst
>>> *rqst)
>>>
>>> mutex_lock(&sxprt->xpt_mutex);
>>>
>>> - ret = -ENOTCONN;
>>> + ret = -EHOSTUNREACH;
>>> rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt);
>>> if (!test_bit(XPT_DEAD, &sxprt->xpt_flags)) {
>>> ret = rpcrdma_bc_send_request(rdma, rqst);
>>> - if (ret == -ENOTCONN)
>>> + if (ret < 0)
>>> svc_close_xprt(sxprt);
>>> }
>>>
>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>>> index 17cb902e5153..92a358fd2ff0 100644
>>> --- a/net/sunrpc/xprtsock.c
>>> +++ b/net/sunrpc/xprtsock.c
>>> @@ -2543,7 +2543,9 @@ static int bc_sendto(struct rpc_rqst *req)
>>> req->rq_xtime = ktime_get();
>>> err = xprt_sock_sendmsg(transport->sock, &msg, xdr, 0, marker,
>>> &sent);
>>> xdr_free_bvec(xdr);
>>> - if (err < 0 || sent != (xdr->len + sizeof(marker)))
>>> + if (err < 0)
>>> + return -EHOSTUNREACH;
>>> + if (sent != (xdr->len + sizeof(marker)))
>>> return -EAGAIN;
>>> return sent;
>>> }
>>> @@ -2567,7 +2569,7 @@ static int bc_send_request(struct rpc_rqst
>>> *req)
>>> */
>>> mutex_lock(&xprt->xpt_mutex);
>>> if (test_bit(XPT_DEAD, &xprt->xpt_flags))
>>> - len = -ENOTCONN;
>>> + len = -EHOSTUNREACH;
>>> else
>>> len = bc_sendto(req);
>>> mutex_unlock(&xprt->xpt_mutex);
>>>
>> --
>> Trond Myklebust
>> Linux NFS client maintainer, Hammerspace
>> [email protected]
>
> --
> Chuck Lever
--
Chuck Lever
On Fri, 2020-04-03 at 17:46 -0400, Chuck Lever wrote:
> > On Apr 3, 2020, at 4:05 PM, Chuck Lever <[email protected]>
> > wrote:
> >
> > Hi Trond, thanks for the look!
> >
> > > On Apr 3, 2020, at 4:00 PM, Trond Myklebust <
> > > [email protected]> wrote:
> > >
> > > On Fri, 2020-04-03 at 15:42 -0400, Chuck Lever wrote:
> > > > Commit 3832591e6fa5 ("SUNRPC: Handle connection issues
> > > > correctly on
> > > > the back channel") intended to make backchannel RPCs fail
> > > > immediately when there is no forward channel connection. What's
> > > > currently happening is, when the forward channel conneciton
> > > > goes
> > > > away, backchannel operations are causing hard loops because
> > > > call_transmit_status's SOFTCONN logic ignores ENOTCONN.
> > >
> > > Won't RPC_TASK_NOCONNECT do the right thing? It should cause the
> > > request to exit with an ENOTCONN error when it hits
> > > call_connect().
> >
> > OK, so does that mean SOFTCONN is entirely the wrong semantic here?
> >
> > Was RPC_TASK_NOCONNECT available when 3832591e6fa5 was merged?
>
> It turns out 3832591e6fa5 is not related. It's 58255a4e3ce5 that
> added
> RPC_TASK_SOFTCONN on NFSv4 callback Calls.
>
> However, the server uses nfsd4_run_cb_work() for both NFSv4.0 and
> NFSv4.1 callbacks. IMO a fix for this will have to take care that
> RPC_TASK_NOCONNECT is not set on NFSv4.0 callback tasks.
Possibly, but don't we really want to let such a NFSv4.0 request fail
and send another CB_NULL? There is already version-specific code in
nfsd4_process_cb_update() to set up the callback client.
> Unfortunately the looping behavior appears also related to the
> RPC_IS_SIGNALLED bug I reported earlier this week. The CB request is
> signalled as it is firing up, and then drops into a hard loop.
>
Hmm... Is there any reason why xprt_rdma_alloc_slot() should not be
checking if the transport is being shut down? After all, if the
transport is being destroyed here, then there is no point in allocating
a slot for that specific RDMA transport (it's not as if it can be
reused elsewhere).
> nfsd-1986 [001] 123.028240:
> svc_drop: addr=192.168.2.51:52077 xid=0x489a88f6
> flags=RQ_SECURE|RQ_USEDEFERRAL|RQ_SPLICE_OK|RQ_BUSY
> kworker/u8:12-442 [003] 123.028242:
> rpc_task_signalled: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE status=0 action=rpc_prepare_task
> kworker/u8:10-440 [000] 123.028289:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=rpc_prepare_task
> kworker/u8:10-440 [000] 123.028289:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_start
> kworker/u8:10-440 [000] 123.028291:
> rpc_request: task:47@3 nfs4_cbv1 CB_RECALL (async)
> kworker/u8:10-440 [000] 123.028291:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_reserve
> kworker/u8:10-440 [000] 123.028298:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_reserveresult
> kworker/u8:10-440 [000] 123.028299:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_refresh
> kworker/u8:10-440 [000] 123.028324:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_refreshresult
> kworker/u8:10-440 [000] 123.028324:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_allocate
> kworker/u8:10-440 [000] 123.028340:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_encode
> kworker/u8:10-440 [000] 123.028356:
> xprt_enq_xmit: task:47@3 xid=0x8b95e935 seqno=0 stage=4
> kworker/u8:10-440 [000] 123.028357:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV|SIGNALLED status=0
> action=call_transmit
> kworker/u8:10-440 [000] 123.028363:
> xprt_reserve_cong: task:47@3 snd_task:47 cong=0 cwnd=256
> kworker/u8:10-440 [000] 123.028365:
> xprt_transmit: task:47@3 xid=0x8b95e935 seqno=0 status=-512
> kworker/u8:10-440 [000] 123.028368:
> xprt_release_cong: task:47@3 snd_task:4294967295 cong=0 cwnd=256
> kworker/u8:10-440 [000] 123.028368:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0
> action=call_transmit_status
> kworker/u8:10-440 [000] 123.028371:
> rpc_task_sleep: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 timeout=0
> queue=xprt_pending
> kworker/u8:10-440 [000] 123.028376:
> rpc_task_wakeup: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|QUEUED|ACTIVE|NEED_RECV|SIGNALLED status=-107
> timeout=9000 queue=xprt_pending
> kworker/u8:10-440 [000] 123.028377:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107
> action=xprt_timer
> kworker/u8:10-440 [000] 123.028377:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107
> action=call_status
> kworker/u8:10-440 [000] 123.028378:
> rpc_call_status: task:47@3 status=-107
> kworker/u8:10-440 [000] 123.028378:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0
> action=call_encode
> kworker/u8:10-440 [000] 123.028380:
> xprt_enq_xmit: task:47@3 xid=0x8b95e935 seqno=0 stage=4
> kworker/u8:10-440 [000] 123.028380:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV|SIGNALLED status=0
> action=call_transmit
> kworker/u8:10-440 [000] 123.028381:
> xprt_reserve_cong: task:47@3 snd_task:47 cong=0 cwnd=256
> kworker/u8:10-440 [000] 123.028381:
> xprt_transmit: task:47@3 xid=0x8b95e935 seqno=0 status=-512
> kworker/u8:10-440 [000] 123.028382:
> xprt_release_cong: task:47@3 snd_task:4294967295 cong=0 cwnd=256
> kworker/u8:10-440 [000] 123.028382:
> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0
> action=call_transmit_status
>
> If we want to avoid the early RPC_IS_SIGNALLED check in the
> scheduler,
> I guess an alternative would be to have call_transmit_status() check
> for
> RPC_TASK_SIGNALLED. That's not as broad a fix, but it would address
> both the NULL and the CB looping cases, IIUC.
>
>
> > > > To avoid changing the behavior of call_transmit_status in the
> > > > forward direction, make backchannel RPCs return with a
> > > > different
> > > > error than ENOTCONN when they fail.
> > > >
> > > > Signed-off-by: Chuck Lever <[email protected]>
> > > > ---
> > > > net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 15 ++++++++++---
> > > > --
> > > > net/sunrpc/xprtsock.c | 6 ++++--
> > > > 2 files changed, 14 insertions(+), 7 deletions(-)
> > > >
> > > > I'm playing with this fix. It seems to be required in order to
> > > > get
> > > > Kerberos mounts to work under load with NFSv4.1 and later on
> > > > RDMA.
> > > >
> > > > If there are no objections, I can carry this for v5.7-rc ...
> > > >
> > > >
> > > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> > > > b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> > > > index d510a3a15d4b..b8a72d7fbcc2 100644
> > > > --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> > > > +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> > > > @@ -207,11 +207,16 @@ rpcrdma_bc_send_request(struct
> > > > svcxprt_rdma
> > > > *rdma, struct rpc_rqst *rqst)
> > > >
> > > > drop_connection:
> > > > dprintk("svcrdma: failed to send bc call\n");
> > > > - return -ENOTCONN;
> > > > + return -EHOSTUNREACH;
> > > > }
> > > >
> > > > -/* Send an RPC call on the passive end of a transport
> > > > - * connection.
> > > > +/**
> > > > + * xprt_rdma_bc_send_request - send an RPC backchannel Call
> > > > + * @rqst: RPC Call in rq_snd_buf
> > > > + *
> > > > + * Returns:
> > > > + * %0 if the RPC message has been sent
> > > > + * %-EHOSTUNREACH if the Call could not be sent
> > > > */
> > > > static int
> > > > xprt_rdma_bc_send_request(struct rpc_rqst *rqst)
> > > > @@ -225,11 +230,11 @@ xprt_rdma_bc_send_request(struct rpc_rqst
> > > > *rqst)
> > > >
> > > > mutex_lock(&sxprt->xpt_mutex);
> > > >
> > > > - ret = -ENOTCONN;
> > > > + ret = -EHOSTUNREACH;
> > > > rdma = container_of(sxprt, struct svcxprt_rdma,
> > > > sc_xprt);
> > > > if (!test_bit(XPT_DEAD, &sxprt->xpt_flags)) {
> > > > ret = rpcrdma_bc_send_request(rdma, rqst);
> > > > - if (ret == -ENOTCONN)
> > > > + if (ret < 0)
> > > > svc_close_xprt(sxprt);
> > > > }
> > > >
> > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > > index 17cb902e5153..92a358fd2ff0 100644
> > > > --- a/net/sunrpc/xprtsock.c
> > > > +++ b/net/sunrpc/xprtsock.c
> > > > @@ -2543,7 +2543,9 @@ static int bc_sendto(struct rpc_rqst
> > > > *req)
> > > > req->rq_xtime = ktime_get();
> > > > err = xprt_sock_sendmsg(transport->sock, &msg, xdr, 0,
> > > > marker,
> > > > &sent);
> > > > xdr_free_bvec(xdr);
> > > > - if (err < 0 || sent != (xdr->len + sizeof(marker)))
> > > > + if (err < 0)
> > > > + return -EHOSTUNREACH;
> > > > + if (sent != (xdr->len + sizeof(marker)))
> > > > return -EAGAIN;
> > > > return sent;
> > > > }
> > > > @@ -2567,7 +2569,7 @@ static int bc_send_request(struct
> > > > rpc_rqst
> > > > *req)
> > > > */
> > > > mutex_lock(&xprt->xpt_mutex);
> > > > if (test_bit(XPT_DEAD, &xprt->xpt_flags))
> > > > - len = -ENOTCONN;
> > > > + len = -EHOSTUNREACH;
> > > > else
> > > > len = bc_sendto(req);
> > > > mutex_unlock(&xprt->xpt_mutex);
> > > >
> > > --
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > [email protected]
> >
> > --
> > Chuck Lever
>
> --
> Chuck Lever
>
>
>
--
Trond Myklebust
CTO, Hammerspace Inc
4300 El Camino Real, Suite 105
Los Altos, CA 94022
http://www.hammer.space
> On Apr 3, 2020, at 6:43 PM, Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2020-04-03 at 17:46 -0400, Chuck Lever wrote:
>>> On Apr 3, 2020, at 4:05 PM, Chuck Lever <[email protected]>
>>> wrote:
>>>
>>> Hi Trond, thanks for the look!
>>>
>>>> On Apr 3, 2020, at 4:00 PM, Trond Myklebust <
>>>> [email protected]> wrote:
>>>>
>>>> On Fri, 2020-04-03 at 15:42 -0400, Chuck Lever wrote:
>>>>> Commit 3832591e6fa5 ("SUNRPC: Handle connection issues
>>>>> correctly on
>>>>> the back channel") intended to make backchannel RPCs fail
>>>>> immediately when there is no forward channel connection. What's
>>>>> currently happening is, when the forward channel conneciton
>>>>> goes
>>>>> away, backchannel operations are causing hard loops because
>>>>> call_transmit_status's SOFTCONN logic ignores ENOTCONN.
>>>>
>>>> Won't RPC_TASK_NOCONNECT do the right thing? It should cause the
>>>> request to exit with an ENOTCONN error when it hits
>>>> call_connect().
>>>
>>> OK, so does that mean SOFTCONN is entirely the wrong semantic here?
>>>
>>> Was RPC_TASK_NOCONNECT available when 3832591e6fa5 was merged?
>>
>> It turns out 3832591e6fa5 is not related. It's 58255a4e3ce5 that
>> added
>> RPC_TASK_SOFTCONN on NFSv4 callback Calls.
>>
>> However, the server uses nfsd4_run_cb_work() for both NFSv4.0 and
>> NFSv4.1 callbacks. IMO a fix for this will have to take care that
>> RPC_TASK_NOCONNECT is not set on NFSv4.0 callback tasks.
>
> Possibly, but don't we really want to let such a NFSv4.0 request fail
> and send another CB_NULL? There is already version-specific code in
> nfsd4_process_cb_update() to set up the callback client.
A not unreasonable conclusion. But it's hard to test the NFSv4.0 case,
since it's instability on the forward channel that is tickling this
problem. The NFSv4.0 callback connection is not affected by that.
Maybe Bruce has a thought? Otherwise we can try an unconditional
NOCONNECT for now. RPC_TASK_NOCONNECT was added three years after
58255a4e3ce5, fwiw...
>> Unfortunately the looping behavior appears also related to the
>> RPC_IS_SIGNALLED bug I reported earlier this week. The CB request is
>> signalled as it is firing up, and then drops into a hard loop.
>>
>
> Hmm... Is there any reason why xprt_rdma_alloc_slot() should not be
> checking if the transport is being shut down? After all, if the
> transport is being destroyed here, then there is no point in allocating
> a slot for that specific RDMA transport (it's not as if it can be
> reused elsewhere).
The trace for the NULL rqst I posted a couple days ago shows the
signal occurring after xprt_enq_xmit, which is well past the
call_reserve step.
I don't see any special SOFTCONN related logic in call_reserveresult,
I suspect call_reserve is not currently set up to handle this case.
>> nfsd-1986 [001] 123.028240:
>> svc_drop: addr=192.168.2.51:52077 xid=0x489a88f6
>> flags=RQ_SECURE|RQ_USEDEFERRAL|RQ_SPLICE_OK|RQ_BUSY
>> kworker/u8:12-442 [003] 123.028242:
>> rpc_task_signalled: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE status=0 action=rpc_prepare_task
>> kworker/u8:10-440 [000] 123.028289:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=rpc_prepare_task
>> kworker/u8:10-440 [000] 123.028289:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_start
>> kworker/u8:10-440 [000] 123.028291:
>> rpc_request: task:47@3 nfs4_cbv1 CB_RECALL (async)
>> kworker/u8:10-440 [000] 123.028291:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_reserve
>> kworker/u8:10-440 [000] 123.028298:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_reserveresult
>> kworker/u8:10-440 [000] 123.028299:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_refresh
>> kworker/u8:10-440 [000] 123.028324:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_refreshresult
>> kworker/u8:10-440 [000] 123.028324:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_allocate
>> kworker/u8:10-440 [000] 123.028340:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_encode
>> kworker/u8:10-440 [000] 123.028356:
>> xprt_enq_xmit: task:47@3 xid=0x8b95e935 seqno=0 stage=4
>> kworker/u8:10-440 [000] 123.028357:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV|SIGNALLED status=0
>> action=call_transmit
>> kworker/u8:10-440 [000] 123.028363:
>> xprt_reserve_cong: task:47@3 snd_task:47 cong=0 cwnd=256
>> kworker/u8:10-440 [000] 123.028365:
>> xprt_transmit: task:47@3 xid=0x8b95e935 seqno=0 status=-512
>> kworker/u8:10-440 [000] 123.028368:
>> xprt_release_cong: task:47@3 snd_task:4294967295 cong=0 cwnd=256
>> kworker/u8:10-440 [000] 123.028368:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0
>> action=call_transmit_status
>> kworker/u8:10-440 [000] 123.028371:
>> rpc_task_sleep: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 timeout=0
>> queue=xprt_pending
>> kworker/u8:10-440 [000] 123.028376:
>> rpc_task_wakeup: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|QUEUED|ACTIVE|NEED_RECV|SIGNALLED status=-107
>> timeout=9000 queue=xprt_pending
>> kworker/u8:10-440 [000] 123.028377:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107
>> action=xprt_timer
>> kworker/u8:10-440 [000] 123.028377:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107
>> action=call_status
>> kworker/u8:10-440 [000] 123.028378:
>> rpc_call_status: task:47@3 status=-107
>> kworker/u8:10-440 [000] 123.028378:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0
>> action=call_encode
>> kworker/u8:10-440 [000] 123.028380:
>> xprt_enq_xmit: task:47@3 xid=0x8b95e935 seqno=0 stage=4
>> kworker/u8:10-440 [000] 123.028380:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV|SIGNALLED status=0
>> action=call_transmit
>> kworker/u8:10-440 [000] 123.028381:
>> xprt_reserve_cong: task:47@3 snd_task:47 cong=0 cwnd=256
>> kworker/u8:10-440 [000] 123.028381:
>> xprt_transmit: task:47@3 xid=0x8b95e935 seqno=0 status=-512
>> kworker/u8:10-440 [000] 123.028382:
>> xprt_release_cong: task:47@3 snd_task:4294967295 cong=0 cwnd=256
>> kworker/u8:10-440 [000] 123.028382:
>> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
>> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0
>> action=call_transmit_status
>>
>> If we want to avoid the early RPC_IS_SIGNALLED check in the
>> scheduler,
>> I guess an alternative would be to have call_transmit_status() check
>> for
>> RPC_TASK_SIGNALLED. That's not as broad a fix, but it would address
>> both the NULL and the CB looping cases, IIUC.
>>
>>
>>>>> To avoid changing the behavior of call_transmit_status in the
>>>>> forward direction, make backchannel RPCs return with a
>>>>> different
>>>>> error than ENOTCONN when they fail.
>>>>>
>>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>>> ---
>>>>> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 15 ++++++++++---
>>>>> --
>>>>> net/sunrpc/xprtsock.c | 6 ++++--
>>>>> 2 files changed, 14 insertions(+), 7 deletions(-)
>>>>>
>>>>> I'm playing with this fix. It seems to be required in order to
>>>>> get
>>>>> Kerberos mounts to work under load with NFSv4.1 and later on
>>>>> RDMA.
>>>>>
>>>>> If there are no objections, I can carry this for v5.7-rc ...
>>>>>
>>>>>
>>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>>>> b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>>>> index d510a3a15d4b..b8a72d7fbcc2 100644
>>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>>>> @@ -207,11 +207,16 @@ rpcrdma_bc_send_request(struct
>>>>> svcxprt_rdma
>>>>> *rdma, struct rpc_rqst *rqst)
>>>>>
>>>>> drop_connection:
>>>>> dprintk("svcrdma: failed to send bc call\n");
>>>>> - return -ENOTCONN;
>>>>> + return -EHOSTUNREACH;
>>>>> }
>>>>>
>>>>> -/* Send an RPC call on the passive end of a transport
>>>>> - * connection.
>>>>> +/**
>>>>> + * xprt_rdma_bc_send_request - send an RPC backchannel Call
>>>>> + * @rqst: RPC Call in rq_snd_buf
>>>>> + *
>>>>> + * Returns:
>>>>> + * %0 if the RPC message has been sent
>>>>> + * %-EHOSTUNREACH if the Call could not be sent
>>>>> */
>>>>> static int
>>>>> xprt_rdma_bc_send_request(struct rpc_rqst *rqst)
>>>>> @@ -225,11 +230,11 @@ xprt_rdma_bc_send_request(struct rpc_rqst
>>>>> *rqst)
>>>>>
>>>>> mutex_lock(&sxprt->xpt_mutex);
>>>>>
>>>>> - ret = -ENOTCONN;
>>>>> + ret = -EHOSTUNREACH;
>>>>> rdma = container_of(sxprt, struct svcxprt_rdma,
>>>>> sc_xprt);
>>>>> if (!test_bit(XPT_DEAD, &sxprt->xpt_flags)) {
>>>>> ret = rpcrdma_bc_send_request(rdma, rqst);
>>>>> - if (ret == -ENOTCONN)
>>>>> + if (ret < 0)
>>>>> svc_close_xprt(sxprt);
>>>>> }
>>>>>
>>>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>>>>> index 17cb902e5153..92a358fd2ff0 100644
>>>>> --- a/net/sunrpc/xprtsock.c
>>>>> +++ b/net/sunrpc/xprtsock.c
>>>>> @@ -2543,7 +2543,9 @@ static int bc_sendto(struct rpc_rqst
>>>>> *req)
>>>>> req->rq_xtime = ktime_get();
>>>>> err = xprt_sock_sendmsg(transport->sock, &msg, xdr, 0,
>>>>> marker,
>>>>> &sent);
>>>>> xdr_free_bvec(xdr);
>>>>> - if (err < 0 || sent != (xdr->len + sizeof(marker)))
>>>>> + if (err < 0)
>>>>> + return -EHOSTUNREACH;
>>>>> + if (sent != (xdr->len + sizeof(marker)))
>>>>> return -EAGAIN;
>>>>> return sent;
>>>>> }
>>>>> @@ -2567,7 +2569,7 @@ static int bc_send_request(struct
>>>>> rpc_rqst
>>>>> *req)
>>>>> */
>>>>> mutex_lock(&xprt->xpt_mutex);
>>>>> if (test_bit(XPT_DEAD, &xprt->xpt_flags))
>>>>> - len = -ENOTCONN;
>>>>> + len = -EHOSTUNREACH;
>>>>> else
>>>>> len = bc_sendto(req);
>>>>> mutex_unlock(&xprt->xpt_mutex);
>>>>>
>>>> --
>>>> Trond Myklebust
>>>> Linux NFS client maintainer, Hammerspace
>>>> [email protected]
>>>
>>> --
>>> Chuck Lever
>>
>> --
>> Chuck Lever
>>
>>
>>
> --
> Trond Myklebust
> CTO, Hammerspace Inc
> 4300 El Camino Real, Suite 105
> Los Altos, CA 94022
> http://www.hammer.space
--
Chuck Lever
On Fri, Apr 03, 2020 at 07:11:12PM -0400, Chuck Lever wrote:
>
>
> > On Apr 3, 2020, at 6:43 PM, Trond Myklebust <[email protected]> wrote:
> >
> > On Fri, 2020-04-03 at 17:46 -0400, Chuck Lever wrote:
> >>> On Apr 3, 2020, at 4:05 PM, Chuck Lever <[email protected]>
> >>> wrote:
> >>>
> >>> Hi Trond, thanks for the look!
> >>>
> >>>> On Apr 3, 2020, at 4:00 PM, Trond Myklebust <
> >>>> [email protected]> wrote:
> >>>>
> >>>> On Fri, 2020-04-03 at 15:42 -0400, Chuck Lever wrote:
> >>>>> Commit 3832591e6fa5 ("SUNRPC: Handle connection issues
> >>>>> correctly on
> >>>>> the back channel") intended to make backchannel RPCs fail
> >>>>> immediately when there is no forward channel connection. What's
> >>>>> currently happening is, when the forward channel conneciton
> >>>>> goes
> >>>>> away, backchannel operations are causing hard loops because
> >>>>> call_transmit_status's SOFTCONN logic ignores ENOTCONN.
> >>>>
> >>>> Won't RPC_TASK_NOCONNECT do the right thing? It should cause the
> >>>> request to exit with an ENOTCONN error when it hits
> >>>> call_connect().
> >>>
> >>> OK, so does that mean SOFTCONN is entirely the wrong semantic here?
> >>>
> >>> Was RPC_TASK_NOCONNECT available when 3832591e6fa5 was merged?
> >>
> >> It turns out 3832591e6fa5 is not related. It's 58255a4e3ce5 that
> >> added
> >> RPC_TASK_SOFTCONN on NFSv4 callback Calls.
> >>
> >> However, the server uses nfsd4_run_cb_work() for both NFSv4.0 and
> >> NFSv4.1 callbacks. IMO a fix for this will have to take care that
> >> RPC_TASK_NOCONNECT is not set on NFSv4.0 callback tasks.
> >
> > Possibly, but don't we really want to let such a NFSv4.0 request fail
> > and send another CB_NULL? There is already version-specific code in
> > nfsd4_process_cb_update() to set up the callback client.
>
> A not unreasonable conclusion. But it's hard to test the NFSv4.0 case,
> since it's instability on the forward channel that is tickling this
> problem. The NFSv4.0 callback connection is not affected by that.
>
> Maybe Bruce has a thought? Otherwise we can try an unconditional
> NOCONNECT for now. RPC_TASK_NOCONNECT was added three years after
> 58255a4e3ce5, fwiw...
I'm not really following the details here, but it makes sense to me that
we'd want nfsd to find about lost connections as soon as possible, so it
can try to use another connection or inform the client (with session
flags or NFS4ERR_CB_PATH_DOWN) promptly. So NOCONNECT makes sense to
me.
--b.
>
>
> >> Unfortunately the looping behavior appears also related to the
> >> RPC_IS_SIGNALLED bug I reported earlier this week. The CB request is
> >> signalled as it is firing up, and then drops into a hard loop.
> >>
> >
> > Hmm... Is there any reason why xprt_rdma_alloc_slot() should not be
> > checking if the transport is being shut down? After all, if the
> > transport is being destroyed here, then there is no point in allocating
> > a slot for that specific RDMA transport (it's not as if it can be
> > reused elsewhere).
>
> The trace for the NULL rqst I posted a couple days ago shows the
> signal occurring after xprt_enq_xmit, which is well past the
> call_reserve step.
>
> I don't see any special SOFTCONN related logic in call_reserveresult,
> I suspect call_reserve is not currently set up to handle this case.
>
>
> >> nfsd-1986 [001] 123.028240:
> >> svc_drop: addr=192.168.2.51:52077 xid=0x489a88f6
> >> flags=RQ_SECURE|RQ_USEDEFERRAL|RQ_SPLICE_OK|RQ_BUSY
> >> kworker/u8:12-442 [003] 123.028242:
> >> rpc_task_signalled: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE status=0 action=rpc_prepare_task
> >> kworker/u8:10-440 [000] 123.028289:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=rpc_prepare_task
> >> kworker/u8:10-440 [000] 123.028289:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_start
> >> kworker/u8:10-440 [000] 123.028291:
> >> rpc_request: task:47@3 nfs4_cbv1 CB_RECALL (async)
> >> kworker/u8:10-440 [000] 123.028291:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_reserve
> >> kworker/u8:10-440 [000] 123.028298:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_reserveresult
> >> kworker/u8:10-440 [000] 123.028299:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_refresh
> >> kworker/u8:10-440 [000] 123.028324:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_refreshresult
> >> kworker/u8:10-440 [000] 123.028324:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_allocate
> >> kworker/u8:10-440 [000] 123.028340:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|SIGNALLED status=0 action=call_encode
> >> kworker/u8:10-440 [000] 123.028356:
> >> xprt_enq_xmit: task:47@3 xid=0x8b95e935 seqno=0 stage=4
> >> kworker/u8:10-440 [000] 123.028357:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV|SIGNALLED status=0
> >> action=call_transmit
> >> kworker/u8:10-440 [000] 123.028363:
> >> xprt_reserve_cong: task:47@3 snd_task:47 cong=0 cwnd=256
> >> kworker/u8:10-440 [000] 123.028365:
> >> xprt_transmit: task:47@3 xid=0x8b95e935 seqno=0 status=-512
> >> kworker/u8:10-440 [000] 123.028368:
> >> xprt_release_cong: task:47@3 snd_task:4294967295 cong=0 cwnd=256
> >> kworker/u8:10-440 [000] 123.028368:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0
> >> action=call_transmit_status
> >> kworker/u8:10-440 [000] 123.028371:
> >> rpc_task_sleep: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 timeout=0
> >> queue=xprt_pending
> >> kworker/u8:10-440 [000] 123.028376:
> >> rpc_task_wakeup: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|QUEUED|ACTIVE|NEED_RECV|SIGNALLED status=-107
> >> timeout=9000 queue=xprt_pending
> >> kworker/u8:10-440 [000] 123.028377:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107
> >> action=xprt_timer
> >> kworker/u8:10-440 [000] 123.028377:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107
> >> action=call_status
> >> kworker/u8:10-440 [000] 123.028378:
> >> rpc_call_status: task:47@3 status=-107
> >> kworker/u8:10-440 [000] 123.028378:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0
> >> action=call_encode
> >> kworker/u8:10-440 [000] 123.028380:
> >> xprt_enq_xmit: task:47@3 xid=0x8b95e935 seqno=0 stage=4
> >> kworker/u8:10-440 [000] 123.028380:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV|SIGNALLED status=0
> >> action=call_transmit
> >> kworker/u8:10-440 [000] 123.028381:
> >> xprt_reserve_cong: task:47@3 snd_task:47 cong=0 cwnd=256
> >> kworker/u8:10-440 [000] 123.028381:
> >> xprt_transmit: task:47@3 xid=0x8b95e935 seqno=0 status=-512
> >> kworker/u8:10-440 [000] 123.028382:
> >> xprt_release_cong: task:47@3 snd_task:4294967295 cong=0 cwnd=256
> >> kworker/u8:10-440 [000] 123.028382:
> >> rpc_task_run_action: task:47@3 flags=ASYNC|DYNAMIC|SOFT|NOCONNECT
> >> runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0
> >> action=call_transmit_status
> >>
> >> If we want to avoid the early RPC_IS_SIGNALLED check in the
> >> scheduler,
> >> I guess an alternative would be to have call_transmit_status() check
> >> for
> >> RPC_TASK_SIGNALLED. That's not as broad a fix, but it would address
> >> both the NULL and the CB looping cases, IIUC.
> >>
> >>
> >>>>> To avoid changing the behavior of call_transmit_status in the
> >>>>> forward direction, make backchannel RPCs return with a
> >>>>> different
> >>>>> error than ENOTCONN when they fail.
> >>>>>
> >>>>> Signed-off-by: Chuck Lever <[email protected]>
> >>>>> ---
> >>>>> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 15 ++++++++++---
> >>>>> --
> >>>>> net/sunrpc/xprtsock.c | 6 ++++--
> >>>>> 2 files changed, 14 insertions(+), 7 deletions(-)
> >>>>>
> >>>>> I'm playing with this fix. It seems to be required in order to
> >>>>> get
> >>>>> Kerberos mounts to work under load with NFSv4.1 and later on
> >>>>> RDMA.
> >>>>>
> >>>>> If there are no objections, I can carry this for v5.7-rc ...
> >>>>>
> >>>>>
> >>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> >>>>> b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> >>>>> index d510a3a15d4b..b8a72d7fbcc2 100644
> >>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> >>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> >>>>> @@ -207,11 +207,16 @@ rpcrdma_bc_send_request(struct
> >>>>> svcxprt_rdma
> >>>>> *rdma, struct rpc_rqst *rqst)
> >>>>>
> >>>>> drop_connection:
> >>>>> dprintk("svcrdma: failed to send bc call\n");
> >>>>> - return -ENOTCONN;
> >>>>> + return -EHOSTUNREACH;
> >>>>> }
> >>>>>
> >>>>> -/* Send an RPC call on the passive end of a transport
> >>>>> - * connection.
> >>>>> +/**
> >>>>> + * xprt_rdma_bc_send_request - send an RPC backchannel Call
> >>>>> + * @rqst: RPC Call in rq_snd_buf
> >>>>> + *
> >>>>> + * Returns:
> >>>>> + * %0 if the RPC message has been sent
> >>>>> + * %-EHOSTUNREACH if the Call could not be sent
> >>>>> */
> >>>>> static int
> >>>>> xprt_rdma_bc_send_request(struct rpc_rqst *rqst)
> >>>>> @@ -225,11 +230,11 @@ xprt_rdma_bc_send_request(struct rpc_rqst
> >>>>> *rqst)
> >>>>>
> >>>>> mutex_lock(&sxprt->xpt_mutex);
> >>>>>
> >>>>> - ret = -ENOTCONN;
> >>>>> + ret = -EHOSTUNREACH;
> >>>>> rdma = container_of(sxprt, struct svcxprt_rdma,
> >>>>> sc_xprt);
> >>>>> if (!test_bit(XPT_DEAD, &sxprt->xpt_flags)) {
> >>>>> ret = rpcrdma_bc_send_request(rdma, rqst);
> >>>>> - if (ret == -ENOTCONN)
> >>>>> + if (ret < 0)
> >>>>> svc_close_xprt(sxprt);
> >>>>> }
> >>>>>
> >>>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> >>>>> index 17cb902e5153..92a358fd2ff0 100644
> >>>>> --- a/net/sunrpc/xprtsock.c
> >>>>> +++ b/net/sunrpc/xprtsock.c
> >>>>> @@ -2543,7 +2543,9 @@ static int bc_sendto(struct rpc_rqst
> >>>>> *req)
> >>>>> req->rq_xtime = ktime_get();
> >>>>> err = xprt_sock_sendmsg(transport->sock, &msg, xdr, 0,
> >>>>> marker,
> >>>>> &sent);
> >>>>> xdr_free_bvec(xdr);
> >>>>> - if (err < 0 || sent != (xdr->len + sizeof(marker)))
> >>>>> + if (err < 0)
> >>>>> + return -EHOSTUNREACH;
> >>>>> + if (sent != (xdr->len + sizeof(marker)))
> >>>>> return -EAGAIN;
> >>>>> return sent;
> >>>>> }
> >>>>> @@ -2567,7 +2569,7 @@ static int bc_send_request(struct
> >>>>> rpc_rqst
> >>>>> *req)
> >>>>> */
> >>>>> mutex_lock(&xprt->xpt_mutex);
> >>>>> if (test_bit(XPT_DEAD, &xprt->xpt_flags))
> >>>>> - len = -ENOTCONN;
> >>>>> + len = -EHOSTUNREACH;
> >>>>> else
> >>>>> len = bc_sendto(req);
> >>>>> mutex_unlock(&xprt->xpt_mutex);
> >>>>>
> >>>> --
> >>>> Trond Myklebust
> >>>> Linux NFS client maintainer, Hammerspace
> >>>> [email protected]
> >>>
> >>> --
> >>> Chuck Lever
> >>
> >> --
> >> Chuck Lever
> >>
> >>
> >>
> > --
> > Trond Myklebust
> > CTO, Hammerspace Inc
> > 4300 El Camino Real, Suite 105
> > Los Altos, CA 94022
> > http://www.hammer.space
>
> --
> Chuck Lever
>
>
> On Apr 3, 2020, at 7:11 PM, Chuck Lever <[email protected]> wrote:
>
>
>
>> On Apr 3, 2020, at 6:43 PM, Trond Myklebust <[email protected]> wrote:
>>
>> On Fri, 2020-04-03 at 17:46 -0400, Chuck Lever wrote:
>>>> On Apr 3, 2020, at 4:05 PM, Chuck Lever <[email protected]>
>>>> wrote:
>>>>
>>>> Hi Trond, thanks for the look!
>>>>
>>>>> On Apr 3, 2020, at 4:00 PM, Trond Myklebust <
>>>>> [email protected]> wrote:
>>>>>
>>>>> On Fri, 2020-04-03 at 15:42 -0400, Chuck Lever wrote:
>>>>>> Commit 3832591e6fa5 ("SUNRPC: Handle connection issues
>>>>>> correctly on
>>>>>> the back channel") intended to make backchannel RPCs fail
>>>>>> immediately when there is no forward channel connection. What's
>>>>>> currently happening is, when the forward channel conneciton
>>>>>> goes
>>>>>> away, backchannel operations are causing hard loops because
>>>>>> call_transmit_status's SOFTCONN logic ignores ENOTCONN.
>>>>>
>>>>> Won't RPC_TASK_NOCONNECT do the right thing? It should cause the
>>>>> request to exit with an ENOTCONN error when it hits
>>>>> call_connect().
>>>>
>>>> OK, so does that mean SOFTCONN is entirely the wrong semantic here?
>>>>
>>>> Was RPC_TASK_NOCONNECT available when 3832591e6fa5 was merged?
>>>
>>> It turns out 3832591e6fa5 is not related. It's 58255a4e3ce5 that
>>> added
>>> RPC_TASK_SOFTCONN on NFSv4 callback Calls.
>>>
>>> However, the server uses nfsd4_run_cb_work() for both NFSv4.0 and
>>> NFSv4.1 callbacks. IMO a fix for this will have to take care that
>>> RPC_TASK_NOCONNECT is not set on NFSv4.0 callback tasks.
>>
>> Possibly, but don't we really want to let such a NFSv4.0 request fail
>> and send another CB_NULL? There is already version-specific code in
>> nfsd4_process_cb_update() to set up the callback client.
>
> A not unreasonable conclusion. But it's hard to test the NFSv4.0 case,
> since it's instability on the forward channel that is tickling this
> problem. The NFSv4.0 callback connection is not affected by that.
>
> Maybe Bruce has a thought? Otherwise we can try an unconditional
> NOCONNECT for now. RPC_TASK_NOCONNECT was added three years after
> 58255a4e3ce5, fwiw...
I confirmed that NFSv4.0 callback does not tolerate using the
RPC_TASK_NOCONNECT flag in nfsd4_run_cb_work(). After replacing
SOFTCONN with NOCONNECT, the NFSv4.0 mount operates without using
delegation.
setup_callback_client creates the callback rpc_clnt with NOPING. The
current callback mechanism depends on the next RPC Call to initiate
connection establishment.
Setting NOCONNECT by itself is still not enough to prevent a soft
lockup, btw. The rpc_xprt for the backchannel is still marked
connected, so the NOCONNECT check in call_connect is skipped entirely
on subsequent retransmits. My fix now includes some new code to
ensure that the backchannel rpc_xprt is marked closed by
svc_delete_xprt.
I'll post the updated patch soon.
--
Chuck Lever