2020-04-01 19:39:25

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH RFC] sunrpc: Ensure signalled RPC tasks exit

If an RPC task is signaled while it is running and the transport is
not connected, it will never sleep and never be terminated. This can
happen when a RPC transport is shut down: the remaining tasks are
signalled, but the transport is disconnected.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/sched.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)

Interested in comments and suggestions.

Nearly every time my NFS/RDMA client unmounts when using krb5, the
umount hangs (killably). I tracked it down to an NFSv3 NULL request
that is signalled but loops and does not exit.


diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 55e900255b0c..905c31f75593 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -905,6 +905,12 @@ static void __rpc_execute(struct rpc_task *task)
trace_rpc_task_run_action(task, do_action);
do_action(task);

+ if (RPC_SIGNALLED(task)) {
+ task->tk_rpc_status = -ERESTARTSYS;
+ rpc_exit(task, -ERESTARTSYS);
+ break;
+ }
+
/*
* Lockless check for whether task is sleeping or not.
*/
@@ -912,14 +918,6 @@ static void __rpc_execute(struct rpc_task *task)
continue;

/*
- * Signalled tasks should exit rather than sleep.
- */
- if (RPC_SIGNALLED(task)) {
- task->tk_rpc_status = -ERESTARTSYS;
- rpc_exit(task, -ERESTARTSYS);
- }
-
- /*
* The queue->lock protects against races with
* rpc_make_runnable().
*


2020-04-01 21:56:36

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH RFC] sunrpc: Ensure signalled RPC tasks exit

On Wed, 2020-04-01 at 15:37 -0400, Chuck Lever wrote:
> If an RPC task is signaled while it is running and the transport is
> not connected, it will never sleep and never be terminated. This can
> happen when a RPC transport is shut down: the remaining tasks are
> signalled, but the transport is disconnected.
>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> net/sunrpc/sched.c | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
> Interested in comments and suggestions.
>
> Nearly every time my NFS/RDMA client unmounts when using krb5, the
> umount hangs (killably). I tracked it down to an NFSv3 NULL request
> that is signalled but loops and does not exit.
>
>
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 55e900255b0c..905c31f75593 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -905,6 +905,12 @@ static void __rpc_execute(struct rpc_task *task)
> trace_rpc_task_run_action(task, do_action);
> do_action(task);
>
> + if (RPC_SIGNALLED(task)) {
> + task->tk_rpc_status = -ERESTARTSYS;
> + rpc_exit(task, -ERESTARTSYS);
> + break;
> + }
> +

Hmm... I'd really prefer to avoid this kind of check in the tight loop.
Why is this NULL request looping?

> /*
> * Lockless check for whether task is sleeping or not.
> */
> @@ -912,14 +918,6 @@ static void __rpc_execute(struct rpc_task *task)
> continue;
>
> /*
> - * Signalled tasks should exit rather than sleep.
> - */
> - if (RPC_SIGNALLED(task)) {
> - task->tk_rpc_status = -ERESTARTSYS;
> - rpc_exit(task, -ERESTARTSYS);
> - }
> -
> - /*
> * The queue->lock protects against races with
> * rpc_make_runnable().
> *
>
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-04-02 15:07:40

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH RFC] sunrpc: Ensure signalled RPC tasks exit



> On Apr 1, 2020, at 5:55 PM, Trond Myklebust <[email protected]> wrote:
>
> On Wed, 2020-04-01 at 15:37 -0400, Chuck Lever wrote:
>> If an RPC task is signaled while it is running and the transport is
>> not connected, it will never sleep and never be terminated. This can
>> happen when a RPC transport is shut down: the remaining tasks are
>> signalled, but the transport is disconnected.
>>
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>> net/sunrpc/sched.c | 14 ++++++--------
>> 1 file changed, 6 insertions(+), 8 deletions(-)
>>
>> Interested in comments and suggestions.
>>
>> Nearly every time my NFS/RDMA client unmounts when using krb5, the
>> umount hangs (killably). I tracked it down to an NFSv3 NULL request
>> that is signalled but loops and does not exit.
>>
>>
>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> index 55e900255b0c..905c31f75593 100644
>> --- a/net/sunrpc/sched.c
>> +++ b/net/sunrpc/sched.c
>> @@ -905,6 +905,12 @@ static void __rpc_execute(struct rpc_task *task)
>> trace_rpc_task_run_action(task, do_action);
>> do_action(task);
>>
>> + if (RPC_SIGNALLED(task)) {
>> + task->tk_rpc_status = -ERESTARTSYS;
>> + rpc_exit(task, -ERESTARTSYS);
>> + break;
>> + }
>> +
>
> Hmm... I'd really prefer to avoid this kind of check in the tight loop.

Certainly performance is a concern, but compared to the indirect
function call directly preceding this check, it is small potatoes.
Adding an "unlikely()" to the RPC_SIGNALLED() macro might help
mitigate the execution pipeline bubble it would cause.


> Why is this NULL request looping?

Here's an ftrace log I captured a couple of days ago.

umount.nfs-7540 [004] 938.823064: rpc_task_begin: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=ACTIVE status=0 action=(nil)s
kworker/u25:0-6092 [000] 938.823073: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_start
kworker/u25:0-6092 [000] 938.823075: rpc_request: task:18647@1 nfsv3 NULL (async)
kworker/u25:0-6092 [000] 938.823076: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_reserve
kworker/u25:0-6092 [000] 938.823078: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_reserveresult
kworker/u25:0-6092 [000] 938.823079: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_refresh
kworker/u25:0-6092 [000] 938.823080: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_refreshresult
kworker/u25:0-6092 [000] 938.823081: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_allocate
kworker/u25:0-6092 [000] 938.823084: xprtrdma_op_allocate: task:18647@1 req=0xffff8884691ca000 (1624, 60)
kworker/u25:0-6092 [000] 938.823085: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_encode
kworker/u25:0-6092 [000] 938.823088: rpcgss_seqno: task:18647@1 xid=0xfd7e5fb8 seqno=21981
kworker/u25:0-6092 [000] 938.823117: xprt_enq_xmit: task:18647@1 xid=0xfd7e5fb8 seqno=21981 stage=4
kworker/u25:0-6092 [000] 938.823118: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 action=call_transmit
umount.nfs-7540 [004] 938.823119: rpc_task_signalled: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 action=call_transmit_status
kworker/u25:0-6092 [000] 938.823119: xprt_reserve_cong: task:18647@1 snd_task:18647 cong=0 cwnd=16384
kworker/u25:0-6092 [000] 938.823121: xprt_transmit: task:18647@1 xid=0xfd7e5fb8 seqno=21981 status=-512
kworker/u25:0-6092 [000] 938.823127: xprt_release_cong: task:18647@1 snd_task:4294967295 cong=0 cwnd=16384
kworker/u25:0-6092 [000] 938.823127: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 action=call_transmit_status
kworker/u25:0-6092 [000] 938.823129: rpc_task_sleep: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 timeout=0 queue=xprt_pending
kworker/u25:0-6092 [000] 938.823132: rpc_task_wakeup: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|QUEUED|ACTIVE|NEED_RECV|SIGNALLED status=-107 timeout=60000 queue=xprt_pending
kworker/u25:0-6092 [000] 938.823133: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107 action=xprt_timer
kworker/u25:0-6092 [000] 938.823133: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107 action=call_status
kworker/u25:0-6092 [000] 938.823136: rpc_call_status: task:18647@1 status=-107
kworker/u25:0-6092 [000] 938.823137: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 action=call_encode
kworker/u25:0-6092 [000] 938.823138: rpcgss_seqno: task:18647@1 xid=0xfd7e5fb8 seqno=21982
kworker/u25:0-6092 [000] 938.823146: xprt_enq_xmit: task:18647@1 xid=0xfd7e5fb8 seqno=21982 stage=4
kworker/u25:0-6092 [000] 938.823147: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV|SIGNALLED status=0 action=call_transmit
kworker/u25:0-6092 [000] 938.823147: xprt_reserve_cong: task:18647@1 snd_task:18647 cong=0 cwnd=16384
kworker/u25:0-6092 [000] 938.823148: xprt_transmit: task:18647@1 xid=0xfd7e5fb8 seqno=21982 status=-512
kworker/u25:0-6092 [000] 938.823149: xprt_release_cong: task:18647@1 snd_task:4294967295 cong=0 cwnd=16384
kworker/u25:0-6092 [000] 938.823149: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 action=call_transmit_status
kworker/u25:0-6092 [000] 938.823150: rpc_task_sleep: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 timeout=0 queue=xprt_pending
kworker/u25:0-6092 [000] 938.823151: rpc_task_wakeup: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|QUEUED|ACTIVE|NEED_RECV|SIGNALLED status=-107 timeout=120000 queue=xprt_pending
kworker/u25:0-6092 [000] 938.823152: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107 action=xprt_timer
kworker/u25:0-6092 [000] 938.823152: rpc_task_run_action: task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107 action=call_status
kworker/u25:0-6092 [000] 938.823153: rpc_call_status: task:18647@1 status=-107

And here, the looping continues.

My analysis is that the signal occurs while the task is running/active
in a section of code that continues to iterate on reconnecting without
checking for a signal. Note that the tail of xprt_transmit will not set
tk_status in this case, so it remains set to zero upon entry to
call_transmit_status.

I like the scheduler fix because I don't believe we can easily tell if
there might be other such critical sections in the client's state
machine where a loop might occur with no exit condition based on a
signal.

This fix closes the door on all such bugs.


--
Chuck Lever