2021-07-14 15:50:54

by Chuck Lever

[permalink] [raw]
Subject: [PATCH RFC 0/4] Ensure RPC_TASK_NORTO is disabled for select operations

This is a set of patches I've been toying with to get better
responsiveness from a client when a transport remains connected but
the server is not returning RPC replies.

The approach I've taken is to disable RPC_TASK_NO_RETRANS_TIMEOUT
for a few particular operations to enable them to time out even
though the connection is still operational. It could be
appropriate to take this approach for any idempotent operation
that cannot be killed with a ^C.

The test is simply to mount the addled server with NFSv4.1, sec=sys,
and TCP. Both sides happen to have keytabs, so Kerberos is used to
protect lease management operations.

mount.nfs-1181 [004] 65.367559: rpc_request: task:1@1 nfsv4 NULL (sync)
mount.nfs-1181 [004] 65.369735: rpc_task_end: task:1@1 flags=NULLCREDS|DYNAMIC|SOFT|SOFTCONN|SENT|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
mount.nfs-1181 [004] 65.369830: rpc_request: task:2@1 nfsv4 EXCHANGE_ID (sync)
mount.nfs-1181 [004] 65.616381: rpc_task_end: task:2@1 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|TIMEOUT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
192.168.2.55-ma-1196 [009] 65.616467: rpc_request: task:3@1 nfsv4 EXCHANGE_ID (sync)
192.168.2.55-ma-1196 [009] 65.616706: rpc_task_end: task:3@1 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|TIMEOUT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
192.168.2.55-ma-1196 [009] 65.616722: rpc_request: task:4@1 nfsv4 CREATE_SESSION (sync)
192.168.2.55-ma-1196 [009] 65.616963: rpc_task_end: task:4@1 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|TIMEOUT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
192.168.2.55-ma-1196 [009] 65.616995: rpc_request: task:5@1 nfsv4 RECLAIM_COMPLETE (sync)
192.168.2.55-ma-1196 [009] 65.618621: rpc_task_end: task:5@1 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
mount.nfs-1181 [004] 65.618642: rpc_request: task:6@2 nfsv4 LOOKUP_ROOT (sync)
mount.nfs-1181 [004] 65.619940: rpc_task_end: task:6@2 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
mount.nfs-1181 [004] 65.619951: rpc_request: task:7@2 nfsv4 SERVER_CAPS (sync)
mount.nfs-1181 [004] 65.620041: rpc_task_end: task:7@2 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
mount.nfs-1181 [004] 65.620049: rpc_request: task:8@2 nfsv4 FSINFO (sync)

Server stops replying.

kworker/u26:0-47 [009] 70.944662: rpc_request: task:9@1 nfsv4 SEQUENCE (async)

Client wonders what's up.

mount.nfs-1181 [004] 76.367642: rpc_task_end: task:8@2 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-512 action=rpc_exit_task

User ^C's the mount.nfs command.

kworker/u25:10-213 [000] 251.165539: rpc_task_end: task:9@1 flags=ASYNC|MOVEABLE|DYNAMIC|SOFT|SENT|TIMEOUT runstate=RUNNING|ACTIVE|NEED_RECV status=-110 action=rpc_exit_task
kworker/u25:10-213 [000] 251.165553: rpc_request: task:10@1 nfsv4 DESTROY_SESSION (sync)

Client attempts to tear down the nfs_client since there are no more
mounts of that server and the lease renewal query has finally timed
out.

kworker/u25:10-213 [001] 431.386315: rpc_task_end: task:10@1 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|TIMEOUT|CRED_NOREF runstate=RUNNING|ACTIVE|NEED_RECV status=-110 action=rpc_exit_task
kworker/u25:10-213 [001] 431.386339: rpc_request: task:11@1 nfsv4 DESTROY_CLIENTID (sync)
kworker/u25:10-213 [001] 611.607170: rpc_task_end: task:11@1 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|TIMEOUT|CRED_NOREF runstate=RUNNING|ACTIVE|NEED_RECV status=-110 action=rpc_exit_task
kworker/u25:1-1205 [000] 611.607253: rpc_request: task:12@1 nfsv4 NULL (async)

Client sends GSS_DESTROY_CTX for the lease management GSS context.

kworker/u25:1-1205 [000] 791.827916: rpc_task_end: task:12@1 flags=ASYNC|NULLCREDS|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE|NEED_RECV status=-5 action=rpc_exit_task
kworker/0:2-215 [000] 791.828351: xprt_destroy: peer=[192.168.2.55]:2049 state=LOCKED|CONNECTED|BOUND

It's not until task:12@1 exits that the client can finally complete
the destruction of the RPC transport and the nfs_client. It takes
about 12 minutes from the ^C before the client's state is finally
completely cleaned up.

NB: For NFS/RDMA mounts, during that time the client can't complete
a system shutdown because the rpc_xprt holds hardware resources that
block driver removal.

---

Chuck Lever (4):
NFS: Unset RPC_TASK_NO_RETRANS_TIMEOUT for async lease renewal
NFS: Unset RPC_TASK_NO_RETRANS_TIMEOUT for session/clientid destruction
SUNRPC: Refactor rpc_ping()
SUNRPC: Unset RPC_TASK_NO_RETRANS_TIMEOUT for NULL RPCs


fs/nfs/nfs4client.c | 1 +
fs/nfs/nfs4proc.c | 9 +++++++++
net/sunrpc/clnt.c | 33 ++++++++++++++++++++++++---------
3 files changed, 34 insertions(+), 9 deletions(-)

--
Chuck Lever


2021-07-14 15:50:55

by Chuck Lever

[permalink] [raw]
Subject: [PATCH RFC 4/4] SUNRPC: Unset RPC_TASK_NO_RETRANS_TIMEOUT for NULL RPCs

In some rare failure modes, the server is actually reading the
transport, but then just dropping the requests on the floor.
TCP_USER_TIMEOUT cannot detect that case.

Prevent such a stuck server from pinning client resources
indefinitely by ensuring that certain idempotent requests
(such as NULL) can time out even if the connection is still
operational.

Otherwise rpc_bind_new_program(), gss_destroy_cred(), or
rpc_clnt_test_and_add_xprt() can wait forever.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/clnt.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index ca2000d8cf64..d34737a8a68a 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2694,6 +2694,18 @@ static const struct rpc_procinfo rpcproc_null = {
.p_decode = rpcproc_decode_null,
};

+static void
+rpc_null_call_prepare(struct rpc_task *task, void *data)
+{
+ task->tk_flags &= ~RPC_TASK_NO_RETRANS_TIMEOUT;
+ rpc_call_start(task);
+}
+
+static const struct rpc_call_ops rpc_null_ops = {
+ .rpc_call_prepare = rpc_null_call_prepare,
+ .rpc_call_done = rpc_default_callback,
+};
+
static
struct rpc_task *rpc_call_null_helper(struct rpc_clnt *clnt,
struct rpc_xprt *xprt, struct rpc_cred *cred, int flags,
@@ -2707,7 +2719,7 @@ struct rpc_task *rpc_call_null_helper(struct rpc_clnt *clnt,
.rpc_xprt = xprt,
.rpc_message = &msg,
.rpc_op_cred = cred,
- .callback_ops = (ops != NULL) ? ops : &rpc_default_ops,
+ .callback_ops = ops ?: &rpc_null_ops,
.callback_data = data,
.flags = flags | RPC_TASK_SOFT | RPC_TASK_SOFTCONN |
RPC_TASK_NULLCREDS,
@@ -2758,6 +2770,7 @@ static void rpc_cb_add_xprt_release(void *calldata)
}

static const struct rpc_call_ops rpc_cb_add_xprt_call_ops = {
+ .rpc_call_prepare = rpc_null_call_prepare,
.rpc_call_done = rpc_cb_add_xprt_done,
.rpc_release = rpc_cb_add_xprt_release,
};


2021-07-14 15:51:00

by Chuck Lever

[permalink] [raw]
Subject: [PATCH RFC 2/4] NFS: Unset RPC_TASK_NO_RETRANS_TIMEOUT for session/clientid destruction

In some rare failure modes, the server is actually reading the
transport, but then just dropping the requests on the floor.
TCP_USER_TIMEOUT cannot detect that case.

Prevent such a stuck server from pinning client resources
indefinitely by ensuring that session and client ID clean-up can
time out even if the connection is still operational.

Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfs/nfs4client.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 28431acd1230..c5032f784ac0 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -281,6 +281,7 @@ static void nfs4_destroy_callback(struct nfs_client *clp)

static void nfs4_shutdown_client(struct nfs_client *clp)
{
+ clp->cl_rpcclient->cl_noretranstimeo = 0;
if (__test_and_clear_bit(NFS_CS_RENEWD, &clp->cl_res_state))
nfs4_kill_renewd(clp);
clp->cl_mvops->shutdown_client(clp);


2021-07-14 15:51:01

by Chuck Lever

[permalink] [raw]
Subject: [PATCH RFC 3/4] SUNRPC: Refactor rpc_ping()

Make it use the rpc_null_call_helper() so that it can share the
new rpc_call_ops structure to be introduced in the next patch.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/clnt.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 8b4de70e8ead..ca2000d8cf64 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2694,17 +2694,6 @@ static const struct rpc_procinfo rpcproc_null = {
.p_decode = rpcproc_decode_null,
};

-static int rpc_ping(struct rpc_clnt *clnt)
-{
- struct rpc_message msg = {
- .rpc_proc = &rpcproc_null,
- };
- int err;
- err = rpc_call_sync(clnt, &msg, RPC_TASK_SOFT | RPC_TASK_SOFTCONN |
- RPC_TASK_NULLCREDS);
- return err;
-}
-
static
struct rpc_task *rpc_call_null_helper(struct rpc_clnt *clnt,
struct rpc_xprt *xprt, struct rpc_cred *cred, int flags,
@@ -2733,6 +2722,19 @@ struct rpc_task *rpc_call_null(struct rpc_clnt *clnt, struct rpc_cred *cred, int
}
EXPORT_SYMBOL_GPL(rpc_call_null);

+static int rpc_ping(struct rpc_clnt *clnt)
+{
+ struct rpc_task *task;
+ int status;
+
+ task = rpc_call_null_helper(clnt, NULL, NULL, 0, NULL, NULL);
+ if (IS_ERR(task))
+ return PTR_ERR(task);
+ status = task->tk_status;
+ rpc_put_task(task);
+ return status;
+}
+
struct rpc_cb_add_xprt_calldata {
struct rpc_xprt_switch *xps;
struct rpc_xprt *xprt;


2021-07-14 15:53:20

by Chuck Lever

[permalink] [raw]
Subject: [PATCH RFC 1/4] NFS: Unset RPC_TASK_NO_RETRANS_TIMEOUT for async lease renewal

In some rare failure modes, the server is actually reading the
transport, but then just dropping the requests on the floor.
TCP_USER_TIMEOUT cannot detect that case.

Prevent such a stuck server from pinning client resources
indefinitely by ensuring that async lease renewal requests can time
out even if the connection is still operational.

Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfs/nfs4proc.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index e1214bb6b7ee..346217f6a00b 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5612,6 +5612,12 @@ struct nfs4_renewdata {
* nfs4_proc_async_renew(): This is not one of the nfs_rpc_ops; it is a special
* standalone procedure for queueing an asynchronous RENEW.
*/
+static void nfs4_renew_prepare(struct rpc_task *task, void *calldata)
+{
+ task->tk_flags &= ~RPC_TASK_NO_RETRANS_TIMEOUT;
+ rpc_call_start(task);
+}
+
static void nfs4_renew_release(void *calldata)
{
struct nfs4_renewdata *data = calldata;
@@ -5650,6 +5656,7 @@ static void nfs4_renew_done(struct rpc_task *task, void *calldata)
}

static const struct rpc_call_ops nfs4_renew_ops = {
+ .rpc_call_prepare = nfs4_renew_prepare,
.rpc_call_done = nfs4_renew_done,
.rpc_release = nfs4_renew_release,
};
@@ -9219,6 +9226,8 @@ static void nfs41_sequence_prepare(struct rpc_task *task, void *data)
struct nfs4_sequence_args *args;
struct nfs4_sequence_res *res;

+ task->tk_flags &= ~RPC_TASK_NO_RETRANS_TIMEOUT;
+
args = task->tk_msg.rpc_argp;
res = task->tk_msg.rpc_resp;



2021-07-14 15:59:37

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH RFC 2/4] NFS: Unset RPC_TASK_NO_RETRANS_TIMEOUT for session/clientid destruction

On Wed, 2021-07-14 at 11:50 -0400, Chuck Lever wrote:
> In some rare failure modes, the server is actually reading the
> transport, but then just dropping the requests on the floor.
> TCP_USER_TIMEOUT cannot detect that case.
>
> Prevent such a stuck server from pinning client resources
> indefinitely by ensuring that session and client ID clean-up can
> time out even if the connection is still operational.
>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
>  fs/nfs/nfs4client.c |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index 28431acd1230..c5032f784ac0 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -281,6 +281,7 @@ static void nfs4_destroy_callback(struct
> nfs_client *clp)
>  
>  static void nfs4_shutdown_client(struct nfs_client *clp)
>  {
> +       clp->cl_rpcclient->cl_noretranstimeo = 0;
>         if (__test_and_clear_bit(NFS_CS_RENEWD, &clp->cl_res_state))
>                 nfs4_kill_renewd(clp);
>         clp->cl_mvops->shutdown_client(clp);
>
>

I can't see how this will help. Again, I suggest we rather turn off the
retransmission default for the RPC calls where the server can drop
stuff on the floor. That's really only the RPCSEC_GSS control
messages. 

Anything else is covered by the NFSv4 blanket ban on dropping RPC
calls.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-07-14 16:01:13

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH RFC 1/4] NFS: Unset RPC_TASK_NO_RETRANS_TIMEOUT for async lease renewal

On Wed, 2021-07-14 at 11:50 -0400, Chuck Lever wrote:
> In some rare failure modes, the server is actually reading the
> transport, but then just dropping the requests on the floor.
> TCP_USER_TIMEOUT cannot detect that case.
>
> Prevent such a stuck server from pinning client resources
> indefinitely by ensuring that async lease renewal requests can time
> out even if the connection is still operational.
>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
>  fs/nfs/nfs4proc.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index e1214bb6b7ee..346217f6a00b 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5612,6 +5612,12 @@ struct nfs4_renewdata {
>   * nfs4_proc_async_renew(): This is not one of the nfs_rpc_ops; it
> is a special
>   * standalone procedure for queueing an asynchronous RENEW.
>   */
> +static void nfs4_renew_prepare(struct rpc_task *task, void
> *calldata)
> +{
> +       task->tk_flags &= ~RPC_TASK_NO_RETRANS_TIMEOUT;
> +       rpc_call_start(task);
> +}
> +
>  static void nfs4_renew_release(void *calldata)
>  {
>         struct nfs4_renewdata *data = calldata;
> @@ -5650,6 +5656,7 @@ static void nfs4_renew_done(struct rpc_task
> *task, void *calldata)
>  }
>  
>  static const struct rpc_call_ops nfs4_renew_ops = {
> +       .rpc_call_prepare = nfs4_renew_prepare,
>         .rpc_call_done = nfs4_renew_done,
>         .rpc_release = nfs4_renew_release,
>  };
> @@ -9219,6 +9226,8 @@ static void nfs41_sequence_prepare(struct
> rpc_task *task, void *data)
>         struct nfs4_sequence_args *args;
>         struct nfs4_sequence_res *res;
>  
> +       task->tk_flags &= ~RPC_TASK_NO_RETRANS_TIMEOUT;
> +
>         args = task->tk_msg.rpc_argp;
>         res = task->tk_msg.rpc_resp;
>  
>
>

This isn't necessary. The server isn't allowed to drop these calls on
the floor.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-07-14 17:56:24

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH RFC 1/4] NFS: Unset RPC_TASK_NO_RETRANS_TIMEOUT for async lease renewal



> On Jul 14, 2021, at 12:00 PM, Trond Myklebust <[email protected]> wrote:
>
> On Wed, 2021-07-14 at 11:50 -0400, Chuck Lever wrote:
>> In some rare failure modes, the server is actually reading the
>> transport, but then just dropping the requests on the floor.
>> TCP_USER_TIMEOUT cannot detect that case.
>>
>> Prevent such a stuck server from pinning client resources
>> indefinitely by ensuring that async lease renewal requests can time
>> out even if the connection is still operational.
>>
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>> fs/nfs/nfs4proc.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index e1214bb6b7ee..346217f6a00b 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -5612,6 +5612,12 @@ struct nfs4_renewdata {
>> * nfs4_proc_async_renew(): This is not one of the nfs_rpc_ops; it
>> is a special
>> * standalone procedure for queueing an asynchronous RENEW.
>> */
>> +static void nfs4_renew_prepare(struct rpc_task *task, void
>> *calldata)
>> +{
>> + task->tk_flags &= ~RPC_TASK_NO_RETRANS_TIMEOUT;
>> + rpc_call_start(task);
>> +}
>> +
>> static void nfs4_renew_release(void *calldata)
>> {
>> struct nfs4_renewdata *data = calldata;
>> @@ -5650,6 +5656,7 @@ static void nfs4_renew_done(struct rpc_task
>> *task, void *calldata)
>> }
>>
>> static const struct rpc_call_ops nfs4_renew_ops = {
>> + .rpc_call_prepare = nfs4_renew_prepare,
>> .rpc_call_done = nfs4_renew_done,
>> .rpc_release = nfs4_renew_release,
>> };
>> @@ -9219,6 +9226,8 @@ static void nfs41_sequence_prepare(struct
>> rpc_task *task, void *data)
>> struct nfs4_sequence_args *args;
>> struct nfs4_sequence_res *res;
>>
>> + task->tk_flags &= ~RPC_TASK_NO_RETRANS_TIMEOUT;
>> +
>> args = task->tk_msg.rpc_argp;
>> res = task->tk_msg.rpc_resp;
>
> This isn't necessary. The server isn't allowed to drop these calls on
> the floor.

Again, a server bug, a misconfiguration, a dependence on an
inoperative network service (like DNS), a weird crash, or even
a malicious server can pin client resources. This is plainly a
denial of service. Clients cannot depend on "the server is not
allowed to" if they are to be considered secure.

I'm not suggesting that the Linux client should make a heroic
effort to operate normally when a server behaves like this. I
_am_ suggesting that the client should protect itself and its
users by not pinning its own resources when a server is
behaving bizarrely.

I can drop 1/4 & 2/4 for the moment, but I don't agree that
3/4 & 4/4 alone are adequate to resolve the denial of service.

----

Now with regard to the spec requirement, I believe it might
be under-specified. It's at least problematic.

This is from RFC 8881 Section 2.9.2, and is referring in
particular to non-NULL NFSv4 operations:

> A replier MUST NOT silently drop a request, even if the request is a retry. (The silent drop behavior of RPCSEC_GSS [4] does not apply because this behavior happens at the RPCSEC_GSS layer, a lower layer in the request processing.) Instead, the replier SHOULD return an appropriate error (see Section 2.10.6.1), or it MAY disconnect the connection.


It states that the server cannot drop a request, but the text
does /not/ literally mandate that the only signal for a lost
request is connection loss -- that part is only a MAY.

Further the use of SHOULD suggests that a session error is the
preferred mechanism for signaling such a loss. The use of MAY
indicates that connection termination is permissible but not
preferred.

Moreover, the text is not careful to state that these two options
are exhaustive. It leaves open the possibility for other server
responses in this situation. (I recognize that might not have
been the intent of the authors, but that's certainly how it
reads a decade later).

Let's not even get into the carve-out for RPCSEC_GSS silent drops.

Thus I don't think we can lean on the current spec to ensure that
a server will always signal the loss of a reply by disconnecting,
especially in cases where there is no session. Further down in
ss2.9.2, we have this one-sentence paragraph:

> In addition, as described in Section 2.10.6.2, while a session is active, the NFSv4.1 requester MUST NOT stop waiting for a reply.


So what about when there is no active session? For example, what
about EXCHANGE_ID, CREATE_SESSION, BIND_CONN_TO_SESSION, and
DESTROY_CLIENTID ? I would argue that a high quality client
implementation will not wait indefinitely for the completion of
these operations.


--
Chuck Lever



2021-07-14 18:29:34

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH RFC 1/4] NFS: Unset RPC_TASK_NO_RETRANS_TIMEOUT for async lease renewal

On Wed, 2021-07-14 at 17:53 +0000, Chuck Lever III wrote:
>
>
> > On Jul 14, 2021, at 12:00 PM, Trond Myklebust
> > <[email protected]> wrote:
> >
> > On Wed, 2021-07-14 at 11:50 -0400, Chuck Lever wrote:
> > > In some rare failure modes, the server is actually reading the
> > > transport, but then just dropping the requests on the floor.
> > > TCP_USER_TIMEOUT cannot detect that case.
> > >
> > > Prevent such a stuck server from pinning client resources
> > > indefinitely by ensuring that async lease renewal requests can
> > > time
> > > out even if the connection is still operational.
> > >
> > > Signed-off-by: Chuck Lever <[email protected]>
> > > ---
> > >  fs/nfs/nfs4proc.c |    9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > index e1214bb6b7ee..346217f6a00b 100644
> > > --- a/fs/nfs/nfs4proc.c
> > > +++ b/fs/nfs/nfs4proc.c
> > > @@ -5612,6 +5612,12 @@ struct nfs4_renewdata {
> > >   * nfs4_proc_async_renew(): This is not one of the nfs_rpc_ops;
> > > it
> > > is a special
> > >   * standalone procedure for queueing an asynchronous RENEW.
> > >   */
> > > +static void nfs4_renew_prepare(struct rpc_task *task, void
> > > *calldata)
> > > +{
> > > +       task->tk_flags &= ~RPC_TASK_NO_RETRANS_TIMEOUT;
> > > +       rpc_call_start(task);
> > > +}
> > > +
> > >  static void nfs4_renew_release(void *calldata)
> > >  {
> > >         struct nfs4_renewdata *data = calldata;
> > > @@ -5650,6 +5656,7 @@ static void nfs4_renew_done(struct rpc_task
> > > *task, void *calldata)
> > >  }
> > >  
> > >  static const struct rpc_call_ops nfs4_renew_ops = {
> > > +       .rpc_call_prepare = nfs4_renew_prepare,
> > >         .rpc_call_done = nfs4_renew_done,
> > >         .rpc_release = nfs4_renew_release,
> > >  };
> > > @@ -9219,6 +9226,8 @@ static void nfs41_sequence_prepare(struct
> > > rpc_task *task, void *data)
> > >         struct nfs4_sequence_args *args;
> > >         struct nfs4_sequence_res *res;
> > >  
> > > +       task->tk_flags &= ~RPC_TASK_NO_RETRANS_TIMEOUT;
> > > +
> > >         args = task->tk_msg.rpc_argp;
> > >         res = task->tk_msg.rpc_resp;
> >
> > This isn't necessary. The server isn't allowed to drop these calls
> > on
> > the floor.
>
> Again, a server bug, a misconfiguration, a dependence on an
> inoperative network service (like DNS), a weird crash, or even
> a malicious server can pin client resources. This is plainly a
> denial of service. Clients cannot depend on "the server is not
> allowed to" if they are to be considered secure.
>
> I'm not suggesting that the Linux client should make a heroic
> effort to operate normally when a server behaves like this. I
> _am_ suggesting that the client should protect itself and its
> users by not pinning its own resources when a server is
> behaving bizarrely.
>
> I can drop 1/4 & 2/4 for the moment, but I don't agree that
> 3/4 & 4/4 alone are adequate to resolve the denial of service.
>
> ----
>
> Now with regard to the spec requirement, I believe it might
> be under-specified. It's at least problematic.
>
> This is from RFC 8881 Section 2.9.2, and is referring in
> particular to non-NULL NFSv4 operations:
>
> > A replier MUST NOT silently drop a request, even if the request is
> > a retry. (The silent drop behavior of RPCSEC_GSS [4] does not apply
> > because this behavior happens at the RPCSEC_GSS layer, a lower
> > layer in the request processing.) Instead, the replier SHOULD
> > return an appropriate error (see Section 2.10.6.1), or it MAY
> > disconnect the connection.
>
>
> It states that the server cannot drop a request, but the text
> does /not/ literally mandate that the only signal for a lost
> request is connection loss -- that part is only a MAY.

That's irrelevant. The normative behaviour is that requests MUST NOT be
dropped. Since no options are listed other than replying or dropping
the connection, I'm not seeing how else the server can meet the
requirement. The SHOULD and MAY are just there to emphasise which is
the preferred behaviour.

>
> Further the use of SHOULD suggests that a session error is the
> preferred mechanism for signaling such a loss. The use of MAY
> indicates that connection termination is permissible but not
> preferred.
>
> Moreover, the text is not careful to state that these two options
> are exhaustive. It leaves open the possibility for other server
> responses in this situation. (I recognize that might not have
> been the intent of the authors, but that's certainly how it
> reads a decade later).
>
> Let's not even get into the carve-out for RPCSEC_GSS silent drops.
>
> Thus I don't think we can lean on the current spec to ensure that
> a server will always signal the loss of a reply by disconnecting,
> especially in cases where there is no session. Further down in
> ss2.9.2, we have this one-sentence paragraph:
>
> > In addition, as described in Section 2.10.6.2, while a session is
> > active, the NFSv4.1 requester MUST NOT stop waiting for a reply.
>
>
> So what about when there is no active session? For example, what
> about EXCHANGE_ID, CREATE_SESSION, BIND_CONN_TO_SESSION, and
> DESTROY_CLIENTID ? I would argue that a high quality client
> implementation will not wait indefinitely for the completion of
> these operations.
>

Sorry, but no! We're NOT working under the assumption that servers are
broken. We fix what is necessary in order to make the client spec-
adherent. That is all.

If the server is up, and is reading the socket, then we assume it is a
NFSv4 server, and we assume that it is not broken. If it starts getting
picky about which operations is will or will not reply to, then it is
by definition broken because it is in violation of the normative
behaviour for non-NULL NFSv4 operations.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]