2024-04-25 10:49:48

by Dan Aloni

[permalink] [raw]
Subject: [PATCH] sunrpc: fix NFSACL RPC retry on soft mount

It used to be quite awhile ago since 1b63a75180c6 ('SUNRPC: Refactor
rpc_clone_client()'), in 2012, that `cl_timeout` was copied in so that
all mount parameters propagate to NFSACL clients. However since that
change, if mount options as follows are given:

soft,timeo=50,retrans=16,vers=3

The resultant NFSACL client receives:

cl_softrtry: 1
cl_timeout: to_initval=60000, to_maxval=60000, to_increment=0, to_retries=2, to_exponential=0

These values lead to NFSACL operations not being retried under the
condition of transient network outages with soft mount. Instead, getacl
call fails after 60 seconds with EIO.

The simple fix is to pass the existing client's `cl_timeout` as the new
client timeout.

Cc: Chuck Lever <[email protected]>
Cc: Benjamin Coddington <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/T/
Fixes: 1b63a75180c6 ('SUNRPC: Refactor rpc_clone_client()')
Signed-off-by: Dan Aloni <[email protected]>
---
net/sunrpc/clnt.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index cda0935a68c9..07ffd4ee695a 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1068,6 +1068,7 @@ struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *old,
.version = vers,
.authflavor = old->cl_auth->au_flavor,
.cred = old->cl_cred,
+ .timeout = old->cl_timeout,
};
struct rpc_clnt *clnt;
int err;
--
2.39.3



2024-04-25 11:05:12

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: fix NFSACL RPC retry on soft mount

On 25 Apr 2024, at 6:49, Dan Aloni wrote:

> It used to be quite awhile ago since 1b63a75180c6 ('SUNRPC: Refactor
> rpc_clone_client()'), in 2012, that `cl_timeout` was copied in so that
> all mount parameters propagate to NFSACL clients. However since that
> change, if mount options as follows are given:
>
> soft,timeo=50,retrans=16,vers=3
>
> The resultant NFSACL client receives:
>
> cl_softrtry: 1
> cl_timeout: to_initval=60000, to_maxval=60000, to_increment=0, to_retries=2, to_exponential=0
>
> These values lead to NFSACL operations not being retried under the
> condition of transient network outages with soft mount. Instead, getacl
> call fails after 60 seconds with EIO.
>
> The simple fix is to pass the existing client's `cl_timeout` as the new
> client timeout.
>
> Cc: Chuck Lever <[email protected]>
> Cc: Benjamin Coddington <[email protected]>
> Link: https://lore.kernel.org/all/[email protected]/T/
> Fixes: 1b63a75180c6 ('SUNRPC: Refactor rpc_clone_client()')
> Signed-off-by: Dan Aloni <[email protected]>

This also affects the local rpcbind, and makes the change in
6b996476f364 sunrpc: honor rpc_task's timeout value in rpcb_create()
redundant. Just an observation, thanks for fixing this!

Reviewed-by: Benjamin Coddington <[email protected]>

Ben

> ---
> net/sunrpc/clnt.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index cda0935a68c9..07ffd4ee695a 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1068,6 +1068,7 @@ struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *old,
> .version = vers,
> .authflavor = old->cl_auth->au_flavor,
> .cred = old->cl_cred,
> + .timeout = old->cl_timeout,
> };
> struct rpc_clnt *clnt;
> int err;
> --
> 2.39.3