LinuxLists.cc - [PATCH v2] nfs: don't retry detect_trunking with RPC_AUTH

2013-11-13 14:08:30

Subject: [PATCH v2] nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once

Currently, when we try to mount and get back NFS4ERR_CLID_IN_USE or
NFS4ERR_WRONGSEC, we create a new rpc_clnt and then try the call again.
There is no guarantee that doing so will work however, so we can end up
retrying the call in an infinite loop.

Worse yet, we create the new client using rpc_clone_client_set_auth,
which creates the new client as a child of the old one. Thus, we can end
up with a *very* long lineage of rpc_clnts. When we go to put all of the
references to them, we can end up with a long call chain that can smash
the stack as each rpc_free_client() call can recurse back into itself.

This patch fixes this by simply ensuring that the SETCLIENTID call will
only be retried in this situation if the last attempt did not use
RPC_AUTH_UNIX.

Note too that with this change, we don't need the (i > 2) check in the
-EACCES case since we now have a more reliable test as to whether we
should reattempt.

Cc: [email protected] # v3.10+
Cc: Chuck Lever <[email protected]>
Tested-by/Acked-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/nfs4state.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index c8e729d..6f04706 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2093,10 +2093,15 @@ again:
nfs4_root_machine_cred(clp);
goto again;
}
- if (i > 2)
+ if (clnt->cl_auth->au_flavor == RPC_AUTH_UNIX)
break;
case -NFS4ERR_CLID_INUSE:
case -NFS4ERR_WRONGSEC:
+ /* No point in retrying if we already used RPC_AUTH_UNIX */
+ if (clnt->cl_auth->au_flavor == RPC_AUTH_UNIX) {
+ status = -EPERM;
+ break;
+ }
clnt = rpc_clone_client_set_auth(clnt, RPC_AUTH_UNIX);
if (IS_ERR(clnt)) {
status = PTR_ERR(clnt);
--
1.8.3.1

2013-11-13 20:48:05

by Weston Andros Adamson

[permalink] [raw]

Subject: Re: [PATCH v2] nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once

I tested v2 and everything looks good.

Also, it seems like I?m no longer able to reproduce the "NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO? dmesg.

-dros

On Nov 13, 2013, at 9:08 AM, Jeff Layton <[email protected]> wrote:

> Currently, when we try to mount and get back NFS4ERR_CLID_IN_USE or
> NFS4ERR_WRONGSEC, we create a new rpc_clnt and then try the call again.
> There is no guarantee that doing so will work however, so we can end up
> retrying the call in an infinite loop.
>
> Worse yet, we create the new client using rpc_clone_client_set_auth,
> which creates the new client as a child of the old one. Thus, we can end
> up with a *very* long lineage of rpc_clnts. When we go to put all of the
> references to them, we can end up with a long call chain that can smash
> the stack as each rpc_free_client() call can recurse back into itself.
>
> This patch fixes this by simply ensuring that the SETCLIENTID call will
> only be retried in this situation if the last attempt did not use
> RPC_AUTH_UNIX.
>
> Note too that with this change, we don't need the (i > 2) check in the
> -EACCES case since we now have a more reliable test as to whether we
> should reattempt.
>
> Cc: [email protected] # v3.10+
> Cc: Chuck Lever <[email protected]>
> Tested-by/Acked-by: Weston Andros Adamson <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfs/nfs4state.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index c8e729d..6f04706 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2093,10 +2093,15 @@ again:
> nfs4_root_machine_cred(clp);
> goto again;
> }
> - if (i > 2)
> + if (clnt->cl_auth->au_flavor == RPC_AUTH_UNIX)
> break;
> case -NFS4ERR_CLID_INUSE:
> case -NFS4ERR_WRONGSEC:
> + /* No point in retrying if we already used RPC_AUTH_UNIX */
> + if (clnt->cl_auth->au_flavor == RPC_AUTH_UNIX) {
> + status = -EPERM;
> + break;
> + }
> clnt = rpc_clone_client_set_auth(clnt, RPC_AUTH_UNIX);
> if (IS_ERR(clnt)) {
> status = PTR_ERR(clnt);
> --
> 1.8.3.1
>

2013-11-13 21:12:12

by Jeff Layton

[permalink] [raw]

Subject: Re: [PATCH v2] nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once

On Wed, 13 Nov 2013 20:48:04 +0000
Weston Andros Adamson <[email protected]> wrote:

> I tested v2 and everything looks good.
>

Thanks for testing it.

> Also, it seems like I?m no longer able to reproduce the "NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO? dmesg.
>
> -dros
>

Interesting. I suspect that's due to the fact that we give up retrying
quickly enough now that you can't signal it fast enough there. It's
still probably reasonable to add in handling for -ERESTARTSYS in that
switch statement however. I imagine that it's still possible to race in
with a signal at the right time and trigger that message.

> On Nov 13, 2013, at 9:08 AM, Jeff Layton <[email protected]> wrote:
>
> > Currently, when we try to mount and get back NFS4ERR_CLID_IN_USE or
> > NFS4ERR_WRONGSEC, we create a new rpc_clnt and then try the call again.
> > There is no guarantee that doing so will work however, so we can end up
> > retrying the call in an infinite loop.
> >
> > Worse yet, we create the new client using rpc_clone_client_set_auth,
> > which creates the new client as a child of the old one. Thus, we can end
> > up with a *very* long lineage of rpc_clnts. When we go to put all of the
> > references to them, we can end up with a long call chain that can smash
> > the stack as each rpc_free_client() call can recurse back into itself.
> >
> > This patch fixes this by simply ensuring that the SETCLIENTID call will
> > only be retried in this situation if the last attempt did not use
> > RPC_AUTH_UNIX.
> >
> > Note too that with this change, we don't need the (i > 2) check in the
> > -EACCES case since we now have a more reliable test as to whether we
> > should reattempt.
> >
> > Cc: [email protected] # v3.10+
> > Cc: Chuck Lever <[email protected]>
> > Tested-by/Acked-by: Weston Andros Adamson <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfs/nfs4state.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index c8e729d..6f04706 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -2093,10 +2093,15 @@ again:
> > nfs4_root_machine_cred(clp);
> > goto again;
> > }
> > - if (i > 2)
> > + if (clnt->cl_auth->au_flavor == RPC_AUTH_UNIX)
> > break;
> > case -NFS4ERR_CLID_INUSE:
> > case -NFS4ERR_WRONGSEC:
> > + /* No point in retrying if we already used RPC_AUTH_UNIX */
> > + if (clnt->cl_auth->au_flavor == RPC_AUTH_UNIX) {
> > + status = -EPERM;
> > + break;
> > + }
> > clnt = rpc_clone_client_set_auth(clnt, RPC_AUTH_UNIX);
> > if (IS_ERR(clnt)) {
> > status = PTR_ERR(clnt);
> > --
> > 1.8.3.1
> >
>

--
Jeff Layton <[email protected]>

2013-11-13 21:33:37

by Weston Andros Adamson

[permalink] [raw]

Subject: Re: [PATCH v2] nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once

On Nov 13, 2013, at 4:12 PM, Jeff Layton <[email protected]> wrote:

> On Wed, 13 Nov 2013 20:48:04 +0000
> Weston Andros Adamson <[email protected]> wrote:
>
>> I tested v2 and everything looks good.
>>
>
> Thanks for testing it.
>
>> Also, it seems like I?m no longer able to reproduce the "NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO? dmesg.
>>
>> -dros
>>
>
> Interesting. I suspect that's due to the fact that we give up retrying
> quickly enough now that you can't signal it fast enough there. It's
> still probably reasonable to add in handling for -ERESTARTSYS in that
> switch statement however. I imagine that it's still possible to race in
> with a signal at the right time and trigger that message.
>

I just realized that I?ve been testing with a locally modified gssd - with the fd leak fix I posted yesterday. This might have something to do with the gssd hang that when ^C?d will generate the line in dmesg.

-dros

>> On Nov 13, 2013, at 9:08 AM, Jeff Layton <[email protected]> wrote:
>>
>>> Currently, when we try to mount and get back NFS4ERR_CLID_IN_USE or
>>> NFS4ERR_WRONGSEC, we create a new rpc_clnt and then try the call again.
>>> There is no guarantee that doing so will work however, so we can end up
>>> retrying the call in an infinite loop.
>>>
>>> Worse yet, we create the new client using rpc_clone_client_set_auth,
>>> which creates the new client as a child of the old one. Thus, we can end
>>> up with a *very* long lineage of rpc_clnts. When we go to put all of the
>>> references to them, we can end up with a long call chain that can smash
>>> the stack as each rpc_free_client() call can recurse back into itself.
>>>
>>> This patch fixes this by simply ensuring that the SETCLIENTID call will
>>> only be retried in this situation if the last attempt did not use
>>> RPC_AUTH_UNIX.
>>>
>>> Note too that with this change, we don't need the (i > 2) check in the
>>> -EACCES case since we now have a more reliable test as to whether we
>>> should reattempt.
>>>
>>> Cc: [email protected] # v3.10+
>>> Cc: Chuck Lever <[email protected]>
>>> Tested-by/Acked-by: Weston Andros Adamson <[email protected]>
>>> Signed-off-by: Jeff Layton <[email protected]>
>>> ---
>>> fs/nfs/nfs4state.c | 7 ++++++-
>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>> index c8e729d..6f04706 100644
>>> --- a/fs/nfs/nfs4state.c
>>> +++ b/fs/nfs/nfs4state.c
>>> @@ -2093,10 +2093,15 @@ again:
>>> nfs4_root_machine_cred(clp);
>>> goto again;
>>> }
>>> - if (i > 2)
>>> + if (clnt->cl_auth->au_flavor == RPC_AUTH_UNIX)
>>> break;
>>> case -NFS4ERR_CLID_INUSE:
>>> case -NFS4ERR_WRONGSEC:
>>> + /* No point in retrying if we already used RPC_AUTH_UNIX */
>>> + if (clnt->cl_auth->au_flavor == RPC_AUTH_UNIX) {
>>> + status = -EPERM;
>>> + break;
>>> + }
>>> clnt = rpc_clone_client_set_auth(clnt, RPC_AUTH_UNIX);
>>> if (IS_ERR(clnt)) {
>>> status = PTR_ERR(clnt);
>>> --
>>> 1.8.3.1
>>>
>>
>
>
> --
> Jeff Layton <[email protected]>