From: Weston Andros Adamson <dros@netapp.com>
To: Jeff Layton <jlayton@redhat.com>
CC: "Myklebust, Trond" <Trond.Myklebust@netapp.com>,
        linux-nfs list <linux-nfs@vger.kernel.org>,
        Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [PATCH v2] nfs: don't retry detect_trunking with RPC_AUTH_UNIX
 more than once
Date: Wed, 13 Nov 2013 21:33:36 +0000
Message-ID: <72CA86AD-5543-43C8-A736-3D4118B0824B@netapp.com>
References: <1384351701-29572-1-git-send-email-jlayton@redhat.com>
 <EBCF8527-3031-4BFE-9559-9CAF50C4633D@netapp.com>
 <20131113161207.76c216fe@tlielax.poochiereds.net>
In-Reply-To: <20131113161207.76c216fe@tlielax.poochiereds.net>
Content-Type: text/plain; charset="Windows-1252"
MIME-Version: 1.0
Sender: linux-nfs-owner@vger.kernel.org


On Nov 13, 2013, at 4:12 PM, Jeff Layton <jlayton@redhat.com> wrote:

> On Wed, 13 Nov 2013 20:48:04 +0000
> Weston Andros Adamson <dros@netapp.com> wrote:
> 
>> I tested v2 and everything looks good.
>> 
> 
> Thanks for testing it.
> 
>> Also, it seems like I?m no longer able to reproduce the "NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO? dmesg.
>> 
>> -dros
>> 
> 
> Interesting. I suspect that's due to the fact that we give up retrying
> quickly enough now that you can't signal it fast enough there. It's
> still probably reasonable to add in handling for -ERESTARTSYS in that
> switch statement however. I imagine that it's still possible to race in
> with a signal at the right time and trigger that message.
> 

I just realized that I?ve been testing with a locally modified gssd - with the fd leak fix I posted yesterday.  This might have something to do with the gssd hang that when ^C?d will generate the line in dmesg.

-dros


>> On Nov 13, 2013, at 9:08 AM, Jeff Layton <jlayton@redhat.com> wrote:
>> 
>>> Currently, when we try to mount and get back NFS4ERR_CLID_IN_USE or
>>> NFS4ERR_WRONGSEC, we create a new rpc_clnt and then try the call again.
>>> There is no guarantee that doing so will work however, so we can end up
>>> retrying the call in an infinite loop.
>>> 
>>> Worse yet, we create the new client using rpc_clone_client_set_auth,
>>> which creates the new client as a child of the old one. Thus, we can end
>>> up with a *very* long lineage of rpc_clnts. When we go to put all of the
>>> references to them, we can end up with a long call chain that can smash
>>> the stack as each rpc_free_client() call can recurse back into itself.
>>> 
>>> This patch fixes this by simply ensuring that the SETCLIENTID call will
>>> only be retried in this situation if the last attempt did not use
>>> RPC_AUTH_UNIX.
>>> 
>>> Note too that with this change, we don't need the (i > 2) check in the
>>> -EACCES case since we now have a more reliable test as to whether we
>>> should reattempt.
>>> 
>>> Cc: stable@vger.kernel.org # v3.10+
>>> Cc: Chuck Lever <chuck.lever@oracle.com>
>>> Tested-by/Acked-by: Weston Andros Adamson <dros@netapp.com>
>>> Signed-off-by: Jeff Layton <jlayton@redhat.com>
>>> ---
>>> fs/nfs/nfs4state.c | 7 ++++++-
>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>> index c8e729d..6f04706 100644
>>> --- a/fs/nfs/nfs4state.c
>>> +++ b/fs/nfs/nfs4state.c
>>> @@ -2093,10 +2093,15 @@ again:
>>> 			nfs4_root_machine_cred(clp);
>>> 			goto again;
>>> 		}
>>> -		if (i > 2)
>>> +		if (clnt->cl_auth->au_flavor == RPC_AUTH_UNIX)
>>> 			break;
>>> 	case -NFS4ERR_CLID_INUSE:
>>> 	case -NFS4ERR_WRONGSEC:
>>> +		/* No point in retrying if we already used RPC_AUTH_UNIX */
>>> +		if (clnt->cl_auth->au_flavor == RPC_AUTH_UNIX) {
>>> +			status = -EPERM;
>>> +			break;
>>> +		}
>>> 		clnt = rpc_clone_client_set_auth(clnt, RPC_AUTH_UNIX);
>>> 		if (IS_ERR(clnt)) {
>>> 			status = PTR_ERR(clnt);
>>> -- 
>>> 1.8.3.1
>>> 
>> 
> 
> 
> -- 
> Jeff Layton <jlayton@redhat.com>