Return-Path: Received: from fieldses.org ([173.255.197.46]:45085 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751888AbbHZUJk (ORCPT ); Wed, 26 Aug 2015 16:09:40 -0400 Date: Wed, 26 Aug 2015 16:09:40 -0400 From: "J. Bruce Fields" To: Ulrich Gemkow Cc: linux-nfs@vger.kernel.org Subject: Re: NFSv4 mount fails on Sun Solaris 10 after reboot of client Message-ID: <20150826200940.GE4161@fieldses.org> References: <201508241452.57718.ulrich.gemkow@ikr.uni-stuttgart.de> <201508251928.06201.ulrich.gemkow@ikr.uni-stuttgart.de> <20150825215456.GF8579@fieldses.org> <201508262154.24455.ulrich.gemkow@ikr.uni-stuttgart.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <201508262154.24455.ulrich.gemkow@ikr.uni-stuttgart.de> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Wed, Aug 26, 2015 at 09:54:22PM +0200, Ulrich Gemkow wrote: > Hello Bruce, > > On Tuesday 25 August 2015 23:54:56 J. Bruce Fields wrote: > > The SERVERFAULT is on SETCLIENTID_CONFIRM. > > > > In nfsd4_setclientid_confirm(): > > > > conf = find_confirmed_client(clid, false, nn); > > unconf = find_unconfirmed_client(clid, false, nn); > > /* > > * We try hard to give out unique clientid's, so if we get an > > * attempt to confirm the same clientid with a different cred, > > * there's a bug somewhere. Let's charitably assume it's our > > * bug. > > */ > > status = nfserr_serverfault; > > if (unconf && !same_creds(&unconf->cl_cred, &rqstp->rq_cred)) > > goto out; > > if (conf && !same_creds(&conf->cl_cred, &rqstp->rq_cred)) > > goto out; > > > > The SETCLIENTID and SETCLIENTID_CONFIRM are done with identical > > auth_unix creds. > > > > The clientid that were looking up there was returned from the previous > > SETCLIENTID, generated by this logic: > > > > if (conf && same_verf(&conf->cl_verifier, &clverifier)) > > /* case 1: probable callback update */ > > copy_clid(new, conf); > > else /* case 4 (new client) or cases 2, 3 (client reboot): */ > > gen_clid(new, nn); > > > > So it should be a brand new clientid, unless the client was reusing the old > > verifier. > > > > So perhaps the client is sending the SETCLIENTID with a verifier set to what it > > used on the previous boot? That sounds like a client bug. The linux > > client uses a timestamp for the verifier, looks like the Solaris client > > might too. Is there some reason the clock on this client isn't > > advancing on reboot? > > Thank you for the analysis. But the clock of the client advances > regularely and as one would expect. OK, thanks for checking that. > The client is SPARC Solaris 10 with the latest patches > applied - I cannot believe that this client has such a > basic NFS bug. To confirm or deny my hypothesis, I think what we want is a longer capture that gets the failing SETCLIENTID_CONFIRM (as seen in the previous capture) but also shows what clientid the client was using before the reboot. So ideal might be something like: - start the capture - mount - create a file (I just want to make sure the client does at least one open) - reboot the client - mount again, see the failure - stop the capture > Can you think of any kind of server configuration bug > (as said, Vanilla Linux 4.1.6) that causes this error? > The NFS server startup system is "self-made"... I can't think of any off hand, if there's a server-side problem here I'd suspect the code before the configuration. --b.