Return-Path: Received: from smtp-o-3.desy.de ([131.169.56.156]:37640 "EHLO smtp-o-3.desy.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751563AbbH0Gny (ORCPT ); Thu, 27 Aug 2015 02:43:54 -0400 Received: from smtp-map-3.desy.de (smtp-map-3.desy.de [131.169.56.68]) by smtp-o-3.desy.de (DESY-O-3) with ESMTP id BD23D280738 for ; Thu, 27 Aug 2015 08:43:52 +0200 (CEST) Received: from ZITSWEEP1.win.desy.de (zitsweep1.win.desy.de [131.169.97.95]) by smtp-map-3.desy.de (DESY_MAP_3) with ESMTP id AE75F1271 for ; Thu, 27 Aug 2015 08:43:52 +0200 (MEST) Date: Thu, 27 Aug 2015 08:43:51 +0200 (CEST) From: "Mkrtchyan, Tigran" To: "J. Bruce Fields" Cc: Ulrich Gemkow , linux-nfs@vger.kernel.org Message-ID: <824431189.4182121.1440657831497.JavaMail.zimbra@desy.de> In-Reply-To: <20150825215456.GF8579@fieldses.org> References: <201508241452.57718.ulrich.gemkow@ikr.uni-stuttgart.de> <20150824201401.GA401@fieldses.org> <201508251928.06201.ulrich.gemkow@ikr.uni-stuttgart.de> <20150825215456.GF8579@fieldses.org> Subject: Re: NFSv4 mount fails on Sun Solaris 10 after reboot of client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: ----- Original Message ----- > From: "J. Bruce Fields" > To: "Ulrich Gemkow" > Cc: linux-nfs@vger.kernel.org > Sent: Tuesday, August 25, 2015 11:54:56 PM > Subject: Re: NFSv4 mount fails on Sun Solaris 10 after reboot of client > On Tue, Aug 25, 2015 at 07:28:03PM +0200, Ulrich Gemkow wrote: >> Hello Bruce, >> >> On Monday 24 August 2015 22:14:01 J. Bruce Fields wrote: >> > On Mon, Aug 24, 2015 at 02:52:55PM +0200, Ulrich Gemkow wrote: >> > > we have a weired problem with Linux NFSv4.0 Server (Vanilla >> > > Kernel 4.1.6) and a Sun Solaris 10 client (all patches applied): >> > > >> > > When mounting a share on the Solaris client and then rebooting >> > > the client without unmounting the share first, after the reboot >> > > every attempt to mount the share again gives an I/O error on >> > > the client and the mount fails. >> > > >> > > After a long time (serveral hours) the v4 mount suddenly works >> > > again. >> > > >> > > Mounting a share with vers=2 works always even in times when >> > > the v4 mount fails. >> > > >> > > So it seems the Linux NFSv4 server holds a state for the client >> > > which prevents the re-mounting of the share and gives the >> > > I/O-error on the client. >> > > >> > > We use NFSv4 without idmapd. >> > > >> > > Is there any tip how to debug or solve this? >> > >> > Best is probably to get a packet trace. So something like: >> > >> > tcpdump -s0 -iem0 -wtmp.pcap >> > >> > and then try the client mount, then kill the tcpdump after the mount >> > fails, and send us tmp.pcap. (And/or take a look at tmp.pcap yourself >> > with wireshark. The interesting question is what kind of error the >> > server is returning when the client tries the mount after reboot.) >> >> Thank you for your reply. The tcpdump is attached, the relevant >> packets are 49..52. The error seems to be a SERVERFAULT. Can you >> see more from the dump? >> >> Thanks again and best regards > > The SERVERFAULT is on SETCLIENTID_CONFIRM. > > In nfsd4_setclientid_confirm(): > > conf = find_confirmed_client(clid, false, nn); > unconf = find_unconfirmed_client(clid, false, nn); > /* > * We try hard to give out unique clientid's, so if we get an > * attempt to confirm the same clientid with a different cred, > * there's a bug somewhere. Let's charitably assume it's our > * bug. > */ > status = nfserr_serverfault; > if (unconf && !same_creds(&unconf->cl_cred, &rqstp->rq_cred)) > goto out; > if (conf && !same_creds(&conf->cl_cred, &rqstp->rq_cred)) > goto out; > > The SETCLIENTID and SETCLIENTID_CONFIRM are done with identical > auth_unix creds. > > The clientid that were looking up there was returned from the previous > SETCLIENTID, generated by this logic: > > if (conf && same_verf(&conf->cl_verifier, &clverifier)) > /* case 1: probable callback update */ > copy_clid(new, conf); > else /* case 4 (new client) or cases 2, 3 (client reboot): */ > gen_clid(new, nn); > > So it should be a brand new clientid, unless the client was reusing the old > verifier. > > So perhaps the client is sending the SETCLIENTID with a verifier set to what it > used on the previous boot? That sounds like a client bug. The linux > client uses a timestamp for the verifier, looks like the Solaris client > might too. Is there some reason the clock on this client isn't > advancing on reboot? probably NFS4ERR_STALE_CLIENTID is a better error code for this scenario. Tigran. > > --b. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html