Return-Path: Received: from userp2120.oracle.com ([156.151.31.85]:54284 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754918AbeFYRDS (ORCPT ); Mon, 25 Jun 2018 13:03:18 -0400 Subject: Re: [PATCH 2/2] nfsd: return ENOSPC if unable to allocate a session slot To: Chuck Lever , Trond Myklebust Cc: Bruce Fields , Linux NFS Mailing List References: <1529598933-16506-1-git-send-email-manjunath.b.patil@oracle.com> <1529598933-16506-2-git-send-email-manjunath.b.patil@oracle.com> <20180622175416.GA7119@fieldses.org> <148E65CF-D3D4-4E43-A190-822C5F7824B9@gmail.com> <180d25ce5474539f15a84a23258d15c71ec11ad9.camel@hammerspace.com> <1131E2BE-162D-45BB-BC24-49097733ACC3@gmail.com> From: Manjunath Patil Message-ID: <3ab9ddf4-f51a-12f0-8d33-256c2bded552@oracle.com> Date: Mon, 25 Jun 2018 10:03:10 -0700 MIME-Version: 1.0 In-Reply-To: <1131E2BE-162D-45BB-BC24-49097733ACC3@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: On 6/25/2018 8:39 AM, Chuck Lever wrote: > >> On Jun 24, 2018, at 9:56 AM, Trond Myklebust wrote: >> >> On Sat, 2018-06-23 at 15:00 -0400, Chuck Lever wrote: >>>> On Jun 22, 2018, at 6:31 PM, Trond Myklebust >>> om> wrote: >>>> >>>> On Fri, 2018-06-22 at 17:49 -0400, Chuck Lever wrote: >>>>> Hi Bruce- >>>>> >>>>> >>>>>> On Jun 22, 2018, at 1:54 PM, J. Bruce Fields >>>>> org> >>>>>> wrote: >>>>>> >>>>>> On Thu, Jun 21, 2018 at 04:35:33PM +0000, Manjunath Patil >>>>>> wrote: >>>>>>> Presently nfserr_jukebox is being returned by nfsd for >>>>>>> create_session >>>>>>> request if server is unable to allocate a session slot. This >>>>>>> may >>>>>>> be >>>>>>> treated as NFS4ERR_DELAY by the clients and which may >>>>>>> continue to >>>>>>> re-try >>>>>>> create_session in loop leading NFSv4.1+ mounts in hung state. >>>>>>> nfsd >>>>>>> should return nfserr_nospc in this case as per >>>>>>> rfc5661(section- >>>>>>> 18.36.4 >>>>>>> subpoint 4. Session creation). >>>>>> I don't think the spec actually gives us an error that we can >>>>>> use >>>>>> to say >>>>>> a CREATE_SESSION failed permanently for lack of resources. >>>>> The current situation is that the server replies NFS4ERR_DELAY, >>>>> and the client retries indefinitely. The goal is to let the >>>>> client choose whether it wants to try the CREATE_SESSION again, >>>>> try a different NFS version, or fail the mount request. >>>>> >>>>> Bill and I both looked at this section of RFC 5661. It seems to >>>>> us that the use of NFS4ERR_NOSPC is appropriate and unambiguous >>>>> in this situation, and it is an allowed status for the >>>>> CREATE_SESSION operation. NFS4ERR_DELAY OTOH is not helpful. >>>> There are a range of errors which we may need to handle by >>>> destroying >>>> the session, and then creating a new one (mainly the ones where the >>>> client and server slot handling get out of sync). That's why >>>> returning >>>> NFS4ERR_NOSPC in response to CREATE_SESSION is unhelpful, and is >>>> why >>>> the only sane response by the client will be to treat it as a >>>> temporary >>>> error. >>>> IOW: these patches will not be acceptable, even with a rewrite, as >>>> they >>>> are based on a flawed assumption. >>> Fair enough. We're not attached to any particular solution/fix. >>> >>> So let's take "recovery of an active mount" out of the picture >>> for a moment. >>> >>> The narrow problem is behavioral: during initial contact with an >>> unfamiliar server, the server can hold off a client indefinitely >>> by sending NFS4ERR_DELAY for example until another client unmounts. >>> We want to find a way to allow clients to make progress when a >>> server is short of resources. >>> >>> It appears that the mount(2) system call does not return as long >>> as the server is still returning NFS4ERR_DELAY. Possibly user >>> space is never given an opportunity to stop retrying, and thus >>> mount.nfs gets stuck. >>> >>> It appears that DELAY is OK for EXCHANGE_ID too. So if a server >>> decides to return DELAY to EXCHANGE_ID, I wonder if our client's >>> trunking detection would be hamstrung by one bad server... >> The 'mount' program has the 'retry' option in order to set a timeout >> for the mount operation itself. Is that option not working correctly? > Manjunath will need to confirm that, but my understanding is that > mount.nfs is not regaining control when the server returns DELAY > to CREATE_SESSION. My conclusion was that mount(2) is not returning. > yes. this is true. Even with setting a retry the mount calls blocks on client side indefinitely. On the wire I can see CREATE_SESSION and NFS4ERR_DELAY exchanges happening continuously. I am not sure about the effects, but a NFSv4.0 mount to same server at this moment succeeds. More information: ... 2144  09:54:32.473054 write(1, "mount.nfs: trying text-based opt"..., 113) = 113 <0.000337> 2144  09:54:32.473468 mount("10.211.47.123:/exports", "/NFSMNT", "nfs", 0, "retry=1,vers=4,minorversion=1,ad"... 2143  09:56:42.253947 <... wait4 resumed> 0x7fffb2e13ec8, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <129.800036> 2143  09:56:42.254142 --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} --- ... The client mount call hangs here - [] nfs_wait_client_init_complete+0x52/0xc0 [nfs] [] nfs41_discover_server_trunking+0x6d/0xb0 [nfsv4] [] nfs4_discover_server_trunking+0x82/0x2e0 [nfsv4] [] nfs4_init_client+0x136/0x300 [nfsv4] [] nfs_get_client+0x24f/0x2f0 [nfs] [] nfs4_set_client+0x9f/0xf0 [nfsv4] [] nfs4_create_server+0x13e/0x3b0 [nfsv4] [] nfs4_remote_mount+0x32/0x60 [nfsv4] [] mount_fs+0x3e/0x180 [] vfs_kern_mount+0x6b/0x110 [] nfs_do_root_mount+0x86/0xc0 [nfsv4] [] nfs4_try_mount+0x44/0xc0 [nfsv4] [] nfs_fs_mount+0x4cb/0xda0 [nfs] [] mount_fs+0x3e/0x180 [] vfs_kern_mount+0x6b/0x110 [] do_mount+0x251/0xcf0 [] SyS_mount+0xa2/0x110 [] tracesys_phase2+0x6d/0x72 [] 0xffffffffffffffff I have a setup to reproduce this. If you need any more info, please let me know. -Thanks, Manjunath >> If so, we should definitely fix that. > My recollection is that mount.nfs polls, it does not set a timer > signal. So it will call mount(2) repeatedly until either "retry" > minutes has passed, or mount(2) succeeds. I don't think it will > deal with mount(2) not returning, but I could be wrong about that. > > My preference would be to make the kernel more reliable (ie mount(2) > fails immediately in this case). That gives mount.nfs some time to > try other things (like, try the original mount again after a few > moments, or fall back to NFSv4.0, or fail). > > We don't want mount.nfs to wait for the full retry= while doing > nothing else. That would make this particular failure mode behave > differently than all the other modes we have had, historically, IIUC. > > Also, I agree with Bruce that the server should make CREATE_SESSION > less likely to fail. That would also benefit state recovery. > > >> We might also want to look into making it take values < 1 minute. That >> could be accomplished either by extending the syntax of the 'retry' >> option (e.g.: 'retry=:') or by adding a new option >> (e.g. 'sretry='). >> >> It would then be up to the caller of mount to decide the policy of what >> to do after a timeout. > I agree that the caller of mount(2) should be allowed to provide the > policy. > > >> Renegotiation downward to NFSv3 might be an >> option, but it's not something that most people want to do in the case >> where there are lots of clients competing for resources since that's >> precisely the regime where the NFSv3 DRC scheme breaks down (lots of >> disconnections, combined with a high turnover of DRC slots). > -- > Chuck Lever > chucklever@gmail.com > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html