Return-Path: linux-nfs-owner@vger.kernel.org Received: from mail-vc0-f181.google.com ([209.85.220.181]:55899 "EHLO mail-vc0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753369AbbAUXAX convert rfc822-to-8bit (ORCPT ); Wed, 21 Jan 2015 18:00:23 -0500 Received: by mail-vc0-f181.google.com with SMTP id id10so3501286vcb.12 for ; Wed, 21 Jan 2015 15:00:22 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: Date: Wed, 21 Jan 2015 18:00:22 -0500 Message-ID: Subject: Re: race between inode eviction with delegation and opening same file From: Trond Myklebust To: Olga Kornievskaia Cc: linux-nfs Content-Type: text/plain; charset=UTF-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Wed, Jan 21, 2015 at 5:48 PM, Trond Myklebust wrote: > Hi Olga, > > On Wed, Jan 21, 2015 at 5:29 PM, Olga Kornievskaia wrote: >> There is a race between return of a delegation (due to resource >> constraints and inode getting evicted) and an open for the same file >> because…. >> >> In nfs_inode_return_delegation_noreclaim() we detach that delegation >> from the inode first (nfs_inode_detach_delegation()) and then we send >> a delegreturn. In between of nfs_inode_detach_delegation() and >> nfs_do_return_delegation() an open request comes and it doesn’t see >> any delegations for this inode and thus it sends an open request. At >> the same time eviction thread is working on doing a delegreturn for >> this inode. The server gets request for the open first and returns a >> delegation for it (note: server will issue the same delegation stateid >> but different seqnum as the one currently being returned by the >> client). Then the server processes a delegreturn for this file (from >> its perspective delegation stateid is no longer valid). Yet, on the >> client side as it processes a reply from the server, it adds a “new” >> delegation to the inode and proceeds to use it for an IO request later >> which errors in BAD_STATEID. >> >> I don't see how we can make detaching the delegation and the return of >> it atomic. Instead I propose the following solution: >> I propose to examine the delegation we get on open. If the sequence >> number is not 0 and we don't have a delegation already then we must >> have experienced this race. Therefore, we should return the delegation >> (and ignore if this errors, as the server thinks there is no more >> delegation). Something like this. Disclaimer: it doesn't seem to be >> the actual fix (as it hangs the machine) so I'm still missing >> something .... >> >> >> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c >> index b76166b..01bba63 100644 >> --- a/fs/nfs/delegation.c >> +++ b/fs/nfs/delegation.c >> @@ -368,5 +368,12 @@ int nfs_inode_set_delegation(struct inode *inode, >> struct rpc_cred *cred, struct >> old_delegation, clp); >> if (freeme == NULL) >> goto out; >> + } else { >> + if (be32_to_cpu(delegation->stateid.seqid) > 0) { >> + nfs_do_return_delegation(inode, delegation, 0); >> + goto out; >> + } >> } >> list_add_rcu(&delegation->super_list, &server->delegations); >> nfsi->delegation_state = delegation->type; > > Shouldn't the seqid always be non-zero? AFAIK, RFC5661 reserves seqid > == 0 for the special case where the client is asking the server to > 'please substitute the most recent value' (see section 8.2.2). > Normally, the server is therefore supposed to start with seqid == 1, > and increment so that when the seqid wraps around, it skips over the > value '0'. BTW: We should probably add in a check to protect against servers that return seqid == 0. I suspect that could cause some really nasty behaviour if/when the client is trying to close a file or delegreturn while there is an on-the-wire OPEN. > Also, please note that nfs_do_return_delegation() may sleep, so you > cannot call it while holding a spinlock. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com