Return-Path: linux-nfs-owner@vger.kernel.org Received: from mx1.redhat.com ([209.132.183.28]:65208 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752239Ab2DWMAw (ORCPT ); Mon, 23 Apr 2012 08:00:52 -0400 Date: Mon, 23 Apr 2012 08:00:12 -0400 From: Jeff Layton To: Miklos Szeredi Cc: Malahal Naineni , Steve Dickson , linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org, viro@zeniv.linux.org.uk, hch@infradead.org, michael.brantley@deshaw.com, sven.breuner@itwm.fraunhofer.de, chuck.lever@oracle.com, pstaubach@exagrid.com, bfields@fieldses.org, trond.myklebust@fys.uio.no, rees@umich.edu Subject: Re: [PATCH RFC v3] vfs: make fstatat retry once on ESTALE errors from getattr call Message-ID: <20120423080012.7c23ef24@tlielax.poochiereds.net> In-Reply-To: References: <1334316311-22331-1-git-send-email-jlayton@redhat.com> <1334749927-26138-1-git-send-email-jlayton@redhat.com> <20120420104055.511e15bc@tlielax.poochiereds.net> <4F91C49D.8070908@RedHat.com> <20120420203725.GA3512@us.ibm.com> <20120420171314.73801874@corrin.poochiereds.net> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-nfs-owner@vger.kernel.org List-ID: On Sun, 22 Apr 2012 07:40:57 +0200 Miklos Szeredi wrote: > On Fri, Apr 20, 2012 at 11:13 PM, Jeff Layton wrote: > > On Fri, 20 Apr 2012 15:37:26 -0500 > > Malahal Naineni wrote: > > > >> Steve Dickson [SteveD@redhat.com] wrote: > >> > > 2) if we assume that it is fairly representative of one, how can we > >> > > achieve retrying indefinitely with NFS, or at least some large finite > >> > > amount? > >> > The amount of looping would be peer speculation. If the problem can > >> > not be handled by one simple retry I would say we simply pass the > >> > error up to the app... Its an application issue... > >> > >> As someone said, ESTALE is an incorrect errno for a path based call. > >> How about turning ESTALE into ENOENT after a retry or few retries? > >> > > > > It's not really the same thing. One could envision an application > > that's repeatedly renaming a new file on top of another one. The file > > is never missing from the namespace of the server, but you could still > > end up getting an ESTALE. > > > > That would break other atomicity guarantees in an even worse way, IMO... > > For directory operations ESTALE *is* equivalent to ENOENT if already > retrying with LOOKUP_REVAL. Think about it. Atomic replacement by > another directory with rename(2) is not an excuse here actually. > Local filesystems too can end up with IS_DEAD directory after lookup > in that case. > Doesn't that violate POSIX? rename(2) is supposed to be atomic, and I can't see where there's any exception for that for directories. Seems like it ought to be possible to eliminate that race for other filesystems as well, by turning those into an ESTALE return and retrying again. > For non directories we basically have getattr and setattr. NFSv4 can > handle both without retries if we supply the name instead of the > handle (i.e. i_op->getattr_by_name, i_op->setattr_by_name). Other > protocols can do whatever they want, exponential backoff with limited > number of retries, whatever. > > No looping required in the VFS. > Per-name operations for things like getattr and setattr would be nice. It would also make things cleaner on CIFS since there'd be less conversion from dentry to path. Note that we would still need to do a lookup. We have to update the inode attributes with these operations too. But, that would likely avoid the ESTALE problems with NFS since we could retry within the lower fs itself. That said, there's more than just those two operations involved. We'd need similar ones for other inode operations too: readlink permission *xattr ...and probably others, especially if you want to allow for more resilient create/delete in the face of a stale directory. -- Jeff Layton