Return-Path: linux-nfs-owner@vger.kernel.org Received: from mx1.redhat.com ([209.132.183.28]:20315 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756060Ab2DQNKq (ORCPT ); Tue, 17 Apr 2012 09:10:46 -0400 Message-ID: <4F8D580B.7060104@RedHat.com> Date: Tue, 17 Apr 2012 07:46:19 -0400 From: Steve Dickson MIME-Version: 1.0 To: Jeff Layton CC: "Myklebust, Trond" , Bernd Schubert , Malahal Naineni , "linux-nfs@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "pstaubach@exagrid.com" , "miklos@szeredi.hu" , "viro@ZenIV.linux.org.uk" , "hch@infradead.org" , "michael.brantley@deshaw.com" , "sven.breuner@itwm.fraunhofer.de" Subject: Re: [PATCH RFC] vfs: make fstatat retry on ESTALE errors from getattr call References: <1334316311-22331-1-git-send-email-jlayton@redhat.com> <20120413150518.GA1987@us.ibm.com> <20120413114236.0e557e01@tlielax.poochiereds.net> <4F8B1B7B.3040304@itwm.fraunhofer.de> <20120416073655.7cdb90cf@corrin.poochiereds.net> <4F8C3036.2030702@itwm.fraunhofer.de> <20120416134642.1754cd3e@corrin.poochiereds.net> <1334604785.2879.23.camel@lade.trondhjem.org> <20120416154322.0d95e435@corrin.poochiereds.net> <1334607906.2879.36.camel@lade.trondhjem.org> <20120416190548.2463d1d0@corrin.poochiereds.net> In-Reply-To: <20120416190548.2463d1d0@corrin.poochiereds.net> Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-nfs-owner@vger.kernel.org List-ID: On 04/16/2012 07:05 PM, Jeff Layton wrote: > On Mon, 16 Apr 2012 20:25:06 +0000 > "Myklebust, Trond" wrote: > >> On Mon, 2012-04-16 at 15:43 -0400, Jeff Layton wrote: >>> On Mon, 16 Apr 2012 19:33:05 +0000 >>> "Myklebust, Trond" wrote: >>> >>>> On Mon, 2012-04-16 at 13:46 -0400, Jeff Layton wrote: >>>>> The question about looping indefinitely really comes down to: >>>>> >>>>> 1) is a persistent ESTALE in conjunction with a successful lookup a >>>>> situation that we expect to be temporary. i.e. will the admin at some >>>>> point be able to do something about it? If not, then there's no point >>>>> in continuing to retry. Again, this is a situation that *really* should >>>>> not happen if the filesystem is doing the right thing. >>>>> >>>>> 2) If the admin can't do anything about it, is it reasonable to expect >>>>> that users can send a fatal signal to hung applications if this >>>>> situation occurs. >>>>> >>>>> We expect that that's ok in other situations to resolve hung >>>>> applications, so I'm not sure I understand why it wouldn't be >>>>> acceptable here... >>>> >>>> There are definitely potentially persistent pathological situations that >>>> the filesystem can't do anything about. If the point of origin for your >>>> pathname (for instance your current directory in the case of a relative >>>> pathname) is stale, then no amount of looping is going to help you to >>>> recover. >>>> >>> >>> Ok -- Peter pretty much said something similar. Retrying indefnitely >>> when the lookup returns ESTALE probably won't help. I'm ok with >>> basically letting the VFS continue to do what it does there already. If >>> it gets an ESTALE, it tries again with LOOKUP_REVAL set and then gives >>> up if that doesn't work. >>> >>> If however, the operation itself keeps returning ESTALE, are we OK to >>> retry indefinitely assuming that we'll break out of the loop on fatal >>> signals? >>> >>> For example, something like the v2 patch I sent a little while ago? >> >> >> Won't something like fstatat(AT_FDCWD, "", &stat, AT_EMPTY_PATH) risk >> looping forever there, or am I missing something? >> > > To make sure I understand, that should be "shortcut" for a lookup of the > cwd? > > So I guess the concern is that you'd do the above and get a successful > lookup since you're just going to get back the cwd. At that point, > you'd attempt the getattr and get ESTALE back. Then, you'd redo the > lookup with LOOKUP_REVAL set -- but since we're operating on the > cwd, we don't have a way to redo the lookup since we don't have a > pathname that we can look up again... > > So yeah, I guess if you're sitting in a stale directory, something like > that could loop eternally. > > Do you think the proposed check for fatal_signal_pending is enough to > mitigate such a problem? Or do we need to limit the number of retries > to address those sorts of loops? > I think your version 2 patch is definitely more clever than v1, but I'm thinking Trond's point is no matter what type of looping is done or how long its done its going to fail... So why loop at all? Again I think the safest and simply way, from the VFS point of view, is do the looping once if the file system as registered for it through the fs_flags. This will catch %99 of the issues, failing on the %1 of the corner cases... steved.