Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751603AbbEQDDx (ORCPT ); Sat, 16 May 2015 23:03:53 -0400 Received: from cantor2.suse.de ([195.135.220.15]:43094 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750849AbbEQDDn (ORCPT ); Sat, 16 May 2015 23:03:43 -0400 Date: Sun, 17 May 2015 13:03:30 +1000 From: NeilBrown To: Al Viro Cc: Linus Torvalds , Andreas Dilger , Dave Chinner , Linux Kernel Mailing List , linux-fsdevel , Christoph Hellwig Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks Message-ID: <20150517130330.6807b9f4@notabene.brown> In-Reply-To: <20150516054626.GS7232@ZenIV.linux.org.uk> References: <20150514033040.GF7232@ZenIV.linux.org.uk> <20150514112304.GT15721@dastard> <20150516093022.51e1464e@notabene.brown> <20150516112503.2f970573@notabene.brown> <20150516014718.GO7232@ZenIV.linux.org.uk> <20150516144527.20b89194@notabene.brown> <20150516054626.GS7232@ZenIV.linux.org.uk> X-Mailer: Claws Mail 3.10.1-162-g4d0ed6 (GTK+ 2.24.25; x86_64-suse-linux-gnu) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/jHwOFoqwLWTgLYImeml7zLF"; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6252 Lines: 149 --Sig_/jHwOFoqwLWTgLYImeml7zLF Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Sat, 16 May 2015 06:46:26 +0100 Al Viro wrote: > On Sat, May 16, 2015 at 02:45:27PM +1000, NeilBrown wrote: >=20 > > Yes, I've looked lately :-) > > I think that all of RCU-walk, and probably some of REF-walk should happ= en > > before the filesystem gets to see anything. > > But once you hit a non-positive dentry or the parent of the target name= , I'd > > rather hand over the the FS. >=20 > ... and be ready to get it back when the sucker runs into a symlink. Unl= ess > you want to handle _those_ in NFS somehow (including an absolute one star= ting > with /sys/, etc.). Certain - when a symlink or mountpoint is found, the filesystem stops. mountpoints should be rarely hit because the path to a mountpoint will usually be stable enough for RCU-walk to find it.... Thinks: I wonder what happens when a mount-on NFS directory is deleted on t= he server... automountpoints would be handled completely by the filesystem. It would mount something and then return saying "Look, I found a mount point - you wanna handle for me?". >=20 > > NFSv4 has the ability to look up multiple components in a single LOOKUP= call. > > VFS doesn't give it a chance to try because it wants to go step-by-step= , and > > wants each entry in the cache to have an inode etc. >=20 > Do tell, how do we deal with .. afterwards if we leave the intermediate o= nes > without inodes? We _could_ feed multi-component requests to filesystems > (and NFSv4 isn't the first one to handle that - 9p had been there a lot > earlier), but then you get to > * populate all of them with inodes > * be damn careful to avoid multiple dentries for the same directory > inode NFS directories already need to be revalidated occasionally. Having a dent= ry in "unknown" state just means a revalidation is that much more likely. Suppose I cd into a directory, then rename the directory on the server. Wh= at happens? What should happen? I could make a case that the NFS client should lookup ".." on the server and rebuild the path upwards. There is a (to me) really key point here. Local filesystems use the dcache for correctness. It prevents concurrent directory renames from creating loops and it ensure that only one file of a given name exists in each directory. Remote filesystems don't use it for correctness. For them it is simply an optimisation. So getting upset about directories with multiple dentries, or directories that aren't connected to the root is very important for local filesystems, and largely irrelevant for network filesystems. A local filesystem needs the cache to remain consistent with storage. A network filesystem cannot possible ensure that the cache is consistent with storage, and just need to be able to notice the more offensive inconsistencies reasonably quickly, and repair them. > Look, creating those suckers isn't the worst part; you need to be ready f= or > e.g. mount(2) or pathname resolution playing with the ones you'd created. > It's not fs-private data structure; pathname resolution might very well s= pan > many filesystem types. Any partname lookup which touched these dentries would call d_revalidate() (or similar) which could get the inode etc if it was really needed. >=20 > Worse, you get to deal with several multi-component requests jumping into > fs at the same place. With responses arriving a bit afterwards, and guess > what? Those requests happen to share bits and pieces of prefixes. Oh, > and one of them is a rename. Dealing with just the final components isn't > a problem; you'll need to deal with directory tree in all its fscking glo= ry. > In a way that wouldn't be in too incestous relationship with the pathwalk= ing > logics in VFS and, by that proxy, such in all other fs types. >=20 > In particular, "unknown" for intermediate nodes is a recipe for really > nasty mess. If the path can rejoin the known universe several components > later... >=20 > Dealing with multi-component lookups isn't impossible and might be a good > idea, but only if all intermediates are populated. What information does > NFSv4 multi-component lookup give you? 9p one gives an array of FIDs, > one per component, and that is best used as multi-component revalidate > on hot dcache... If was remembering RFC3010 in which a LOOKUP had a "pathname4" which was an array of "component4". It could just return the filehandle and attributes = of the final target. RFC3530 and later revised that so "LOOKUP" gets a "component4". That just means that it is easy to get the attributes if you want them. I'm not really saying that multiple component lookups are a good idea, or that doing the lookup and not getting the intermediate attributes is a sensible approach. What I'm really pointing out is that the current dcache imposes a particular model very strongly on filesystems and I'm far from convinced that that is a good idea. NeilBrown --Sig_/jHwOFoqwLWTgLYImeml7zLF Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIVAwUBVVgFAznsnt1WYoG5AQICfhAAoxhYSYyX100lzhnQltlhW5NYGKxyEqSw xYWn4Gw0e5Lb21ZCWRe09CvoSk6SK2EBh3rPN4PxKOJ6QN8DQeGcLcZoaq7j9W2e FCeIwITVAYAKP3HUAls77RlH43FPh1oYdY5HfYXv3LGv5+lQqUQZNV8t8NTvt2j0 KJAJLgk44ltqfxintC8fbXMsqTRXM2h2kv2mXQYuR2dE1MJ5h6+V7H1WRKC3GRs0 v8KaUkzwFqCWEmFoxtHTCALT6CPcFYBCn/K12OKNW25C70NKkZohnmpKyeaAuxus gFspGTOtKGcsIFS2/C8PNPjGCmZvwoJrzD0G6o2uTvL1htRgmx5z22Kgix+HdBfQ IIwpm5uLtWLrmW7ldqUwCChs/TNqWMhpWgITCqwUGEfVTxT446MAHM+3hqKJFwGM dEaUjK0N02I/gOanTVVzEZdvZeahfHr5uAgEk8fxvMqAivYwySEipIzPBLJTFhvf T4VoGmdsp/QlXdmSyKASMNDN5PCp1z+4vU1O6gm0qAT0AfCdrBduxLZY8m5LqTCZ /8XDC8/n5HkPv863N/zClfejljPpY4sLcktJohsNfFA+uOYufPjQ89YoqKz7rj9O DkOyMOzXTbdUAC5dcK8g4VHn0N639jDG2O8Wq+n97Vz0/nDHgES44IhiDWToJ1MV wiSTjX8tCGU= =EOGb -----END PGP SIGNATURE----- --Sig_/jHwOFoqwLWTgLYImeml7zLF-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/