Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751323AbbEQEsG (ORCPT ); Sun, 17 May 2015 00:48:06 -0400 Received: from cantor2.suse.de ([195.135.220.15]:45137 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750709AbbEQEr7 (ORCPT ); Sun, 17 May 2015 00:47:59 -0400 Date: Sun, 17 May 2015 14:47:47 +1000 From: NeilBrown To: Linus Torvalds Cc: Al Viro , Andreas Dilger , Dave Chinner , Linux Kernel Mailing List , linux-fsdevel , Christoph Hellwig Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks Message-ID: <20150517144747.2a69846c@notabene.brown> In-Reply-To: References: <20150514033040.GF7232@ZenIV.linux.org.uk> <20150514112304.GT15721@dastard> <20150516093022.51e1464e@notabene.brown> <20150516112503.2f970573@notabene.brown> <20150516014718.GO7232@ZenIV.linux.org.uk> <20150516144527.20b89194@notabene.brown> <20150516054626.GS7232@ZenIV.linux.org.uk> <20150516141811.GT7232@ZenIV.linux.org.uk> <20150517131203.7342afc8@notabene.brown> X-Mailer: Claws Mail 3.10.1-162-g4d0ed6 (GTK+ 2.24.25; x86_64-suse-linux-gnu) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/O9oL604Av22wBiu+dVEl_lL"; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5526 Lines: 125 --Sig_/O9oL604Av22wBiu+dVEl_lL Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Sat, 16 May 2015 21:04:34 -0700 Linus Torvalds wrote: > On Sat, May 16, 2015 at 8:48 PM, Linus Torvalds > wrote: > > > > Sorry, but that really is how it is. NFS isn't special enough for some > > badly designed lookup models to matter one whit. >=20 > Btw, it's not just about performance, although the whole "we can do > cached lookups without ever having to et the filesystem involved" is a > big deal. >=20 > It's about getting fundamental concpets like mount points etc right, > it's about all those those things that the filesystem really doesn't > know about, and _cannot_ sanely know about. >=20 > It's now about things like overlayfs etc, all those things. >=20 > So the filesystem really isn't in control. Never will be. The > filesystem is at the mercy of (extended) unix semantics that are > bigger than the filesystem. >=20 > This is true of IO too. The filesystem does have a bit more > flexibility, but in the end, you have to do the readpage thing, > because it's the only way you'll get mmap. The filesystem isn't really > in control there either, there are strict rules for what it has to do > in order to have reasonable coherent mmap semantics. >=20 > So the vfs layer often does have a "library" approach, because > filesystems may do things in very different ways. But at the same > time, the vfs layer really *is* in control, because it's the vfs layer > that enforces certain basic semantics. So the dcache very much isn't > just sme "slave cache" that you choose to use and is at the control of > the filesystem. Like the page cache, you don't get a choice, because > you aren't in charge. Last I checked, sysfs doesn't use the page cache. Of course, sysfs is a special case (but then, aren't we all -- deep down). I like the page cache. Really do. It provides useful services and lots of 'generic' helpers. You can use the 'generic' versions directly, or wrap th= em in a little bit of extra code, or rewrite them completely. It's lovely. >=20 > When somebody does a lookup of a filename, it is not a "pass this > filename to the filesystem". It very much *is* a > component-by-component lookup. And in the *vast* majority of the > cases, the cached lookup when you don't even get asked is absolutely > the right thing to do, and doing anything else wouldn't just be wrong, > it would be completely and utterly stupid. I think you must have been reading someone else's emails, not mine. I'm totally there with the cached lookups. They are awesome. Don't want anything else. But when the cache doesn't have the answer - what then? The filesystem is most likely to know how to fill the cache most efficiently. I remember hunting after some problem a while ago. I don't remember the exact details but it was related to when NFS is asked to perform permission checks on the way to opening something. I'm pretty sure it involved atomic_open() as a key part. Anyway, the code is/was very hairy and seemed to be convoluted in order to try to meet every bodies needs at once. Having one piece of code that tries to handle the subtle details for all filesystems is, I think, a mistake. Certainly have a block of code that do= es the 'easy, local filesystem' version. But don't try to combine the necessarily-different NFS version into the same block of code. It becomes (nearly) unreadable. I know there are interesting complex cases for open: O_EXCL and trailing symlinks and things certainly make it interesting. But pretending that all filesystems can be squashed into the one mould is just a pretence. NeilBrown >=20 > And the fact that somebody doesn't understand that, and has designed > bad extensions to do multi-component lookup, isn't actually an > argument against the dcache. It's just an argument for "people make > bad intterfaces because they hack things up and don't understand > things". >=20 > Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ --Sig_/O9oL604Av22wBiu+dVEl_lL Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIVAwUBVVgdcznsnt1WYoG5AQJ6GhAAq1pEZHathKMXkggus79+ta0F9IbkxPYV ypDpogbluNYa2v1/0eagk9LscTMzEkzk+2VdubM7ab3pHasnx+XoxTxVUGzf7CSH eBM20mJMOk2lbNnoGZZFfN1mOxQ6viQOe1qzD/m/5VLWAYDkxYe+is1X2ftMCwf0 SCsYbcQfaJMjDEXL+ZZPYhiFGykyjDZ9Qbs98GVPRYBj8YKO1P+pJdzz6v0sTKtI yckf9Uy6NUWOcNMOo95Qtfd4UFBfr1HuFYzjhhyiW4aa4SdR2HYn8wEluADGwU9A 7/MTItX+1+VfOJQ3b8tmO5gG7Rdz+qFX7ABPlnD6J8Y4xv2VCoF6lRRog82/tgup XEU9D/1LJDTBNgrqXq+aK1BrN4fQGPkRCEelIt1kkGLgnAt6ExLGX2UDDG1SUNAp a0WQfAdXJNNx9R5CZb3TNy9pAK93jAF2C1ziFCwkZVqeGH0q1cBmztbhWZt9LPUg aJy2GxVqRMvkiT/BG38ZcNJUEVxlSsyni7BtY6cJxnVPJg+S56aIgTfnj1AM0bL7 5APyH+NJMGtc3X7NSQQrvu4GVrthuz7k5TnNscfvqLNIorSrvYgjYVrHLW+OdyjS pCujq385+Jw2QJQp+JY3xlWmMEaSSX0VobnqH+TCTWX0H4Q7uf9Fw12pjw1bYoyB zVe+js2UpSo= =aetv -----END PGP SIGNATURE----- --Sig_/O9oL604Av22wBiu+dVEl_lL-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/