Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933261AbbEPEpk (ORCPT ); Sat, 16 May 2015 00:45:40 -0400 Received: from cantor2.suse.de ([195.135.220.15]:33443 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750964AbbEPEpi (ORCPT ); Sat, 16 May 2015 00:45:38 -0400 Date: Sat, 16 May 2015 14:45:27 +1000 From: NeilBrown To: Al Viro Cc: Linus Torvalds , Andreas Dilger , Dave Chinner , Linux Kernel Mailing List , linux-fsdevel , Christoph Hellwig Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks Message-ID: <20150516144527.20b89194@notabene.brown> In-Reply-To: <20150516014718.GO7232@ZenIV.linux.org.uk> References: <20150511180650.GA4147@ZenIV.linux.org.uk> <20150513222533.GA24192@ZenIV.linux.org.uk> <20150514033040.GF7232@ZenIV.linux.org.uk> <20150514112304.GT15721@dastard> <20150516093022.51e1464e@notabene.brown> <20150516112503.2f970573@notabene.brown> <20150516014718.GO7232@ZenIV.linux.org.uk> X-Mailer: Claws Mail 3.10.1-162-g4d0ed6 (GTK+ 2.24.25; x86_64-suse-linux-gnu) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/ZEbD6cvih_BGxfttKmtv8wB"; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4338 Lines: 105 --Sig_/ZEbD6cvih_BGxfttKmtv8wB Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Sat, 16 May 2015 02:47:18 +0100 Al Viro wrote: > On Sat, May 16, 2015 at 11:25:03AM +1000, NeilBrown wrote: > > But surely those things can be managed with a spinlock. > >=20 > > I think a big part of the problem is that the VFS tries to control > > filesystems rather than provide services to them. >=20 > What with being the thing syscalls talk to for sending the requests to > filesystems... Do you really want to push the pathname resolution into > fs code? You've looked at it lately, right? Yes, I've looked lately :-) I think that all of RCU-walk, and probably some of REF-walk should happen before the filesystem gets to see anything. But once you hit a non-positive dentry or the parent of the target name, I'd rather hand over the the FS. NFSv4 has the ability to look up multiple components in a single LOOKUP cal= l. VFS doesn't give it a chance to try because it wants to go step-by-step, and wants each entry in the cache to have an inode etc. The earlier the filesystem gets control, the less completely-general the VFS needs to be. >=20 > > I'm not convinced that serialising 'lookup' calls is vital. If two thr= eads > > find a 'not-validated' dentry, and both try to look up the inode, they > > will both ultimately get the same struct_inode from the icache, and wil= l both > > succeed in connecting it to the dentry. Obviously it would be better to > > avoid two concurrent NFS "LOOKUP" requests, but that is a problem for N= FS to > > solve. I suspect that using d_fsdata to point to a pending LOOKUP requ= est > > would allow the "second" thread to wait for that request to finish. Ot= her > > filesystems would take a completely different approach. >=20 > See upthread regarding multiple negative dentries with the same name and = fun > consequences thereof. There might be _NO_ inode. At all. dcache has a = large > negative component and without it you'd get really fucked on NFS as soon > as you try to compile anything. Shitloads of headers, looked up in a lot= of > directories. Most of the lookups ending up negative. We really do need = that > stuff... Of course negative dentries are important and having multiple would be unfortunate. I don't suggest that for a moment. I'm suggesting three different states for a dentry: positive, negative, don= 't know. "don't know" is a new state that isn't currently allowed. While a filesystem is performing 'lookup', doing its own locking or not, the dentry would be "don't know". Anything that needed to know would block somewhere in the filesystem code on whatever lock or waitqueue or whatever that the filesystem developer felt as appropriate. On i_mutex if generic_foo() was in use. If NFSv4 did a multi-component lookup, the intermediate dentries would be "don't know" even while they had children. For local filesystems, that sort of thing would never happen. For NFS - which has to allow for random chang= es on the server anyway - it is just part of the game. NeilBrown --Sig_/ZEbD6cvih_BGxfttKmtv8wB Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIVAwUBVVbLZznsnt1WYoG5AQIODg/9F1m8EYcJW6iN8ilrJt79rpwUi5ogQ8qX XGRV/MSieFVHtt4CdznzV9aYTobs4GRpVKaaog4jdFeiR4eWnZTH7t+bfo5rzu+o xcepg7ShAtgXRU8jxja8NLR09SELtnaqvA6aFM44qjMLN/pN+vp30aJitYuqsewz 2TbGlzd3Qfb69/+62ygkEQseT8mQhXgyGUV5HhkW1qywh3RtWkQkxfm0PBoVzCQs gduHqKP4oDBGbKXL8FoEMRb20dwPXZ9LCf1hd161kjp9EMISoNoAQAubmNPd0mMH 8DiRifsXkM51xtW0NPEplMFkKodaHAaP0btQgkaAuBXN1MmOb1gABJBJUyfubVA7 nVC3tN1xWFgHXLRA5nyyrH+N71GOd3ccbHcPyk0UcGjdS4eGf37NJSqm3+m9Wwjf xX/fwN4nI8QlbvTjKUicEN09M1To6Clsjl3PJGA6vXbp5M+mmxvCclnqV/cVMNWO DNatcGjWtQz+ZxSEce98p5YaYd6cWWWiMrzMZNa7Um8B4Gj12V1gd7tbJKCWuHWQ QZNoFge0VJdCwJedRxusftISyf0w29wwj78YI7JctNndeY5cnNgGEsTc38Ybzt/p AHLCouNBQN3ImyG/y53Vylw1ebvxpKym8SuwQ0YeoSUf/d2VUYfXHpaWk2WY9ngX IhqNOBe4gR0= =TuON -----END PGP SIGNATURE----- --Sig_/ZEbD6cvih_BGxfttKmtv8wB-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/