Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946270AbbEPBi7 (ORCPT ); Fri, 15 May 2015 21:38:59 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:58693 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934479AbbEPBi6 (ORCPT ); Fri, 15 May 2015 21:38:58 -0400 Date: Sat, 16 May 2015 02:38:53 +0100 From: Al Viro To: Linus Torvalds Cc: NeilBrown , Andreas Dilger , Dave Chinner , Linux Kernel Mailing List , linux-fsdevel , Christoph Hellwig Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks Message-ID: <20150516013853.GN7232@ZenIV.linux.org.uk> References: <20150505052205.GS889@ZenIV.linux.org.uk> <20150511180650.GA4147@ZenIV.linux.org.uk> <20150513222533.GA24192@ZenIV.linux.org.uk> <20150514033040.GF7232@ZenIV.linux.org.uk> <20150514112304.GT15721@dastard> <20150516093022.51e1464e@notabene.brown> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2042 Lines: 36 On Fri, May 15, 2015 at 05:45:56PM -0700, Linus Torvalds wrote: > Al, do you have any ideas? Personally, I've wanted to make I_mutex a > rwsem for a long time, but right now pretty much everything uses it > for exclusion. For example, filename lookup is clearly just reading > the directory, so it should take a rwsem for reading, right? No. Not > the way it is done now. Filename lookup wants the directory inode > exclusively because that guarantees that we create just one dentry and > call the filesystem ->lookup only once on that dentry. rwsem by itself won't do us much good there. Look: for multiple lookups on the same existing entry we could try to teach d_splice_alias() to cope, etc. But what happens when a bunch of processes looks for the same inexistent entry? And no, "who cares about fuckloads of negatives with the same name" isn't a good answer - suppose we do mkdir() after that. OK, so we'll find a negative dentry in dcache. And tell the filesystem to create the sucker. Done. Made it positive. Now, do we hunt down all _other_ negative dentries for it? Or never keep negative ones at all. Or slap some kind of ->d_revalidate() there to catch all negative dentries creates before the last mkdir/creat/mknod/symlink/link in given parent? One possibility would be a new dentry state - "being looked up". Hashed, treated as "fall out of RCU mode" for lazy pathwalk purposes, and places where we call ->lookup() would (while still holding ->i_mutex on parent shared) wait for that state to end. Places where we call ->d_revalidate() (with or without ->i_mutex on parent) would also wait on those. It would need a careful analysis of tree-walkers, though. Doable, but there might be dragons. In case of e.g. ceph - swamp ones, with mirror in the line of sight... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/