Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2992859AbbEPDQz (ORCPT ); Fri, 15 May 2015 23:16:55 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:58831 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2992557AbbEPDQw (ORCPT ); Fri, 15 May 2015 23:16:52 -0400 Date: Sat, 16 May 2015 04:16:47 +0100 From: Al Viro To: Linus Torvalds Cc: NeilBrown , Andreas Dilger , Dave Chinner , Linux Kernel Mailing List , linux-fsdevel , Christoph Hellwig Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks Message-ID: <20150516031647.GQ7232@ZenIV.linux.org.uk> References: <20150514033040.GF7232@ZenIV.linux.org.uk> <20150514112304.GT15721@dastard> <20150516093022.51e1464e@notabene.brown> <20150516112503.2f970573@notabene.brown> <20150516015540.GP7232@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1822 Lines: 33 On Fri, May 15, 2015 at 07:23:11PM -0700, Linus Torvalds wrote: > For filesystems that say that they are ok with, make lookup_slow() > (and *only* lookup_slow for now) instead take the rwsem for reading, > but in addition to that, take a hashed mutex. > > By "hashed mutex", I mean having a smallish table of mutexes (say, > 1024), and just creating a hash based on the name-hash and the parent > pointer. That way we can avoid all the issues with adding a new lock > to the dentry itself, or having to allocate a new child dentry just > for the lock. It *could* cause some cross-directory serialization due > to hash collisions, but that shouldn't be noticeable if the hash is of > a reasonable size and quality. What for? All we need is a flag, waitqueue and being woken up when the flag gets cleared. So let's just use the queue of parent's ->i_mutex and explicitly kick it when removing dentry flag. We *are* holding a reference on parent (we need that to hold that sucker shared, after all), so it's not going away under us... I'm all for gradual transformations, but in this case I suspect that doing it on per-fs basis isn't the best way to do it; gradual massage of code using dcache lookups or walking the lists of children in filesystems (fortunately, it's fairly rare these days, and we only need to care about the code checking if such a beast is hashed; d_alloc() already places new dentry on the list of children) would seem to be a better approach. We'd also need to audit fs/dcache.c tree-walking-related code itself, but that's much more limited. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/