Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2992535AbbEPAqB (ORCPT ); Fri, 15 May 2015 20:46:01 -0400 Received: from mail-ie0-f175.google.com ([209.85.223.175]:32889 "EHLO mail-ie0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934479AbbEPAp5 (ORCPT ); Fri, 15 May 2015 20:45:57 -0400 MIME-Version: 1.0 In-Reply-To: <20150516093022.51e1464e@notabene.brown> References: <20150505052205.GS889@ZenIV.linux.org.uk> <20150511180650.GA4147@ZenIV.linux.org.uk> <20150513222533.GA24192@ZenIV.linux.org.uk> <20150514033040.GF7232@ZenIV.linux.org.uk> <20150514112304.GT15721@dastard> <20150516093022.51e1464e@notabene.brown> Date: Fri, 15 May 2015 17:45:56 -0700 X-Google-Sender-Auth: AySje3I4xWV4g-hHGPtlc3jCJlk Message-ID: Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks From: Linus Torvalds To: NeilBrown Cc: Andreas Dilger , Dave Chinner , Al Viro , Linux Kernel Mailing List , linux-fsdevel , Christoph Hellwig Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3019 Lines: 64 On Fri, May 15, 2015 at 4:30 PM, NeilBrown wrote: > > .. and I've been wondering what to do about i_mutex and NFS. I've had > customer reports of slowness in creating files that seems to be due to > i_mutex on the directory being held over the whole 'create' RPC, so only one > of those can be in flight at the one time. > "make -j" on a large source directory can easily want to create lots of > "*.o" files at "the same time". > > And NFS doesn't need i_mutex at all because the server will provide the > needed guarantees. So i_mutex on a directory is probably the nastiest lock we have in the fs layer. It's used for several different half-related things: - serialize filename creation/deletion This is partly for the benefit of the filesystem itself (and not helpful for NFS, as you note), but it's also very much about making sure we have uniqueness guarantees at the VFS layer too. So even with NFS, it's not just "the server provides the needed guarantees", because some of the guarantees are really client-local. For example, simply that we only ever have one single dentry for a particular name, and that we only ever have one active lookup per dentry. Those things happen independently of - and before - the server even sees the operation. So the whole local directory tree consistency ends up depending on this. - readdir(). This is mostly to make it hard for filesystems to do the wrong thing when there is concurrent file creation. I suspect readdir could fairly easily push the i_mutex down from the caller and into the filesystem, and then filesystems might narrow down the use (or even get rid of it). The initial patch might even be automated with coccinelle. However, rather few loads actually have a lot of readdir() activity, and samba is probably the only major one. I've seen benchmarks where it matters, but they are rare (and I haven't seen one in literally years). So the readdir case could probably be at least relaxed fairly easily. But the thing that tends to hurt on more loads is, as you note, the filename lookup/creation/movement case. And that's much harder to fix. Al, do you have any ideas? Personally, I've wanted to make I_mutex a rwsem for a long time, but right now pretty much everything uses it for exclusion. For example, filename lookup is clearly just reading the directory, so it should take a rwsem for reading, right? No. Not the way it is done now. Filename lookup wants the directory inode exclusively because that guarantees that we create just one dentry and call the filesystem ->lookup only once on that dentry. Again, there tend to be no simple benchmarks or loads that people care about that show this. Most of the time it's fairly hard to see. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/