Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934987AbbEPBrI (ORCPT ); Fri, 15 May 2015 21:47:08 -0400 Received: from mail-ie0-f178.google.com ([209.85.223.178]:35982 "EHLO mail-ie0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934523AbbEPBrF (ORCPT ); Fri, 15 May 2015 21:47:05 -0400 MIME-Version: 1.0 In-Reply-To: <20150516112503.2f970573@notabene.brown> References: <20150505052205.GS889@ZenIV.linux.org.uk> <20150511180650.GA4147@ZenIV.linux.org.uk> <20150513222533.GA24192@ZenIV.linux.org.uk> <20150514033040.GF7232@ZenIV.linux.org.uk> <20150514112304.GT15721@dastard> <20150516093022.51e1464e@notabene.brown> <20150516112503.2f970573@notabene.brown> Date: Fri, 15 May 2015 18:47:04 -0700 X-Google-Sender-Auth: t5Q35d0T0eQ5AGx39oR_V7FtYbM Message-ID: Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks From: Linus Torvalds To: NeilBrown Cc: Andreas Dilger , Dave Chinner , Al Viro , Linux Kernel Mailing List , linux-fsdevel , Christoph Hellwig Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3216 Lines: 72 On Fri, May 15, 2015 at 6:25 PM, NeilBrown wrote: >> >> For example, simply that we only ever have one single dentry for a >> particular name, and that we only ever have one active lookup per >> dentry. Those things happen independently of - and before - the server >> even sees the operation. > > But surely those things can be managed with a spinlock. Some of them could. But no, not in general. Exactly because of that "only one lookup of this particular dentry". The lookup sleeps, so when another thread tries to look up the same dentry, it needs to hit a sleeping lock. And since at that point we don't have the new dentry yet (much less the inode we're looking up), it is in the only place we *do* have: the inode of the directory we're looking up. Now, maybe we could solve it with a new sleeping lock in the dentry itself. Maybe we could allocate the new dentry early, add it to the directory the usual way, but mark it as being "not ready" (so that d_lookup() wouldn't use it). And have the sleeping lock be a new sleeping lock in the dentry. But that would be a very big change. And I suspect Al has tons of reasons why it would have various problems. We do rely on the i_mutex a fair amount.. > I think a big part of the problem is that the VFS tries to control > filesystems rather than provide services to them. Well, most of the time we do aim to provice services (eg the page cache generally works that way). But the dentry layer really is pretty important. And it's a *huge* performance win. Yes, yes, you don't see it when it works, you only see the bad cases, and there is no way in hell we can let the low-level filesystem control the dentry tree. Also, remember: we do have something like 65 different filesystems. It's a *big* deal to make them all work "well enough". Making the locking rules unambiguous is pretty primary. Making one particular filesystem perform well is important to _you_, but in the big picture... >> - readdir(). This is mostly to make it hard for filesystems to do the >> wrong thing when there is concurrent file creation. > > and makes it impossible for them to do the right thing too? Well, possibly. As mentioned, the readdir case might actually be pretty trivial, but it just hasn't come up often. I've literally only ever seen it for samba. I'd take a tested patch. Hint, hint. >> Again, there tend to be no simple benchmarks or loads that people care >> about that show this. Most of the time it's fairly hard to see. > > Which "this"? Readdir? I haven't come across one (NFS READDIR issues are > all about when to use READDIR_PLUS and when not to). > Or create-contention-on-i_mutex? The load my customer had was 'make -j60' on > a many-cpu machine. Both/either. The problem is that the "make -j60" case seems to depend on the filesystem having bad latency - it certainly doesn't show up on a local filesystem. Maybe that could be simulated some way.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/