Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2992474AbbEOVPj (ORCPT ); Fri, 15 May 2015 17:15:39 -0400 Received: from mail-pd0-f175.google.com ([209.85.192.175]:36563 "EHLO mail-pd0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422936AbbEOVPg convert rfc822-to-8bit (ORCPT ); Fri, 15 May 2015 17:15:36 -0400 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks From: Andreas Dilger In-Reply-To: <20150514112304.GT15721@dastard> Date: Fri, 15 May 2015 15:15:48 -0600 Cc: Linus Torvalds , Al Viro , Linux Kernel Mailing List , linux-fsdevel , Christoph Hellwig , Neil Brown Content-Transfer-Encoding: 8BIT Message-Id: References: <20150505052205.GS889@ZenIV.linux.org.uk> <20150511180650.GA4147@ZenIV.linux.org.uk> <20150513222533.GA24192@ZenIV.linux.org.uk> <20150514033040.GF7232@ZenIV.linux.org.uk> <20150514112304.GT15721@dastard> To: Dave Chinner X-Mailer: Apple Mail (2.2098) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3738 Lines: 87 On May 14, 2015, at 5:23 AM, Dave Chinner wrote: > > On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote: >> On Wed, May 13, 2015 at 8:30 PM, Al Viro wrote: >>> >>> Maybe... I'd like to see the profiles, TBH - especially getxattr() and >>> access() frequency on various loads. Sure, make(1) and cc(1) really care >>> about stat() very much, but I wouldn't be surprised if something like >>> httpd or samba would be hitting getxattr() a lot... >> >> So I haven't seen samba profiles in ages, but iirc we have more >> serious problems than trying to speed up basic filename lookup. >> >> At least long long ago, inode semaphore contention was a big deal, >> largely due to readdir(). > > It still is - it's the prime reason people still need to create > hashed directory structures so that they can get concurrency in > directory operations. IMO, concurrency in directory operations is a > more important problem to solve than worrying about readdir speed; > in large filesystems readdir and lookup are IO bound operations and > so everything serialises on the IO as it's done with the i_mutex > held.... We've had a patch[*] to add ext4 parallel directory operations in Lustre for a few years, that adds separate locks for each internal tree and leaf block instead of using i_mutex, so it scales as the size of the directory grows. This definitely improved many-threaded directory create/lookup/unlink performance (rename still uses a single lock). Since it is used only directly on the server within the kernel there is no VFS interface for it. The last time that I brought VFS integration up with Al a year or two ago he thought the complexity was very high and not worth the effort, but maybe with increasing thread counts on clients it is time to look into it? Cheers, Andreas [*] For anyone interested, the latest version of the ext4 patch is at: http://git.whamcloud.com/fs/lustre-release.git/blob/ac8a7f3d22ef007bcf62bfb4c6ed747cec6e874e:/ldiskfs/kernel_patches/patches/rhel7/ext4-pdirop.patch >> And readdir() itself, for that matter - we have no good vfs-level >> readdir caching, so it all ends up serialized on the inode >> semaphore, and it all goes all the way into the filesystem to get >> the readdir data. And at least for ext4, readdir() >> is slow anyway, because it doesn't use the page cache, it uses >> that good old buffer cache, because of how ext4 does metadata >> journaling etc. > > IIRC, ext4 readdir is not slow because of the use of the buffer > cache, it's slow because of the way it hashes dirents across blocks > on disk. i.e. it has locality issues, not a caching problem. > >> Having readdir() caching at the VFS layer would likely be a really >> good thing, but it's hard. It *might* be worth looking at the nfs4 >> code to see if we could possibly move some of that code into the vfs >> layer, but the answer is likely "no", or at least "that's incredibly >> painful". > > Maybe I'm missing something - what operation would be sped up by > caching readdir data? Are you trying to optimise the ->lookup that > tends to follow readdir by caching individual dirents? Or something > else? > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/