Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753173AbZAOVLn (ORCPT ); Thu, 15 Jan 2009 16:11:43 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S933800AbZAOVFn (ORCPT ); Thu, 15 Jan 2009 16:05:43 -0500 Received: from rcsinet12.oracle.com ([148.87.113.124]:30437 "EHLO rgminet12.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934147AbZAOVFk (ORCPT ); Thu, 15 Jan 2009 16:05:40 -0500 Subject: Re: [GIT PULL] adaptive spinning mutexes From: Chris Mason To: Linus Torvalds Cc: Ingo Molnar , Matthew Wilcox , Peter Zijlstra , "Paul E. McKenney" , Gregory Haskins , Andi Kleen , Andrew Morton , Linux Kernel Mailing List , linux-fsdevel , linux-btrfs , Thomas Gleixner , Nick Piggin , Peter Morreale , Sven Dietrich , Dmitry Adamushko , Johannes Weiner In-Reply-To: References: <1231863710.7141.3.camel@twins> <1231864854.7141.8.camel@twins> <1231867314.7141.16.camel@twins> <1231952436.14825.28.camel@laptop> <20090114183319.GA18630@elte.hu> <20090114184746.GA21334@elte.hu> <20090114192811.GA19691@elte.hu> <20090115174440.GF29283@parisc-linux.org> <20090115180844.GL22472@elte.hu> <1232047618.8269.93.camel@think.oraclecorp.com> Content-Type: text/plain Date: Thu, 15 Jan 2009 16:04:18 -0500 Message-Id: <1232053458.8269.139.camel@think.oraclecorp.com> Mime-Version: 1.0 X-Mailer: Evolution 2.24.1 Content-Transfer-Encoding: 7bit X-Source-IP: acsmt706.oracle.com [141.146.40.84] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090202.496FA4D8.01B3:SCFSTAT928724,ss=1,fgs=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2978 Lines: 80 On Thu, 2009-01-15 at 12:13 -0800, Linus Torvalds wrote: > > On Thu, 15 Jan 2009, Chris Mason wrote: > > > On Thu, 2009-01-15 at 10:16 -0800, Linus Torvalds wrote: > > > > > > Umm. Except if you wrote the code nicely and used spinlocks, you wouldn't > > > hold the lock over all those unnecessary and complex operations. > > > > While this is true, there are examples of places we should expect > > speedups for this today. > > Sure. There are cases where we do have to use sleeping things, because the > code is generic and really can't control what lower levels do, and those > lower levels have to be able to sleep. > > So: > > > Concurrent file creation/deletion in a single dir will often find things > > hot in cache and not have to block anywhere (mail spools). > > The inode->i_mutex thing really does need to use a mutex, and spinning > will help. Of course, it should only help when you really have lots of > concurrent create/delete/readdir in the same directory, and that hopefully > is a very rare load in real life, but hey, it's a valid one. > Mail server deletes is the common one I can think of. Mail server creates end up bound by fsync in my testing here, so it doesn't quite show up unless your IO subsystem is much faster than your kernel. > > Concurrent O_DIRECT aio writes to the same file, where i_mutex is > > dropped early on. > > Won't the actual IO costs generally dominate in everything but trivial > benchmarks? Benchmarks in the past on large IO rigs have shown it. It isn't something you're going to see on a regular sized server, but for AIO + O_DIRECT, the pieces that hold i_mutex can be seen in real life. [ re: pipes, ok I don't know of realistic pipe benchmarks but I'll run them if people can suggest one ] I ran a mailserver delivery simulation (fs_mark), and then my standard dbench/4k creates/4k stats on ext3 and ext4. ext3 had very similar scores with spinning on and off. ext4 scored the same on mailserver delivery (fsync bound apparently) For the others: dbench no spin: 1970MB/s dbench spin: 3036MB/s (4k creates in file/sec, higher is better) 4k creates no spin: avg 418.91 median 346.12 std 306.37 high 2025.80 low 338.11 4k creates spin: avg 406.09 median 344.27 std 179.95 high 1249.78 low 336.10 Somehow the spinning mutex made ext4 more fair. The 4k create test had 50 procs doing file creates in parallel, but each proc was operating in its own directory. So, they shouldn't actually be competing for any of the VFS mutexes. The stat run was the same patched/unpatched, probably for the same reason. None of this is very earth shattering, apparently my canned benchmarks for btrfs pain points don't apply that well to ext34. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/