Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764439AbZANRd2 (ORCPT ); Wed, 14 Jan 2009 12:33:28 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1764262AbZANRdG (ORCPT ); Wed, 14 Jan 2009 12:33:06 -0500 Received: from mail-bw0-f21.google.com ([209.85.218.21]:52591 "EHLO mail-bw0-f21.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1763518AbZANRdC (ORCPT ); Wed, 14 Jan 2009 12:33:02 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=vhm+QS20i8sszDHoZ5UQbHj93FwyI0doTtrYkSUvPXLGhj+HGl5IJH+WW1AtzQ1Q71 5y3Wz5vZ6bKYGNRPTNRwCosmov83Qpq0A8xzgbT5hk995b0I0h2qIlVrHz13JuPSkETp 179deoZov/T6cPYby7Ekuq4k4OWpMNHgQS56I= MIME-Version: 1.0 In-Reply-To: <1231951662.8269.22.camel@think.oraclecorp.com> References: <1231774622.4371.96.camel@laptop> <1231859742.442.128.camel@twins> <1231863710.7141.3.camel@twins> <1231864854.7141.8.camel@twins> <1231867314.7141.16.camel@twins> <1231901899.1709.18.camel@think.oraclecorp.com> <1231951662.8269.22.camel@think.oraclecorp.com> Date: Wed, 14 Jan 2009 18:32:58 +0100 Message-ID: Subject: Re: [PATCH -v9][RFC] mutex: implement adaptive spinning From: Dmitry Adamushko To: Chris Mason Cc: Peter Zijlstra , Linus Torvalds , Ingo Molnar , "Paul E. McKenney" , Gregory Haskins , Matthew Wilcox , Andi Kleen , Andrew Morton , Linux Kernel Mailing List , linux-fsdevel , linux-btrfs , Thomas Gleixner , Nick Piggin , Peter Morreale , Sven Dietrich Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4684 Lines: 126 2009/1/14 Chris Mason : > On Wed, 2009-01-14 at 12:18 +0100, Dmitry Adamushko wrote: >> 2009/1/14 Chris Mason : >> > On Tue, 2009-01-13 at 18:21 +0100, Peter Zijlstra wrote: >> >> On Tue, 2009-01-13 at 08:49 -0800, Linus Torvalds wrote: >> >> > >> >> > So do a v10, and ask people to test. >> >> >> >> --- >> >> Subject: mutex: implement adaptive spinning >> >> From: Peter Zijlstra >> >> Date: Mon Jan 12 14:01:47 CET 2009 >> >> >> >> Change mutex contention behaviour such that it will sometimes busy wait on >> >> acquisition - moving its behaviour closer to that of spinlocks. >> >> >> > >> > I've spent a bunch of time on this one, and noticed earlier today that I >> > still had bits of CONFIG_FTRACE compiling. I wasn't actually tracing >> > anything, but it seems to have had a big performance hit. >> > >> > The bad news is the simple spin got much much faster, dbench 50 coming >> > in at 1282MB/s instead of 580MB/s. (other benchmarks give similar >> > results) >> > >> > v10 is better that not spinning, but its in the 5-10% range. So, I've >> > been trying to find ways to close the gap, just to understand exactly >> > where it is different. >> > >> > If I take out: >> > /* >> > * If there are pending waiters, join them. >> > */ >> > if (!list_empty(&lock->wait_list)) >> > break; >> > >> > >> > v10 pops dbench 50 up to 1800MB/s. The other tests soundly beat my >> > spinning and aren't less fair. But clearly this isn't a good solution. >> > >> > I tried a few variations, like only checking the wait list once before >> > looping, which helps some. Are there other suggestions on better tuning >> > options? >> >> (some thoughts/speculations) >> >> Perhaps for highly-contanded mutexes the spinning implementation may >> quickly degrade [*] to the non-spinning one (i.e. the current >> sleep-wait mutex) and then just stay in this state until a moment of >> time when there are no waiters [**] -- i.e. >> list_empty(&lock->wait_list) == 1 and waiters can start spinning >> again. > > It is actually ok if the highly contention mutexes don't degrade as long > as they are highly contended and the holder isn't likely to schedule. Yes, my point was that they likely do fall back (degrade) to the wait-on-the-list (non-spinning) behavior with dbench, and that's why the performance numbers are similar (5-10% IIRC) to those of the 'normal' mutex. And the thing is that for the highly contention mutexes, it's not easy to switch back to the (fast) spinning-wait -- just because most of the time there is someone waiting for the lock (and if this 'someone' is not a spinner, none of the new mutex_lock() requests can do busy-waiting/spinning). But whatever, without the list_empty() check it's not relevant any more. >> >> what may trigger [*]: >> >> (1) obviously, an owner scheduling out. >> >> Even if it happens rarely (otherwise, it's not a target scenario for >> our optimization), due to the [**] it may take quite some time until >> waiters are able to spin again. >> >> let's say, waiters (almost) never block (and possibly, such cases >> would be better off just using a spinlock after some refactoring, if >> possible) >> >> (2) need_resched() is triggered for one of the waiters. >> >> (3) !owner && rt_task(p) >> >> quite unlikely, but possible (there are 2 race windows). >> >> Of course, the question is whether it really takes a noticeable amount >> of time to get out of the [**] state. >> I'd imagine it can be a case for highly-contended locks. >> >> If this is the case indeed, then which of 1,2,3 gets triggered the most. > > Sorry, I don't have stats on that. > >> >> Have you tried removing need_resched() checks? So we kind of emulate >> real spinlocks here. > > Unfortunately, the need_resched() checks deal with a few of the ugly > corners. They are more important without the waiter list check. > Basically if we spun without the need_resched() checks, the process who > wants to unlock might not be able to schedule back in. Yeah, for the "owner == NULL" case. If the owner was preempted in the fast path right after taking the lock and before calling mutex_set_owner(). btw., I wonder... would an additional preempt_disable/enable() in the fast path harm that much? We could avoid the preemption scenario above. > > -chris > -- Best regards, Dmitry Adamushko -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/