Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752296Ab0AGRuS (ORCPT ); Thu, 7 Jan 2010 12:50:18 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752085Ab0AGRuR (ORCPT ); Thu, 7 Jan 2010 12:50:17 -0500 Received: from smtp1.linux-foundation.org ([140.211.169.13]:52989 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751839Ab0AGRuQ (ORCPT ); Thu, 7 Jan 2010 12:50:16 -0500 Date: Thu, 7 Jan 2010 09:49:45 -0800 (PST) From: Linus Torvalds X-X-Sender: torvalds@localhost.localdomain To: Peter Zijlstra cc: Christoph Lameter , Arjan van de Ven , "Paul E. McKenney" , KAMEZAWA Hiroyuki , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "minchan.kim@gmail.com" , "hugh.dickins" , Nick Piggin , Ingo Molnar Subject: Re: [RFC][PATCH 6/8] mm: handle_speculative_fault() In-Reply-To: Message-ID: References: <20100104182429.833180340@chello.nl> <20100104182813.753545361@chello.nl> <20100105054536.44bf8002@infradead.org> <20100105192243.1d6b2213@infradead.org> <1262884960.4049.106.camel@laptop> User-Agent: Alpine 2.00 (LFD 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2671 Lines: 56 On Thu, 7 Jan 2010, Linus Torvalds wrote: > > Well, I have yet to hear a realistic scenario of _how_ to do it all > speculatively in the first place, at least not without horribly subtle > complexity issues. So I'd really rather see how far we can possibly get by > just improving mmap_sem. For an example of this: it's entirely possible that one avenue of mmap_sem improvement would be to look at the _writer_ side, and see how that can be improved. An example of where we've done that is in madvise(): we used to always take it for writing (because _some_ madvise versions needed the exclusive access). And suddenly some operations got way more scalable, and work in the presense of concurrent page faults. And quite frankly, I'd _much_ rather look at that kind of simple and logically fairly straightforward solutions, instead of doing the whole speculative page fault work. For example: there's no real reason why we take mmap_sem for writing when extending an existing vma. And while 'brk()' is a very oldfashioned way of doing memory management, it's still quite common. So rather than looking at subtle lockless algorithms, why not look at doing the common cases of an extending brk? Make that one take the mmap_sem for _reading_, and then do the extending of the brk area with a simple cmpxchg or something? And "extending brk" is actually a lot more common than shrinking it, and is common for exactly the kind of workloads that are often nasty right now (threaded allocators with lots and lots of smallish allocations) The thing is, I can pretty much _guarantee_ that the speculative page fault is going to end up doing a lot of nasty stuff that still needs almost-global locking, and it's likely to be more complicated and slower for the single-threaded case (you end up needing refcounts, a new "local" lock or something). Sure, moving to a per-vma lock can help, but it doesn't help a lot. It doesn't help AT ALL for the single-threaded case, and for the multi-threaded case I will bet you that a _lot_ of cases will have one very hot vma - the regular data vma that gets shared for normal malloc() etc. So I'm personally rather doubtful about the whole speculative work. It's a fair amount of complexity without any really obvious upside. Yes, the mmap_sem can be very annoying, but nobody can really honestly claim that we've really optimized it all that much. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/