Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964785AbWHHKKo (ORCPT ); Tue, 8 Aug 2006 06:10:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S964787AbWHHKKn (ORCPT ); Tue, 8 Aug 2006 06:10:43 -0400 Received: from pfx2.jmh.fr ([194.153.89.55]:41376 "EHLO pfx2.jmh.fr") by vger.kernel.org with ESMTP id S964786AbWHHKKm (ORCPT ); Tue, 8 Aug 2006 06:10:42 -0400 From: Eric Dumazet To: Andi Kleen Subject: Re: [RFC] NUMA futex hashing Date: Tue, 8 Aug 2006 12:10:39 +0200 User-Agent: KMail/1.9.1 Cc: Ravikiran G Thirumalai , "Shai Fultheim (Shai@scalex86.org)" , pravin b shelar , linux-kernel@vger.kernel.org References: <20060808070708.GA3931@localhost.localdomain> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200608081210.40334.dada1@cosmosbay.com> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1555 Lines: 33 On Tuesday 08 August 2006 11:57, Andi Kleen wrote: > Ravikiran G Thirumalai writes: > > Current futex hash scheme is not the best for NUMA. The futex hash > > table is an array of struct futex_hash_bucket, which is just a spinlock > > and a list_head -- this means multiple spinlocks on the same cacheline > > and on NUMA machines, on the same internode cacheline. If futexes of two > > unrelated threads running on two different nodes happen to hash onto > > adjacent hash buckets, or buckets on the same internode cacheline, then > > we have the internode cacheline bouncing between nodes. > > When I did some testing with a (arguably far too lock intensive) benchmark > on a bigger box I got most bouncing cycles not in the futex locks itself, > but in the down_read on the mm semaphore. This is true, even with a normal application (not a biased benchmark) and using oprofile. mmap_sem is the killer. We may have special case for PRIVATE futexes (they dont need to be chained in a global table, but a process private table) POSIX thread api already can let the application tell glibc/kernel a mutex/futex ahe a process scope. For this private futexes, I think we would not need to down_read(mmap_sem) at all. (only a/some lock/s protecting the process private table) Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/