Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751343AbaAKHhd (ORCPT ); Sat, 11 Jan 2014 02:37:33 -0500 Received: from e36.co.us.ibm.com ([32.97.110.154]:53421 "EHLO e36.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751007AbaAKHha (ORCPT ); Sat, 11 Jan 2014 02:37:30 -0500 Date: Fri, 10 Jan 2014 23:37:24 -0800 From: "Paul E. McKenney" To: Davidlohr Bueso Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, dvhart@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, efault@gmx.de, jeffm@suse.com, torvalds@linux-foundation.org, jason.low2@hp.com, Waiman.Long@hp.com, tom.vaden@hp.com, scott.norton@hp.com, aswin@hp.com Subject: Re: [PATCH v5 2/4] futex: Larger hash table Message-ID: <20140111073724.GA10038@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <1388675120-8017-1-git-send-email-davidlohr@hp.com> <1388675120-8017-3-git-send-email-davidlohr@hp.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1388675120-8017-3-git-send-email-davidlohr@hp.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14011107-3532-0000-0000-000004A7BB8D Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 02, 2014 at 07:05:18AM -0800, Davidlohr Bueso wrote: > From: Davidlohr Bueso > > Currently, the futex global hash table suffers from it's fixed, smallish > (for today's standards) size of 256 entries, as well as its lack of NUMA > awareness. Large systems, using many futexes, can be prone to high amounts > of collisions; where these futexes hash to the same bucket and lead to > extra contention on the same hb->lock. Furthermore, cacheline bouncing is a > reality when we have multiple hb->locks residing on the same cacheline and > different futexes hash to adjacent buckets. > > This patch keeps the current static size of 16 entries for small systems, > or otherwise, 256 * ncpus (or larger as we need to round the number to a > power of 2). Note that this number of CPUs accounts for all CPUs that can > ever be available in the system, taking into consideration things like > hotpluging. While we do impose extra overhead at bootup by making the hash > table larger, this is a one time thing, and does not shadow the benefits > of this patch. > > Furthermore, as suggested by tglx, by cache aligning the hash buckets we can > avoid access across cacheline boundaries and also avoid massive cache line > bouncing if multiple cpus are hammering away at different hash buckets which > happen to reside in the same cache line. > > Also, similar to other core kernel components (pid, dcache, tcp), by using > alloc_large_system_hash() we benefit from its NUMA awareness and thus the > table is distributed among the nodes instead of in a single one. > > For a custom microbenchmark that pounds on the uaddr hashing -- making the wait > path fail at futex_wait_setup() returning -EWOULDBLOCK for large amounts of > futexes, we can see the following benefits on a 80-core, 8-socket 1Tb server: > > +---------+--------------------+------------------------+-----------------------+-------------------------------+ > | threads | baseline (ops/sec) | aligned-only (ops/sec) | large table (ops/sec) | large table+aligned (ops/sec) | > +---------+--------------------+------------------------+-----------------------+-------------------------------+ > | 512 | 32426 | 50531 (+55.8%) | 255274 (+687.2%) | 292553 (+802.2%) | > | 256 | 65360 | 99588 (+52.3%) | 443563 (+578.6%) | 508088 (+677.3%) | > | 128 | 125635 | 200075 (+59.2%) | 742613 (+491.1%) | 835452 (+564.9%) | > | 80 | 193559 | 323425 (+67.1%) | 1028147 (+431.1%) | 1130304 (+483.9%) | > | 64 | 247667 | 443740 (+79.1%) | 997300 (+302.6%) | 1145494 (+362.5%) | > | 32 | 628412 | 721401 (+14.7%) | 965996 (+53.7%) | 1122115 (+78.5%) | > +---------+--------------------+------------------------+-----------------------+-------------------------------+ > > Cc: Ingo Molnar > Reviewed-by: Darren Hart > Acked-by: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Paul E. McKenney > Cc: Mike Galbraith > Cc: Jeff Mahoney > Cc: Linus Torvalds > Cc: Scott Norton > Cc: Tom Vaden > Cc: Aswin Chandramouleeswaran > Reviewed-by: Waiman Long > Reviewed-and-tested-by: Jason Low > Signed-off-by: Davidlohr Bueso Reviewed-by: Paul E. McKenney > --- > kernel/futex.c | 26 +++++++++++++++++++------- > 1 file changed, 19 insertions(+), 7 deletions(-) > > diff --git a/kernel/futex.c b/kernel/futex.c > index 085f5fa..577481d 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -63,6 +63,7 @@ > #include > #include > #include > +#include > > #include > > @@ -70,8 +71,6 @@ > > int __read_mostly futex_cmpxchg_enabled; > > -#define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8) > - > /* > * Futex flags used to encode options to functions and preserve them across > * restarts. > @@ -149,9 +148,11 @@ static const struct futex_q futex_q_init = { > struct futex_hash_bucket { > spinlock_t lock; > struct plist_head chain; > -}; > +} ____cacheline_aligned_in_smp; > > -static struct futex_hash_bucket futex_queues[1< +static unsigned long __read_mostly futex_hashsize; > + > +static struct futex_hash_bucket *futex_queues; > > /* > * We hash on the keys returned from get_futex_key (see below). > @@ -161,7 +162,7 @@ static struct futex_hash_bucket *hash_futex(union futex_key *key) > u32 hash = jhash2((u32*)&key->both.word, > (sizeof(key->both.word)+sizeof(key->both.ptr))/4, > key->both.offset); > - return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)]; > + return &futex_queues[hash & (futex_hashsize - 1)]; > } > > /* > @@ -2719,7 +2720,18 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val, > static int __init futex_init(void) > { > u32 curval; > - int i; > + unsigned long i; > + > +#if CONFIG_BASE_SMALL > + futex_hashsize = 16; > +#else > + futex_hashsize = roundup_pow_of_two(256 * num_possible_cpus()); > +#endif > + > + futex_queues = alloc_large_system_hash("futex", sizeof(*futex_queues), > + futex_hashsize, 0, > + futex_hashsize < 256 ? HASH_SMALL : 0, > + NULL, NULL, futex_hashsize, futex_hashsize); > > /* > * This will fail and we want it. Some arch implementations do > @@ -2734,7 +2746,7 @@ static int __init futex_init(void) > if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT) > futex_cmpxchg_enabled = 1; > > - for (i = 0; i < ARRAY_SIZE(futex_queues); i++) { > + for (i = 0; i < futex_hashsize; i++) { > plist_head_init(&futex_queues[i].chain); > spin_lock_init(&futex_queues[i].lock); > } > -- > 1.8.1.4 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/