Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753491AbaGVIsw (ORCPT ); Tue, 22 Jul 2014 04:48:52 -0400 Received: from casper.infradead.org ([85.118.1.10]:59188 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751861AbaGVIst (ORCPT ); Tue, 22 Jul 2014 04:48:49 -0400 Date: Tue, 22 Jul 2014 10:48:42 +0200 From: Peter Zijlstra To: Thomas Gleixner Cc: Steven Rostedt , Darren Hart , Andy Lutomirski , Andi Kleen , Waiman Long , Ingo Molnar , Davidlohr Bueso , Heiko Carstens , "linux-kernel@vger.kernel.org" , Linux API , "linux-doc@vger.kernel.org" , Jason Low , Scott J Norton , Robert Haas Subject: Re: [RFC PATCH 0/5] futex: introduce an optimistic spinning futex Message-ID: <20140722084842.GZ3935@laptop> References: <20140721212740.GS3935@laptop> <20140721213457.46623e2f@gandalf.local.home> <20140722074719.GV3935@laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 22, 2014 at 10:39:17AM +0200, Thomas Gleixner wrote: > On Tue, 22 Jul 2014, Peter Zijlstra wrote: > > Anyway, there is one big fail in the entire futex stack that we 'need' > > to sort some day and that is NUMA. Some people (again database people) > > explicitly do not use futexes and instead use sysvsem because of this. > > > > The problem with numa futexes is that because they're vaddr based there > > is no (persistent) node information. You always end up having to fall > > back to looking in all nodes before you can guarantee there is no > > matching futex. > > > > One way to achieve it is by extending the futex value to include a node > > number, but that's obviously a complete ABI break. Then again, it should > > be pretty straight fwd, since the node number doesn't need to be part of > > the actual atomic update part, just part of the userspace storage. > > So you want per node hash buckets, right? Fair enough, but how do you > make sure, that no thread/process on a different node is fiddling with > that "node bound" futex as well? You don't and that should work just as well, just slower. But since the node id is in the futex 'value' we'll always end up in the right node-hash, even if its a remote one. So yes, per node hashes, and a persistent futex->node map. And before people start talking about mempol and using that to bind memory to nodes and such, remember that private futexes do not have a vma lookup and therefore mempols are impossible to use. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/