Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756836AbaGVUxK (ORCPT ); Tue, 22 Jul 2014 16:53:10 -0400 Received: from www.linutronix.de ([62.245.132.108]:40507 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752460AbaGVUxI (ORCPT ); Tue, 22 Jul 2014 16:53:08 -0400 Date: Tue, 22 Jul 2014 22:52:39 +0200 (CEST) From: Thomas Gleixner To: Waiman Long cc: Peter Zijlstra , Steven Rostedt , Darren Hart , Andy Lutomirski , Andi Kleen , Ingo Molnar , Davidlohr Bueso , Heiko Carstens , "linux-kernel@vger.kernel.org" , Linux API , "linux-doc@vger.kernel.org" , Jason Low , Scott J Norton , Robert Haas Subject: Re: [RFC PATCH 0/5] futex: introduce an optimistic spinning futex In-Reply-To: <53CEC8AC.7020700@hp.com> Message-ID: References: <20140721212740.GS3935@laptop> <20140721213457.46623e2f@gandalf.local.home> <20140722074719.GV3935@laptop> <20140722084842.GZ3935@laptop> <53CEC8AC.7020700@hp.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 22 Jul 2014, Waiman Long wrote: > On 07/22/2014 05:59 AM, Thomas Gleixner wrote: > > On Tue, 22 Jul 2014, Peter Zijlstra wrote: > > > On Tue, Jul 22, 2014 at 10:39:17AM +0200, Thomas Gleixner wrote: > > > > On Tue, 22 Jul 2014, Peter Zijlstra wrote: > > > > > Anyway, there is one big fail in the entire futex stack that we 'need' > > > > > to sort some day and that is NUMA. Some people (again database people) > > > > > explicitly do not use futexes and instead use sysvsem because of this. > > > > > > > > > > The problem with numa futexes is that because they're vaddr based > > > > > there > > > > > is no (persistent) node information. You always end up having to fall > > > > > back to looking in all nodes before you can guarantee there is no > > > > > matching futex. > > > > > > > > > > One way to achieve it is by extending the futex value to include a > > > > > node > > > > > number, but that's obviously a complete ABI break. Then again, it > > > > > should > > > > > be pretty straight fwd, since the node number doesn't need to be part > > > > > of > > > > > the actual atomic update part, just part of the userspace storage. > > > > So you want per node hash buckets, right? Fair enough, but how do you > > > > make sure, that no thread/process on a different node is fiddling with > > > > that "node bound" futex as well? > > > You don't and that should work just as well, just slower. But since the > > > node id is in the futex 'value' we'll always end up in the right > > > node-hash, even if its a remote one. > > > > > > So yes, per node hashes, and a persistent futex->node map. > > Which works fine as long as you only have the futex_q on the stack of > > the blocked task. If user space is lying to you, then you just end up > > with a bunch of threads sleeping forever. Who cares? > > > > But if you create independent kernel state, which we have with > > pi_state and which you need for finegrained locking and further > > spinning fun, you open up another can of worms. Simply because this > > would enable rogue user space to create multiple instances of the > > kernel internal state. I can predict the CVEs resulting from that > > even without using a crystal ball. > > > > Thanks, > > > > tglx > > I think NUMA futex, if implemented, is a completely independent piece that > have no direct relationship with optimistic spinning futex. It should be a > separate patch and not mixing with optimistic spinning patch which will only > make the latter one more complicated. Bullshit. Of course it handles separate issues, but Peter is completely right, that the NUMA aspect is a far bigger issue than the optimistic spinning stuff. Do you have an idea what the costs of cross node memory access and cacheline bouncing are? Obviously not, as you only interest seems to be to slap optimistic spinning to every place which deals with locking. And if you had tried to read _AND_ understand the discussion above, you might have noticed that providing NUMA awareness requires a lot of the functionality which is needed for optimistic spinning as well. But no, you did not even take the time to think about it, you just claim that it makes your optimistic stuff more complicated. Just get it, there is a world outside of optimistic spinning. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/