Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756848AbaGVUZZ (ORCPT ); Tue, 22 Jul 2014 16:25:25 -0400 Received: from g2t2354.austin.hp.com ([15.217.128.53]:48496 "EHLO g2t2354.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752460AbaGVUZX (ORCPT ); Tue, 22 Jul 2014 16:25:23 -0400 Message-ID: <53CEC8AC.7020700@hp.com> Date: Tue, 22 Jul 2014 16:25:16 -0400 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Thomas Gleixner CC: Peter Zijlstra , Steven Rostedt , Darren Hart , Andy Lutomirski , Andi Kleen , Ingo Molnar , Davidlohr Bueso , Heiko Carstens , "linux-kernel@vger.kernel.org" , Linux API , "linux-doc@vger.kernel.org" , Jason Low , Scott J Norton , Robert Haas Subject: Re: [RFC PATCH 0/5] futex: introduce an optimistic spinning futex References: <20140721212740.GS3935@laptop> <20140721213457.46623e2f@gandalf.local.home> <20140722074719.GV3935@laptop> <20140722084842.GZ3935@laptop> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/22/2014 05:59 AM, Thomas Gleixner wrote: > On Tue, 22 Jul 2014, Peter Zijlstra wrote: >> On Tue, Jul 22, 2014 at 10:39:17AM +0200, Thomas Gleixner wrote: >>> On Tue, 22 Jul 2014, Peter Zijlstra wrote: >>>> Anyway, there is one big fail in the entire futex stack that we 'need' >>>> to sort some day and that is NUMA. Some people (again database people) >>>> explicitly do not use futexes and instead use sysvsem because of this. >>>> >>>> The problem with numa futexes is that because they're vaddr based there >>>> is no (persistent) node information. You always end up having to fall >>>> back to looking in all nodes before you can guarantee there is no >>>> matching futex. >>>> >>>> One way to achieve it is by extending the futex value to include a node >>>> number, but that's obviously a complete ABI break. Then again, it should >>>> be pretty straight fwd, since the node number doesn't need to be part of >>>> the actual atomic update part, just part of the userspace storage. >>> So you want per node hash buckets, right? Fair enough, but how do you >>> make sure, that no thread/process on a different node is fiddling with >>> that "node bound" futex as well? >> You don't and that should work just as well, just slower. But since the >> node id is in the futex 'value' we'll always end up in the right >> node-hash, even if its a remote one. >> >> So yes, per node hashes, and a persistent futex->node map. > Which works fine as long as you only have the futex_q on the stack of > the blocked task. If user space is lying to you, then you just end up > with a bunch of threads sleeping forever. Who cares? > > But if you create independent kernel state, which we have with > pi_state and which you need for finegrained locking and further > spinning fun, you open up another can of worms. Simply because this > would enable rogue user space to create multiple instances of the > kernel internal state. I can predict the CVEs resulting from that > even without using a crystal ball. > > Thanks, > > tglx I think NUMA futex, if implemented, is a completely independent piece that have no direct relationship with optimistic spinning futex. It should be a separate patch and not mixing with optimistic spinning patch which will only make the latter one more complicated. -Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/