Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752168AbaGVDGZ (ORCPT ); Mon, 21 Jul 2014 23:06:25 -0400 Received: from g2t2353.austin.hp.com ([15.217.128.52]:11732 "EHLO g2t2353.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751514AbaGVDGX (ORCPT ); Mon, 21 Jul 2014 23:06:23 -0400 Message-ID: <1405998379.25048.16.camel@buesod1.americas.hpqcorp.net> Subject: Re: [RFC PATCH 0/5] futex: introduce an optimistic spinning futex From: Davidlohr Bueso To: Steven Rostedt Cc: Thomas Gleixner , Darren Hart , Andy Lutomirski , Peter Zijlstra , Andi Kleen , Waiman Long , Ingo Molnar , Heiko Carstens , "linux-kernel@vger.kernel.org" , Linux API , "linux-doc@vger.kernel.org" , Jason Low , Scott J Norton , Robert Haas Date: Mon, 21 Jul 2014 20:06:19 -0700 In-Reply-To: <20140721213457.46623e2f@gandalf.local.home> References: <1405956271-34339-1-git-send-email-Waiman.Long@hp.com> <8761iq3bp3.fsf@tassilo.jf.intel.com> <871tte3bjw.fsf@tassilo.jf.intel.com> <20140721212740.GS3935@laptop> <20140721213457.46623e2f@gandalf.local.home> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.6.4 (3.6.4-3.fc18) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2014-07-21 at 21:34 -0400, Steven Rostedt wrote: > On Tue, 22 Jul 2014 03:01:00 +0200 (CEST) > Thomas Gleixner wrote: > > > On Mon, 21 Jul 2014, Darren Hart wrote: > > > On 7/21/14, 14:47, "Thomas Gleixner" wrote: > > > > > > >On Mon, 21 Jul 2014, Andy Lutomirski wrote: > > > >> On Mon, Jul 21, 2014 at 2:27 PM, Peter Zijlstra > > > >>wrote: > > > >> > All this is predicated on the fact that syscalls are 'expensive'. > > > >> > Weren't syscalls only 100s of cycles? All this bitmap mucking is far > > > >> > more expensive due to cacheline misses, which due to the size of the > > > >> > things is almost guaranteed. > > > >> > > > >> 120 - 300 cycles for me, unless tracing happens, and I'm working on > > > >> reducing the incidence of tracing. > > > > > > > >So it's a non issue indeed and definitely not worth the trouble of > > > >that extra storage, the scheduler overhead, etc. > > > > > > > >Summary: Once you can't take it atomically in user space, you've lost > > > > anyway. And we are better off to do the magic spinning in > > > > kernel where we have all the information accessible already. > > I just want to point out that I was having a very nice conversation > with Robert Haas (Cc'd) in Napa Valley at Linux Collaboration about > this very topic. Robert is a PostgeSQL developer who told me that they > implement their spin locks completely in userspace (no futex, just raw > spinning on shared memory). This is because the sleep on contention of a > futex has shown to be very expensive in their benchmarks. Well sure, hashing, plist handling, wakeups etc. (not to mention dirtying cachelines). The "fast" in futex is only due to the fact that there is no jump to kernel space when the lock is uncontended after say, a successful cmpxchg in user space. Once futex(2) is actually called, performance is quite a different story. > His work is > not a micro benchmark but for a very popular database where locking is > crucial. > > I was telling Robert that if futexes get optimistic spinning, he should > reconsider their use of userspace spinlocks in favor of this, because > I'm pretty sure that they will see a great improvement. Agreed, with the critical regions being correctly sized for spinning locks. > Now Robert will be the best one to answer if the system call is indeed > more expensive than doing full spins in userspace. If the spin is done > in the kernel and they still get better performance by just spinning > blindly in userspace even if the owner is asleep, I think we will have > our answer. > > Note, I believe they only care about shared threads, and this > optimistic spinning does not need to be something done between > processes. With the pi_state there would be no such distinction. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/