Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1162108AbbKEQjO (ORCPT ); Thu, 5 Nov 2015 11:39:14 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:37250 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033379AbbKEQjM (ORCPT ); Thu, 5 Nov 2015 11:39:12 -0500 Date: Thu, 5 Nov 2015 17:39:02 +0100 From: Peter Zijlstra To: Waiman Long Cc: Ingo Molnar , Thomas Gleixner , "H. Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, Scott J Norton , Douglas Hatch , Davidlohr Bueso Subject: Re: [PATCH tip/locking/core v9 2/6] locking/qspinlock: prefetch next node cacheline Message-ID: <20151105163902.GF3604@twins.programming.kicks-ass.net> References: <1446247597-61863-1-git-send-email-Waiman.Long@hpe.com> <1446247597-61863-3-git-send-email-Waiman.Long@hpe.com> <20151102163626.GU3604@twins.programming.kicks-ass.net> <563B7E98.4030708@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <563B7E98.4030708@hpe.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2010 Lines: 53 On Thu, Nov 05, 2015 at 11:06:48AM -0500, Waiman Long wrote: > >How does it affect IVB-EX (which you were testing earlier IIRC)? > > My testing on IVB-EX indicated that if the critical section is really short, > the change may actually slow thing a bit in some cases. However, when the > critical section is long enough that the prefetch overhead can be hidden > within the lock acquisition loop, there will be a performance boost. > >>@@ -426,6 +437,15 @@ queue: > >> cpu_relax(); > >> > >> /* > >>+ * If the next pointer is defined, we are not tail anymore. > >>+ * In this case, claim the spinlock& release the MCS lock. > >>+ */ > >>+ if (next) { > >>+ set_locked(lock); > >>+ goto mcs_unlock; > >>+ } > >>+ > >>+ /* > >> * claim the lock: > >> * > >> * n,0,0 -> 0,0,1 : lock, uncontended > >>@@ -458,6 +478,7 @@ queue: > >> while (!(next = READ_ONCE(node->next))) > >> cpu_relax(); > >> > >>+mcs_unlock: > >> arch_mcs_spin_unlock_contended(&next->locked); > >> pv_kick_node(lock, next); > >> > >This however appears an independent optimization. Is it worth it? Would > >we not already have observed a val != tail in this case? At which point > >we're just adding extra code for no gain. > > > >That is, if we observe @next, must we then not also observe val != tail? > > Observing next implies val != tail, but the reverse may not be true. The > branch is done before we observe val != tail. Yes, it is an optimization to > avoid reading node->next again if we have already observed next. I have > observed a very minor performance boost with that change without the > prefetch. This is all good information to have in the Changelog. And since these are two independent changes, two patches would have been the right format. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/