Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1033680AbbKEQwq (ORCPT ); Thu, 5 Nov 2015 11:52:46 -0500 Received: from g1t6220.austin.hp.com ([15.73.96.84]:55233 "EHLO g1t6220.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033200AbbKEQwp (ORCPT ); Thu, 5 Nov 2015 11:52:45 -0500 Message-ID: <563B895B.3090101@hpe.com> Date: Thu, 05 Nov 2015 11:52:43 -0500 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Peter Zijlstra CC: Ingo Molnar , Thomas Gleixner , "H. Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, Scott J Norton , Douglas Hatch , Davidlohr Bueso Subject: Re: [PATCH tip/locking/core v9 2/6] locking/qspinlock: prefetch next node cacheline References: <1446247597-61863-1-git-send-email-Waiman.Long@hpe.com> <1446247597-61863-3-git-send-email-Waiman.Long@hpe.com> <20151102163626.GU3604@twins.programming.kicks-ass.net> <563B7E98.4030708@hpe.com> <20151105163902.GF3604@twins.programming.kicks-ass.net> In-Reply-To: <20151105163902.GF3604@twins.programming.kicks-ass.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2205 Lines: 56 On 11/05/2015 11:39 AM, Peter Zijlstra wrote: > On Thu, Nov 05, 2015 at 11:06:48AM -0500, Waiman Long wrote: > >>> How does it affect IVB-EX (which you were testing earlier IIRC)? >> My testing on IVB-EX indicated that if the critical section is really short, >> the change may actually slow thing a bit in some cases. However, when the >> critical section is long enough that the prefetch overhead can be hidden >> within the lock acquisition loop, there will be a performance boost. >>>> @@ -426,6 +437,15 @@ queue: >>>> cpu_relax(); >>>> >>>> /* >>>> + * If the next pointer is defined, we are not tail anymore. >>>> + * In this case, claim the spinlock& release the MCS lock. >>>> + */ >>>> + if (next) { >>>> + set_locked(lock); >>>> + goto mcs_unlock; >>>> + } >>>> + >>>> + /* >>>> * claim the lock: >>>> * >>>> * n,0,0 -> 0,0,1 : lock, uncontended >>>> @@ -458,6 +478,7 @@ queue: >>>> while (!(next = READ_ONCE(node->next))) >>>> cpu_relax(); >>>> >>>> +mcs_unlock: >>>> arch_mcs_spin_unlock_contended(&next->locked); >>>> pv_kick_node(lock, next); >>>> >>> This however appears an independent optimization. Is it worth it? Would >>> we not already have observed a val != tail in this case? At which point >>> we're just adding extra code for no gain. >>> >>> That is, if we observe @next, must we then not also observe val != tail? >> Observing next implies val != tail, but the reverse may not be true. The >> branch is done before we observe val != tail. Yes, it is an optimization to >> avoid reading node->next again if we have already observed next. I have >> observed a very minor performance boost with that change without the >> prefetch. > This is all good information to have in the Changelog. And since these > are two independent changes, two patches would have been the right > format. Yep, I will separate it into 2 patches and include additional information in the changelog. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/