Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753762AbbG0Ra4 (ORCPT ); Mon, 27 Jul 2015 13:30:56 -0400 Received: from g4t3426.houston.hp.com ([15.201.208.54]:60062 "EHLO g4t3426.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750843AbbG0Ray (ORCPT ); Mon, 27 Jul 2015 13:30:54 -0400 Message-ID: <55B66ACB.6010702@hp.com> Date: Mon, 27 Jul 2015 13:30:51 -0400 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Davidlohr Bueso CC: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , "H. Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, Scott J Norton , Douglas Hatch Subject: Re: [PATCH v3 2/7] locking/pvqspinlock: Add pending bit support References: <1437595962-21472-1-git-send-email-Waiman.Long@hp.com> <1437595962-21472-3-git-send-email-Waiman.Long@hp.com> <1437958584.25997.27.camel@stgolabs.net> In-Reply-To: <1437958584.25997.27.camel@stgolabs.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4217 Lines: 121 On 07/26/2015 08:56 PM, Davidlohr Bueso wrote: > On Wed, 2015-07-22 at 16:12 -0400, Waiman Long wrote: >> Like the native qspinlock, using the pending bit when it is lightly >> loaded to acquire the lock is faster than going through the PV queuing >> process which is even slower than the native queuing process. It also >> avoids loading two additional cachelines (the MCS and PV nodes). >> >> This patch adds the pending bit support for PV qspinlock. The pending >> bit code has a smaller spin threshold (1<<10). It will default back >> to the queuing method if it cannot acquired the lock within a certain >> time limit. > Can we infer that this new spin threshold is the metric to detect these > "light loads"? If so, I cannot help but wonder if there is some more > straightforward/ad-hoc way of detecting this, ie some pv_<> function. > That would also save a lot of time as it would not be time based. > Although it might be a more costly call altogether, I dunno. I used the term "light load" to refer to the condition that at most 2 competing threads are trying to acquire the lock. In that case, the pending code will be used. Once there are 3 or more competing threads, it will switch back to the regular queuing code. It is the same mechanism used in the native code. The only difference is in the addition of a loop counter to make sure that the thread won't spend too much time on spinning. > Some comments about this 'loop' threshold. > >> +static int pv_pending_lock(struct qspinlock *lock, u32 val) >> +{ >> + int loop = PENDING_SPIN_THRESHOLD; >> + u32 new, old; >> + >> + /* >> + * wait for in-progress pending->locked hand-overs >> + */ >> + if (val == _Q_PENDING_VAL) { >> + while (((val = atomic_read(&lock->val)) == _Q_PENDING_VAL)&& >> + loop--) >> + cpu_relax(); >> + } >> + >> + /* >> + * trylock || pending >> + */ >> + for (;;) { >> + if (val& ~_Q_LOCKED_MASK) >> + goto queue; >> + new = _Q_LOCKED_VAL; >> + if (val == new) >> + new |= _Q_PENDING_VAL; >> + old = atomic_cmpxchg(&lock->val, val, new); >> + if (old == val) >> + break; >> + if (loop--<= 0) >> + goto queue; >> + } > So I'm not clear about the semantics of what (should) occurs when the > threshold is exhausted. In the trylock/pending loop above, you > immediately return 0, indicating we want to queue. Ok, but below: This is in the lock slowpath, so it can't return a lock failure. >> + >> + if (new == _Q_LOCKED_VAL) >> + goto gotlock; >> + /* >> + * We are pending, wait for the owner to go away. >> + */ >> + while (((val = smp_load_acquire(&lock->val.counter))& _Q_LOCKED_MASK) >> + && (loop--> 0)) >> + cpu_relax(); >> + >> + if (!(val& _Q_LOCKED_MASK)) { >> + clear_pending_set_locked(lock); >> + goto gotlock; >> + } >> + /* >> + * Clear the pending bit and fall back to queuing >> + */ >> + clear_pending(lock); > ... you call clear_pending before returning. Is this intentional? Smells > fishy. The pending bit acts as a 1-slot waiting queue. So if the vCPU needs to fall back to regular queuing, it needs to clear the bit. > > And basically afaict all this chunk of code does is spin until loop is > exhausted, and breakout when we got the lock. Ie, something like this is > a lot cleaner: > > while (loop--) { > /* > * We are pending, wait for the owner to go away. > */ > val = smp_load_acquire(&lock->val.counter); > if (!(val& _Q_LOCKED_MASK)) { > clear_pending_set_locked(lock); > goto gotlock; > } > > cpu_relax(); > } > > /* > * Clear the pending bit and fall back to queuing > */ > clear_pending(lock); > Yes, we could change the loop to that. I was just following the same logic in the native code. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/