Message-ID: <55B66ACB.6010702@hp.com>
Date: Mon, 27 Jul 2015 13:30:51 -0400
From: Waiman Long <waiman.long@hp.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12
MIME-Version: 1.0
To: Davidlohr Bueso <dave@stgolabs.net>
CC: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>,
        Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>,
        x86@kernel.org, linux-kernel@vger.kernel.org,
        Scott J Norton <scott.norton@hp.com>,
        Douglas Hatch <doug.hatch@hp.com>
Subject: Re: [PATCH v3 2/7] locking/pvqspinlock: Add pending bit support
References: <1437595962-21472-1-git-send-email-Waiman.Long@hp.com>  <1437595962-21472-3-git-send-email-Waiman.Long@hp.com> <1437958584.25997.27.camel@stgolabs.net>
In-Reply-To: <1437958584.25997.27.camel@stgolabs.net>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4217
Lines: 121

On 07/26/2015 08:56 PM, Davidlohr Bueso wrote:
> On Wed, 2015-07-22 at 16:12 -0400, Waiman Long wrote:
>> Like the native qspinlock, using the pending bit when it is lightly
>> loaded to acquire the lock is faster than going through the PV queuing
>> process which is even slower than the native queuing process. It also
>> avoids loading two additional cachelines (the MCS and PV nodes).
>>
>> This patch adds the pending bit support for PV qspinlock. The pending
>> bit code has a smaller spin threshold (1<<10). It will default back
>> to the queuing method if it cannot acquired the lock within a certain
>> time limit.
> Can we infer that this new spin threshold is the metric to detect these
> "light loads"? If so, I cannot help but wonder if there is some more
> straightforward/ad-hoc way of detecting this, ie some pv_<>  function.
> That would also save a lot of time as it would not be time based.
> Although it might be a more costly call altogether, I dunno.

I used the term "light load" to refer to the condition that at most 2 
competing threads are trying to acquire the lock. In that case, the 
pending code will be used. Once there are 3 or more competing threads, 
it will switch back to the regular queuing code. It is the same 
mechanism used in the native code. The only difference is in the 
addition of a loop counter to make sure that the thread won't spend too 
much time on spinning.

> Some comments about this 'loop' threshold.
>
>> +static int pv_pending_lock(struct qspinlock *lock, u32 val)
>> +{
>> +	int loop = PENDING_SPIN_THRESHOLD;
>> +	u32 new, old;
>> +
>> +	/*
>> +	 * wait for in-progress pending->locked hand-overs
>> +	 */
>> +	if (val == _Q_PENDING_VAL) {
>> +		while (((val = atomic_read(&lock->val)) == _Q_PENDING_VAL)&&
>> +			loop--)
>> +			cpu_relax();
>> +	}
>> +
>> +	/*
>> +	 * trylock || pending
>> +	 */
>> +	for (;;) {
>> +		if (val&  ~_Q_LOCKED_MASK)
>> +			goto queue;
>> +		new = _Q_LOCKED_VAL;
>> +		if (val == new)
>> +			new |= _Q_PENDING_VAL;
>> +		old = atomic_cmpxchg(&lock->val, val, new);
>> +		if (old == val)
>> +			break;
>> +		if (loop--<= 0)
>> +			goto queue;
>> +	}
> So I'm not clear about the semantics of what (should) occurs when the
> threshold is exhausted. In the trylock/pending loop above, you
> immediately return 0, indicating we want to queue. Ok, but below:

This is in the lock slowpath, so it can't return a lock failure.

>> +
>> +	if (new == _Q_LOCKED_VAL)
>> +		goto gotlock;
>> +	/*
>> +	 * We are pending, wait for the owner to go away.
>> +	 */
>> +	while (((val = smp_load_acquire(&lock->val.counter))&  _Q_LOCKED_MASK)
>> +		&&  (loop-->  0))
>> +		cpu_relax();
>> +
>> +	if (!(val&  _Q_LOCKED_MASK)) {
>> +		clear_pending_set_locked(lock);
>> +		goto gotlock;
>> +	}
>> +	/*
>> +	 * Clear the pending bit and fall back to queuing
>> +	 */
>> +	clear_pending(lock);
> ... you call clear_pending before returning. Is this intentional? Smells
> fishy.

The pending bit acts as a 1-slot waiting queue. So if the vCPU needs to 
fall back to regular queuing, it needs to clear the bit.

>
> And basically afaict all this chunk of code does is spin until loop is
> exhausted, and breakout when we got the lock. Ie, something like this is
> a lot cleaner:
>
>                  while (loop--) {
>                  	/*
>                           * We are pending, wait for the owner to go away.
>                           */
>                  	val = smp_load_acquire(&lock->val.counter);
>                  	if (!(val&  _Q_LOCKED_MASK)) {
>                  		clear_pending_set_locked(lock);
>                  		goto gotlock;
>                  	}
>
>                  	cpu_relax();		
>                  }
>
>                  /*
>                   * Clear the pending bit and fall back to queuing
>                   */
>                  clear_pending(lock);
>

Yes, we could change the loop to that. I was just following the same 
logic in the native code.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/