2014-06-11 10:26:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v11 06/16] qspinlock: prolong the stay in the pending bit path

On Fri, May 30, 2014 at 11:43:52AM -0400, Waiman Long wrote:
> ---
> kernel/locking/qspinlock.c | 18 ++++++++++++++++--
> 1 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> index fc7fd8c..7f10758 100644
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -233,11 +233,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> */
> for (;;) {
> /*
> - * If we observe any contention; queue.
> + * If we observe that the queue is not empty or both
> + * the pending and lock bits are set, queue
> */
> - if (val & ~_Q_LOCKED_MASK)
> + if ((val & _Q_TAIL_MASK) ||
> + (val == (_Q_LOCKED_VAL|_Q_PENDING_VAL)))
> goto queue;
>
> + if (val == _Q_PENDING_VAL) {
> + /*
> + * Pending bit is set, but not the lock bit.
> + * Assuming that the pending bit holder is going to
> + * set the lock bit and clear the pending bit soon,
> + * it is better to wait than to exit at this point.
> + */
> + cpu_relax();
> + val = atomic_read(&lock->val);
> + continue;
> + }
> +
> new = _Q_LOCKED_VAL;
> if (val == new)
> new |= _Q_PENDING_VAL;


So, again, you just posted a new version without replying to the
previous discussion; so let me try again, what's wrong with the proposal
here:

lkml.kernel.org/r/[email protected]



Attachments:
(No filename) (1.40 kB)
(No filename) (836.00 B)
Download all attachments

2014-06-11 21:22:35

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v11 06/16] qspinlock: prolong the stay in the pending bit path


On 6/11/2014 6:26 AM, Peter Zijlstra wrote:
> On Fri, May 30, 2014 at 11:43:52AM -0400, Waiman Long wrote:
>> ---
>> kernel/locking/qspinlock.c | 18 ++++++++++++++++--
>> 1 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
>> index fc7fd8c..7f10758 100644
>> --- a/kernel/locking/qspinlock.c
>> +++ b/kernel/locking/qspinlock.c
>> @@ -233,11 +233,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>> */
>> for (;;) {
>> /*
>> - * If we observe any contention; queue.
>> + * If we observe that the queue is not empty or both
>> + * the pending and lock bits are set, queue
>> */
>> - if (val & ~_Q_LOCKED_MASK)
>> + if ((val & _Q_TAIL_MASK) ||
>> + (val == (_Q_LOCKED_VAL|_Q_PENDING_VAL)))
>> goto queue;
>>
>> + if (val == _Q_PENDING_VAL) {
>> + /*
>> + * Pending bit is set, but not the lock bit.
>> + * Assuming that the pending bit holder is going to
>> + * set the lock bit and clear the pending bit soon,
>> + * it is better to wait than to exit at this point.
>> + */
>> + cpu_relax();
>> + val = atomic_read(&lock->val);
>> + continue;
>> + }
>> +
>> new = _Q_LOCKED_VAL;
>> if (val == new)
>> new |= _Q_PENDING_VAL;
>
> So, again, you just posted a new version without replying to the
> previous discussion; so let me try again, what's wrong with the proposal
> here:
>
> lkml.kernel.org/r/[email protected]
>
>

I thought I had answered you before, maybe the message was lost or the
answer was not complete. Anyway, I will try to response to your question
again here.

> Wouldn't something like:
>
> while (atomic_read(&lock->val) == _Q_PENDING_VAL)
> cpu_relax();
>
> before the cmpxchg loop have gotten you all this?

That is not exactly the same. The loop will exit if other bits are set or the pending
bit cleared. In the case, we will need to do the same check at the beginning of the
for loop in order to avoid doing an extra cmpxchg that is not necessary.


> I just tried this on my code and I cannot see a difference.

As I said before, I did see a difference with that change. I think it
depends on the CPU chip that we used for testing. I ran my test on a
10-core Westmere-EX chip. I run my microbench on different pairs of core
within the same chip. It produces different results that varies from
779.5ms to up to 1192ms. Without that patch, the lowest value I can get
is still close to 800ms, but the highest can be up to 1800ms or so. So I
believe it is just a matter of timing that you did not observed in your
test machine.

-Longman

2014-06-12 06:00:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v11 06/16] qspinlock: prolong the stay in the pending bit path

On Wed, Jun 11, 2014 at 05:22:28PM -0400, Long, Wai Man wrote:

> >>@@ -233,11 +233,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> >> */
> >> for (;;) {
> >> /*
> >>- * If we observe any contention; queue.
> >>+ * If we observe that the queue is not empty or both
> >>+ * the pending and lock bits are set, queue
> >> */
> >>- if (val & ~_Q_LOCKED_MASK)
> >>+ if ((val & _Q_TAIL_MASK) ||
> >>+ (val == (_Q_LOCKED_VAL|_Q_PENDING_VAL)))
> >> goto queue;
> >>+ if (val == _Q_PENDING_VAL) {
> >>+ /*
> >>+ * Pending bit is set, but not the lock bit.
> >>+ * Assuming that the pending bit holder is going to
> >>+ * set the lock bit and clear the pending bit soon,
> >>+ * it is better to wait than to exit at this point.
> >>+ */
> >>+ cpu_relax();
> >>+ val = atomic_read(&lock->val);
> >>+ continue;
> >>+ }
> >>+
> >> new = _Q_LOCKED_VAL;
> >> if (val == new)
> >> new |= _Q_PENDING_VAL;

> >Wouldn't something like:
> >
> > while (atomic_read(&lock->val) == _Q_PENDING_VAL)
> > cpu_relax();
> >
> >before the cmpxchg loop have gotten you all this?
>
> That is not exactly the same. The loop will exit if other bits are set or the pending
> bit cleared. In the case, we will need to do the same check at the beginning of the
> for loop in order to avoid doing an extra cmpxchg that is not necessary.

If other bits get set we should stop poking at the pending bit and get
queued. The only transition we want to wait for is: 0,1,0 -> 0,0,1.

What extra unneeded cmpxchg() is there? If we have two cpus waiting in
this loop for the pending bit to go away then both will attempt to grab
the now free pending bit, one will loose and get queued?

There's no avoiding that contention.



Attachments:
(No filename) (1.73 kB)
(No filename) (836.00 B)
Download all attachments

2014-06-12 20:55:04

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v11 06/16] qspinlock: prolong the stay in the pending bit path

On 06/12/2014 02:00 AM, Peter Zijlstra wrote:
> On Wed, Jun 11, 2014 at 05:22:28PM -0400, Long, Wai Man wrote:
>
>>>> @@ -233,11 +233,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>>>> */
>>>> for (;;) {
>>>> /*
>>>> - * If we observe any contention; queue.
>>>> + * If we observe that the queue is not empty or both
>>>> + * the pending and lock bits are set, queue
>>>> */
>>>> - if (val& ~_Q_LOCKED_MASK)
>>>> + if ((val& _Q_TAIL_MASK) ||
>>>> + (val == (_Q_LOCKED_VAL|_Q_PENDING_VAL)))
>>>> goto queue;
>>>> + if (val == _Q_PENDING_VAL) {
>>>> + /*
>>>> + * Pending bit is set, but not the lock bit.
>>>> + * Assuming that the pending bit holder is going to
>>>> + * set the lock bit and clear the pending bit soon,
>>>> + * it is better to wait than to exit at this point.
>>>> + */
>>>> + cpu_relax();
>>>> + val = atomic_read(&lock->val);
>>>> + continue;
>>>> + }
>>>> +
>>>> new = _Q_LOCKED_VAL;
>>>> if (val == new)
>>>> new |= _Q_PENDING_VAL;
>>> Wouldn't something like:
>>>
>>> while (atomic_read(&lock->val) == _Q_PENDING_VAL)
>>> cpu_relax();
>>>
>>> before the cmpxchg loop have gotten you all this?
>> That is not exactly the same. The loop will exit if other bits are set or the pending
>> bit cleared. In the case, we will need to do the same check at the beginning of the
>> for loop in order to avoid doing an extra cmpxchg that is not necessary.
> If other bits get set we should stop poking at the pending bit and get
> queued. The only transition we want to wait for is: 0,1,0 -> 0,0,1.
>
> What extra unneeded cmpxchg() is there? If we have two cpus waiting in
> this loop for the pending bit to go away then both will attempt to grab
> the now free pending bit, one will loose and get queued?
>
> There's no avoiding that contention.

If two tasks see the pending bit goes away and try to grab it with
cmpxchg, there is no way we can avoid the contention. However, if some
how the pending bit holder get the lock and another task set the pending
bit before the current task, the spinlock value will become
_Q_PENDING_VAL|_Q_LOCKED_VAL. The while loop will end and the code will
blindly try to do a cmpxchg unless we check for this case before hand.
This is what my code does by going back to the beginning of the for loop.

-Longman

2014-06-15 13:13:10

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v11 06/16] qspinlock: prolong the stay in the pending bit path

On Thu, Jun 12, 2014 at 04:54:52PM -0400, Waiman Long wrote:
> If two tasks see the pending bit goes away and try to grab it with cmpxchg,
> there is no way we can avoid the contention. However, if some how the
> pending bit holder get the lock and another task set the pending bit before
> the current task, the spinlock value will become
> _Q_PENDING_VAL|_Q_LOCKED_VAL. The while loop will end and the code will
> blindly try to do a cmpxchg unless we check for this case before hand. This
> is what my code does by going back to the beginning of the for loop.

There is already a test for that; see the goto queue;

---

/*
* wait for in-progress pending->locked hand-overs
*
* 0,1,0 -> 0,0,1
*/
if (val == _Q_PENDING_VAL) {
while ((val = atomic_read(&lock->val)) == _Q_PENDING_VAL)
cpu_relax();
}

/*
* trylock || pending
*
* 0,0,0 -> 0,0,1 ; trylock
* 0,0,1 -> 0,1,1 ; pending
*/
for (;;) {
/*
* If we observe any contention; queue.
*/
if (val & ~_Q_LOCKED_MASK)
goto queue;

new = _Q_LOCKED_VAL;
if (val == new)
new |= _Q_PENDING_VAL;

old = atomic_cmpxchg(&lock->val, val, new);
if (old == val)
break;

val = old;
}