2013-04-21 21:13:32

by Rik van Riel

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On 04/20/2013 06:12 PM, Jiannan Ouyang wrote:
> Hello Everyone,
>
> I recently came up with a spinlock algorithm that can adapt to
> preemption, which you may be interested in. The intuition is to
> downgrade a fair lock to an unfair lock automatically upon preemption,
> and preserve the fairness otherwise. It is a guest side optimization,
> and can be used as a complementary technique to host side optimizations
> like co-scheduling and Pause-Loop Exiting.
>
> In my experiments, it improves VM performance by 5:32X on average, when
> running on a non paravirtual VMM, and by 7:91X when running on a VMM
> that supports a paravirtual locking interface (using a pv preemptable
> ticket spinlock), when executing a set of microbenchmarks as well as a
> realistic e-commerce benchmark.
>
> A detailed algorithm description can be found in my VEE 2013 paper,
> Preemptable Ticket Spinlocks: Improving Consolidated Performance in the
> Cloud
> Jiannan Ouyang, John R. Lange
> ouyang,[email protected] <mailto:[email protected]>
> University of Pittsburgh
> http://people.cs.pitt.edu/~ouyang/files/publication/preemptable_lock-ouyang-vee13.pdf

Your algorithm is very clever, and very promising.

However, it does increase the size of the struct spinlock, and adds
an additional atomic operation to spin_unlock, neither of which I
suspect are necessary.

If we always incremented the ticket number by 2 (instead of 1), then
we could use the lower bit of the ticket number as the spinlock.

If we do NOT run virtualized, we simply increment the ticket by 2
in spin_unlock, and the code can remain otherwise the same.

If we do run virtualized, we take that spinlock after acquiring
the ticket (or timing out), just like in your code. In the
virtualized spin_unlock, we can then release the spinlock and
increment the ticket in one operation: by simply increasing the
ticket by 1.

In other words, we should be able to keep the overhead of this
to an absolute minimum, and keep spin_unlock to be always the
same cost it is today.

--
All rights reversed


2013-04-21 23:07:55

by Jiannan Ouyang

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Sun, Apr 21, 2013 at 5:12 PM, Rik van Riel <[email protected]> wrote:
> Your algorithm is very clever, and very promising.
>
> However, it does increase the size of the struct spinlock, and adds
> an additional atomic operation to spin_unlock, neither of which I
> suspect are necessary.
>
> If we always incremented the ticket number by 2 (instead of 1), then
> we could use the lower bit of the ticket number as the spinlock.
>
> If we do NOT run virtualized, we simply increment the ticket by 2
> in spin_unlock, and the code can remain otherwise the same.
>
> If we do run virtualized, we take that spinlock after acquiring
> the ticket (or timing out), just like in your code. In the
> virtualized spin_unlock, we can then release the spinlock and
> increment the ticket in one operation: by simply increasing the
> ticket by 1.
>
> In other words, we should be able to keep the overhead of this
> to an absolute minimum, and keep spin_unlock to be always the
> same cost it is today.
>
> --
> All rights reversed

Hi Rik,

Thanks for your feedback.

Yes I agree with you
- increase the size of struct spinlock is unnecessary
- your idea of utilize the lower bit and save one atomic operation
from unlock is cool!

I can come up with a updated patch soon.

--Jiannan

2013-04-22 05:56:35

by Raghavendra K T

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On 04/22/2013 04:37 AM, Jiannan Ouyang wrote:
> On Sun, Apr 21, 2013 at 5:12 PM, Rik van Riel <[email protected]> wrote:
>> Your algorithm is very clever, and very promising.
>>
>> However, it does increase the size of the struct spinlock, and adds
>> an additional atomic operation to spin_unlock, neither of which I
>> suspect are necessary.
>>
>> If we always incremented the ticket number by 2 (instead of 1), then
>> we could use the lower bit of the ticket number as the spinlock.
>>
>> If we do NOT run virtualized, we simply increment the ticket by 2
>> in spin_unlock, and the code can remain otherwise the same.
>>
>> If we do run virtualized, we take that spinlock after acquiring
>> the ticket (or timing out), just like in your code. In the
>> virtualized spin_unlock, we can then release the spinlock and
>> increment the ticket in one operation: by simply increasing the
>> ticket by 1.
>>
>> In other words, we should be able to keep the overhead of this
>> to an absolute minimum, and keep spin_unlock to be always the
>> same cost it is today.
>>
>> --
>> All rights reversed
>
> Hi Rik,
>
> Thanks for your feedback.
>
> Yes I agree with you
> - increase the size of struct spinlock is unnecessary
> - your idea of utilize the lower bit and save one atomic operation
> from unlock is cool!
>

Yes, +1. it is indeed a cool idea. Thanks to Jeremy.. and as Rik already
mentioned it would also prevent the side effects of increasing
lock size. (It reminds my thought of encoding vcpuid in lock for pv
spinlock)

> I can come up with a updated patch soon.
>
> --Jiannan
>
>

2013-04-22 11:51:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Sun, 2013-04-21 at 17:12 -0400, Rik van Riel wrote:
>
> If we always incremented the ticket number by 2 (instead of 1), then
> we could use the lower bit of the ticket number as the spinlock.

ISTR that paravirt ticket locks already do that and use the lsb to
indicate the unlock needs to perform wakeups.

Also, since all of this is virt nonsense, shouldn't it live in the
paravirt ticket lock code and leave the native code as is?

2013-04-22 12:52:46

by Rik van Riel

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On 04/22/2013 07:51 AM, Peter Zijlstra wrote:
> On Sun, 2013-04-21 at 17:12 -0400, Rik van Riel wrote:
>>
>> If we always incremented the ticket number by 2 (instead of 1), then
>> we could use the lower bit of the ticket number as the spinlock.
>
> ISTR that paravirt ticket locks already do that and use the lsb to
> indicate the unlock needs to perform wakeups.
>
> Also, since all of this is virt nonsense, shouldn't it live in the
> paravirt ticket lock code and leave the native code as is?

Sure, but that is still no reason not to have the virt
implementation be as fast as possible, and share the same
data type as the non-virt implementation.

Also, is it guaranteed that the native spin_lock code has
not been called yet before we switch over to the paravirt
functions?

If the native spin_lock code has been called already at
that time, the native code would still need to be modified
to increment the ticket number by 2, so we end up with a
compatible value in each spin lock's .tickets field, and
prevent a deadlock after we switch over to the paravirt
variant.

--
All rights reversed

2013-04-22 19:49:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote:
> On 04/22/2013 07:51 AM, Peter Zijlstra wrote:
> > On Sun, 2013-04-21 at 17:12 -0400, Rik van Riel wrote:
> >>
> >> If we always incremented the ticket number by 2 (instead of 1), then
> >> we could use the lower bit of the ticket number as the spinlock.
> >
> > ISTR that paravirt ticket locks already do that and use the lsb to
> > indicate the unlock needs to perform wakeups.
> >
> > Also, since all of this is virt nonsense, shouldn't it live in the
> > paravirt ticket lock code and leave the native code as is?
>
> Sure, but that is still no reason not to have the virt
> implementation be as fast as possible, and share the same
> data type as the non-virt implementation.

It has to share the same data-type..

> Also, is it guaranteed that the native spin_lock code has
> not been called yet before we switch over to the paravirt
> functions?
>
> If the native spin_lock code has been called already at
> that time, the native code would still need to be modified
> to increment the ticket number by 2, so we end up with a
> compatible value in each spin lock's .tickets field, and
> prevent a deadlock after we switch over to the paravirt
> variant.

I thought the stuff already made it upstream, but apparently not; the
lastest posting I'm aware of is here:

https://lkml.org/lkml/2012/5/2/105

That stuff changes the normal ticket increment as well..

2013-04-22 19:57:05

by Rik van Riel

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On 04/22/2013 03:49 PM, Peter Zijlstra wrote:
> On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote:

>> If the native spin_lock code has been called already at
>> that time, the native code would still need to be modified
>> to increment the ticket number by 2, so we end up with a
>> compatible value in each spin lock's .tickets field, and
>> prevent a deadlock after we switch over to the paravirt
>> variant.
>
> I thought the stuff already made it upstream, but apparently not; the
> lastest posting I'm aware of is here:
>
> https://lkml.org/lkml/2012/5/2/105
>
> That stuff changes the normal ticket increment as well..

Jiannan,

It looks like the patch above could make a good patch
1 (or 2) in your patch series :)

--
All rights reversed

2013-04-22 20:05:33

by Jiannan Ouyang

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Mon, Apr 22, 2013 at 3:56 PM, Rik van Riel <[email protected]> wrote:
> On 04/22/2013 03:49 PM, Peter Zijlstra wrote:
>>
>> On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote:
>
>
>>> If the native spin_lock code has been called already at
>>> that time, the native code would still need to be modified
>>> to increment the ticket number by 2, so we end up with a
>>> compatible value in each spin lock's .tickets field, and
>>> prevent a deadlock after we switch over to the paravirt
>>> variant.
>>
>>
>> I thought the stuff already made it upstream, but apparently not; the
>> lastest posting I'm aware of is here:
>>
>> https://lkml.org/lkml/2012/5/2/105
>>
>> That stuff changes the normal ticket increment as well..
>
>
> Jiannan,
>
> It looks like the patch above could make a good patch
> 1 (or 2) in your patch series :)
>
> --
> All rights reversed

Yes.
I'm going to move my code, updated with Rik's suggestions, to paravirt
ops based on Jeremy's patch.
I'll post a new patch series soon.

Thanks to everyone for the great feedback!
--Jiannan

2013-04-22 20:08:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Mon, 2013-04-22 at 15:56 -0400, Rik van Riel wrote:
> On 04/22/2013 03:49 PM, Peter Zijlstra wrote:
> > On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote:
>
> >> If the native spin_lock code has been called already at
> >> that time, the native code would still need to be modified
> >> to increment the ticket number by 2, so we end up with a
> >> compatible value in each spin lock's .tickets field, and
> >> prevent a deadlock after we switch over to the paravirt
> >> variant.
> >
> > I thought the stuff already made it upstream, but apparently not; the
> > lastest posting I'm aware of is here:
> >
> > https://lkml.org/lkml/2012/5/2/105
> >
> > That stuff changes the normal ticket increment as well..
>
> Jiannan,
>
> It looks like the patch above could make a good patch
> 1 (or 2) in your patch series :)

I much prefer the entire series from Jeremy since it maintains the
ticket semantics and doesn't degrade the lock to unfair under
contention.

Now I suppose there's a reason its not been merged yet and I suspect
its !paravirt hotpath impact which wasn't rightly justified or somesuch
so maybe someone can work on that or so.. dunno.

2013-04-22 20:32:32

by Rik van Riel

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On 04/22/2013 04:08 PM, Peter Zijlstra wrote:
> On Mon, 2013-04-22 at 15:56 -0400, Rik van Riel wrote:
>> On 04/22/2013 03:49 PM, Peter Zijlstra wrote:
>>> On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote:
>>
>>>> If the native spin_lock code has been called already at
>>>> that time, the native code would still need to be modified
>>>> to increment the ticket number by 2, so we end up with a
>>>> compatible value in each spin lock's .tickets field, and
>>>> prevent a deadlock after we switch over to the paravirt
>>>> variant.
>>>
>>> I thought the stuff already made it upstream, but apparently not; the
>>> lastest posting I'm aware of is here:
>>>
>>> https://lkml.org/lkml/2012/5/2/105
>>>
>>> That stuff changes the normal ticket increment as well..
>>
>> Jiannan,
>>
>> It looks like the patch above could make a good patch
>> 1 (or 2) in your patch series :)
>
> I much prefer the entire series from Jeremy since it maintains the
> ticket semantics and doesn't degrade the lock to unfair under
> contention.
>
> Now I suppose there's a reason its not been merged yet and I suspect
> its !paravirt hotpath impact which wasn't rightly justified or somesuch
> so maybe someone can work on that or so.. dunno.

IIRC one of the reasons was that the performance improvement wasn't
as obvious. Rescheduling VCPUs takes a fair amount of time, quite
probably more than the typical hold time of a spinlock.

--
All rights reversed

2013-04-22 20:44:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Mon, 2013-04-22 at 16:32 -0400, Rik van Riel wrote:
>
> IIRC one of the reasons was that the performance improvement wasn't
> as obvious. Rescheduling VCPUs takes a fair amount of time, quite
> probably more than the typical hold time of a spinlock.

IIRC it would spin for a while before blocking..

/me goes re-read some of that thread...

Ah, its because PLE is curing most of it.. !PLE it had huge gains but
apparently nobody cares about !PLE hardware anymore :-)

2013-04-22 20:46:24

by Jiannan Ouyang

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Mon, Apr 22, 2013 at 4:08 PM, Peter Zijlstra <[email protected]> wrote:

>
> I much prefer the entire series from Jeremy since it maintains the
> ticket semantics and doesn't degrade the lock to unfair under
> contention.
>
> Now I suppose there's a reason its not been merged yet and I suspect
> its !paravirt hotpath impact which wasn't rightly justified or somesuch
> so maybe someone can work on that or so.. dunno.
>
>

In my paper, I comparable preemptable-lock and pv_lock on KVM from
Raghu and Jeremy.
Results show that:
- preemptable-lock improves performance significantly without paravirt support
- preemptable-lock can also be paravirtualized, which outperforms
pv_lock, especially when overcommited by 3 or more
- pv-preemptable-lock has much less performance variance compare to
pv_lock, because it adapts to preemption within VM,
other than using rescheduling that increase VM interference

It would still be very interesting to conduct more experiments to
compare these two, to see if the fairness enforced by pv_lock is
mandatory, and if preemptable-lock outperforms pv_lock in most cases,
and how do they work with PLE.

--Jiannan

2013-04-22 20:48:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Mon, 2013-04-22 at 22:44 +0200, Peter Zijlstra wrote:
> On Mon, 2013-04-22 at 16:32 -0400, Rik van Riel wrote:
> >
> > IIRC one of the reasons was that the performance improvement wasn't
> > as obvious. Rescheduling VCPUs takes a fair amount of time, quite
> > probably more than the typical hold time of a spinlock.
>
> IIRC it would spin for a while before blocking..
>
> /me goes re-read some of that thread...
>
> Ah, its because PLE is curing most of it.. !PLE it had huge gains but
> apparently nobody cares about !PLE hardware anymore :-)

Hmm.. it looked like under light overcommit the paravirt ticket lock
still had some gain (~10%) and of course it brings the fairness thing
which is always good.

I can only imagine the mess unfair + vcpu preemption can bring to guest
tasks.

2013-04-22 20:50:30

by Rik van Riel

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On 04/22/2013 04:46 PM, Jiannan Ouyang wrote:

> It would still be very interesting to conduct more experiments to
> compare these two, to see if the fairness enforced by pv_lock is
> mandatory, and if preemptable-lock outperforms pv_lock in most cases,
> and how do they work with PLE.

Given the fairly high cost of rescheduling a VCPU (which is likely
to include an IPI), versus the short hold time of most spinlocks,
I have the strong suspicion that your approach would win.

The fairness is only compromised in a limited way and in certain
circumstances, so I am not too worried about that.

--
All rights reversed

2013-04-22 20:50:45

by Jiannan Ouyang

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Mon, Apr 22, 2013 at 4:44 PM, Peter Zijlstra <[email protected]> wrote:
> On Mon, 2013-04-22 at 16:32 -0400, Rik van Riel wrote:
>>
>> IIRC one of the reasons was that the performance improvement wasn't
>> as obvious. Rescheduling VCPUs takes a fair amount of time, quite
>> probably more than the typical hold time of a spinlock.
>
> IIRC it would spin for a while before blocking..
>
> /me goes re-read some of that thread...
>
> Ah, its because PLE is curing most of it.. !PLE it had huge gains but
> apparently nobody cares about !PLE hardware anymore :-)
>

For now, I don't know how good it can work with PLE. But I think it
should save the time of VMEXIT on PLE machine.

2013-04-22 20:51:13

by Rik van Riel

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On 04/22/2013 04:48 PM, Peter Zijlstra wrote:

> Hmm.. it looked like under light overcommit the paravirt ticket lock
> still had some gain (~10%) and of course it brings the fairness thing
> which is always good.
>
> I can only imagine the mess unfair + vcpu preemption can bring to guest
> tasks.

If you think unfairness + vcpu preemption is bad, you haven't
tried full fairness + vcpu preemption :)

--
All rights reversed

2013-04-22 20:54:42

by Vinod, Chegu

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On 4/22/2013 1:50 PM, Jiannan Ouyang wrote:
> On Mon, Apr 22, 2013 at 4:44 PM, Peter Zijlstra <[email protected]> wrote:
>> On Mon, 2013-04-22 at 16:32 -0400, Rik van Riel wrote:
>>> IIRC one of the reasons was that the performance improvement wasn't
>>> as obvious. Rescheduling VCPUs takes a fair amount of time, quite
>>> probably more than the typical hold time of a spinlock.
>> IIRC it would spin for a while before blocking..
>>
>> /me goes re-read some of that thread...
>>
>> Ah, its because PLE is curing most of it.. !PLE it had huge gains but
>> apparently nobody cares about !PLE hardware anymore :-)
>>
> For now, I don't know how good it can work with PLE. But I think it
> should save the time of VMEXIT on PLE machine.
> .
>
Thanks for sharing your patch. 'am waiting for your v2 patch(es) and
then let you any review feedback. Hoping to verify your changes on a
large box (PLE enabled) and get back to you with some data...

Thanks
Vinod

2013-04-22 20:55:46

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Mon, 2013-04-22 at 16:46 -0400, Jiannan Ouyang wrote:
> On Mon, Apr 22, 2013 at 4:08 PM, Peter Zijlstra <[email protected]> wrote:
>
> >
> > I much prefer the entire series from Jeremy since it maintains the
> > ticket semantics and doesn't degrade the lock to unfair under
> > contention.
> >
> > Now I suppose there's a reason its not been merged yet and I suspect
> > its !paravirt hotpath impact which wasn't rightly justified or somesuch
> > so maybe someone can work on that or so.. dunno.
> >
> >
>
> In my paper, I comparable preemptable-lock and pv_lock on KVM from
> Raghu and Jeremy.

Which pv_lock? The current pv spinlock mess is basically the old unfair
thing. The later patch series I referred to earlier implemented a
paravirt ticket lock, that should perform much better under overcommit.

> Results show that:
> - preemptable-lock improves performance significantly without paravirt support

But completely wrecks our native spinlock implementation so that's not
going to happen of course ;-)

> - preemptable-lock can also be paravirtualized, which outperforms
> pv_lock, especially when overcommited by 3 or more

See above..

> - pv-preemptable-lock has much less performance variance compare to
> pv_lock, because it adapts to preemption within VM,
> other than using rescheduling that increase VM interference

I would say it has a _much_ worse worst case (and thus worse variance)
than the paravirt ticket implementation from Jeremy. While full
paravirt ticket lock results in vcpu scheduling it does maintain
fairness.

If you drop strict fairness you can end up in unbounded starvation
cases and those are very ugly indeed.

> It would still be very interesting to conduct more experiments to
> compare these two, to see if the fairness enforced by pv_lock is
> mandatory, and if preemptable-lock outperforms pv_lock in most cases,
> and how do they work with PLE.

Be more specific, pv_lock as currently upstream is a trainwreck mostly
done because pure ticket spinners and vcpu-preemption are even worse.

2013-04-22 21:02:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Mon, 2013-04-22 at 16:49 -0400, Rik van Riel wrote:
> Given the fairly high cost of rescheduling a VCPU (which is likely
> to include an IPI), versus the short hold time of most spinlocks,
> I have the strong suspicion that your approach would win.

https://lkml.org/lkml/2012/5/2/101

If you schedule too often your SPIN_THRESHOLD is far too low.

Anyway.. performance can't be that bad, otherwise Jeremey would have
spend as much time on it as he did.

2013-04-22 21:31:17

by Jiannan Ouyang

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Mon, Apr 22, 2013 at 4:55 PM, Peter Zijlstra <[email protected]> wrote:

>
> Which pv_lock? The current pv spinlock mess is basically the old unfair
> thing. The later patch series I referred to earlier implemented a
> paravirt ticket lock, that should perform much better under overcommit.
>

Yes, it is a paravirt *ticket* spinck. I got the patch from
Raghavendra K T through email
http://lwn.net/Articles/495597/

2013-04-22 21:57:09

by Andi Kleen

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

Rik van Riel <[email protected]> writes:
>
> If we always incremented the ticket number by 2 (instead of 1), then
> we could use the lower bit of the ticket number as the spinlock.

Spinning on a single bit is very inefficient, as you need to do
try lock in a loop which is very unfriendly to the MESI state protocol.
It's much better to have at least three states and allow
spinning-while-reading-only.

This is typically very visible on systems with >2S.

-Andi

--
[email protected] -- Speaking for myself only

2013-04-22 23:08:29

by Rik van Riel

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On 04/22/2013 04:55 PM, Peter Zijlstra wrote:
> On Mon, 2013-04-22 at 16:46 -0400, Jiannan Ouyang wrote:

>> - pv-preemptable-lock has much less performance variance compare to
>> pv_lock, because it adapts to preemption within VM,
>> other than using rescheduling that increase VM interference
>
> I would say it has a _much_ worse worst case (and thus worse variance)
> than the paravirt ticket implementation from Jeremy. While full
> paravirt ticket lock results in vcpu scheduling it does maintain
> fairness.
>
> If you drop strict fairness you can end up in unbounded starvation
> cases and those are very ugly indeed.

If needed, Jiannan's scheme could easily be bounded to prevent
infinite starvation. For example, we could allow only the first
8 CPUs in line to jump the queue.

However, given the way that virtual CPUs get scheduled in and
out all the time, I suspect starvation is not a worry, and we
will not need the additional complexity to deal with it.

You may want to play around with virtualization a bit, to get
a feel for how things work in virt land.

--
All rights reversed

2013-04-22 23:14:10

by Rik van Riel

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On 04/22/2013 05:56 PM, Andi Kleen wrote:
> Rik van Riel <[email protected]> writes:
>>
>> If we always incremented the ticket number by 2 (instead of 1), then
>> we could use the lower bit of the ticket number as the spinlock.
>
> Spinning on a single bit is very inefficient, as you need to do
> try lock in a loop which is very unfriendly to the MESI state protocol.
> It's much better to have at least three states and allow
> spinning-while-reading-only.
>
> This is typically very visible on systems with >2S.

Absolutely, the spinning should be read-only, until the CPU
sees that the desired bit is clear. MESI-friendly spinning
is essential.

--
All rights reversed

2013-04-23 01:39:46

by Raghavendra K T

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On 04/23/2013 01:19 AM, Peter Zijlstra wrote:
> On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote:
>> On 04/22/2013 07:51 AM, Peter Zijlstra wrote:
>>> On Sun, 2013-04-21 at 17:12 -0400, Rik van Riel wrote:
>>>>
>>>> If we always incremented the ticket number by 2 (instead of 1), then
>>>> we could use the lower bit of the ticket number as the spinlock.
>>>
>>> ISTR that paravirt ticket locks already do that and use the lsb to
>>> indicate the unlock needs to perform wakeups.
>>>
>>> Also, since all of this is virt nonsense, shouldn't it live in the
>>> paravirt ticket lock code and leave the native code as is?
>>
>> Sure, but that is still no reason not to have the virt
>> implementation be as fast as possible, and share the same
>> data type as the non-virt implementation.
>
> It has to share the same data-type..
>
>> Also, is it guaranteed that the native spin_lock code has
>> not been called yet before we switch over to the paravirt
>> functions?
>>
>> If the native spin_lock code has been called already at
>> that time, the native code would still need to be modified
>> to increment the ticket number by 2, so we end up with a
>> compatible value in each spin lock's .tickets field, and
>> prevent a deadlock after we switch over to the paravirt
>> variant.
>
> I thought the stuff already made it upstream, but apparently not; the
> lastest posting I'm aware of is here:
>
> https://lkml.org/lkml/2012/5/2/105
>
> That stuff changes the normal ticket increment as well..
>

pv-ticket spinlock went on hold state, after Avi acked because of:

though on non-PLE, we get a huge advantage, on PLE machine the benefit
was not as impressive (~10% as you stated in email chain) compared to
the complexity of the patches.
So Avi suggested to try PLE improvements first, so they are going upstream.

https://lkml.org/lkml/2012/7/18/247
https://lkml.org/lkml/2013/1/22/104
https://lkml.org/lkml/2013/2/6/345 (on the way in kvm tree)

Current status of PV spinlock:
I have the rebased patches of pv spinlocks and experimenting with latest
kernel.I have
Gleb's irq delivery incorporated into the patch series. But I am
thinknig whether I can
improve some guest side logic in unlock.
I will probably setup a githup and post the link soon.

2013-04-23 04:59:55

by Raghavendra K T

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On 04/23/2013 02:31 AM, Peter Zijlstra wrote:
> On Mon, 2013-04-22 at 16:49 -0400, Rik van Riel wrote:
>> Given the fairly high cost of rescheduling a VCPU (which is likely
>> to include an IPI), versus the short hold time of most spinlocks,
>> I have the strong suspicion that your approach would win.
>
> https://lkml.org/lkml/2012/5/2/101
>
> If you schedule too often your SPIN_THRESHOLD is far too low.
>
> Anyway.. performance can't be that bad, otherwise Jeremey would have
> spend as much time on it as he did.

When I experimented last time ideal SPIN_THRESHOLD for PLE machine,
was around 4k, 8k. Jeremy's experiment was on a non-PLE machine AFAIK,
which complemented PLE feature in a nice way with 2k threshold.

2013-04-23 05:58:06

by Gleb Natapov

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Mon, Apr 22, 2013 at 07:08:06PM -0400, Rik van Riel wrote:
> On 04/22/2013 04:55 PM, Peter Zijlstra wrote:
> >On Mon, 2013-04-22 at 16:46 -0400, Jiannan Ouyang wrote:
>
> >>- pv-preemptable-lock has much less performance variance compare to
> >>pv_lock, because it adapts to preemption within VM,
> >> other than using rescheduling that increase VM interference
> >
> >I would say it has a _much_ worse worst case (and thus worse variance)
> >than the paravirt ticket implementation from Jeremy. While full
> >paravirt ticket lock results in vcpu scheduling it does maintain
> >fairness.
> >
> >If you drop strict fairness you can end up in unbounded starvation
> >cases and those are very ugly indeed.
>
> If needed, Jiannan's scheme could easily be bounded to prevent
> infinite starvation. For example, we could allow only the first
> 8 CPUs in line to jump the queue.
>
> However, given the way that virtual CPUs get scheduled in and
> out all the time, I suspect starvation is not a worry, and we
> will not need the additional complexity to deal with it.
>
FWIW RHEL6 uses unfair spinlock when it runs as a guest. We never got
reports about problems due to this on any scale.

> You may want to play around with virtualization a bit, to get
> a feel for how things work in virt land.
>
> --
> All rights reversed

--
Gleb.

2013-05-30 11:52:02

by Raghavendra K T

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On 04/23/2013 07:12 AM, Raghavendra K T wrote:
> On 04/23/2013 01:19 AM, Peter Zijlstra wrote:
>> On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote:
>>> On 04/22/2013 07:51 AM, Peter Zijlstra wrote:
>>>> On Sun, 2013-04-21 at 17:12 -0400, Rik van Riel wrote:
>>>>>
>>>>> If we always incremented the ticket number by 2 (instead of 1), then
>>>>> we could use the lower bit of the ticket number as the spinlock.
>>>>
>>>> ISTR that paravirt ticket locks already do that and use the lsb to
>>>> indicate the unlock needs to perform wakeups.
>>>>
>>>> Also, since all of this is virt nonsense, shouldn't it live in the
>>>> paravirt ticket lock code and leave the native code as is?
>>>
>>> Sure, but that is still no reason not to have the virt
>>> implementation be as fast as possible, and share the same
>>> data type as the non-virt implementation.
>>
>> It has to share the same data-type..
>>
>>> Also, is it guaranteed that the native spin_lock code has
>>> not been called yet before we switch over to the paravirt
>>> functions?
>>>
>>> If the native spin_lock code has been called already at
>>> that time, the native code would still need to be modified
>>> to increment the ticket number by 2, so we end up with a
>>> compatible value in each spin lock's .tickets field, and
>>> prevent a deadlock after we switch over to the paravirt
>>> variant.
>>
>> I thought the stuff already made it upstream, but apparently not; the
>> lastest posting I'm aware of is here:
>>
>> https://lkml.org/lkml/2012/5/2/105
>>
>> That stuff changes the normal ticket increment as well..
>>
>
> pv-ticket spinlock went on hold state, after Avi acked because of:
>
> though on non-PLE, we get a huge advantage, on PLE machine the benefit
> was not as impressive (~10% as you stated in email chain) compared to
> the complexity of the patches.
> So Avi suggested to try PLE improvements first, so they are going upstream.
>
> https://lkml.org/lkml/2012/7/18/247
> https://lkml.org/lkml/2013/1/22/104
> https://lkml.org/lkml/2013/2/6/345 (on the way in kvm tree)
>
> Current status of PV spinlock:
> I have the rebased patches of pv spinlocks and experimenting with latest
> kernel.I have
> Gleb's irq delivery incorporated into the patch series. But I am
> thinknig whether I can
> improve some guest side logic in unlock.
> I will probably setup a githup and post the link soon.

Sorry for late reply.

Here is the branch with pvpspinlock V9 version in github reabsed to 3.10-rc

https://github.com/ktraghavendra/linux/tree/pvspinlock_v9

planning post a formal email in a separate thread with link a to this
branch (instead of spamming with 19 patches)

Main changes w.r.t v8 are
- Changed spin_threshold to 32k to avoid excess halt exits that are
causing undercommit degradation (after PLE handler improvement).
- Added kvm_irq_delivery_to_apic (suggested by Gleb)
- optimized halt exit path to use PLE handler

2013-05-30 20:14:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Preemptable Ticket Spinlock

On Thu, 30 May 2013, Raghavendra K T wrote:
> Here is the branch with pvpspinlock V9 version in github reabsed to 3.10-rc
>
> https://github.com/ktraghavendra/linux/tree/pvspinlock_v9
>
> planning post a formal email in a separate thread with link a to this
> branch (instead of spamming with 19 patches)

19 patches is not really spam if you compare it to the total number of
mails per day on LKML.

The git tree is nice for people who want to test stuff easily, but if
you want people to review and comment patches, then please use mail.

Thanks,

tglx