LinuxLists.cc - Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

2012-05-02 05:40:04

Subject: Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

On 04/29/2012 04:50 PM, Takuya Yoshikawa wrote:

> On Fri, 27 Apr 2012 11:52:13 -0300
> Marcelo Tosatti <[email protected]> wrote:
>
>> Yes but the objective you are aiming for is to read and write sptes
>> without mmu_lock. That is, i am not talking about this patch.
>> Please read carefully the two examples i gave (separated by "example)").
>
> The real objective is not still clear.
>
> The ~10% improvement reported before was on macro benchmarks during live
> migration. At least, that optimization was the initial objective.
>
> But at some point, the objective suddenly changed to "lock-less" without
> understanding what introduced the original improvement.
>
> Was the problem really mmu_lock contention?
>

Takuya, i am so tired to argue the advantage of lockless write-protect
and lockless O(1) dirty-log again and again.

> If the path being introduced by this patch is really fast, isn't it
> possible to achieve the same improvement still using mmu_lock?
>
>
> Note: During live migration, the fact that the guest gets faulted is
> itself a limitation. We could easily see noticeable slowdown of a
> program even if it runs only between two GET_DIRTY_LOGs.
>

Obviously no.

It depends on what the guest is doing, from my autotest test, it very
easily to see that, the huge improvement is on bench-migration not
pure-migration.

>
>> The rules for code under mmu_lock should be:
>>
>> 1) Spte updates under mmu lock must always be atomic and
>> with locked instructions.
>> 2) Spte values must be read once, and appropriate action
>> must be taken when writing them back in case their value
>> has changed (remote TLB flush might be required).
>
> Although I am not certain about what will be really needed in the
> final form, if this kind of maybe-needed-overhead is going to be
> added little by little, I worry about possible regression.

Well, will you suggest Linus to reject all patches and stop
all discussion for the "possible regression" reason?

2012-05-02 21:15:41

by Marcelo Tosatti

[permalink] [raw]

Subject: Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

On Wed, May 02, 2012 at 01:39:51PM +0800, Xiao Guangrong wrote:
> On 04/29/2012 04:50 PM, Takuya Yoshikawa wrote:
>
> > On Fri, 27 Apr 2012 11:52:13 -0300
> > Marcelo Tosatti <[email protected]> wrote:
> >
> >> Yes but the objective you are aiming for is to read and write sptes
> >> without mmu_lock. That is, i am not talking about this patch.
> >> Please read carefully the two examples i gave (separated by "example)").
> >
> > The real objective is not still clear.
> >
> > The ~10% improvement reported before was on macro benchmarks during live
> > migration. At least, that optimization was the initial objective.
> >
> > But at some point, the objective suddenly changed to "lock-less" without
> > understanding what introduced the original improvement.
> >
> > Was the problem really mmu_lock contention?
> >
>
>
> Takuya, i am so tired to argue the advantage of lockless write-protect
> and lockless O(1) dirty-log again and again.

His point is valid: there is a lack of understanding on the details of
the improvement.

Did you see the pahole output on struct kvm? Apparently mmu_lock is
sharing a cacheline with read-intensive memslots pointer. It would be
interesting to see what are the effects of cacheline aligning mmu_lock.

> > If the path being introduced by this patch is really fast, isn't it
> > possible to achieve the same improvement still using mmu_lock?
> >
> >
> > Note: During live migration, the fact that the guest gets faulted is
> > itself a limitation. We could easily see noticeable slowdown of a
> > program even if it runs only between two GET_DIRTY_LOGs.
> >
>
>
> Obviously no.
>
> It depends on what the guest is doing, from my autotest test, it very
> easily to see that, the huge improvement is on bench-migration not
> pure-migration.
>
> >
> >> The rules for code under mmu_lock should be:
> >>
> >> 1) Spte updates under mmu lock must always be atomic and
> >> with locked instructions.
> >> 2) Spte values must be read once, and appropriate action
> >> must be taken when writing them back in case their value
> >> has changed (remote TLB flush might be required).
> >
> > Although I am not certain about what will be really needed in the
> > final form, if this kind of maybe-needed-overhead is going to be
> > added little by little, I worry about possible regression.
>
>
> Well, will you suggest Linus to reject all patches and stop
> all discussion for the "possible regression" reason?
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-05-03 00:16:06

by Takuya Yoshikawa

[permalink] [raw]

Subject: Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

On Wed, 02 May 2012 13:39:51 +0800
Xiao Guangrong <[email protected]> wrote:

> > Was the problem really mmu_lock contention?

> Takuya, i am so tired to argue the advantage of lockless write-protect
> and lockless O(1) dirty-log again and again.

You are missing my point. Please do not take my comments as an objection
to your whole work: whey do you feel so?

I thought that your new fast-page-fault path was fast and optimized
the guest during dirty logging.

So in this v4, you might get a similar result even before dropping
mmu_lock, without 07/10?, if the problem Marcelo explained was not there.

Of course there is a problem of mmu_lock contention. What I am suggesting
is to split that problem and do measurement separately so that part of
your work can be merged soon.

Your guest size and workload was small to make get_dirty hold mmu_lock
long time. If you want to appeal the real value of lock-less, you need to
do another measurment.

But this is your work and it's up to you. Although I was thinking to help
your measurement, I cannot do that knowing the fact that you would not
welcome my help.

> > Although I am not certain about what will be really needed in the
> > final form, if this kind of maybe-needed-overhead is going to be
> > added little by little, I worry about possible regression.

> Well, will you suggest Linus to reject all patches and stop
> all discussion for the "possible regression" reason?

My concern was for Marcelo's examples, not your current implementation.
If you can show explicitely what will be needed in the final form,
I do not have any concern.

Sorry for disturbing.

Thanks,
Takuya

2012-05-03 12:09:37

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

On 05/03/2012 05:10 AM, Marcelo Tosatti wrote:

> On Wed, May 02, 2012 at 01:39:51PM +0800, Xiao Guangrong wrote:
>> On 04/29/2012 04:50 PM, Takuya Yoshikawa wrote:
>>
>>> On Fri, 27 Apr 2012 11:52:13 -0300
>>> Marcelo Tosatti <[email protected]> wrote:
>>>
>>>> Yes but the objective you are aiming for is to read and write sptes
>>>> without mmu_lock. That is, i am not talking about this patch.
>>>> Please read carefully the two examples i gave (separated by "example)").
>>>
>>> The real objective is not still clear.
>>>
>>> The ~10% improvement reported before was on macro benchmarks during live
>>> migration. At least, that optimization was the initial objective.
>>>
>>> But at some point, the objective suddenly changed to "lock-less" without
>>> understanding what introduced the original improvement.
>>>
>>> Was the problem really mmu_lock contention?
>>>
>>
>>
>> Takuya, i am so tired to argue the advantage of lockless write-protect
>> and lockless O(1) dirty-log again and again.
>
> His point is valid: there is a lack of understanding on the details of
> the improvement.
>

Actually, the improvement of lockless is that it can let vcpu to be parallel
as possible.

>From the test result, lockless gains little improvement for unix-migration,
in this case, the vcpus are almost idle (at least not busy).

The large improvement is from dbench-migration, in this case, all vcpus are
busy accessing memory which is write-protected by dirty-log. If you enable
page-fault/fast-page-fault tracepoints, you can see huge number of page fault
from different vcpu during the migration.

> Did you see the pahole output on struct kvm? Apparently mmu_lock is
> sharing a cacheline with read-intensive memslots pointer. It would be
> interesting to see what are the effects of cacheline aligning mmu_lock.
>

Yes, i see that. In my test .config, i have enabled
CONFIG_DEBUG_SPINLOCK/CONFIG_DEBUG_LOCK_ALLOC, mmu-lock is not sharing cacheline
with memslots. That means it is not a problem during my test.
(BTW, pahole can not work on my box, it shows:
......
DW_AT_<0x3c>=0x19
DW_AT_<0x3c>=0x19
DW_AT_<0x3c>=0x19
die__process_function: DW_TAG_INVALID (0x4109) @ <0x12886> not handled!
)

If we reorganize 'struct kvm', i guess it is good for kvm but it can not improve
too much for migration. :)

2012-05-03 12:13:54

by Avi Kivity

[permalink] [raw]

Subject: Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

On 05/03/2012 03:09 PM, Xiao Guangrong wrote:
> Actually, the improvement of lockless is that it can let vcpu to be parallel
> as possible.
>
> From the test result, lockless gains little improvement for unix-migration,
> in this case, the vcpus are almost idle (at least not busy).
>
> The large improvement is from dbench-migration, in this case, all vcpus are
> busy accessing memory which is write-protected by dirty-log. If you enable
> page-fault/fast-page-fault tracepoints, you can see huge number of page fault
> from different vcpu during the migration.
>

We can kill the page faults completely by using A/D bits in shadow page
tables. The latest version of the Intel SDM defines A/D bits for EPT,
and NPT has had them from day one.

Of course this comes at a cost, instead of traversing a bitmap we have
to read all sptes, which are 64x as large. But it's probably worthwhile
for large guests and perhaps also for smaller ones.

--
error compiling committee.c: too many arguments to function

2012-05-03 12:23:27

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

On 05/03/2012 08:15 AM, Takuya Yoshikawa wrote:

> On Wed, 02 May 2012 13:39:51 +0800
> Xiao Guangrong <[email protected]> wrote:
>
>>> Was the problem really mmu_lock contention?
>
>> Takuya, i am so tired to argue the advantage of lockless write-protect
>> and lockless O(1) dirty-log again and again.
>
> You are missing my point. Please do not take my comments as an objection
> to your whole work: whey do you feel so?
>

Takuya, i am sorry, please forgive my rudeness! Since my English is
so poor that it is easy for me to misunderstand the mail.　:(

> I thought that your new fast-page-fault path was fast and optimized
> the guest during dirty logging.
>
> So in this v4, you might get a similar result even before dropping
> mmu_lock, without 07/10?, if the problem Marcelo explained was not there.
>

Actually, the improvement is larger than v2/v3 if ept is enabled, but
it is lower for ept disabled. This is because the fask-fask (rmap.WRITABLE bit)
is dropped for better review.

>
> Of course there is a problem of mmu_lock contention. What I am suggesting
> is to split that problem and do measurement separately so that part of
> your work can be merged soon.
>
> Your guest size and workload was small to make get_dirty hold mmu_lock
> long time. If you want to appeal the real value of lock-less, you need to
> do another measurment.
>
>
> But this is your work and it's up to you. Although I was thinking to help
> your measurement, I cannot do that knowing the fact that you would not
> welcome my help.
>

Of course, any measurement is appreciative!

>
>>> Although I am not certain about what will be really needed in the
>>> final form, if this kind of maybe-needed-overhead is going to be
>>> added little by little, I worry about possible regression.
>
>> Well, will you suggest Linus to reject all patches and stop
>> all discussion for the "possible regression" reason?
>
> My concern was for Marcelo's examples, not your current implementation.
> If you can show explicitely what will be needed in the final form,
> I do not have any concern.
>
>
> Sorry for disturbing.

Sorry again.

2012-05-03 12:40:54

by Takuya Yoshikawa

[permalink] [raw]

Subject: Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

On Thu, 03 May 2012 20:23:18 +0800
Xiao Guangrong <[email protected]> wrote:

> Takuya, i am sorry, please forgive my rudeness! Since my English is
> so poor that it is easy for me to misunderstand the mail.$B!!(B:(

Me too, I am not good at reading/speaking English!
No problem.

> > But this is your work and it's up to you. Although I was thinking to help
> > your measurement, I cannot do that knowing the fact that you would not
> > welcome my help.

> Of course, any measurement is appreciative!

I am interested in checking the improvement in my own eyes.

> > Sorry for disturbing.

> Sorry again.

No problem, really.
Let's discuss more from now on!

What I am really worrying about is the lack of review in this area.
I am not the person who is good at MMU things.

We should collect more comments if possible from other developers.

Thanks,
Takuya