[permalink] [raw]

Subject: Re: [PATCH v7 10/11] KVM: MMU: collapse TLB flushes when zap all pages

On Thu, 30 May 2013 03:53:38 +0300
Gleb Natapov <[email protected]> wrote:

> On Wed, May 29, 2013 at 09:19:41PM +0800, Xiao Guangrong wrote:
> > On 05/29/2013 08:39 PM, Marcelo Tosatti wrote:
> > > On Wed, May 29, 2013 at 11:03:19AM +0800, Xiao Guangrong wrote:
> > >>>>> the pages since other vcpus may be doing locklessly shadow
> > >>>>> page walking
> > >>>
> > >>> Ah, yes, i agree with you.
> > >>>
> > >>> We can introduce a list, say kvm->arch.obsolte_pages, to link all of the
> > >>> zapped-page, the page-shrink will free the page on that list first.
> > >>>
> > >>> Marcelo, if you do not have objection on patch 1 ~ 8 and 11, could you please
> > >>> let them merged first, and do add some comments and tlb optimization later?
> > >>
> > >> Exclude patch 11 please, since it depends on the "collapse" optimization.
> > >
> > > I'm fine with patch 1 being merged. I think the remaining patches need better
> > > understanding or explanation. The problems i see are:
> > >
> > > 1) The magic number "10" to zap before considering reschedule is
> > > annoying. It would be good to understand why it is needed at all.
> >
> > ......
> >
> > >
> > > But then again, the testcase is measuring kvm_mmu_zap_all performance
> > > alone which we know is not a common operation, so perhaps there is
> > > no need for that minimum-pages-to-zap-before-reschedule.
> >
> > Well. Although, this is not the common operation, but this operation
> > can be triggered by VCPU - it one VCPU take long time on zap-all-pages,
> > other vcpus is missing IPI-synce, or missing IO. This is easily cause
> > soft lockups if the vcpu is doing memslot-releated things.
> >
> +1. If it is trigarable by a guest it may slow down the guest, but we
> should not allow for it to slow down a host.
>

Well, I don't object to the minimum-pages-to-zap-before-reschedule idea
itself, but if you're going to take patch 4, please at least add a warning
in the changelog that the magic number "10" was selected without good enough
reasoning.

"[ It improves kernel building 0.6% ~ 1% ]" alone will make it hard for
others to change the number later.

I actually once tried to do a similar thing for other code. So I have a
possible reasoning for this, and 10 should probably be changed later.

Takuya

2013-05-30 17:11:11

by Takuya Yoshikawa

[permalink] [raw]

Subject: Re: [PATCH v7 10/11] KVM: MMU: collapse TLB flushes when zap all pages

On Fri, 31 May 2013 01:24:43 +0900
Takuya Yoshikawa <[email protected]> wrote:

> On Thu, 30 May 2013 03:53:38 +0300
> Gleb Natapov <[email protected]> wrote:
>
> > On Wed, May 29, 2013 at 09:19:41PM +0800, Xiao Guangrong wrote:
> > > On 05/29/2013 08:39 PM, Marcelo Tosatti wrote:
> > > > On Wed, May 29, 2013 at 11:03:19AM +0800, Xiao Guangrong wrote:
> > > >>>>> the pages since other vcpus may be doing locklessly shadow
> > > >>>>> page walking
> > > >>>
> > > >>> Ah, yes, i agree with you.
> > > >>>
> > > >>> We can introduce a list, say kvm->arch.obsolte_pages, to link all of the
> > > >>> zapped-page, the page-shrink will free the page on that list first.
> > > >>>
> > > >>> Marcelo, if you do not have objection on patch 1 ~ 8 and 11, could you please
> > > >>> let them merged first, and do add some comments and tlb optimization later?
> > > >>
> > > >> Exclude patch 11 please, since it depends on the "collapse" optimization.
> > > >
> > > > I'm fine with patch 1 being merged. I think the remaining patches need better
> > > > understanding or explanation. The problems i see are:
> > > >
> > > > 1) The magic number "10" to zap before considering reschedule is
> > > > annoying. It would be good to understand why it is needed at all.
> > >
> > > ......
> > >
> > > >
> > > > But then again, the testcase is measuring kvm_mmu_zap_all performance
> > > > alone which we know is not a common operation, so perhaps there is
> > > > no need for that minimum-pages-to-zap-before-reschedule.
> > >
> > > Well. Although, this is not the common operation, but this operation
> > > can be triggered by VCPU - it one VCPU take long time on zap-all-pages,
> > > other vcpus is missing IPI-synce, or missing IO. This is easily cause
> > > soft lockups if the vcpu is doing memslot-releated things.
> > >
> > +1. If it is trigarable by a guest it may slow down the guest, but we
> > should not allow for it to slow down a host.
> >
>
> Well, I don't object to the minimum-pages-to-zap-before-reschedule idea
> itself, but if you're going to take patch 4, please at least add a warning
> in the changelog that the magic number "10" was selected without good enough
> reasoning.
>
> "[ It improves kernel building 0.6% ~ 1% ]" alone will make it hard for
> others to change the number later.
>
> I actually once tried to do a similar thing for other code. So I have a
> possible reasoning for this, and 10 should probably be changed later.
>

In this case, the solution seems to be very simple: just drop spin_needbreak()
and leave need_resched() alone.

This way we can guarantee that zap-all will get a fair amount of CPU time for
each scheduling from the host scheduler's point of view. Of course this can
block other VCPU threads waiting for mmu_lock during that time slice, but
should be much better than blocking them for some magical number of zappings.

We also need to remember that spin_needbreak() does not do anything for some
preempt config settings.

Takuya