Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
Date:   Wed, 20 Oct 2021 10:23:54 +0900
From:   Sergey Senozhatsky <senozhatsky@chromium.org>
To:     David Matlack <dmatlack@google.com>
Cc:     Sergey Senozhatsky <senozhatsky@chromium.org>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>,
        Suleiman Souhlal <suleiman@google.com>,
        kvm list <kvm@vger.kernel.org>,
        LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCHV2 1/3] KVM: x86: introduce kvm_mmu_pte_prefetch structure
Message-ID: <YW9vqgwU+/iVooXj@google.com>
References: <20211019153214.109519-1-senozhatsky@chromium.org>
 <20211019153214.109519-2-senozhatsky@chromium.org>
 <CALzav=cLXXZYBSH6iJifkqVijLAU5EvgVg2W4HKhqke2JBa+yg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CALzav=cLXXZYBSH6iJifkqVijLAU5EvgVg2W4HKhqke2JBa+yg@mail.gmail.com>
Precedence: bulk

On (21/10/19 15:44), David Matlack wrote:
> On Tue, Oct 19, 2021 at 8:32 AM Sergey Senozhatsky
> <senozhatsky@chromium.org> wrote:
> >
> > kvm_mmu_pte_prefetch is a per-VCPU structure that holds a PTE
> > prefetch pages array, lock and the number of PTE to prefetch.
> >
> > This is needed to turn PTE_PREFETCH_NUM into a tunable VM
> > parameter.
> >
> > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> > ---
> >  arch/x86/include/asm/kvm_host.h | 12 +++++++
> >  arch/x86/kvm/mmu.h              |  4 +++
> >  arch/x86/kvm/mmu/mmu.c          | 57 ++++++++++++++++++++++++++++++---
> >  arch/x86/kvm/x86.c              |  9 +++++-
> >  4 files changed, 77 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 5271fce6cd65..11400bc3c70d 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -607,6 +607,16 @@ struct kvm_vcpu_xen {
> >         u64 runstate_times[4];
> >  };
> >
> > +struct kvm_mmu_pte_prefetch {
> > +       /*
> > +        * This will be cast either to array of pointers to struct page,
> > +        * or array of u64, or array of u32
> > +        */
> > +       void *ents;
> > +       unsigned int num_ents;
> > +       spinlock_t lock;
> 
> The spinlock is overkill. I'd suggest something like this:
> - When VM-ioctl is invoked to update prefetch count, store it in
> kvm_arch. No synchronization with vCPUs needed.
> - When a vCPU takes a fault: Read the prefetch count from kvm_arch. If
> different than count at last fault, re-allocate vCPU prefetch array.
> (So you'll need to add prefetch array and count to kvm_vcpu_arch as
> well.)
> 
> No extra locks are needed. vCPUs that fault after the VM-ioctl will
> get the new prefetch count. We don't really care if a prefetch count
> update races with a vCPU fault as long as vCPUs are careful to only
> read the count once (i.e. use READ_ONCE(vcpu->kvm.prefetch_count)) and
> use that. Assuming prefetch count ioctls are rare, the re-allocation
> on the fault path will be rare as well.

So reallocation from the faul-path should happen before vCPU takes the
mmu_lock? And READ_ONCE(prefetch_count) should also happen before vCPU
takes mmu_lock, I assume, so we need to pass it as a parameter to all
the functions that will access prefetch array.

> Note: You could apply this same approach to a module param, except
> vCPUs would be reading the module param rather than vcpu->kvm during
> each fault.
> 
> And the other alternative, like you suggested in the other patch, is
> to use a vCPU ioctl. That would side-step the synchronization issue
> because vCPU ioctls require the vCPU mutex. So the reallocation could
> be done in the ioctl and not at fault time.

One more idea, wonder what do you think:

There is an upper limit on the number of PTEs we prefault, which is 128 as of
now, but I think 64 will be good enough, or maybe even 32. So we can always
allocate MAX_PTE_PREFETCH_NUM arrays in vcpu->arch and ioctl() will change
->num_ents only, which is always in (0, MAX_PTE_PREFETCH_NUM - 1] range. This
way we never have to reallocate anything, we just adjust the "maximum index"
value.

> Taking a step back, can you say a bit more about your usecase?

We are looking at various ways of reducing the number of vm-exits. There
is only one VM running on the device (a pretty low-end laptop).