Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp1639853pxb; Wed, 20 Oct 2021 08:58:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxeI1xUE3hgvBZ1nKg6CzSpZt67CfcrolYM68haI/iNmlW4Y3PiwmbIbt9H0ZZI27oLuIkS X-Received: by 2002:a17:906:4e4a:: with SMTP id g10mr199782ejw.524.1634745539134; Wed, 20 Oct 2021 08:58:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634745539; cv=none; d=google.com; s=arc-20160816; b=W7dKNtLPvemmkxlyO1Q412mDdLEP1rP1vcFL64q4fiTTTVfP8xzae6HLyoP0qKDCTY BThVtHTFu6oWxw8ifKSldQwgF7kAbPHdU/g0AGH3IvX148uuXiRzJag7EFU1Nmc6pjfL +l+h3sebb+KfdwCk1bKrhla5Yzhck4b3Lyzy3FzssO1AKqSQZXquMAgvJtdzGMPnpWFU n51X2GMXjT28+G5qigTjTyaaNQIJf/NfLlgFb0YffBNOOEeoRswbRyKSrEW760+wT6ox aug7d29mgda3aQ78ytN0IxdfbesuvQDWUa3X5+gevHdh96nmae1Doe/9bQi1SVPPB0OZ SRlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Mk0Fw1d85Ctx+UYR5JvQoEBClZUQNjs0O369mrPqXFw=; b=jnbwBZfXdQCLfi9QLSydKGgQArZBKNSNBYhtpafheLP3RUzXoejwmuUCp1IJxl2daZ CCTcG7h/3l9jMH0jrbfhl+vm5SDZfThxCFOAaex8ntE6vdeMBZwSd5h48ooeIiFdUK5Z QIzb5kGrghn0O5uPzauCImlPHVHvKlHNZJ/YmzuzfU73SUtOsqK6Q7YXc6/iu1xHW5x1 j83M5IBsclokuLljdl/77ltYQ3DORrc3IFz9JINtMYt+Q+c5ECSIaoyfYM4N27kkB9yH qRp+N4ndEOnP3Kgw47HNsx/zfyhqpSYPlMQufPKb3xRC1RKnSo3MfKVSOSKLCznCq4Ul uLYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=etuxTANy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s13si5003131edd.23.2021.10.20.08.58.29; Wed, 20 Oct 2021 08:58:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=etuxTANy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231166AbhJTP7A (ORCPT + 99 others); Wed, 20 Oct 2021 11:59:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231162AbhJTP6q (ORCPT ); Wed, 20 Oct 2021 11:58:46 -0400 Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1632C061753 for ; Wed, 20 Oct 2021 08:56:31 -0700 (PDT) Received: by mail-lf1-x130.google.com with SMTP id t9so16039866lfd.1 for ; Wed, 20 Oct 2021 08:56:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Mk0Fw1d85Ctx+UYR5JvQoEBClZUQNjs0O369mrPqXFw=; b=etuxTANyHHFxn/JCnCXn/cSvjX5oH38psGo9/gqj6vd7vZJbVfe7+loM4FsXy0OJs4 RKepimgEpd/dm6cQd9D9sqpKCTfqfBn8MmLzb9DMG6yhIMxd8DfvYC4FdWuNACDHadBA r2RzpY0Zt7cFCdeSj5d64dJO2LM6Ye3crlfklWk4hnZorxP+bDG+yUzZjUY0TecrYQ7Q 6DgI8REXuAF9D29IwaHkRJHKkmp2eR5aUIUmN9Cc5Y1TVFkUmB5XQMX2me7n0oTz37wZ weAgDMhHdki0gtWBxsU64jB2uwPAcsNI/SRlv5VUovG7suSqMFEb6W5mAQGo5JXcecro 4Cqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Mk0Fw1d85Ctx+UYR5JvQoEBClZUQNjs0O369mrPqXFw=; b=PAtARNhWeOjSqqlM0oZuwHuaHUvrQGa3gG3PFhlWD6M7sfzO/PAkrLNANgPk3dy/iU hri41zZo2lxQmsWBUWb30WzPhHARF7/EZC7gB0ZbDY9kOf5Kxvbt7sa8w5Jck6ft1oUl qzeEspnjkFocQ5+C2z4p62nEBTn0MWM21ZV3jj7VGq97CNXe1BeFyOAJLM/otRUOpe1v lB2+vjDZqPWJ4SDuJC0IFoNSW70EhQwLsGlTcfAcK0sIs8SF63OMN/MeQ+H+kzqzj8h5 jQH2JohFTH8sJVzwMgh+P+D2rGSCm/L870G/V9540RUD9Syr4mQyafsYA7Yidu2CejiJ ejGw== X-Gm-Message-State: AOAM533dRdUliiNPX6VS7iIMi7AZNCBafXVQQBiguf2/CHaaw1nZ+Ivl oDsGZfyMVEfNhxkcrBTNYv/rYrn2cjc2XL3y1FvpMQ== X-Received: by 2002:ac2:5fee:: with SMTP id s14mr81434lfg.537.1634745389914; Wed, 20 Oct 2021 08:56:29 -0700 (PDT) MIME-Version: 1.0 References: <20211019153214.109519-1-senozhatsky@chromium.org> <20211019153214.109519-2-senozhatsky@chromium.org> In-Reply-To: From: David Matlack Date: Wed, 20 Oct 2021 08:56:03 -0700 Message-ID: Subject: Re: [PATCHV2 1/3] KVM: x86: introduce kvm_mmu_pte_prefetch structure To: Sergey Senozhatsky Cc: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Suleiman Souhlal , kvm list , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 19, 2021 at 6:24 PM Sergey Senozhatsky wrote: > > On (21/10/19 15:44), David Matlack wrote: > > On Tue, Oct 19, 2021 at 8:32 AM Sergey Senozhatsky > > wrote: > > > > > > kvm_mmu_pte_prefetch is a per-VCPU structure that holds a PTE > > > prefetch pages array, lock and the number of PTE to prefetch. > > > > > > This is needed to turn PTE_PREFETCH_NUM into a tunable VM > > > parameter. > > > > > > Signed-off-by: Sergey Senozhatsky > > > --- > > > arch/x86/include/asm/kvm_host.h | 12 +++++++ > > > arch/x86/kvm/mmu.h | 4 +++ > > > arch/x86/kvm/mmu/mmu.c | 57 ++++++++++++++++++++++++++++++--- > > > arch/x86/kvm/x86.c | 9 +++++- > > > 4 files changed, 77 insertions(+), 5 deletions(-) > > > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > > index 5271fce6cd65..11400bc3c70d 100644 > > > --- a/arch/x86/include/asm/kvm_host.h > > > +++ b/arch/x86/include/asm/kvm_host.h > > > @@ -607,6 +607,16 @@ struct kvm_vcpu_xen { > > > u64 runstate_times[4]; > > > }; > > > > > > +struct kvm_mmu_pte_prefetch { > > > + /* > > > + * This will be cast either to array of pointers to struct page, > > > + * or array of u64, or array of u32 > > > + */ > > > + void *ents; > > > + unsigned int num_ents; > > > + spinlock_t lock; > > > > The spinlock is overkill. I'd suggest something like this: > > - When VM-ioctl is invoked to update prefetch count, store it in > > kvm_arch. No synchronization with vCPUs needed. > > - When a vCPU takes a fault: Read the prefetch count from kvm_arch. If > > different than count at last fault, re-allocate vCPU prefetch array. > > (So you'll need to add prefetch array and count to kvm_vcpu_arch as > > well.) > > > > No extra locks are needed. vCPUs that fault after the VM-ioctl will > > get the new prefetch count. We don't really care if a prefetch count > > update races with a vCPU fault as long as vCPUs are careful to only > > read the count once (i.e. use READ_ONCE(vcpu->kvm.prefetch_count)) and > > use that. Assuming prefetch count ioctls are rare, the re-allocation > > on the fault path will be rare as well. > > So reallocation from the faul-path should happen before vCPU takes the > mmu_lock? Yes. Take a look at mmu_topup_memory_caches for an example of allocating in the fault path prior to taking the mmu lock. > And READ_ONCE(prefetch_count) should also happen before vCPU > takes mmu_lock, I assume, so we need to pass it as a parameter to all > the functions that will access prefetch array. Store the value of READ_ONCE(prefetch_count) in struct kvm_vcpu_arch because you also need to know if it changes on the next fault. Then you also don't have to add a parameter to a bunch of functions in the fault path. > > > Note: You could apply this same approach to a module param, except > > vCPUs would be reading the module param rather than vcpu->kvm during > > each fault. > > > > And the other alternative, like you suggested in the other patch, is > > to use a vCPU ioctl. That would side-step the synchronization issue > > because vCPU ioctls require the vCPU mutex. So the reallocation could > > be done in the ioctl and not at fault time. > > One more idea, wonder what do you think: > > There is an upper limit on the number of PTEs we prefault, which is 128 as of > now, but I think 64 will be good enough, or maybe even 32. So we can always > allocate MAX_PTE_PREFETCH_NUM arrays in vcpu->arch and ioctl() will change > ->num_ents only, which is always in (0, MAX_PTE_PREFETCH_NUM - 1] range. This > way we never have to reallocate anything, we just adjust the "maximum index" > value. 128 * 8 would be 1KB per vCPU. That is probably reasonable, but I don't think the re-allocation would be that complex. > > > Taking a step back, can you say a bit more about your usecase? > > We are looking at various ways of reducing the number of vm-exits. There > is only one VM running on the device (a pretty low-end laptop). When you say reduce the number of vm-exits, can you be more specific? Are you trying to reduce the time it takes for vCPUs to fault in memory during VM startup? I just mention because there are likely other techniques you can apply that would not require modifying KVM code (e.g. prefaulting the host memory before running the VM, using the TDP MMU instead of the legacy MMU to allow parallel faults, using hugepages to map in more memory per fault, etc.)