Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp2610853pxb; Tue, 12 Oct 2021 09:53:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzN2+PhWE3MKW5q4mobaJ3GySuAEfkzhqNR3ncWl2/pnQ6JHlFyWVqulo0uddCkAOXoA1fA X-Received: by 2002:a50:d88b:: with SMTP id p11mr1183460edj.287.1634057606425; Tue, 12 Oct 2021 09:53:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634057606; cv=none; d=google.com; s=arc-20160816; b=vW2clPecyHj0t8/l7TBK8zHinkylmiQ7DXKUdRTlXxci0eZ/WlEp6Vsfuam7/i+Jy2 wXNGaIhOWET6W4fou2LkMA700kuZHVBMgs1jpIHME0qkV5S53+6gpceeNEI5hQINyUHj 9jh8xpWcUAG4YCrAg93h75kWqf5ACSBQ+pXGHrHHFUybwNuWlRJYdLsi5h1Ev/buHnuC WhnOFRJFD76NnV1cfSU1fwgtPGCp9Paq3DKd1EcMR5JdQNH+tI/w40EhMrtKwTFuot5T XS9zc3XV5RTZkwswzI7Esrd78RDSGjUiLOk0ZUBkuPNtix69giy45MEU/L3t276uohcQ AC+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=HXB7pq6VEUj8hxAFp1wWH0HcYGQ43G7lLPeeJ7mMODg=; b=aO3Y4HVHAEjoCNWHSydqOugoGPznE6y/i/vv5w4dzQxC1XBgukqvzyUvl7H2DKPk0S RfabP6+Nfgx1oxClogkwnHNZRcCW7EuBADOChnBuEvWu7OqO8DZEkRSPRKe6BcwreRKp KTSuyp19PYv+ULnbxqx44AamTClLF5fykcjfVZtMA6vcCXy9cK2yhX8JpQVd4WUkMAbG 5b1e2gvZ3c0IqivnA1HXTWCPWb/2d9TgMgykaEyzsY8BvhEW2CDx+P4LCt9KubMsDLzU F5Mfm9Y+0nw4m7OnWxXuwvQ0a3wWYf+Dm6aos4K4PWrYYImC2sE2cqNWQXLKBqD7BP0x 7jSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=qMyN2kuR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g17si20406963ejm.145.2021.10.12.09.53.01; Tue, 12 Oct 2021 09:53:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=qMyN2kuR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231671AbhJLQxA (ORCPT + 99 others); Tue, 12 Oct 2021 12:53:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231717AbhJLQw5 (ORCPT ); Tue, 12 Oct 2021 12:52:57 -0400 Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A648C061745 for ; Tue, 12 Oct 2021 09:50:55 -0700 (PDT) Received: by mail-lf1-x129.google.com with SMTP id x27so89943692lfa.9 for ; Tue, 12 Oct 2021 09:50:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=HXB7pq6VEUj8hxAFp1wWH0HcYGQ43G7lLPeeJ7mMODg=; b=qMyN2kuRAR8/b61nBYxlai/y5zhTPNYGYI5pokwNJupaFCQHgeTZug5eGhtgzpvxdi hCEQSkXye10IsuHbo6/vc/0sfEtJFkAsHrxa5na2TzV1rrGmVvuiOkqTulzBA1aVD1Qw nqfVej6T3hSdhTVenZqFlGwJLSc+N1OAfO6P2oBAYUJ5C2OTWwLvaZ3pGKaEgdZ8DsMk P6tHTzZMOm7n7/1hLByRhwqP6Bn6j6m9DC/WhPEF+PATF0J3l1FvU+QwU1XaqJHqSrDW Hfu6cqaCFWCM08slvQEEigCwglxJqlkpigGtEJXs1m1hQe4kfkKxDyLRf6C2ZabDIb9c ZsoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=HXB7pq6VEUj8hxAFp1wWH0HcYGQ43G7lLPeeJ7mMODg=; b=f4a26YtdR0DkdDpJ4FbGfsWhxWZjaEH6RFrWMxuj3zQCLdR/dK+IEKfqdj6FI0UPeI KqUZ46e1hM4DF6k2GnnOwv2NMY7SeMK3/zvem1HdV/job9d+aSjePhr26Xy6NyUNVL6z nS8ZnmjwG8qS/y0DRPWoSG6r4C4LkZUW7P/g0bD+7oXUA0e807hMAN3rKqF0TIotr6ZQ 3jK/RneVP36Bkk3FnALvlK/oWSF0e33GNRQZPH77tXPJgJ/IMjigNIZuamumXvEy305z 4dnax3Xy0HwHxMyZJ5D4BsR3iCLfQOqnav5WrKK4kEqIbNlMNfUGDaRKOlGIkktdYjb0 qBOA== X-Gm-Message-State: AOAM532XS69rq6PCecp3DJOW0cTnvKZH9DTwE3bT/VbOVgKp40XAvTHX ViHta5Y5t8fSAXLjOfW+X9GCf9Gd99UtWuaXRFNhxw== X-Received: by 2002:a2e:461a:: with SMTP id t26mr31245857lja.198.1634057452993; Tue, 12 Oct 2021 09:50:52 -0700 (PDT) MIME-Version: 1.0 References: <20211012091430.1754492-1-senozhatsky@chromium.org> In-Reply-To: <20211012091430.1754492-1-senozhatsky@chromium.org> From: David Matlack Date: Tue, 12 Oct 2021 09:50:26 -0700 Message-ID: Subject: Re: [PATCH] KVM: MMU: make PTE_PREFETCH_NUM tunable To: Sergey Senozhatsky Cc: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Suleiman Souhlal , kvm list , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 12, 2021 at 2:16 AM Sergey Senozhatsky wrote: > > Turn PTE_PREFETCH_NUM into a module parameter, so that it > can be tuned per-VM. Module parameters do not allow tuning per VM, they effect every VM on the machine. If you want per-VM tuning you could introduce a VM ioctl. > > - /sys/module/kvm/parameters/pte_prefetch_num 8 > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > > EPT_VIOLATION 760998 54.85% 7.23% 0.92us 31765.89us 7.78us ( +- 1.46% ) > MSR_WRITE 170599 12.30% 0.53% 0.60us 3334.13us 2.52us ( +- 0.86% ) > EXTERNAL_INTERRUPT 159510 11.50% 1.65% 0.49us 43705.81us 8.45us ( +- 7.54% ) > [..] > > Total Samples:1387305, Total events handled time:81900258.99us. > > - /sys/module/kvm/parameters/pte_prefetch_num 16 > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > > EPT_VIOLATION 658064 52.58% 7.04% 0.91us 17022.84us 8.34us ( +- 1.52% ) > MSR_WRITE 163776 13.09% 0.54% 0.56us 5192.10us 2.57us ( +- 1.25% ) > EXTERNAL_INTERRUPT 144588 11.55% 1.62% 0.48us 97410.16us 8.75us ( +- 11.44% ) > [..] > > Total Samples:1251546, Total events handled time:77956187.56us. > > Signed-off-by: Sergey Senozhatsky > --- > arch/x86/kvm/mmu/mmu.c | 31 ++++++++++++++++++++++--------- Please also update the shadow paging prefetching code in arch/x86/kvm/mmu/paging_tmpl.h, unless there is a good reason to diverge. > 1 file changed, 22 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 24a9f4c3f5e7..0ab4490674ec 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -115,6 +115,8 @@ module_param(dbg, bool, 0644); > #endif > > #define PTE_PREFETCH_NUM 8 > +static uint __read_mostly pte_prefetch_num = PTE_PREFETCH_NUM; > +module_param(pte_prefetch_num, uint, 0644); > > #define PT32_LEVEL_BITS 10 > > @@ -732,7 +734,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect) > > /* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */ > r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache, > - 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM); > + 1 + PT64_ROOT_MAX_LEVEL + pte_prefetch_num); There is a sampling problem. What happens if the user changes pte_prefetch_num while a fault is being handled? > if (r) > return r; > r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache, > @@ -2753,20 +2755,29 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, > struct kvm_mmu_page *sp, > u64 *start, u64 *end) > { > - struct page *pages[PTE_PREFETCH_NUM]; > + struct page **pages; > struct kvm_memory_slot *slot; > unsigned int access = sp->role.access; > int i, ret; > gfn_t gfn; > > + pages = kmalloc_array(pte_prefetch_num, sizeof(struct page *), > + GFP_KERNEL); This code runs with the MMU lock held. From https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html: Note, that using GFP_KERNEL implies GFP_RECLAIM, which means that direct reclaim may be triggered under memory pressure; the calling context must be allowed to sleep. In general we avoid doing any dynamic memory allocation while the MMU lock is held. That's why the memory caches exist. You can avoid allocating under a lock by allocating the prefetch array when the vCPU is first initialized. This would also solve the module parameter sampling problem because you can read it once and store it in struct kvm_vcpu. > + if (!pages) > + return -1; > + > gfn = kvm_mmu_page_get_gfn(sp, start - sp->spt); > slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, access & ACC_WRITE_MASK); > - if (!slot) > - return -1; > + if (!slot) { > + ret = -1; > + goto out; > + } > > ret = gfn_to_page_many_atomic(slot, gfn, pages, end - start); > - if (ret <= 0) > - return -1; > + if (ret <= 0) { > + ret = -1; > + goto out; > + } > > for (i = 0; i < ret; i++, gfn++, start++) { > mmu_set_spte(vcpu, slot, start, access, gfn, > @@ -2774,7 +2785,9 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, > put_page(pages[i]); > } > > - return 0; > +out: > + kfree(pages); > + return ret; > } > > static void __direct_pte_prefetch(struct kvm_vcpu *vcpu, > @@ -2785,10 +2798,10 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu, > > WARN_ON(!sp->role.direct); > > - i = (sptep - sp->spt) & ~(PTE_PREFETCH_NUM - 1); > + i = (sptep - sp->spt) & ~(pte_prefetch_num - 1); This code assumes pte_prefetch_num is a power of 2, which is now no longer guaranteed to be true. > spte = sp->spt + i; > > - for (i = 0; i < PTE_PREFETCH_NUM; i++, spte++) { > + for (i = 0; i < pte_prefetch_num; i++, spte++) { > if (is_shadow_present_pte(*spte) || spte == sptep) { > if (!start) > continue; > -- > 2.33.0.882.g93a45727a2-goog >