Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp3576027rwl; Tue, 27 Dec 2022 11:16:58 -0800 (PST) X-Google-Smtp-Source: AMrXdXsoAm2I/888v5LnA1yIpTlYgHE3FgTRbrO8m8/xPWtpmT0hD2sd+2NpL3NQaIfJ9AD8eUDR X-Received: by 2002:a05:6a21:6d88:b0:af:8ff3:fc80 with SMTP id wl8-20020a056a216d8800b000af8ff3fc80mr41125914pzb.20.1672168617849; Tue, 27 Dec 2022 11:16:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672168617; cv=none; d=google.com; s=arc-20160816; b=KCfynQxZqBOHTAd/ZrhNyvE1j91KK15SlckU+5L108NU3PN4WSwDbIS9BYC/oIDPx6 LNv8KtkFmqu4ErnLvRVqIJfmcORu4S+KFfQwP4mOkla5QKVibVMs8UUw5qnb4ZEGzEgW YxkBN5AaBWzrpgjpAXf5XWK0r2vohg+YmYuPliv/LPu4j5G4htbNM468/tLiqrW1iUrV wAqPkGU+XVknX68usstJTN3atFqdisaKVqUNb5fS/1YiqqVNl3PUp06RL9ZmnP1YHMg1 fmL7OxgByQtZEoufiUD+vAAwVtXVXnEk/KShpbUScI7xUB/7j6Y9dHMzHwjI2UJmNkTw eNrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=O0p+ppNOGRKb/NOMxrUDYaSzLYMOpLEPChffMwzyD4s=; b=trsE9t9wxyim/CqWEaAIkiSrZuxYrKfwlJ6r3yqqPjTR+a0FOqX1CwpDTe3Xj/JQ9I Z8FqMBzBo8vhVzEZ8xzUhkc/N2JaxUpS4kWfYCEmX/FfVUlYluxaep7s0yQy+uNeqb6+ 1fPqVGhTFkP8dneOfrBSXnhL0qVwiyciJ/MpPJM+IAM5C0utJkp36vGXvUcCdPnoqevD 4tV0mghzuhmx4ZkrSwCH/ovFb+ywEyEQy4mx8lf/VFS66v1959eHB1Sfsp5f95Ul6upX xM9xZOjN29Gn2GfkLdThpYylPhlLTmcw9NSfEi8Y5OEVF088I97hRVAcgRVWFoqcrZIt 7F8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=G+As5w1O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c16-20020a170903235000b00174418919a6si15697761plh.513.2022.12.27.11.16.48; Tue, 27 Dec 2022 11:16:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=G+As5w1O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229987AbiL0TCw (ORCPT + 67 others); Tue, 27 Dec 2022 14:02:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229533AbiL0TCt (ORCPT ); Tue, 27 Dec 2022 14:02:49 -0500 Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E97C2D114 for ; Tue, 27 Dec 2022 11:02:48 -0800 (PST) Received: by mail-ed1-x52a.google.com with SMTP id u28so15309992edd.10 for ; Tue, 27 Dec 2022 11:02:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=O0p+ppNOGRKb/NOMxrUDYaSzLYMOpLEPChffMwzyD4s=; b=G+As5w1O9GJAgcoSWrA1V1BBzCp4+BkPOjQB9qG4fRx7dwEQAqXK8xYYxvvOgFcZIt 4hMjo/Ckrx//oepVJWG+oBx9wRpONu/UKVWzE8Iwi4VauxCXxC8lCy6OI0Q/jjhX+BWf ZaLhfnQcfZWQ/DEFbzeCFUYcT2X+aIEsad4VbI5Pmg+ATzQkoe5xMe8Cmf229TBWVoAV wAIHOKQq7zUg3Hj/40JbeWk+DT09nO2Sg6jlT/fp08Gw8gAJ6g2xIKZHcWjzaIeIprgn 7dXxfdxKgZ5pGaPhrLMuW+7FinDO9XBlyfJMzZHReg9OFaAlYfViqIVKOSlg9Iirj5bC KT6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=O0p+ppNOGRKb/NOMxrUDYaSzLYMOpLEPChffMwzyD4s=; b=svWoRFL5es4I09qUl6Fps+lTPb/4EQjmFjsdkMwOf80bd5IlA2MYKEu+p+4AWO+NZc ZFAylsSzK/Wq2vD5mYCpxz/PbEeIOi/MgXO7UhUgH6xWIeryGM/zGbsSbaLF6aGu+3BJ 7hdcyIvSSr/oelzSHDW15IyfFa3ZA6EH7nlOX7d/h84WZ9XZBWEmtL+dYl8aFkHG+ocY fZpcg9VEgABVTggBMEtUMUEec5EAGojWeV92l0fidNXQEt1nVE/S7r2C3ylJW/fkTP+c TBI9hYpmflO37Pk+LsDm0Cjwe1olFsIk3yhGgCRhATaRs4MconUcPgcZ7e6xL3/Y/yqI v3gA== X-Gm-Message-State: AFqh2kri5Hr6bESyyzkanKCgL5rtTxffZqo5oWrmSRxjeRfWCYY32YC1 Cyh3yTyGrvQbv0KC5pW4FdAj1w3ITBFEJ2e7TdIydg== X-Received: by 2002:a05:6402:204d:b0:471:c322:ce41 with SMTP id bc13-20020a056402204d00b00471c322ce41mr1566013edb.4.1672167767277; Tue, 27 Dec 2022 11:02:47 -0800 (PST) MIME-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> <20221222023457.1764-6-vipinsh@google.com> In-Reply-To: <20221222023457.1764-6-vipinsh@google.com> From: Ben Gardon Date: Tue, 27 Dec 2022 11:02:35 -0800 Message-ID: Subject: Re: [Patch v3 5/9] KVM: x86/mmu: Allocate TDP page table's page on correct NUMA node on split To: Vipin Sharma Cc: seanjc@google.com, pbonzini@redhat.com, dmatlack@google.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 21, 2022 at 6:35 PM Vipin Sharma wrote: > > When dirty log is enabled, huge pages are split. Page table's pages Nit: Suggest "When huge pages are split for dirty log" since this can happen at various points during dirty logging. Same below. > during the split are allocated based on the current thread NUMA node or > mempolicy. This causes inefficient page table accesses if underlying > page is on a different NUMA node > > Allocate page table's pages on the same NUMA node as the underlying huge > page when dirty log is enabled and huge pages are split. > > The performance gain during the pre-copy phase of live migrations of a > 416 vCPUs and 11 TiB memory VM on a 8 node host was seen in the range > of 130% to 150%. > > Suggested-by: David Matlack > Signed-off-by: Vipin Sharma > --- > arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++++---- > include/linux/kvm_host.h | 18 ++++++++++++++++++ > 2 files changed, 26 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index 4974fa96deff..376b8dceb3f9 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -1403,7 +1403,7 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, > return spte_set; > } > > -static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp) > +static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(int nid, gfp_t gfp) > { > struct kvm_mmu_page *sp; > > @@ -1413,7 +1413,8 @@ static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp) > if (!sp) > return NULL; > > - sp->spt = (void *)__get_free_page(gfp); > + sp->spt = kvm_mmu_get_free_page(nid, gfp); > + Just so that kvm_mmu_get_free_page isn't dead code in the previous commit, I'd do this refactor there and just pass NUMA_NO_NODE here. > if (!sp->spt) { > kmem_cache_free(mmu_page_header_cache, sp); > return NULL; > @@ -1427,6 +1428,9 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, > bool shared) > { > struct kvm_mmu_page *sp; > + int nid; > + > + nid = kvm_pfn_to_page_table_nid(spte_to_pfn(iter->old_spte)); > > /* > * Since we are allocating while under the MMU lock we have to be > @@ -1437,7 +1441,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, > * If this allocation fails we drop the lock and retry with reclaim > * allowed. > */ > - sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT); > + sp = __tdp_mmu_alloc_sp_for_split(nid, GFP_NOWAIT | __GFP_ACCOUNT); > if (sp) > return sp; > > @@ -1449,7 +1453,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, > write_unlock(&kvm->mmu_lock); > > iter->yielded = true; > - sp = __tdp_mmu_alloc_sp_for_split(GFP_KERNEL_ACCOUNT); > + sp = __tdp_mmu_alloc_sp_for_split(nid, GFP_KERNEL_ACCOUNT); > > if (shared) > read_lock(&kvm->mmu_lock); > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index d48064503b88..a262e15ebd19 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -1583,6 +1583,24 @@ void kvm_arch_sync_events(struct kvm *kvm); > int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu); > > struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn); > + > +/* > + * Tells the appropriate NUMA node location of the page table's page based on > + * pfn it will point to. > + * > + * Return the nid of the page if pfn is valid and backed by a refcounted page, > + * otherwise, return the nearest memory node for the current CPU. Nit: Should this be "current thread"? > + */ > +static inline int kvm_pfn_to_page_table_nid(kvm_pfn_t pfn) This could just be kvm_pfn_nid (or even better kvm_pfn_node_id) since this really has nothing to do with page tables. We just want to know which NUMA node backs the given PFN. > +{ > + struct page *page = kvm_pfn_to_refcounted_page(pfn); > + > + if (page) > + return page_to_nid(page); > + else > + return numa_mem_id(); > +} > + > bool kvm_is_zone_device_page(struct page *page); > > struct kvm_irq_ack_notifier { > -- > 2.39.0.314.g84b9a713c41-goog >