Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp5182131rwl; Wed, 28 Dec 2022 14:21:57 -0800 (PST) X-Google-Smtp-Source: AMrXdXu6TEVMcZFOTEKPfgpM/l9b7Ep5K1oT8NC7cOL2bQ2X3AM4RBx7a9WiPova6Sdy0sw95HTQ X-Received: by 2002:a17:902:da86:b0:18d:6138:e4f6 with SMTP id j6-20020a170902da8600b0018d6138e4f6mr37851123plx.29.1672266117625; Wed, 28 Dec 2022 14:21:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672266117; cv=none; d=google.com; s=arc-20160816; b=Rrp6Aa9Yi1mp3vtYGKBOd6zq7fB1WtK4qyn/fJyuiDLHXMboxut5th2PEV7GYjEZpa pLtQG9x6D20Q/jFg6V9ZW9nJEdSKeEv7kkQg/O/kb8+QpIjAZynd0MuMmCQHgJjr4cIp Z6OBElFXbWWj/kO79/r4QtN/QOawhIf7hYQunvCi2Ocxph1Ovj0hFdUNmstHxjx/+KfE T1lkjsh+msp+w1qoqyeZKvOW0V1BveCtPCNwbwlBGIImMs4JNSweuPnCq+L/LFk50uaw hYj+zt0UfGmcwKID+lR2JNd4fv/aX1YYBmOIgIXmQFdgeXlGpFnZtg+b2ZEUXNzJLuti x0JA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=SvSokKNDJAEaCfD8TFes2TiKScYf/bbpWdZ8iiamMQI=; b=Qz2AK4059qwkNd3sVG5th3xQIhXcVFP/BW1GD5FHtum/6t2kvMccf3Y+66ykwcQHWe jUgTBN/vZzg6m2JA49a/A2D3xJTVGBLr/yhqIxDIl8takRCdxe8L93CWopkJ/C9PFrL/ 8yzJSksE02viiZXmPPZKJpTuqcnivseERBtCGqyMQsDgV+wsTRH5/8FZm40hFq2jyGht xVkkXhzaE8w/4xt8M7kxLMj3QEK8xUHw3a1FPZTygaBVfgjwlT5LX3sfR4IfbR8Jp+3j wbkkQ5rTTZw9tZfYnGjwxpuaOFTombDFH3qVfIIae5OPUn+Nmhb8YC1FQRxjeFN2PwAV jguQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=G6tLhpCc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w17-20020a170903311100b00186e9ff4edcsi16194020plc.408.2022.12.28.14.21.49; Wed, 28 Dec 2022 14:21:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=G6tLhpCc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231906AbiL1WIh (ORCPT + 62 others); Wed, 28 Dec 2022 17:08:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231398AbiL1WIb (ORCPT ); Wed, 28 Dec 2022 17:08:31 -0500 Received: from mail-yb1-xb2b.google.com (mail-yb1-xb2b.google.com [IPv6:2607:f8b0:4864:20::b2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4D4115805 for ; Wed, 28 Dec 2022 14:08:30 -0800 (PST) Received: by mail-yb1-xb2b.google.com with SMTP id g4so18771759ybg.7 for ; Wed, 28 Dec 2022 14:08:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=SvSokKNDJAEaCfD8TFes2TiKScYf/bbpWdZ8iiamMQI=; b=G6tLhpCctymZGps2C7oBomuMrBuFUXYLeMlyrHYm3Ewycr1zXfhDAD3ZXQ9C/LujI5 pNUj2Y8vsyzwDdfYSoOhRNdCAnpZJ0viroJi0xsxmiN2ZwSFSKjHE/q375i0GHYX47Xz S59I1WUDc2OS7fe12DnmoB5NqrLS4MS1R5kZQy6XTXaxjrdKtscaXueETMeCY6oTDfl1 0ZlR6ouZBcw9krdKyLVwRBq9+yO+iSl6bl0J3eQZhoiGXqN54/WH9OR79MRccWvN5zW1 mJx/WEYHt+WrZeiexQCDuGcPDgliv9ppZ24lFCEnYmDBZ5VmeOHtWICkOp0hm/lNv4Uy OZ6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SvSokKNDJAEaCfD8TFes2TiKScYf/bbpWdZ8iiamMQI=; b=k7E06jVUeKCf8BbbyK5x+qZbHAEKqm83Yc1swVGp9cJnoyqcAXOtFnxjskLPg2pH+c 4u+5tY91MbJ3//Zd1z0mSz7gKK2/HHVObqaEE13Q+orQbEYQmgM2H5Jzq/4XUsXnmMUc CRvfn/Oje7mNdFkVa9BHeG00c3Bj6f8MeErGG6RV/BTj/Lk7AaX530zh4hfefNy+AhuD fAhFzS07NEvy4tJDHVB8EGKcitigrD1hzgohR+shcR8/PU7raU6qObv2VERFkStIzTGf EyWDCBf7iyn8nEiTv4L1jMkZOQQIP7w6yLFhCe/2IABr//J04H+GhVH1J2eUwLC0lSAX jvQg== X-Gm-Message-State: AFqh2koIyzg3crLoAtCmBpXuYM93/U7B0g7H8AfbtbE+yXjH6sf7scMB wu2l55gnM50X3wAv3bGQvC+CS5NPYncFaCZQ+/9RjQ== X-Received: by 2002:a25:3745:0:b0:75b:b01c:6a2c with SMTP id e66-20020a253745000000b0075bb01c6a2cmr2908470yba.166.1672265309768; Wed, 28 Dec 2022 14:08:29 -0800 (PST) MIME-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> <20221222023457.1764-6-vipinsh@google.com> In-Reply-To: From: Vipin Sharma Date: Wed, 28 Dec 2022 14:07:54 -0800 Message-ID: Subject: Re: [Patch v3 5/9] KVM: x86/mmu: Allocate TDP page table's page on correct NUMA node on split To: Ben Gardon Cc: seanjc@google.com, pbonzini@redhat.com, dmatlack@google.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 27, 2022 at 11:02 AM Ben Gardon wrote: > > On Wed, Dec 21, 2022 at 6:35 PM Vipin Sharma wrote: > > > > When dirty log is enabled, huge pages are split. Page table's pages > > Nit: Suggest "When huge pages are split for dirty log" since this can > happen at various points during dirty logging. > Same below. > Yeah, this should be updated. > > during the split are allocated based on the current thread NUMA node or > > mempolicy. This causes inefficient page table accesses if underlying > > page is on a different NUMA node > > > > Allocate page table's pages on the same NUMA node as the underlying huge > > page when dirty log is enabled and huge pages are split. > > > > The performance gain during the pre-copy phase of live migrations of a > > 416 vCPUs and 11 TiB memory VM on a 8 node host was seen in the range > > of 130% to 150%. > > > > Suggested-by: David Matlack > > Signed-off-by: Vipin Sharma > > --- > > arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++++---- > > include/linux/kvm_host.h | 18 ++++++++++++++++++ > > 2 files changed, 26 insertions(+), 4 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > > index 4974fa96deff..376b8dceb3f9 100644 > > --- a/arch/x86/kvm/mmu/tdp_mmu.c > > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > > @@ -1403,7 +1403,7 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, > > return spte_set; > > } > > > > -static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp) > > +static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(int nid, gfp_t gfp) > > { > > struct kvm_mmu_page *sp; > > > > @@ -1413,7 +1413,8 @@ static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp) > > if (!sp) > > return NULL; > > > > - sp->spt = (void *)__get_free_page(gfp); > > + sp->spt = kvm_mmu_get_free_page(nid, gfp); > > + > > Just so that kvm_mmu_get_free_page isn't dead code in the previous > commit, I'd do this refactor there and just pass NUMA_NO_NODE here. > Agreed. > > if (!sp->spt) { > > kmem_cache_free(mmu_page_header_cache, sp); > > return NULL; > > @@ -1427,6 +1428,9 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, > > bool shared) > > { > > struct kvm_mmu_page *sp; > > + int nid; > > + > > + nid = kvm_pfn_to_page_table_nid(spte_to_pfn(iter->old_spte)); > > > > /* > > * Since we are allocating while under the MMU lock we have to be > > @@ -1437,7 +1441,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, > > * If this allocation fails we drop the lock and retry with reclaim > > * allowed. > > */ > > - sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT); > > + sp = __tdp_mmu_alloc_sp_for_split(nid, GFP_NOWAIT | __GFP_ACCOUNT); > > if (sp) > > return sp; > > > > @@ -1449,7 +1453,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, > > write_unlock(&kvm->mmu_lock); > > > > iter->yielded = true; > > - sp = __tdp_mmu_alloc_sp_for_split(GFP_KERNEL_ACCOUNT); > > + sp = __tdp_mmu_alloc_sp_for_split(nid, GFP_KERNEL_ACCOUNT); > > > > if (shared) > > read_lock(&kvm->mmu_lock); > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index d48064503b88..a262e15ebd19 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -1583,6 +1583,24 @@ void kvm_arch_sync_events(struct kvm *kvm); > > int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu); > > > > struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn); > > + > > +/* > > + * Tells the appropriate NUMA node location of the page table's page based on > > + * pfn it will point to. > > + * > > + * Return the nid of the page if pfn is valid and backed by a refcounted page, > > + * otherwise, return the nearest memory node for the current CPU. > > Nit: Should this be "current thread"? I will say "current thread CPU". As memory nodes are near to CPUs whereas threads can execute on multiple CPUs throughout its lifetime. > > > + */ > > +static inline int kvm_pfn_to_page_table_nid(kvm_pfn_t pfn) > > This could just be kvm_pfn_nid (or even better kvm_pfn_node_id) since > this really has nothing to do with page tables. We just want to know > which NUMA node backs the given PFN. Apart from NUMA node backing the given PFN, it can also return the nearest NUMA node via numa_mem_id(). So, it is actually telling which NUMA node is the best one for the page table's page, given a PFN. > > > +{ > > + struct page *page = kvm_pfn_to_refcounted_page(pfn); > > + > > + if (page) > > + return page_to_nid(page); > > + else > > + return numa_mem_id(); > > +} > > + > > bool kvm_is_zone_device_page(struct page *page); > > > > struct kvm_irq_ack_notifier { > > -- > > 2.39.0.314.g84b9a713c41-goog > >