Received: by 2002:ac0:e350:0:0:0:0:0 with SMTP id g16csp2020799imn; Mon, 1 Aug 2022 08:22:07 -0700 (PDT) X-Google-Smtp-Source: AA6agR4V5UpFSM2K4gFP4fBPm/fnjrP+cxYPjYDK3VHvj01OOQuabeBM70ZEu+gvmLydjYdDhQYk X-Received: by 2002:a17:90b:4b8b:b0:1f5:72f:6551 with SMTP id lr11-20020a17090b4b8b00b001f5072f6551mr4639168pjb.194.1659367326857; Mon, 01 Aug 2022 08:22:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659367326; cv=none; d=google.com; s=arc-20160816; b=XAUpA+fU7BlEYJg0OqBFDf0ncWsk1gritsSvb5srJvixabxIRd33GLnQSBy04F7B9v uRRuF6Ix/luKYAU1S2yy8BRTrM8P28ozjMoRmaZbNHJbeL42IRELzYiSxvZ+xB5cqLq8 XeZd+M69wKdhmpp9aq+KGhNGm+a3XKhgVCg++KhL+FE7j7MZ/qWTy8QtPXspgTroTIf4 8by3r63T0avJhZgg3k+2M+3wnq42H0zdrgyoGsnVJouO14TgyB6pTROnAedtoBS40Isb rB0Ai7aHe/yQYgU6YMOU9prV0mmq8Eum/T5Jo2uXPPkOlNfheMFKcuws7wIZTkC24NRh NvJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:mime-version:message-id:date :dkim-signature; bh=S/2jI0VtfXi7m6ZMf+qwyQLgOYuMiCIT7wBl4PXRX+g=; b=mlBcnMk4nTEft5UFH3AuTjwUkEZewTqLJMNZjP1sm4Y/qCAXnDrarzS1ks0isjeSgB wrPfBPV3IBu/+RPj7WIH5VGlR8RyJO/ePLCdjDDJatfuMSvmAFt0n0YFkqLV5V0mWdQx B1VHsg9NLN7JPH4lubXDny/Ww10Fq7EcJw9b33zuhzzOHNvXJNLunhX65T3uLO3LUBwA z+cbMb1MyO4bM4MWiO+pJXxicqwgPYI5ff6lgBf8qUBRTG29nzWUEal9BWfJ8r4nH2D+ AEW8BlkvvYS1i65W4K3OMAHJWlA/WQWHKhpiHbBA2G6yQIwdeI1uHWQiYDDNosgjOS+p gwrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="OqEy/n2o"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z6-20020a170902d54600b0016229a39343si11371071plf.115.2022.08.01.08.21.52; Mon, 01 Aug 2022 08:22:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="OqEy/n2o"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233382AbiHAPTn (ORCPT + 99 others); Mon, 1 Aug 2022 11:19:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232357AbiHAPTl (ORCPT ); Mon, 1 Aug 2022 11:19:41 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C10712AC4F for ; Mon, 1 Aug 2022 08:19:40 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id c15-20020a170902d48f00b0016c01db365cso7550508plg.20 for ; Mon, 01 Aug 2022 08:19:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:mime-version:message-id:date:from:to:cc; bh=S/2jI0VtfXi7m6ZMf+qwyQLgOYuMiCIT7wBl4PXRX+g=; b=OqEy/n2odJcpLuQeQrEW3uvv4W6U5wbHVoh+YVePDwAMenajqwhMu4ssIBWUMxydvo PKYVsYiXs6raDPcmybsCA24N7Xr/PKhc3cv0Vwr6BMp4HyuldkXfgDcjH9L3QbzEG3q5 bnNJS2cnVbxfJyehbXL9D+5LzG6I9WBICeXtTSON6X+SPopmA/aqmPqDBG7bmgqEp6PG 5KtSDvU7j4UB39QtuZW1XmQ721oGR4wdvlfPYnTwC+l7niSzawHFaz2uGJslDdVkhgFL 9D7Sf0+Iz1Jj8kgeI9C//InH/tv88mSYgGkCu1saaRUz09yZ27m4tFnY+LnZQP+AJSP8 CdjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:mime-version:message-id:date:x-gm-message-state :from:to:cc; bh=S/2jI0VtfXi7m6ZMf+qwyQLgOYuMiCIT7wBl4PXRX+g=; b=zDkSnsKYOBh4pNWnFgp7Qo2AReof2+Oc8bl6V+fOmOJQl5/FIHUOSt7MrOMMgx+Lz7 1VkM+wrr/pkwUmtR58vzpJDPXf459JQ9qjdiFDx6x5JlcTXfDgCbMkhG/Bj0zF3hYv/m kaNITHT4i4cOnCC7tgM9KZZbfYfAx2jR7YZzL6JW0O3A8F1+8mLY2bPexpiI2eRJCIlJ cUWQm9LhnVgfDe78Q8LgJv+cbRAe+4PEEnNDhFKK3sUpe/4vkeby4hsukfLj/q2CnV/6 94VzeroFi6hj3OU0h832aIAEQx+5bSF8i9XPddlm3Ci5Je/42NevIGEnEdMxmWBmgQ7M qmfg== X-Gm-Message-State: AJIora9m+ZtVt8CPyGah64WAEqxalESmAD8i0GdGeWDwi0g9tuM6aURn M621IOO2esqgwkjWrHYXIVB6BBaDeYzE X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a63:6c45:0:b0:419:b668:6d with SMTP id h66-20020a636c45000000b00419b668006dmr13395927pgc.548.1659367180173; Mon, 01 Aug 2022 08:19:40 -0700 (PDT) Date: Mon, 1 Aug 2022 08:19:28 -0700 Message-Id: <20220801151928.270380-1-vipinsh@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.37.1.455.g008518b4e5-goog Subject: [PATCH] KVM: x86/mmu: Make page tables for eager page splitting NUMA aware From: Vipin Sharma To: seanjc@google.com, dmatlack@google.com, pbonzini@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org tdp_mmu_alloc_sp_for_split() allocates page tables for Eager Page Splitting. Currently it does not specify a NUMA node preference, so it will try to allocate from the local node. The thread doing eager page splitting is supplied by the userspace and may not be running on the same node where it would be best for page tables to be allocated. We can improve TDP MMU eager page splitting by making tdp_mmu_alloc_sp_for_split() NUMA-aware. Specifically, when splitting a huge page, allocate the new lower level page tables on the same node as the huge page. __get_free_page() is replaced by alloc_page_nodes(). This introduces two functional changes. 1. __get_free_page() removes gfp flag __GFP_HIGHMEM via call to __get_free_pages(). This should not be an issue as __GFP_HIGHMEM flag is not passed in tdp_mmu_alloc_sp_for_split() anyway. 2. __get_free_page() calls alloc_pages() and use thread's mempolicy for the NUMA node allocation. From this commit, thread's mempolicy will not be used and first preference will be to allocate on the node where huge page was present. dirty_log_perf_test for 416 vcpu and 1GB/vcpu configuration on a 8 NUMA node machine showed dirty memory time improvements between 2% - 35% in multiple runs. Suggested-by: David Matlack Signed-off-by: Vipin Sharma --- arch/x86/kvm/mmu/tdp_mmu.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index bf2ccf9debcaa..1e30e18fc6a03 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1402,9 +1402,19 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, return spte_set; } -static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp) +/* + * Caller's responsibility to pass a valid spte which has the shadow page + * present. + */ +static int tdp_mmu_spte_to_nid(u64 spte) +{ + return page_to_nid(pfn_to_page(spte_to_pfn(spte))); +} + +static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(int nid, gfp_t gfp) { struct kvm_mmu_page *sp; + struct page *spt_page; gfp |= __GFP_ZERO; @@ -1412,11 +1422,12 @@ static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp) if (!sp) return NULL; - sp->spt = (void *)__get_free_page(gfp); - if (!sp->spt) { + spt_page = alloc_pages_node(nid, gfp, 0); + if (!spt_page) { kmem_cache_free(mmu_page_header_cache, sp); return NULL; } + sp->spt = page_address(spt_page); return sp; } @@ -1426,6 +1437,9 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, bool shared) { struct kvm_mmu_page *sp; + int nid; + + nid = tdp_mmu_spte_to_nid(iter->old_spte); /* * Since we are allocating while under the MMU lock we have to be @@ -1436,7 +1450,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, * If this allocation fails we drop the lock and retry with reclaim * allowed. */ - sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT); + sp = __tdp_mmu_alloc_sp_for_split(nid, GFP_NOWAIT | __GFP_ACCOUNT); if (sp) return sp; @@ -1448,7 +1462,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, write_unlock(&kvm->mmu_lock); iter->yielded = true; - sp = __tdp_mmu_alloc_sp_for_split(GFP_KERNEL_ACCOUNT); + sp = __tdp_mmu_alloc_sp_for_split(nid, GFP_KERNEL_ACCOUNT); if (shared) read_lock(&kvm->mmu_lock); -- 2.37.1.455.g008518b4e5-goog