Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA9ABC61DA4 for ; Mon, 6 Mar 2023 22:41:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230092AbjCFWlm (ORCPT ); Mon, 6 Mar 2023 17:41:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229718AbjCFWlk (ORCPT ); Mon, 6 Mar 2023 17:41:40 -0500 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C267874A74 for ; Mon, 6 Mar 2023 14:41:34 -0800 (PST) Received: by mail-pf1-x44a.google.com with SMTP id y35-20020a056a00182300b005e8e2c6afe2so6186785pfa.12 for ; Mon, 06 Mar 2023 14:41:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678142494; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=bFNJWG6qjm2MPxLP7ev67uNQscemoGBQSlcIHBm/wyc=; b=C7ey9k8kuVCYIYxg7pb/31RB4zdzj0lDq+3l5WlFPkoz5lEb4hLTDreF0q/eEjxDM8 bBip4kKF/B/Fzc+u4bI9iSMbiEbYNzCMn64/UPrjVW7/6EO5wcixYlAA+waXBdkR17lB EIZPnmJeuxODV5D50rWDNHTVpcp8LCE8875YkAU5IZTO+uHE02KGiKtqNy64GPQdMTXI YmM1E60e9EjAzPfrEs0O1dUCkHSCQWcYn++QXl/HAHXSUsOhoVu0h0NCAq/LdgI/sk/L ZrJn9jbv6c8bEspACsek8WeEVXfEJRc98+PQCxJyfclbW1ekDOaMbC9V6ed9Oqru+YAt 40kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678142494; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=bFNJWG6qjm2MPxLP7ev67uNQscemoGBQSlcIHBm/wyc=; b=vriQnmoDWJVumQAXV94dWgUemwLkOkJeM/14JLGZBOrpCPkjTfHkb3bFEVASq10/Tp S2PzNC/4SCR5dPwQMJiyAPLJJRXt3ypL8XLyH8BjHOYnUrBbzy/SjkdvFbPDTjvM6tsA KLh49IJrM86AxcG+CuOb50QzqP7vSS3W8VQWz4K8mz16aSFm5gJDah28Cal+pWFtlGpg 5h2PTZAJb+pISOHlKHu/Zaja2hr8RIfuPrelSZMS9YdHj+EHNJwDlkaH8eu1fyfrR7AJ HGnu7SP6Zh/gU7YIC+wbrJQpXI2vQNR7LDIidRyR0Nj71ObLNwv8QRjzf3twadOzrWlA sDhA== X-Gm-Message-State: AO0yUKXSw3Usv3FPNCYEFC3JZ8HD1X0aOcMUX7Nfh3k3eUj7rk8emFt4 mlorjdQFxYqIYqwz3H5UOcEQTKh3eC+o X-Google-Smtp-Source: AK7set9qQPyLdRr5f3blHETLHZwfdyYDQ93LBqDMsRXqKj6MExxdOFo3X9A70W3eA+mCA3SlWfjx7pJFgaly X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a62:8307:0:b0:5a8:4dc1:5916 with SMTP id h7-20020a628307000000b005a84dc15916mr5433004pfe.2.1678142494296; Mon, 06 Mar 2023 14:41:34 -0800 (PST) Date: Mon, 6 Mar 2023 14:41:09 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog Message-ID: <20230306224127.1689967-1-vipinsh@google.com> Subject: [Patch v4 00/18] NUMA aware page table allocation From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: jmattson@google.com, mizhang@google.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, This series build up based on the feedback on v3. Biggest change in features is to enable NUMA aware page table per VM basis instead of using a module parameter for all VMs on a host. This was decided based on an internal discussion to avoid forcing all VMs to be NUMA aware on a host. We need to collect more data to see how much performance degradation a VM can get in negative testing, where vCPUs in VM are always accessing remote NUMA nodes memory instead of staying local compared to a VM which is not NUMA aware. There are other changes which are mentioned in the change log below for v4. Thanks Vipin v4: - Removed module parameter for enabling NUMA aware page table. - Added new capability KVM_CAP_NUMA_AWARE_PAGE_TABLE to enable this feature per VM. - Added documentation for the new capability. - Holding mutex just before the top up and releasing it after the fault/split is addressed. Previous version were using spinlocks two times, first time for topup and second time fetching the page from cache. - Using the existing slots_lock for split_shadow_page_cache operations. - KVM MMU shrinker will also shrink mm_shadow_info_cache besides split_shadow_page_cache and mmu_shadow_page_cache. - Reduced cache default size to 4. - Split patches into smaller ones. v3: https://lore.kernel.org/lkml/20221222023457.1764-1-vipinsh@google.com/ - Split patches into smaller ones. - Repurposed KVM MMU shrinker to free cache pages instead of oldest page table pages - Reduced cache size from 40 to 5 - Removed __weak function and initializing node value in all architectures. - Some name changes. v2: https://lore.kernel.org/lkml/20221201195718.1409782-1-vipinsh@google.com/ - All page table pages will be allocated on underlying physical page's NUMA node. - Introduced module parameter, numa_aware_pagetable, to disable this feature. - Using kvm_pfn_to_refcounted_page to get page from a pfn. v1: https://lore.kernel.org/all/20220801151928.270380-1-vipinsh@google.com/ Vipin Sharma (18): KVM: x86/mmu: Change KVM mmu shrinker to no-op KVM: x86/mmu: Remove zapped_obsolete_pages from struct kvm_arch{} KVM: x86/mmu: Track count of pages in KVM MMU page caches globally KVM: x86/mmu: Shrink shadow page caches via MMU shrinker KVM: x86/mmu: Add split_shadow_page_cache pages to global count of MMU cache pages KVM: x86/mmu: Shrink split_shadow_page_cache via MMU shrinker KVM: x86/mmu: Unconditionally count allocations from MMU page caches KVM: x86/mmu: Track unused mmu_shadowed_info_cache pages count via global counter KVM: x86/mmu: Shrink mmu_shadowed_info_cache via MMU shrinker KVM: x86/mmu: Add per VM NUMA aware page table capability KVM: x86/mmu: Add documentation of NUMA aware page table capability KVM: x86/mmu: Allocate NUMA aware page tables on TDP huge page splits KVM: mmu: Add common initialization logic for struct kvm_mmu_memory_cache{} KVM: mmu: Initialize kvm_mmu_memory_cache.gfp_zero to __GFP_ZERO by default KVM: mmu: Add NUMA node support in struct kvm_mmu_memory_cache{} KVM: x86/mmu: Allocate numa aware page tables during page fault KVM: x86/mmu: Allocate shadow mmu page table on huge page split on the same NUMA node KVM: x86/mmu: Reduce default mmu memory cache size Documentation/virt/kvm/api.rst | 29 +++ arch/arm64/kvm/arm.c | 2 +- arch/arm64/kvm/mmu.c | 2 +- arch/mips/kvm/mips.c | 3 + arch/riscv/kvm/mmu.c | 8 +- arch/riscv/kvm/vcpu.c | 2 +- arch/x86/include/asm/kvm_host.h | 17 +- arch/x86/include/asm/kvm_types.h | 6 +- arch/x86/kvm/mmu/mmu.c | 319 +++++++++++++++++++------------ arch/x86/kvm/mmu/mmu_internal.h | 38 ++++ arch/x86/kvm/mmu/paging_tmpl.h | 29 +-- arch/x86/kvm/mmu/tdp_mmu.c | 23 ++- arch/x86/kvm/x86.c | 18 +- include/linux/kvm_host.h | 2 + include/linux/kvm_types.h | 21 ++ include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 24 ++- 17 files changed, 386 insertions(+), 158 deletions(-) -- 2.40.0.rc0.216.gc4246ad0f0-goog