Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp6345300rwb; Mon, 5 Dec 2022 10:58:49 -0800 (PST) X-Google-Smtp-Source: AA0mqf6wdKrKq1c/3Xk4Ozio1x380gygOE9XUf/oMvi4HnNZ5SURBXsd2i6WeKRmTTnkP/aQdgAx X-Received: by 2002:a17:906:1593:b0:7c0:8fde:dffb with SMTP id k19-20020a170906159300b007c08fdedffbmr22424834ejd.493.1670266728947; Mon, 05 Dec 2022 10:58:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670266728; cv=none; d=google.com; s=arc-20160816; b=SFaoouwPEX02xmyHnf5BUd8qtgqtfb/mISf4CBPO1r6y5YCE5K+94+4ClzViTGes8+ NwNND5kWED2g9+KJBduva0QwbGyatJClA7xaqsQRvzZSopE6AwTtdMIPQhEMg7BquX9I yZgkpIo8RWCNwx9M1s55/n3lqzxKLGDirjn/9QGULcgMWTbb2A6P/DLZ+nu/22WmFFUG 9gxRJx7bzDKNZJ0fL50gxH9lFunprCn+0nbRXQeXMYh9bURudMkVf+7/kA5q/RdDnkeU nfMRlMObxSr1xkGE5Pzr8GR2p49E1sWxLrXruJHPqaLjFYS6Q4GQ0a/Qf+E78T4ql1Ko zETQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=XB10lz0QsRjxLo2pbbY9hjkEwtxi6QdcCS8KNqwdFjM=; b=oyPntEhADA6/V7pBEjH/JeRFEMhzLLUPowjvZjLKJNnfZW+fKno3Rb90/iC5kbwmrz FSVLq1hsPkNAo3Iu8wEgQ+ArNmaQgA/ooWErtEQfF+Ohv9J7aXGv+kvCbgN/ohyPEcrh K6180grripiVGH2hMkZWma4HkNG9koWJcRVaTQSFXSj2N87OBrT0S41+FsBuGm24FE4a lNCYTH6HW7VwNZbMGq5sQJ4cN9Dl3/onDsE1HqYKiD9s+CZ4T54DRpoBiY9YvT6VPsOQ SMEqt8C9xNjyrv+jV5TJPGsKNgxL28esDAywXX6Y8Z14q/DExAXoRoCZRGsrJ0Y16dZ9 XKEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=srFFNGm0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c8-20020a05640227c800b00462848f0cdbsi232929ede.299.2022.12.05.10.58.29; Mon, 05 Dec 2022 10:58:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=srFFNGm0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231921AbiLESvh (ORCPT + 80 others); Mon, 5 Dec 2022 13:51:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230035AbiLESvg (ORCPT ); Mon, 5 Dec 2022 13:51:36 -0500 Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDCEF1C426 for ; Mon, 5 Dec 2022 10:51:34 -0800 (PST) Received: by mail-qv1-xf29.google.com with SMTP id q10so8815070qvt.10 for ; Mon, 05 Dec 2022 10:51:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=XB10lz0QsRjxLo2pbbY9hjkEwtxi6QdcCS8KNqwdFjM=; b=srFFNGm0Vqfsjysj03v3upOR/SxH0q3SyRMSCSv5leOyvmjpHcGdYZpBucA2Yg8NSU /M74vYI6eIubRN3KThGKGsxjamCtXZgFrs8sMNC5BEsS0dHC97uIhJpQLafrt4zixZEz FT1YT+pFHm2q7Q0ddFjI6gp32DLIUtbckL8+qF1ST2xUJfmXOpBkQFjj2aK+r2W5NjjB 78E8vIFyjmvB1d970w3rtv0wGhbbB7S83YEx9NpzF+gDFn5C2hOn9+Plm0HIaWQAjenQ Y+UV6XkEt8xjzMciiN4a+g7D7amdD2+lLG9+WWiJyhebbroVc6vHlThTRZE/O9WCuS5U rfBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=XB10lz0QsRjxLo2pbbY9hjkEwtxi6QdcCS8KNqwdFjM=; b=OmnaKwMTrs7SxT6WjSjbqWzGrJRqy4WvK+G+PT8lFE6qJzrm2cJ8uZERPGZHkOYcBv ldz6NCgVfcjamClT0Y/dgfcNW7ssZE8yL6QyH8PqxyMzVDdA6isw44aMDXojXqKPoePY rt9PoBX/jR+pKAtHK50Z1AFcmdLPjThwtoBZP0jSLLRmzLQjZUiNqkzSyxGq7UsDeM89 oKm3knxLdyalifpq8DSv13WFM1qXyjzXGmPg3tiYCd7Jo/SOxYkdFOMV4M7b6V6xM/cm DAw0MY5h72aR5fuFTsoido3XkITM4x5ysztXQ4vhx3XYUGI4gjtNHr0ISoAXbNOBFG/r FlRw== X-Gm-Message-State: ANoB5pnXuhgbxA8bUBROtgqDF3Ir51utQDegJS+TACBB6bYMrqTCQJM+ uHh12/Md/RT7VeGJ+2Sg+oN7+4905pWLsVrl+l9+E31f7bpGH+fY X-Received: by 2002:a0c:e6a9:0:b0:4bb:892a:fc11 with SMTP id j9-20020a0ce6a9000000b004bb892afc11mr58693006qvn.28.1670266293960; Mon, 05 Dec 2022 10:51:33 -0800 (PST) MIME-Version: 1.0 References: <20221201195718.1409782-1-vipinsh@google.com> <20221201195718.1409782-2-vipinsh@google.com> In-Reply-To: From: Ben Gardon Date: Mon, 5 Dec 2022 10:51:23 -0800 Message-ID: Subject: Re: [Patch v2 1/2] KVM: x86/mmu: Allocate page table pages on TDP splits during dirty log enable on the underlying page's numa node To: Sean Christopherson Cc: Vipin Sharma , dmatlack@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 5, 2022 at 10:47 AM Sean Christopherson wrote: > > Side topic, the shortlog is way, way too long. The purpose of the shortlog is > to provide a synopsis of the change, not to describe the change in detail. > > I also think this patch should be 2/2, with the more generic support added along > with the module param (or capability) in 1/2. E.g. to yield something like > > KVM: x86/mmu: Add a module param to make per-vCPU SPTs NUMA aware > KVM: x86/mmu: Honor NUMA awareness for per-VM page table allocations > > On Mon, Dec 05, 2022, Ben Gardon wrote: > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index 4736d7849c60..0554dfc55553 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -90,6 +90,9 @@ __MODULE_PARM_TYPE(nx_huge_pages_recovery_period_ms, "uint"); > > > static bool __read_mostly force_flush_and_sync_on_reuse; > > > module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644); > > > > > > +static bool __read_mostly numa_aware_pagetable = true; > > > +module_param_named(numa_aware_pagetable, numa_aware_pagetable, bool, 0644); > > > + > > > > I'm usually all for having module params to control things, but in > > this case I don't think it provides much value because whether this > > NUMA optimization is useful or not is going to depend more on VM size > > and workload than anything else. If we wanted to make this > > configurable, a VM capability would probably be a better mechanism so > > that userspace could leave it off when running small, > > non-performance-sensitive VMs > > Would we actually want to turn it off in this case? IIUC, @nid is just the > preferred node, i.e. failure to allocate for the preferred @nid will result in > falling back to other nodes, not outright failure. So the pathological worst > case scenario would be that for a system with VMs that don't care about performance, > all of a nodes memory is allocated due to all VMs starting on that node. > > On the flip side, if a system had a mix of VM shapes, I think we'd want even the > performance insensitive VMs to be NUMA aware so that they can be sequestered on > their own node(s), i.e. don't "steal" memory from the VMs that are performance > sensitive and have been affined to a single node. Yeah, the only reason to turn it off would be to save memory. As a strawman, if you had 100 1-vCPU VMs on a 2 node system, you'd have 4000 pages allocated in caches, doing nothing. > > > and turn it on when running large, multi-node VMs. A whole-host module > > parameter seems overly restrictive.