Received: by 2002:ac0:e350:0:0:0:0:0 with SMTP id g16csp398978imn; Wed, 3 Aug 2022 08:34:11 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tmrcoXgdAF+oPW7a9Ibchgop1VY1xZHFQjvtNjl2wViHh3o0A2FLMTbqfj5h2QGhRGoU8X X-Received: by 2002:a17:907:2807:b0:72b:4530:29d5 with SMTP id eb7-20020a170907280700b0072b453029d5mr21418812ejc.69.1659540850933; Wed, 03 Aug 2022 08:34:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659540850; cv=none; d=google.com; s=arc-20160816; b=N4gnKF4xZCHQekrNiP0/B4JWZbbrhdYql9FLtgF6+pLFuFQc+7o3kshSW59k3FCWyF iePas4oZ8s2ocuFqutu5Xz2LVcqmGqshHlaywrVjFgLutrryarH4+2bp2CDlcZ5Zdbvz DhlQkuOewVSaRKiOewksWzAhFNthE4smS4iT2mdJFwa1idjHTaAQkg5lKoHdg/hAhMRZ pz27dkisxtEHnsch6uWv9NkSYDr1VLKFuIKxN41jf4cxeCl3PHb0Mgr8V3l/2QPeIpn2 LncLpWLc2K6CEpVoMDHCCRScQcu8KOB54eOMPSlW2GUB3Pgy7zYZqZsaZF63k9SVwopx yWzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=TEJipl6ZEklwILT6wRjOyBYSkCe6bZ/w1fkrBbq5Vog=; b=harrfjnsdsZaxi+uobDGO+XQ5c1+l4sgdlK5GxTdP1xk0QEmEkxrw0yAEWTe/96uHA KHMYPog4qJvr/V64IMzgCG6JVUNBizfYIPKY3vsSNtOp6icSK5iW1OTdzSfVp0ynwltz IsTHNmdjPLXMAYQpKCpzSDkzAJSThBq8M3fhmk0a7DLGtauDKSVrNYyjN9MDpC1bV+lc kxJH+l18PC2F33csMShqc1qlAIW9FAtPiMOYnTIChAy4ldktCLtu0Krfy4YYG4/R55Fb TgTYjmduOkf9hihHmPwpw32EtzaQf8iJfDxMypdETrBsO1ob/bZuh1qszBzWLFighb+/ BZlQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FaQ958NJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ji6-20020a170907980600b00730659d28f9si6849736ejc.828.2022.08.03.08.33.46; Wed, 03 Aug 2022 08:34:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FaQ958NJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237721AbiHCPIg (ORCPT + 99 others); Wed, 3 Aug 2022 11:08:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237143AbiHCPId (ORCPT ); Wed, 3 Aug 2022 11:08:33 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B29D32FFFB for ; Wed, 3 Aug 2022 08:08:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659539311; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TEJipl6ZEklwILT6wRjOyBYSkCe6bZ/w1fkrBbq5Vog=; b=FaQ958NJAkKp3fZeGnNqr4RulEHN1jg2Q3AP3uywSQalfkglVEtHUZKQjhZKb073wvLKNi 4Wy0HN8R/AveFXBrbGyEm9tZjJb/mYKKArvq+D6tf1q1x4+j9xBqzFepIVq6jjqDK8mtsR Goci1RRN9KRZE7LLb9X3aPKPQ5/vqnU= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-222-mkuSyfzJOyiRyqCAwckIzg-1; Wed, 03 Aug 2022 11:08:30 -0400 X-MC-Unique: mkuSyfzJOyiRyqCAwckIzg-1 Received: by mail-wm1-f70.google.com with SMTP id 18-20020a05600c029200b003a500b612e2so279725wmk.9 for ; Wed, 03 Aug 2022 08:08:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc; bh=TEJipl6ZEklwILT6wRjOyBYSkCe6bZ/w1fkrBbq5Vog=; b=yNopLA8U3hktUmvaZToM5iCsHWj2p3nVpUxxnMd109mz+KCkVRRCV79VDlSlWwXPZ0 6BZtQE9uReNCu2NXcwByq9J1ANGEcMuGh82jJHu/oPwPOIdT4Yxi/iPNZ6pWdNp4soON z4duVc+P/uzwUA3xmO8eDlNVab+Mde6uZz3yOwjye/ig7qB83dYshzZdBEwnF+SKV4hF AQ8iS6Qdn421fi1ayYx8ldxGSg1drJTuM/DceL0H39WegxqD6+X7iL5fS9gz59+9w1qq BLQdL/R9iOdlL79W38FoT8eCB/h/51R3dwkt4f5JFtSlX8Yt3BGneaAjcmE4nVPThNX2 oy+Q== X-Gm-Message-State: ACgBeo3TIZTDc6+G7/JU7VNM+eZEKZGPKUpWTha2cHK6lsop3SS9KC0q k1clLj9hNNQT88N/qNQ6Waqge8IE1jticRQ+QWZTL6HzTinQxBHlsMyJLQqBsMcQFltliWHOgg8 n4hfCns3sSPHUNJEpg7tOXvEj X-Received: by 2002:a7b:ca48:0:b0:3a3:365d:1089 with SMTP id m8-20020a7bca48000000b003a3365d1089mr3303519wml.153.1659539308548; Wed, 03 Aug 2022 08:08:28 -0700 (PDT) X-Received: by 2002:a7b:ca48:0:b0:3a3:365d:1089 with SMTP id m8-20020a7bca48000000b003a3365d1089mr3303490wml.153.1659539308168; Wed, 03 Aug 2022 08:08:28 -0700 (PDT) Received: from ?IPV6:2001:b07:add:ec09:c399:bc87:7b6c:fb2a? ([2001:b07:add:ec09:c399:bc87:7b6c:fb2a]) by smtp.googlemail.com with ESMTPSA id s1-20020adfea81000000b0021e74ef5ae8sm18333208wrm.21.2022.08.03.08.08.26 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 03 Aug 2022 08:08:27 -0700 (PDT) Message-ID: Date: Wed, 3 Aug 2022 17:08:25 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH] KVM: x86/mmu: Make page tables for eager page splitting NUMA aware Content-Language: en-US To: Sean Christopherson Cc: David Matlack , Vipin Sharma , kvm@vger.kernel.org, linux-kernel@vger.kernel.org References: <20220801151928.270380-1-vipinsh@google.com> <4ccbafb5-9157-ec73-c751-ec71164f8688@redhat.com> From: Paolo Bonzini In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/2/22 21:12, Sean Christopherson wrote: > Ah crud, I misread the patch. I was thinking tdp_mmu_spte_to_nid() was getting > the node for the existing shadow page, but it's actually getting the node for the > underlying physical huge page. > > That means this patch is broken now that KVM doesn't require a "struct page" to > create huge SPTEs (commit a8ac499bb6ab ("KVM: x86/mmu: Don't require refcounted > "struct page" to create huge SPTEs"). I.e. this will explode as pfn_to_page() > may return garbage. > > return page_to_nid(pfn_to_page(spte_to_pfn(spte))); I was about to say that yesterday. However my knowledge of struct page things has been proved to be spotty enough, that I wasn't 100% sure of that. But yeah, with a fresh mind it's quite obvious that anything that goes through hva_to_pfn_remap and similar paths will fail. > That said, I still don't like this patch, at least not as a one-off thing. I do > like the idea of having page table allocations follow the node requested for the > target pfn, what I don't like is having one-off behavior that silently overrides > userspace policy. Yes, I totally agree with making it all or nothing. > I would much rather we solve the problem for all page table allocations, maybe > with a KVM_CAP or module param? Unfortunately, that's a very non-trivial change > because it will probably require having a per-node cache in each of the per-vCPU > caches. A module parameter is fine. If you care about performance to this level, your userspace is probably homogeneous enough. Paolo > Hmm, actually, a not-awful alternative would be to have the fault path fallback > to the current task's policy if allocation fails. I.e. make it best effort. > > E.g. as a rough sketch... > > --- > arch/x86/kvm/mmu/tdp_mmu.c | 37 ++++++++++++++++++++++++++++++------- > 1 file changed, 30 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index bf2ccf9debca..e475f5b55794 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -273,10 +273,11 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, > > static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu) > { > + struct kvm_mmu_memory_cache *cache = &vcpu->arch.mmu_shadow_page_cache; > struct kvm_mmu_page *sp; > > sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache); > - sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache); > + sp->spt = kvm_mmu_alloc_shadow_page_table(cache, GFP_NOWAIT, pfn); > > return sp; > } > @@ -1190,7 +1191,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > if (is_removed_spte(iter.old_spte)) > break; > > - sp = tdp_mmu_alloc_sp(vcpu); > + sp = tdp_mmu_alloc_sp(vcpu, fault->pfn); > tdp_mmu_init_child_sp(sp, &iter); > > if (tdp_mmu_link_sp(vcpu->kvm, &iter, sp, account_nx, true)) { > @@ -1402,17 +1403,39 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, > return spte_set; > } > > -static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp) > +void *kvm_mmu_alloc_shadow_page_table(struct kvm_mmu_memory_cache *cache, > + gfp_t gfp, kvm_pfn_t pfn) > +{ > + struct page *page = kvm_pfn_to_refcounted_page(pfn); > + struct page *spt_page; > + int node; > + > + gfp |= __GFP_ZERO | __GFP_ACCOUNT; > + > + if (page) { > + spt_page = alloc_pages_node(page_to_nid(page), gfp, 0); > + if (spt_page) > + return page_address(spt_page); > + } else if (!cache) { > + return (void *)__get_free_page(gfp); > + } > + > + if (cache) > + return kvm_mmu_memory_cache_alloc(cache); > + > + return NULL; > +} > + > +static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp, > + kvm_pfn_t pfn) > { > struct kvm_mmu_page *sp; > > - gfp |= __GFP_ZERO; > - > sp = kmem_cache_alloc(mmu_page_header_cache, gfp); > if (!sp) > return NULL; > > - sp->spt = (void *)__get_free_page(gfp); > + sp->spt = kvm_mmu_alloc_shadow_page_table(NULL, gfp, pfn); > if (!sp->spt) { > kmem_cache_free(mmu_page_header_cache, sp); > return NULL; > @@ -1436,7 +1459,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, > * If this allocation fails we drop the lock and retry with reclaim > * allowed. > */ > - sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT); > + sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT); > if (sp) > return sp; > > > base-commit: f8990bfe1eab91c807ca8fc0d48705e8f986b951 > -- >