Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp5145278iog; Wed, 22 Jun 2022 13:01:30 -0700 (PDT) X-Google-Smtp-Source: AGRyM1stAZQPzPTT1wtfLRN8t4JBjAzuFK4yTwo8Y0xf+XTgr4DVdKwXhQdMYnjY5wXmAMZQsmKa X-Received: by 2002:a63:6a49:0:b0:3fd:df6d:5ba3 with SMTP id f70-20020a636a49000000b003fddf6d5ba3mr4318841pgc.385.1655928089919; Wed, 22 Jun 2022 13:01:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655928089; cv=none; d=google.com; s=arc-20160816; b=MBBlyyTMAQDSSR7ayD5Z7G/r2coG83npDyBxEvhq6ikez8At78qeKbJNPaMgr1rflA RFiE0WhdjGeQtFyluZ/v0hQGnIxVsNeWbifPKJdTMOX2VMSZFv3pftIiv2t9Ly8tLmy2 dTrCHluLLyfnqJ/eJzH6+IL3k3NURTkk62b/xGkYFic2w8wgMRqoJZITQHNAMc97O2Yu /4Pq+w14yk0q9C16d5JRMWeGqzTltSZrR7IwoMUeucKJgyO54BKgdB+22ahoL6SM8D1D FW6KVY4GXt7BE2o9hAb7CX76ynSTozB1t3NtKPiMui/RIbDNWIgtk2nZxXi78gImzQh5 6rHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=HLI5RyJrlXr63VfLpHMPS2LJh+90nXFqmq+5GSxiDY8=; b=T9o0XHvKptyWsxDaGTz9NQdSFZ4svon1KelBRSh9FTdvm6cUIVRpSMcazoXUHa1eAo pFZKogSg0NLA4KsiORq3RqNuAeL5Unts9XPN45AN6AenzAV7EHECud1MyR343WJcxfhZ M1JT8DDpaoR8iansF4CQjSQk8GsiRSBpdFDbQ/oUteJzMvroNTu/VZ+WzHCgioPWmJi0 I9w/xPgfUono5Zzdjvropzl4PtoPNZT9H0Dt8uJKADLs/5oQstnbmapqDGUD7oQ5dQML Nd7iztaq+lihp46UUEttYv1JqrmVNb8Rf+q7W/NK6BF9TEDp0QNApkkcuKlgjDNJO93d NFyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=R8Y5kNQX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s35-20020a634523000000b003fcc2ae9ce4si24615699pga.390.2022.06.22.13.01.15; Wed, 22 Jun 2022 13:01:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=R8Y5kNQX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1358433AbiFVT1j (ORCPT + 99 others); Wed, 22 Jun 2022 15:27:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242604AbiFVT1X (ORCPT ); Wed, 22 Jun 2022 15:27:23 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 962E73A5E0 for ; Wed, 22 Jun 2022 12:27:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1655926035; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HLI5RyJrlXr63VfLpHMPS2LJh+90nXFqmq+5GSxiDY8=; b=R8Y5kNQXb5C/S3JYYvKMv84gpC/jMTkGcfmAT7gpH6cxYkrffp25zagNCw+MYnN7lWp9va KwVbmhgr4oyKSKvG34y6dLn4CGSfgi8N1oOoOUeTQg+hAYUr1FfFDWU+hIy7OIccJKh3D9 CA/VplTtMgR2Xt+8ysZfI8Zxz/Llz00= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-91-_kmsAk5qNnqXV5OGlfZs9Q-1; Wed, 22 Jun 2022 15:27:13 -0400 X-MC-Unique: _kmsAk5qNnqXV5OGlfZs9Q-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B8F031C006AA; Wed, 22 Jun 2022 19:27:12 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 628C31121315; Wed, 22 Jun 2022 19:27:12 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: maz@kernel.org, anup@brainfault.org, seanjc@google.com, bgardon@google.com, peterx@redhat.com, maciej.szmigiero@oracle.com, kvmarm@lists.cs.columbia.edu, linux-mips@vger.kernel.org, kvm-riscv@lists.infradead.org, pfeiner@google.com, jiangshanlai@gmail.com, dmatlack@google.com Subject: [PATCH v7 04/23] KVM: x86/mmu: Derive shadow MMU page role from parent Date: Wed, 22 Jun 2022 15:26:51 -0400 Message-Id: <20220622192710.2547152-5-pbonzini@redhat.com> In-Reply-To: <20220622192710.2547152-1-pbonzini@redhat.com> References: <20220622192710.2547152-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: David Matlack Instead of computing the shadow page role from scratch for every new page, derive most of the information from the parent shadow page. This eliminates the dependency on the vCPU root role to allocate shadow page tables, and reduces the number of parameters to kvm_mmu_get_page(). Preemptively split out the role calculation to a separate function for use in a following commit. Note that when calculating the MMU root role, we can take @role.passthrough, @role.direct, and @role.access directly from @vcpu->arch.mmu->root_role. Only @role.level and @role.quadrant still must be overridden for PAE page directories, when shadowing 32-bit guest page tables with PAE page tables. No functional change intended. Reviewed-by: Peter Xu Signed-off-by: David Matlack Message-Id: <20220516232138.1783324-5-dmatlack@google.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 114 +++++++++++++++++++-------------- arch/x86/kvm/mmu/paging_tmpl.h | 9 +-- 2 files changed, 71 insertions(+), 52 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2e30398fe59f..fd1b479bf7fc 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1993,49 +1993,15 @@ static void clear_sp_write_flooding_count(u64 *spte) __clear_sp_write_flooding_count(sptep_to_sp(spte)); } -static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, - gfn_t gfn, - gva_t gaddr, - unsigned level, - bool direct, - unsigned int access) +static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gfn_t gfn, + union kvm_mmu_page_role role) { - union kvm_mmu_page_role role; struct hlist_head *sp_list; - unsigned quadrant; struct kvm_mmu_page *sp; int ret; int collisions = 0; LIST_HEAD(invalid_list); - role = vcpu->arch.mmu->root_role; - role.level = level; - role.direct = direct; - role.access = access; - if (role.has_4_byte_gpte) { - /* - * If the guest has 4-byte PTEs then that means it's using 32-bit, - * 2-level, non-PAE paging. KVM shadows such guests with PAE paging - * (i.e. 8-byte PTEs). The difference in PTE size means that KVM must - * shadow each guest page table with multiple shadow page tables, which - * requires extra bookkeeping in the role. - * - * Specifically, to shadow the guest's page directory (which covers a - * 4GiB address space), KVM uses 4 PAE page directories, each mapping - * 1GiB of the address space. @role.quadrant encodes which quarter of - * the address space each maps. - * - * To shadow the guest's page tables (which each map a 4MiB region), KVM - * uses 2 PAE page tables, each mapping a 2MiB region. For these, - * @role.quadrant encodes which half of the region they map. - */ - quadrant = gaddr >> (PAGE_SHIFT + (SPTE_LEVEL_BITS * level)); - quadrant &= (1 << level) - 1; - role.quadrant = quadrant; - } - if (level <= vcpu->arch.mmu->cpu_role.base.level) - role.passthrough = 0; - sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]; for_each_valid_sp(vcpu->kvm, sp, sp_list) { if (sp->gfn != gfn) { @@ -2053,7 +2019,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, * Unsync pages must not be left as is, because the new * upper-level page will be write-protected. */ - if (level > PG_LEVEL_4K && sp->unsync) + if (role.level > PG_LEVEL_4K && sp->unsync) kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list); continue; @@ -2094,14 +2060,14 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, ++vcpu->kvm->stat.mmu_cache_miss; - sp = kvm_mmu_alloc_page(vcpu, direct); + sp = kvm_mmu_alloc_page(vcpu, role.direct); sp->gfn = gfn; sp->role = role; hlist_add_head(&sp->hash_link, sp_list); if (sp_has_gptes(sp)) { account_shadowed(vcpu->kvm, sp); - if (level == PG_LEVEL_4K && kvm_vcpu_write_protect_gfn(vcpu, gfn)) + if (role.level == PG_LEVEL_4K && kvm_vcpu_write_protect_gfn(vcpu, gfn)) kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, 1); } trace_kvm_mmu_get_page(sp, true); @@ -2113,6 +2079,55 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, return sp; } +static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct, unsigned int access) +{ + struct kvm_mmu_page *parent_sp = sptep_to_sp(sptep); + union kvm_mmu_page_role role; + + role = parent_sp->role; + role.level--; + role.access = access; + role.direct = direct; + role.passthrough = 0; + + /* + * If the guest has 4-byte PTEs then that means it's using 32-bit, + * 2-level, non-PAE paging. KVM shadows such guests with PAE paging + * (i.e. 8-byte PTEs). The difference in PTE size means that KVM must + * shadow each guest page table with multiple shadow page tables, which + * requires extra bookkeeping in the role. + * + * Specifically, to shadow the guest's page directory (which covers a + * 4GiB address space), KVM uses 4 PAE page directories, each mapping + * 1GiB of the address space. @role.quadrant encodes which quarter of + * the address space each maps. + * + * To shadow the guest's page tables (which each map a 4MiB region), KVM + * uses 2 PAE page tables, each mapping a 2MiB region. For these, + * @role.quadrant encodes which half of the region they map. + * + * Note, the 4 PAE page directories are pre-allocated and the quadrant + * assigned in mmu_alloc_root(). So only page tables need to be handled + * here. + */ + if (role.has_4_byte_gpte) { + WARN_ON_ONCE(role.level != PG_LEVEL_4K); + role.quadrant = (sptep - parent_sp->spt) % 2; + } + + return role; +} + +static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, + u64 *sptep, gfn_t gfn, + bool direct, unsigned int access) +{ + union kvm_mmu_page_role role; + + role = kvm_mmu_child_role(sptep, direct, access); + return kvm_mmu_get_page(vcpu, gfn, role); +} + static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator, struct kvm_vcpu *vcpu, hpa_t root, u64 addr) @@ -2964,8 +2979,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) if (is_shadow_present_pte(*it.sptep)) continue; - sp = kvm_mmu_get_page(vcpu, base_gfn, it.addr, - it.level - 1, true, ACC_ALL); + sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, ACC_ALL); link_shadow_page(vcpu, it.sptep, sp); if (fault->is_tdp && fault->huge_page_disallowed && @@ -3368,13 +3382,18 @@ static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn) return ret; } -static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva, +static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, u8 level) { - bool direct = vcpu->arch.mmu->root_role.direct; + union kvm_mmu_page_role role = vcpu->arch.mmu->root_role; struct kvm_mmu_page *sp; - sp = kvm_mmu_get_page(vcpu, gfn, gva, level, direct, ACC_ALL); + role.level = level; + + if (role.has_4_byte_gpte) + role.quadrant = quadrant; + + sp = kvm_mmu_get_page(vcpu, gfn, role); ++sp->root_count; return __pa(sp->spt); @@ -3408,8 +3427,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) for (i = 0; i < 4; ++i) { WARN_ON_ONCE(IS_VALID_PAE_ROOT(mmu->pae_root[i])); - root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT), - i << 30, PT32_ROOT_LEVEL); + root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT), i, + PT32_ROOT_LEVEL); mmu->pae_root[i] = root | PT_PRESENT_MASK | shadow_me_value; } @@ -3578,8 +3597,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) root_gfn = pdptrs[i] >> PAGE_SHIFT; } - root = mmu_alloc_root(vcpu, root_gfn, i << 30, - PT32_ROOT_LEVEL); + root = mmu_alloc_root(vcpu, root_gfn, i, PT32_ROOT_LEVEL); mmu->pae_root[i] = root | pm_mask; } diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index e4655056e651..6ecdd7a41a82 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -654,8 +654,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, if (!is_shadow_present_pte(*it.sptep)) { table_gfn = gw->table_gfn[it.level - 2]; access = gw->pt_access[it.level - 2]; - sp = kvm_mmu_get_page(vcpu, table_gfn, fault->addr, - it.level-1, false, access); + sp = kvm_mmu_get_child_sp(vcpu, it.sptep, table_gfn, + false, access); + /* * We must synchronize the pagetable before linking it * because the guest doesn't need to flush tlb when @@ -711,8 +712,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, drop_large_spte(vcpu, it.sptep); if (!is_shadow_present_pte(*it.sptep)) { - sp = kvm_mmu_get_page(vcpu, base_gfn, fault->addr, - it.level - 1, true, direct_access); + sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, + true, direct_access); link_shadow_page(vcpu, it.sptep, sp); if (fault->huge_page_disallowed && fault->req_level >= it.level) -- 2.31.1