Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp5137409iog; Wed, 22 Jun 2022 12:48:42 -0700 (PDT) X-Google-Smtp-Source: AGRyM1v1vz+WOQFdBzBXyA+6lNEiKgSZ4xmRNGFVMw3RuV/URgUkIkj5f9DHMDGqt8BeVRMq/RnL X-Received: by 2002:a17:907:6f07:b0:715:7e23:bbb7 with SMTP id sy7-20020a1709076f0700b007157e23bbb7mr4570204ejc.462.1655927321848; Wed, 22 Jun 2022 12:48:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655927321; cv=none; d=google.com; s=arc-20160816; b=vLUQUoPsfwhGh9lwlhrYV7L03Oat6ggcyvCD+ou+WTxODpLzN/hgfiAzokBENh8i+1 DcGRhRaaKrIeUm7pPM0lpa0J+oEXHwoN13uDTwFMcaLCsW2hA7SkKRI2xQxgtxnqsEwA iT6ESFVxJRgFZaRr6ZJ1mjhEaFcxyrQnEtXizSRcbvPErIEDqANLc+eIf5pJvw2DtXrl diZjROXHctDxGWR+ipqbAFd7DrqsnuGNtp1zko1MDYfBFpPn5KFfPLpIq9nCBWl3zmpE IZnHrALBVat3JtDoAhS2T4LlycvnSVey77oe62IdZ2b4CCYMoDy1w+iqIDGG7vjMeNd/ Jcvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=RKFUL486E2CfSNwUsRAg1P2Cyur/2axyukd6UEqggTk=; b=0oF1lL3fY2S3BP8uxdhyOMTTZOPONGsYGOcIG413axBy5vUpddvXxPnhAGYnpmPEzE fNpjzUmep2Ill1F0vuENU0SjSScFsG5KojSrp1r6MQJY6dQ+7I7wdsXQg9E4JmHYFCv8 RQR5NyEj/n4sr1nTMADxm4lrr53TGKGzTMh6h0qM2pNDVzvL8KzoyiSUMvuvhU89vKkO TPxzJvxQEnTKxY23RzSU3Q1f8bavoarZFYT8jqEggqTyBy+PsHqWRmnbLs+R2ykKnS9w GJDHzPxDk5yToS0cZd53UJwEDegRXVcWORKHmVmM5s4Bmm+98JPwB7iJb7vw2daF904C IViQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=S2CN1RYB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h7-20020a170906530700b00718c1dc9107si118504ejo.159.2022.06.22.12.48.16; Wed, 22 Jun 2022 12:48:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=S2CN1RYB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1358275AbiFVT30 (ORCPT + 99 others); Wed, 22 Jun 2022 15:29:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357783AbiFVT12 (ORCPT ); Wed, 22 Jun 2022 15:27:28 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id F0AD83EA9D for ; Wed, 22 Jun 2022 12:27:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1655926041; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RKFUL486E2CfSNwUsRAg1P2Cyur/2axyukd6UEqggTk=; b=S2CN1RYB+UFxaOlZHmqOZ8GY68EFwvsPmUJa/EZICKPt45rBO5YpQoSgfAy2tMEmqVkiZg PDDOYDrBPHv3E/DCxY7kye2h9PrxdeD4S/i4TwgsnG1glYGPtwrpD9TEnrGs/UsnXA/2pe 3dYdkFVrss1duGqhiP6sHWR4a5Z316g= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-663-RZaNGeAbNNO7JBKVpYhBIQ-1; Wed, 22 Jun 2022 15:27:18 -0400 X-MC-Unique: RZaNGeAbNNO7JBKVpYhBIQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E95E3294EDC5; Wed, 22 Jun 2022 19:27:17 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 94D63141510C; Wed, 22 Jun 2022 19:27:17 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: maz@kernel.org, anup@brainfault.org, seanjc@google.com, bgardon@google.com, peterx@redhat.com, maciej.szmigiero@oracle.com, kvmarm@lists.cs.columbia.edu, linux-mips@vger.kernel.org, kvm-riscv@lists.infradead.org, pfeiner@google.com, jiangshanlai@gmail.com, dmatlack@google.com Subject: [PATCH v7 17/23] KVM: x86/mmu: Cache the access bits of shadowed translations Date: Wed, 22 Jun 2022 15:27:04 -0400 Message-Id: <20220622192710.2547152-18-pbonzini@redhat.com> In-Reply-To: <20220622192710.2547152-1-pbonzini@redhat.com> References: <20220622192710.2547152-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: David Matlack Splitting huge pages requires allocating/finding shadow pages to replace the huge page. Shadow pages are keyed, in part, off the guest access permissions they are shadowing. For fully direct MMUs, there is no shadowing so the access bits in the shadow page role are always ACC_ALL. But during shadow paging, the guest can enforce whatever access permissions it wants. In particular, eager page splitting needs to know the permissions to use for the subpages, but KVM cannot retrieve them from the guest page tables because eager page splitting does not have a vCPU. Fortunately, the guest access permissions are easy to cache whenever page faults or FNAME(sync_page) update the shadow page tables; this is an extension of the existing cache of the shadowed GFNs in the gfns array of the shadow page. The access bits only take up 3 bits, which leaves 61 bits left over for gfns, which is more than enough. Now that the gfns array caches more information than just GFNs, rename it to shadowed_translation. While here, preemptively fix up the WARN_ON() that detects gfn mismatches in direct SPs. The WARN_ON() was paired with a pr_err_ratelimited(), which means that users could sometimes see the WARN without the accompanying error message. Fix this by outputting the error message as part of the WARN splat, and opportunistically make them WARN_ONCE() because if these ever fire, they are all but guaranteed to fire a lot and will bring down the kernel. Signed-off-by: David Matlack Message-Id: <20220516232138.1783324-18-dmatlack@google.com> Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/mmu/mmu.c | 85 +++++++++++++++++++++++---------- arch/x86/kvm/mmu/mmu_internal.h | 17 ++++++- arch/x86/kvm/mmu/paging_tmpl.h | 9 +++- 4 files changed, 84 insertions(+), 29 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 7e4c31b57a75..64efe8c90c31 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -713,7 +713,7 @@ struct kvm_vcpu_arch { struct kvm_mmu_memory_cache mmu_pte_list_desc_cache; struct kvm_mmu_memory_cache mmu_shadow_page_cache; - struct kvm_mmu_memory_cache mmu_gfn_array_cache; + struct kvm_mmu_memory_cache mmu_shadowed_info_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; /* diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7cca28d89a85..13a059ad5dc7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -656,7 +656,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect) if (r) return r; if (maybe_indirect) { - r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_gfn_array_cache, + r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadowed_info_cache, PT64_ROOT_MAX_LEVEL); if (r) return r; @@ -669,7 +669,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) { kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache); - kvm_mmu_free_memory_cache(&vcpu->arch.mmu_gfn_array_cache); + kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache); } @@ -678,34 +678,68 @@ static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc) kmem_cache_free(pte_list_desc_cache, pte_list_desc); } +static bool sp_has_gptes(struct kvm_mmu_page *sp); + static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index) { if (sp->role.passthrough) return sp->gfn; if (!sp->role.direct) - return sp->gfns[index]; + return sp->shadowed_translation[index] >> PAGE_SHIFT; return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS)); } -static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn) +/* + * For leaf SPTEs, fetch the *guest* access permissions being shadowed. Note + * that the SPTE itself may have a more constrained access permissions that + * what the guest enforces. For example, a guest may create an executable + * huge PTE but KVM may disallow execution to mitigate iTLB multihit. + */ +static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index) { - if (sp->role.passthrough) { - WARN_ON_ONCE(gfn != sp->gfn); - return; - } + if (sp_has_gptes(sp)) + return sp->shadowed_translation[index] & ACC_ALL; - if (!sp->role.direct) { - sp->gfns[index] = gfn; + /* + * For direct MMUs (e.g. TDP or non-paging guests) or passthrough SPs, + * KVM is not shadowing any guest page tables, so the "guest access + * permissions" are just ACC_ALL. + * + * For direct SPs in indirect MMUs (shadow paging), i.e. when KVM + * is shadowing a guest huge page with small pages, the guest access + * permissions being shadowed are the access permissions of the huge + * page. + * + * In both cases, sp->role.access contains the correct access bits. + */ + return sp->role.access; +} + +static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index, gfn_t gfn, u32 access) +{ + if (sp_has_gptes(sp)) { + sp->shadowed_translation[index] = (gfn << PAGE_SHIFT) | access; return; } - if (WARN_ON(gfn != kvm_mmu_page_get_gfn(sp, index))) - pr_err_ratelimited("gfn mismatch under direct page %llx " - "(expected %llx, got %llx)\n", - sp->gfn, - kvm_mmu_page_get_gfn(sp, index), gfn); + WARN_ONCE(access != kvm_mmu_page_get_access(sp, index), + "access mismatch under %s page %llx (expected %u, got %u)\n", + sp->role.passthrough ? "passthrough" : "direct", + sp->gfn, kvm_mmu_page_get_access(sp, index), access); + + WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index), + "gfn mismatch under %s page %llx (expected %llx, got %llx)\n", + sp->role.passthrough ? "passthrough" : "direct", + sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn); +} + +static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index, u32 access) +{ + gfn_t gfn = kvm_mmu_page_get_gfn(sp, index); + + kvm_mmu_page_set_translation(sp, index, gfn, access); } /* @@ -1554,14 +1588,14 @@ static bool kvm_test_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head, static void __rmap_add(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, const struct kvm_memory_slot *slot, - u64 *spte, gfn_t gfn) + u64 *spte, gfn_t gfn, u32 access) { struct kvm_mmu_page *sp; struct kvm_rmap_head *rmap_head; int rmap_count; sp = sptep_to_sp(spte); - kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn); + kvm_mmu_page_set_translation(sp, spte - sp->spt, gfn, access); kvm_update_page_stats(kvm, sp->role.level, 1); rmap_head = gfn_to_rmap(gfn, sp->role.level, slot); @@ -1575,11 +1609,11 @@ static void __rmap_add(struct kvm *kvm, } static void rmap_add(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot, - u64 *spte, gfn_t gfn) + u64 *spte, gfn_t gfn, u32 access) { struct kvm_mmu_memory_cache *cache = &vcpu->arch.mmu_pte_list_desc_cache; - __rmap_add(vcpu->kvm, cache, slot, spte, gfn); + __rmap_add(vcpu->kvm, cache, slot, spte, gfn, access); } bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) @@ -1643,7 +1677,7 @@ static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp) list_del(&sp->link); free_page((unsigned long)sp->spt); if (!sp->role.direct) - free_page((unsigned long)sp->gfns); + free_page((unsigned long)sp->shadowed_translation); kmem_cache_free(mmu_page_header_cache, sp); } @@ -2070,7 +2104,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm, struct shadow_page_caches { struct kvm_mmu_memory_cache *page_header_cache; struct kvm_mmu_memory_cache *shadow_page_cache; - struct kvm_mmu_memory_cache *gfn_array_cache; + struct kvm_mmu_memory_cache *shadowed_info_cache; }; static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm, @@ -2084,7 +2118,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm, sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache); sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache); if (!role.direct) - sp->gfns = kvm_mmu_memory_cache_alloc(caches->gfn_array_cache); + sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache); set_page_private(virt_to_page(sp->spt), (unsigned long)sp); @@ -2136,7 +2170,7 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu, struct shadow_page_caches caches = { .page_header_cache = &vcpu->arch.mmu_page_header_cache, .shadow_page_cache = &vcpu->arch.mmu_shadow_page_cache, - .gfn_array_cache = &vcpu->arch.mmu_gfn_array_cache, + .shadowed_info_cache = &vcpu->arch.mmu_shadowed_info_cache, }; return __kvm_mmu_get_shadow_page(vcpu->kvm, vcpu, &caches, gfn, role); @@ -2785,7 +2819,10 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, if (!was_rmapped) { WARN_ON_ONCE(ret == RET_PF_SPURIOUS); - rmap_add(vcpu, slot, sptep, gfn); + rmap_add(vcpu, slot, sptep, gfn, pte_access); + } else { + /* Already rmapped but the pte_access bits may have changed. */ + kvm_mmu_page_set_access(sp, sptep - sp->spt, pte_access); } return ret; diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index bb9d12ac0db3..ae2d660e2dab 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -67,8 +67,21 @@ struct kvm_mmu_page { gfn_t gfn; u64 *spt; - /* hold the gfn of each spte inside spt */ - gfn_t *gfns; + + /* + * Stores the result of the guest translation being shadowed by each + * SPTE. KVM shadows two types of guest translations: nGPA -> GPA + * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both + * cases the result of the translation is a GPA and a set of access + * constraints. + * + * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed + * access permissions are stored in the lower bits. Note, for + * convenience and uniformity across guests, the access permissions are + * stored in KVM format (e.g. ACC_EXEC_MASK) not the raw guest format. + */ + u64 *shadowed_translation; + /* Currently serving as active root */ union { int root_count; diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 6ecdd7a41a82..24f292f3f93f 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -985,7 +985,8 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, } /* - * Using the cached information from sp->gfns is safe because: + * Using the information in sp->shadowed_translation (kvm_mmu_page_get_gfn()) is + * safe because: * - The spte has a reference to the struct page, so the pfn for a given gfn * can't change unless all sptes pointing to it are nuked first. * @@ -1067,12 +1068,16 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) * "present" bit, as all other paging modes will create a * read-only SPTE if pte_access is zero. */ - if ((!pte_access && !shadow_present_mask) || gfn != sp->gfns[i]) { + if ((!pte_access && !shadow_present_mask) || + gfn != kvm_mmu_page_get_gfn(sp, i)) { drop_spte(vcpu->kvm, &sp->spt[i]); flush = true; continue; } + /* Update the shadowed access bits in case they changed. */ + kvm_mmu_page_set_access(sp, i, pte_access); + sptep = &sp->spt[i]; spte = *sptep; host_writable = spte & shadow_host_writable_mask; -- 2.31.1