Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp2618827rdg; Mon, 16 Oct 2023 09:32:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHq/EHe5O9/ENRi7uyuHwzEh1cVfpnuzwiiOQEGG+odLwXvfygejjrQtK426ROzP5ehf/Xg X-Received: by 2002:a05:6a20:a104:b0:161:2cf2:75ec with SMTP id q4-20020a056a20a10400b001612cf275ecmr41636233pzk.49.1697473972621; Mon, 16 Oct 2023 09:32:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697473972; cv=none; d=google.com; s=arc-20160816; b=nCZ0924rZsvxM7HFRkBFjY+WPfptPyJZwxcY4/0tg/+QGz9xkco0AfSpH/V6nK+Wi2 m18iLqQRR/HJKsC4gLKAvtMQvLsOeHC5WgsXcPLE1nIdLWWRKBzqVzsPkuUlI2RrjEGM 9EPYPBSfgqDyoENZOqkFyIkeiM3aI4cwTr+jMsvPyVIdPrvlE+Y/TyM9TI/OmgXEpuk0 gUocfljnTf+habOxgMe4oCEp53mJvhDslkSdFAE4PLVJfJ4SrFXi284yRPqgLPHQMp7h PKhZ2+ljIj5U6SBeF60bi9X80IEqsyHpkU53zC/qUFSOW6V5vIBEdzVKPFnGXAMeNyLl QYcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6mPopYs1L+0Ca0fYk2z/NyUw20jV1uo/L7IddLi2Rk0=; fh=lRdU2Q/1zx5DcPdZuWBjshA5VT5Oc9cEhB1tCFiV0Nw=; b=IwLpvhCbTLzoawaX5IMuDLMjacBU+plyYALO9vrMRYJxkSlfmXMdtTVjSbW6ljNtkO SmYosNU4tROKlQ1ZNXd6YnRNrOvL44/uxwqZV8HjepWb6RIqghVNaoSfnggcfrctsIcy cP3oBMSXi2bBXFRFDRhAYIubSHICGl5MzPIALo5IwPOEmEBwhLGXVZoef4T+YxW660E4 FDHLxARv7QrpMH89SnG/d8TPlwcEtwLLGLxh56PP7HueKiB0T1BUtYAsyO8tFn8EDLbY Gm/78wOjyP7R24fS+6+tK0SiJmTwqmyDKwS5eghfJNRW6UaAcK2rIoAvhwO2deD6Yg3V NMqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=K5fxtRf4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id c7-20020a63ef47000000b005780a1eff9dsi11418788pgk.54.2023.10.16.09.32.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 09:32:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=K5fxtRf4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id E457A80A0E28; Mon, 16 Oct 2023 09:32:37 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233991AbjJPQb4 (ORCPT + 99 others); Mon, 16 Oct 2023 12:31:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36128 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233659AbjJPQQj (ORCPT ); Mon, 16 Oct 2023 12:16:39 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32299A2; Mon, 16 Oct 2023 09:16:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697472996; x=1729008996; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=v0DxrZjRDaNmNklqFsVZwsFgzi6y6gCREtHAQPHdN24=; b=K5fxtRf4F2o/kaMkSUHcQbURD3k0rjCgohYIUCM5RIOp06rk5t8DeD1Y XEa7tyWWAkmBeau8vqO0s4ssBeMs0J6JsNV+xjBvFLNkhqSQFdXuRQfm7 7CtIQjBXoxSK31wOUGtdl54YkZU3JkRp+WRpyvvVWiG2v1jdBKdmG9xvq GiucQ4T7FCBfd/UK0RXQtbI8w2TCo5feYx9WKnPmCEDSlFmvWgVdzslbx anHccX12/fNtUx15VcfRUdR+ncQDH8V9qBeobFWedlbsRfQFJUcc1Klam xbvu9ZMm3WVwc/sJs6NDnOyH+XwCpapckTqhYGHu+7vQ67bxUabt1RMLK g==; X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="364921743" X-IronPort-AV: E=Sophos;i="6.03,229,1694761200"; d="scan'208";a="364921743" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2023 09:15:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="846448101" X-IronPort-AV: E=Sophos;i="6.03,229,1694761200"; d="scan'208";a="846448101" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2023 09:15:41 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, hang.yuan@intel.com, tina.zhang@intel.com Subject: [PATCH v16 043/116] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page Date: Mon, 16 Oct 2023 09:13:55 -0700 Message-Id: <4c0eb66ff52d29f2d2811af1d519e99aa507a689.1697471314.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 16 Oct 2023 09:32:38 -0700 (PDT) From: Isaku Yamahata For private GPA, CPU refers a private page table whose contents are encrypted. The dedicated APIs to operate on it (e.g. updating/reading its PTE entry) are used and their cost is expensive. When KVM resolves KVM page fault, it walks the page tables. To reuse the existing KVM MMU code and mitigate the heavy cost to directly walk private page table, allocate one more page to copy the dummy page table for KVM MMU code to directly walk. Resolve KVM page fault with the existing code, and do additional operations necessary for the private page table. To distinguish such cases, the existing KVM page table is called a shared page table (i.e. not associated with private page table), and the page table with private page table is called a private page table. The relationship is depicted below. Add a private pointer to struct kvm_mmu_page for private page table and add helper functions to allocate/initialize/free a private page table page. KVM page fault | | | V | -------------+---------- | | | | V V | shared GPA private GPA | | | | V V | shared PT root dummy PT root | private PT root | | | | V V | V shared PT dummy PT ----propagate----> private PT | | | | | \-----------------+------\ | | | | | V | V V shared guest page | private guest page | non-encrypted memory | encrypted memory | PT: page table - Shared PT is visible to KVM and it is used by CPU. - Private PT is used by CPU but it is invisible to KVM. - Dummy PT is visible to KVM but not used by CPU. It is used to propagate PT change to the actual private PT which is used by CPU. Signed-off-by: Isaku Yamahata --- arch/x86/include/asm/kvm_host.h | 5 ++ arch/x86/kvm/mmu/mmu.c | 7 +++ arch/x86/kvm/mmu/mmu_internal.h | 83 +++++++++++++++++++++++++++++++-- arch/x86/kvm/mmu/tdp_mmu.c | 1 + 4 files changed, 92 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f8664becb1e4..432b94a61ab1 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -817,6 +817,11 @@ struct kvm_vcpu_arch { struct kvm_mmu_memory_cache mmu_shadow_page_cache; struct kvm_mmu_memory_cache mmu_shadowed_info_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; + /* + * This cache is to allocate private page table. E.g. Secure-EPT used + * by the TDX module. + */ + struct kvm_mmu_memory_cache mmu_private_spt_cache; /* * QEMU userspace and the guest each have their own FPU state. diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4ac34a0cc33e..6636be590583 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -689,6 +689,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect) 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM); if (r) return r; + if (kvm_gfn_shared_mask(vcpu->kvm)) { + r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_private_spt_cache, + PT64_ROOT_MAX_LEVEL); + if (r) + return r; + } r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache, PT64_ROOT_MAX_LEVEL); if (r) @@ -708,6 +714,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache); + kvm_mmu_free_memory_cache(&vcpu->arch.mmu_private_spt_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache); } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index a510f0a16853..398f30fc89a9 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -95,7 +95,23 @@ struct kvm_mmu_page { int root_count; refcount_t tdp_mmu_root_count; }; - unsigned int unsync_children; + union { + struct { + unsigned int unsync_children; + /* + * Number of writes since the last time traversal + * visited this page. + */ + atomic_t write_flooding_count; + }; +#ifdef CONFIG_KVM_MMU_PRIVATE + /* + * Associated private shadow page table, e.g. Secure-EPT page + * passed to the TDX module. + */ + void *private_spt; +#endif + }; union { struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */ tdp_ptep_t ptep; @@ -124,9 +140,6 @@ struct kvm_mmu_page { int clear_spte_count; #endif - /* Number of writes since the last time traversal visited this page. */ - atomic_t write_flooding_count; - #ifdef CONFIG_X86_64 /* Used for freeing the page asynchronously if it is a TDP MMU page. */ struct rcu_head rcu_head; @@ -150,6 +163,68 @@ static inline bool is_private_sp(const struct kvm_mmu_page *sp) return kvm_mmu_page_role_is_private(sp->role); } +#ifdef CONFIG_KVM_MMU_PRIVATE +static inline void *kvm_mmu_private_spt(struct kvm_mmu_page *sp) +{ + return sp->private_spt; +} + +static inline void kvm_mmu_init_private_spt(struct kvm_mmu_page *sp, void *private_spt) +{ + sp->private_spt = private_spt; +} + +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) +{ + bool is_root = vcpu->arch.root_mmu.root_role.level == sp->role.level; + + KVM_BUG_ON(!kvm_mmu_page_role_is_private(sp->role), vcpu->kvm); + if (is_root) + /* + * Because TDX module assigns root Secure-EPT page and set it to + * Secure-EPTP when TD vcpu is created, secure page table for + * root isn't needed. + */ + sp->private_spt = NULL; + else { + /* + * Because the TDX module doesn't trust VMM and initializes + * the pages itself, KVM doesn't initialize them. Allocate + * pages with garbage and give them to the TDX module. + */ + sp->private_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_private_spt_cache); + /* + * Because mmu_private_spt_cache is topped up before staring kvm + * page fault resolving, the allocation above shouldn't fail. + */ + WARN_ON_ONCE(!sp->private_spt); + } +} + +static inline void kvm_mmu_free_private_spt(struct kvm_mmu_page *sp) +{ + if (sp->private_spt) + free_page((unsigned long)sp->private_spt); +} +#else +static inline void *kvm_mmu_private_spt(struct kvm_mmu_page *sp) +{ + return NULL; +} + +static inline void kvm_mmu_init_private_spt(struct kvm_mmu_page *sp, void *private_spt) +{ +} + +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) +{ +} + +static inline void kvm_mmu_free_private_spt(struct kvm_mmu_page *sp) +{ +} +#endif + static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) { /* diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 0c5c7eadb1ba..583888bbb87d 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -66,6 +66,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) { + kvm_mmu_free_private_spt(sp); free_page((unsigned long)sp->spt); kmem_cache_free(mmu_page_header_cache, sp); } -- 2.25.1