Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp4726885iob; Sun, 8 May 2022 23:25:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxbFipjFiHzJ8yORHDmhJTx5g2554h0ytEDuxJtXqVpHYxdHfSJcXuIVW9+1Q8cwWRsY5c+ X-Received: by 2002:aa7:8e0b:0:b0:50d:6d7f:54d with SMTP id c11-20020aa78e0b000000b0050d6d7f054dmr14793316pfr.29.1652077559271; Sun, 08 May 2022 23:25:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652077559; cv=none; d=google.com; s=arc-20160816; b=E7O2dFU9WbQd47QvgjyexPh38XmQn5mHt4tthYjxHBl16d+eZmMiiJPVm5+QpGA636 g1qvMata2AhrG4spYKojTH82OYloVe1HPBZ3q0sNZIwZ5+axdMSeeXObffUN9W9j4JCi wNQHEjgGmUq3lNfx8vJQKkunOFpESxkcBbVH2OfjyKN50o9DYRtg/hJmtRROs1cdc2Z9 BZROwLw3AVTbO6QL69IYBmZYZSpovht5sSo7rwrSIBec0azWCwlz0tOPGsv8fTNbcbNu 1w1ZC+43xlHijAzt1IIV6cxK2rHV3vO76KIq7kBhouMxpZnxx/xdLtWK43hnzW6k3vCF 1Myg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=gANrBbNjDtIUoeQ2qbU9M+AEENTLnt1/Kj+014NPtEE=; b=0L1ZpuXVKsY7Zb7aF0i+s0/yl/la1i8zURjPdhSPpmlYPfWj0YLv7PczEaGJtoYbAQ SXkEq5nmAWnP/C1u1ud5ATPgUhywZuhIwBYb8iSIKa85K+0GZTytBDG9qO9tLrNe6JSg gNyQgz01gKXSSF5Wzapa0seXxUnKSIp+o6n60eP2KzUIkGofCa6vB4b9IIVFSAdBwzHV q1JU2pRVDqjfPlD3D7SGq+MxsqBT+9lGT+Lz7vpwUq/pnFad9rJJXBv0iGpVyXknAG9r 8p/sPDlY5Qc916X1kmp9+ncxulGDWdlbAI4l/7Q9r568QgIOX7rUhs0mk9czuJ9j3cDE FmWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HCxY5ZCX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id g2-20020a17090a300200b001dc1df262e8si14240032pjb.110.2022.05.08.23.25.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 May 2022 23:25:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HCxY5ZCX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A73F0177064; Sun, 8 May 2022 23:24:49 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384095AbiEES1n (ORCPT + 99 others); Thu, 5 May 2022 14:27:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1383195AbiEESTl (ORCPT ); Thu, 5 May 2022 14:19:41 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CFADB15709; Thu, 5 May 2022 11:15:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651774552; x=1683310552; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aYynseuNWmbGl2UAUJ0UpxmY6xn9k2cN25ehM7pMsG8=; b=HCxY5ZCX8RL01CMxw4nlf2i5l58naQKmHT1HTkKuhJMrtkfEYWMk32+9 lLWRN1Fe/SW5wjWSfjlNQNpaJoI9bfrPgUkh/1K9PKpa/Gi1u1xep+RJs oipXrSkVpPzTdVjW3FZo5xF5BHUMXH2j3qSVCUja8br9DWg5v2Pg6Tdsd syyHmgggCdHd/kQLXtsySmY/jfojfeCWJEs0cS8FfblKrveK6a8Ar7uDQ ORbB9OML6cZ1l4wNqs2V5Hz4rXa3lmjCh/6K5uF1k6TIXFN8a6qra7cui 2Y8FgHLpNpshkxoe+sFhIMR3LfnX4K/wXPLCKxNeOsqkc8oVb4igrl3Nb g==; X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354846" X-IronPort-AV: E=Sophos;i="5.91,202,1647327600"; d="scan'208";a="268354846" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2022 11:15:48 -0700 X-IronPort-AV: E=Sophos;i="5.91,202,1647327600"; d="scan'208";a="665083348" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2022 11:15:48 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar Subject: [RFC PATCH v6 057/104] KVM: x86/mmu: steal software usable git to record if GFN is for shared or not Date: Thu, 5 May 2022 11:14:51 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Isaku Yamahata With TDX, all GFNs are private at guest boot time. At run time guest TD can explicitly change it to shared from private or vice-versa by MapGPA hypercall. If it's specified, the given GFN can't be used as otherwise. That's is, if a guest tells KVM that the GFN is shared, it can't be used as private. or vice-versa. Steal software usable bit, SPTE_SHARED_MASK, for it from MMIO counter to record it. Use the bit SPTE_SHARED_MASK in shared or private EPT to determine which mapping, shared or private, is allowed. If requested mapping isn't allowed, return RET_PF_RETRY to wait for other vcpu to change it. The bit is recorded in both shared and private shadow page to avoid traverse one more shadow page when resolving KVM page fault. The bit needs to be kept over zapping the EPT entry. Currently the EPT entry is initialized SHADOW_NONPRESENT_VALUE unconditionally to clear SPTE_SHARED_MASK bit. To carry SPTE_SHARED_MASK bit, introduce a helper function to get initial value for zapped entry with SPTE_SHARED_MASK bit. Replace SHADOW_NONPRESENT_VALUE with it. Signed-off-by: Isaku Yamahata --- arch/x86/kvm/mmu/spte.h | 17 +++++++--- arch/x86/kvm/mmu/tdp_mmu.c | 65 ++++++++++++++++++++++++++++++++------ 2 files changed, 68 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 1ac2a7a91166..d97ffe440536 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -14,6 +14,9 @@ */ #define SPTE_MMU_PRESENT_MASK BIT_ULL(11) +/* Masks that used to track for shared GPA **/ +#define SPTE_SHARED_MASK BIT_ULL(62) + /* * TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may also * be restricted to using write-protection (for L2 when CPU dirty logging, i.e. @@ -104,7 +107,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK)); * the memslots generation and is derived as follows: * * Bits 0-7 of the MMIO generation are propagated to spte bits 3-10 - * Bits 8-18 of the MMIO generation are propagated to spte bits 52-62 + * Bits 8-18 of the MMIO generation are propagated to spte bits 52-61 * * The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not included in * the MMIO generation number, as doing so would require stealing a bit from @@ -118,7 +121,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK)); #define MMIO_SPTE_GEN_LOW_END 10 #define MMIO_SPTE_GEN_HIGH_START 52 -#define MMIO_SPTE_GEN_HIGH_END 62 +#define MMIO_SPTE_GEN_HIGH_END 61 #define MMIO_SPTE_GEN_LOW_MASK GENMASK_ULL(MMIO_SPTE_GEN_LOW_END, \ MMIO_SPTE_GEN_LOW_START) @@ -131,7 +134,7 @@ static_assert(!(SPTE_MMU_PRESENT_MASK & #define MMIO_SPTE_GEN_HIGH_BITS (MMIO_SPTE_GEN_HIGH_END - MMIO_SPTE_GEN_HIGH_START + 1) /* remember to adjust the comment above as well if you change these */ -static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11); +static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 10); #define MMIO_SPTE_GEN_LOW_SHIFT (MMIO_SPTE_GEN_LOW_START - 0) #define MMIO_SPTE_GEN_HIGH_SHIFT (MMIO_SPTE_GEN_HIGH_START - MMIO_SPTE_GEN_LOW_BITS) @@ -208,6 +211,7 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mask; /* Removed SPTEs must not be misconstrued as shadow present PTEs. */ static_assert(!(__REMOVED_SPTE & SPTE_MMU_PRESENT_MASK)); static_assert(!(__REMOVED_SPTE & SHADOW_NONPRESENT_VALUE)); +static_assert(!(__REMOVED_SPTE & SPTE_SHARED_MASK)); /* * See above comment around __REMOVED_SPTE. REMOVED_SPTE is the actual @@ -217,7 +221,12 @@ static_assert(!(__REMOVED_SPTE & SHADOW_NONPRESENT_VALUE)); static inline bool is_removed_spte(u64 spte) { - return spte == REMOVED_SPTE; + return (spte & ~SPTE_SHARED_MASK) == REMOVED_SPTE; +} + +static inline u64 spte_shared_mask(u64 spte) +{ + return spte & SPTE_SHARED_MASK; } /* diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 2aa2cb8a9b05..1d7642a0acc9 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -736,6 +736,11 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm, return 0; } +static u64 shadow_nonpresent_spte(u64 old_spte) +{ + return SHADOW_NONPRESENT_VALUE | spte_shared_mask(old_spte); +} + static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm, struct tdp_iter *iter) { @@ -770,7 +775,8 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm, * SHADOW_NONPRESENT_VALUE (which sets "suppress #VE" bit) so it * can be set when EPT table entries are zapped. */ - kvm_tdp_mmu_write_spte(iter->sptep, SHADOW_NONPRESENT_VALUE); + kvm_tdp_mmu_write_spte(iter->sptep, + shadow_nonpresent_spte(iter->old_spte)); return 0; } @@ -948,8 +954,11 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root, continue; if (!shared) - tdp_mmu_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE); - else if (tdp_mmu_set_spte_atomic(kvm, &iter, SHADOW_NONPRESENT_VALUE)) + tdp_mmu_set_spte(kvm, &iter, + shadow_nonpresent_spte(iter.old_spte)); + else if (tdp_mmu_set_spte_atomic( + kvm, &iter, + shadow_nonpresent_spte(iter.old_spte))) goto retry; } } @@ -1006,7 +1015,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp) return false; __tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, - SHADOW_NONPRESENT_VALUE, sp->gfn, sp->role.level + 1, + shadow_nonpresent_spte(old_spte), + sp->gfn, sp->role.level + 1, true, true, is_private_sp(sp)); return true; @@ -1048,11 +1058,20 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, continue; } + /* + * SPTE_SHARED_MASK is stored as 4K granularity. The + * information is lost if we delete upper level SPTE page. + * TODO: support large page. + */ + if (kvm_gfn_shared_mask(kvm) && iter.level > PG_LEVEL_4K) + continue; + if (!is_shadow_present_pte(iter.old_spte) || !is_last_spte(iter.old_spte, iter.level)) continue; - tdp_mmu_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE); + tdp_mmu_set_spte(kvm, &iter, + shadow_nonpresent_spte(iter.old_spte)); flush = true; } @@ -1168,18 +1187,44 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, gfn_t gfn_unalias = iter->gfn & ~kvm_gfn_shared_mask(vcpu->kvm); WARN_ON(sp->role.level != fault->goal_level); + WARN_ON(is_private_sptep(iter->sptep) != fault->is_private); - /* TDX shared GPAs are no executable, enforce this for the SDV. */ - if (kvm_gfn_shared_mask(vcpu->kvm) && !fault->is_private) - pte_access &= ~ACC_EXEC_MASK; + if (kvm_gfn_shared_mask(vcpu->kvm)) { + if (fault->is_private) { + /* + * SPTE allows only RWX mapping. PFN can't be mapped it + * as READONLY in GPA. + */ + if (fault->slot && !fault->map_writable) + return RET_PF_RETRY; + /* + * This GPA is not allowed to map as private. Let + * vcpu loop in page fault until other vcpu change it + * by MapGPA hypercall. + */ + if (fault->slot && + spte_shared_mask(iter->old_spte)) + return RET_PF_RETRY; + } else { + /* This GPA is not allowed to map as shared. */ + if (fault->slot && + !spte_shared_mask(iter->old_spte)) + return RET_PF_RETRY; + /* TDX shared GPAs are no executable, enforce this. */ + pte_access &= ~ACC_EXEC_MASK; + } + } if (unlikely(!fault->slot)) new_spte = make_mmio_spte(vcpu, gfn_unalias, pte_access); - else + else { wrprot = make_spte(vcpu, sp, fault->slot, pte_access, gfn_unalias, fault->pfn, iter->old_spte, fault->prefetch, true, fault->map_writable, &new_spte); + if (spte_shared_mask(iter->old_spte)) + new_spte |= SPTE_SHARED_MASK; + } if (new_spte == iter->old_spte) ret = RET_PF_SPURIOUS; @@ -1488,7 +1533,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter, * invariant that the PFN of a present * leaf SPTE can never change. * See __handle_changed_spte(). */ - tdp_mmu_set_spte(kvm, iter, SHADOW_NONPRESENT_VALUE); + tdp_mmu_set_spte(kvm, iter, shadow_nonpresent_spte(iter->old_spte)); if (!pte_write(range->pte)) { new_spte = kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte, -- 2.25.1