Received: by 2002:a05:6358:e9c4:b0:b2:91dc:71ab with SMTP id hc4csp4118642rwb; Sun, 7 Aug 2022 15:34:57 -0700 (PDT) X-Google-Smtp-Source: AA6agR5ma98x2dUVzL8QHL2KFSmbr773C4e+tsNWRz+1ne3SPBrt/GhKVP7ImH+9Jb+bjJ1FJ5YD X-Received: by 2002:a17:90a:684e:b0:1f5:474b:28e5 with SMTP id e14-20020a17090a684e00b001f5474b28e5mr24779629pjm.159.1659911697446; Sun, 07 Aug 2022 15:34:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659911697; cv=none; d=google.com; s=arc-20160816; b=yGTcVe+g1s5WFMVbNpvxFYVGmwvkbOr89ChuTwE0mgfHfjU2sQ3bSFSRBXnFNceiPs Yt6Sst1U62Z2M70v015JV01owE4v0IW6gd8D6qkYpAwI0swc1AFZn3rKY6rj3m7Bi4h/ LspvO6HYPn7J+0fQzx1L/k9LceJ4ICHlsEqmfm/6Ey1mLB37GPK0jYPIxAy2oY8To82e c1kfFtIIJ4zvAzCTfc5YStMnMjVNOBXRMK0PDLwdJbcOU+ooEWXmhIEE8Xd6m/lAjuTt 54BphH0A+10MkYSFGijeSAZp6yNYguf+4A1ABLeAf1bDMQjF8SfwHn6y5bOPiYodgCdC KQ+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=xiJMLUidc1s5lMzHDiqsvXdznznF6Z/2ZhmrgwPwbE0=; b=X/5WEn7sPLSni4kyonyc2GwXmV04+26EN3HYOk1xvUzWP7P1e4Y5g/FxTlOuBuGxMb qxtuLNyHEXgkly1a/1OeZjCrSazOoEM3MNcfhefwyINDAQZCmTmJRCryRK45oIM6Hu93 hyiUP5WlpQkgP+iNb59Zb5TPD5Zpu4w2HTjaGQ6T1u1H1BgKiCA9MxyXCwbcLPsognRq yT6Cdl8JSp6jZbxdLFoPyMEhTBqhnzoNJZlLRCUl4HAnC/qtgobAgfE+Xx24VQnq5hiV nRl9yIHRiAMQs5R+FtdzgAXb07tJm7j841HaedLDx8LMv0EjS6TpyyACm7zkyMxyQrK2 DtYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MaBvExBY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a27-20020a630b5b000000b00415c14120fasi9111584pgl.34.2022.08.07.15.34.42; Sun, 07 Aug 2022 15:34:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MaBvExBY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242195AbiHGWKD (ORCPT + 99 others); Sun, 7 Aug 2022 18:10:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237806AbiHGWGG (ORCPT ); Sun, 7 Aug 2022 18:06:06 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBA45655A; Sun, 7 Aug 2022 15:03:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1659909786; x=1691445786; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lc7XJO42kPPBN4Yl9/1JjwIpgvNFDxbyUSh96eiLKic=; b=MaBvExBYEfcZd+SqwURt3ZU0XnRzvj0WI+iw8jESJ6Fa7acySLMU9oxQ EAkaiouTt3QYP2irR8GzzAV4aUFTR5tu5+LmlVCB9KQYSFk9OMR/MzUGc gw+jDh/+/xpNAyOyL1MLUQgDu6gDlP5mhNbTw5zx6jVFWrcKNRYENaOdT jLSx0J0Xdi+cklhLQLrePvt4BqsemHHM4/PMwTY5KPdojj4pMHxpFVkjQ y9Ypwr7UldjFFjKFmGJ4BWKBdFPgDdF0hIwhJ2Qz6Hj30mfgpZuyhYE6V RTVpYW2CHLACkW2+WxAjQbBtq5XehtkYu7cBQFBHAjNwLdty6Q6U0HVHR w==; X-IronPort-AV: E=McAfee;i="6400,9594,10432"; a="289224133" X-IronPort-AV: E=Sophos;i="5.93,220,1654585200"; d="scan'208";a="289224133" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Aug 2022 15:02:35 -0700 X-IronPort-AV: E=Sophos;i="5.93,220,1654585200"; d="scan'208";a="663682579" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Aug 2022 15:02:35 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar Subject: [PATCH v8 043/103] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases Date: Sun, 7 Aug 2022 15:01:28 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sean Christopherson TDX supports only write-back(WB) memory type for private memory architecturally so that (virtualized) memory type change doesn't make sense for private memory. Also currently, page migration isn't supported for TDX yet. (TDX architecturally supports page migration. it's KVM and kernel implementation issue.) Regarding memory type change (mtrr virtualization and lapic page mapping change), pages are zapped by kvm_zap_gfn_range(). On the next KVM page fault, the SPTE entry with a new memory type for the page is populated. Regarding page migration, pages are zapped by the mmu notifier. On the next KVM page fault, the new migrated page is populated. Don't zap private pages on unmapping for those two cases. When deleting/moving a KVM memory slot, zap private pages. Typically tearing down VM. Don't invalidate private page tables. i.e. zap only leaf SPTEs for KVM mmu that has a shared bit mask. The existing kvm_tdp_mmu_invalidate_all_roots() depends on role.invalid with read-lock of mmu_lock so that other vcpu can operate on KVM mmu concurrently. It marks the root page table invalid and zaps SPTEs of the root page tables. The TDX module doesn't allow to unlink a protected root page table from the hardware and then allocate a new one for it. i.e. replacing a protected root page table. Instead, zap only leaf SPTEs for KVM mmu with a shared bit mask set. Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata --- arch/x86/kvm/mmu/mmu.c | 58 ++++++++++++++++++++++++++++++++++++-- arch/x86/kvm/mmu/tdp_mmu.c | 24 ++++++++++++---- arch/x86/kvm/mmu/tdp_mmu.h | 5 ++-- 3 files changed, 77 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index aa0819905874..1d8f1349e925 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1589,7 +1589,12 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) flush = kvm_handle_gfn_range(kvm, range, kvm_zap_rmap); if (is_tdp_mmu_enabled(kvm)) - flush = kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush); + /* + * kvm_unmap_gfn_range() is called via mmu notifier. + * For now page migration for private page isn't supported yet, + * don't zap private pages. + */ + flush = kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush, false); return flush; } @@ -6039,11 +6044,48 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm) return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages)); } +static void kvm_mmu_zap_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) +{ + bool flush = false; + + write_lock(&kvm->mmu_lock); + + /* + * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst + * case scenario we'll have unused shadow pages lying around until they + * are recycled due to age or when the VM is destroyed. + */ + if (is_tdp_mmu_enabled(kvm)) { + struct kvm_gfn_range range = { + .slot = slot, + .start = slot->base_gfn, + .end = slot->base_gfn + slot->npages, + .may_block = false, + }; + + /* + * this handles both private gfn and shared gfn. + * All private page should be zapped on memslot deletion. + */ + flush = kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush, true); + } else { + flush = slot_handle_level(kvm, slot, __kvm_zap_rmap, PG_LEVEL_4K, + KVM_MAX_HUGEPAGE_LEVEL, true); + } + if (flush) + kvm_flush_remote_tlbs(kvm); + + write_unlock(&kvm->mmu_lock); +} + static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, struct kvm_page_track_notifier_node *node) { - kvm_mmu_zap_all_fast(kvm); + if (kvm_gfn_shared_mask(kvm)) + kvm_mmu_zap_memslot(kvm, slot); + else + kvm_mmu_zap_all_fast(kvm); } int kvm_mmu_init_vm(struct kvm *kvm) @@ -6145,8 +6187,18 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) if (is_tdp_mmu_enabled(kvm)) { for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) + /* + * zap_private = false. Zap only shared pages. + * + * kvm_zap_gfn_range() is used when PAT memory type was + * changed. Later on the next kvm page fault, populate + * it with updated spte entry. + * Because only WB is supported for private pages, don't + * care of private pages. + */ flush = kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start, - gfn_end, true, flush); + gfn_end, true, flush, + false); } if (flush) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 3f5b019f9774..6a680e0a9260 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -938,7 +938,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp) * operation can cause a soft lockup. */ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, - gfn_t start, gfn_t end, bool can_yield, bool flush) + gfn_t start, gfn_t end, bool can_yield, bool flush, + bool zap_private) { struct tdp_iter iter; @@ -946,6 +947,10 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, lockdep_assert_held_write(&kvm->mmu_lock); + WARN_ON_ONCE(zap_private && !is_private_sp(root)); + if (!zap_private && is_private_sp(root)) + return false; + rcu_read_lock(); for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) { @@ -978,12 +983,13 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, * more SPTEs were zapped since the MMU lock was last acquired. */ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end, - bool can_yield, bool flush) + bool can_yield, bool flush, bool zap_private) { struct kvm_mmu_page *root; for_each_tdp_mmu_root_yield_safe(kvm, root, as_id) - flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush); + flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush, + zap_private && is_private_sp(root)); return flush; } @@ -1043,6 +1049,12 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) lockdep_assert_held_write(&kvm->mmu_lock); list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { + /* + * Skip private root since private page table + * is only torn down when VM is destroyed. + */ + if (is_private_sp(root)) + continue; if (!root->role.invalid && !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) { root->role.invalid = true; @@ -1234,11 +1246,13 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return ret; } +/* Used by mmu notifier via kvm_unmap_gfn_range() */ bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, - bool flush) + bool flush, bool zap_private) { return kvm_tdp_mmu_zap_leafs(kvm, range->slot->as_id, range->start, - range->end, range->may_block, flush); + range->end, range->may_block, flush, + zap_private); } typedef bool (*tdp_handler_t)(struct kvm *kvm, struct tdp_iter *iter, diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index c163f7cc23ca..c98c7df449a8 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -16,7 +16,8 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root, bool shared); bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, - gfn_t end, bool can_yield, bool flush); + gfn_t end, bool can_yield, bool flush, + bool zap_private); bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp); void kvm_tdp_mmu_zap_all(struct kvm *kvm); void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm); @@ -25,7 +26,7 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm); int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, - bool flush); + bool flush, bool zap_private); bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range); -- 2.25.1