Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp1181036rbb; Mon, 26 Feb 2024 00:57:34 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUcs6CCJKRp0NtxaSpB3aRIDMBUs16OlUPVllv0NEbA4evqoi2apxtrF5ZdW/LgOKZm3/hA2Wpd8PbDtUtqYHCf6XWlVPe+l1WDhNWxsA== X-Google-Smtp-Source: AGHT+IEI97tTy5GzXggmufZxL/SVXX6Dm5e6UnbIl+FGCdQZRCm79ZT9GyKMN1M4fJMdKtslVcED X-Received: by 2002:aa7:dccd:0:b0:564:405c:dfa with SMTP id w13-20020aa7dccd000000b00564405c0dfamr4112505edu.17.1708937854586; Mon, 26 Feb 2024 00:57:34 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708937854; cv=pass; d=google.com; s=arc-20160816; b=eR9JQVdDTvnuKmSvIoajiQYhLq+esNBCXPggofkHpFh5OsJXq6Y+qV6XS7vZWJ+OpN L9jTP6kpuDzG1itXWcIJtXcv8XLyAZtbrJNL0N8h1l9dOtpFZncT1CYqegbalC5cTfMu 3Uks4buJ8KI1n8JXinRHlGvBR9rEgbRYVVpiVU7XSZSWEU2jGG6TOyDNINXCja2lF6UO aMcc4HPSVfZKq3kZehFaFfQ8Lty3e2gtJkVMUH/hRwaxXA1+u9onhR82qA63+r5K00sy G3yuCeB/moyHdKMa0SJB9WCYO2zPgHaJemaAHQcVlIsfkl7/maRyfvXvmimA5z7HpYrg D2bw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=FUTessDucssq4tHb+UitQ1qJb1nE5EovXCPHuPLrFkU=; fh=ka1BjeomumTXa1Vhgkzg2yK2f13tB8VguFI2e9zPJ3c=; b=q0b4ruMMWfo6LwVQaCx3rSluezZTwXkOHeZpAL/rG3inWffiPAy/Gpi4yeacWXly9x r4Ro/7JK/QXURjRvx3q2CoSduOakX5BlEFERt1njfa8r7Cxt5ySfxl7+HfqugpzHrECe 1OV0URRY028yCm1lEc6GZlyk2HKPWcOACgRvCGNuU4+zzLveKBRmtCJD11xI5/bpW6c+ iobnRryATc+2T7RVayO3Yv0+tdJtWsf7eA4lMYuUzErcn3wpS0Nseun2Sf5VwyBls4kE zo8dR0ke3Gw09NGAsl051LlAM3k1RPz3ieUX10bcefNQ8BeGMcanig4vZ16scg4Dv94x e1CA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jeSHSvdN; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-80822-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-80822-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id w4-20020aa7cb44000000b00565b90910b3si1532262edt.308.2024.02.26.00.57.34 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Feb 2024 00:57:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-80822-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jeSHSvdN; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-80822-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-80822-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id CA08D1F266E2 for ; Mon, 26 Feb 2024 08:47:19 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C435662163; Mon, 26 Feb 2024 08:28:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jeSHSvdN" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80F8E605D9; Mon, 26 Feb 2024 08:28:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708936104; cv=none; b=jFoQzNkMmsJyBQ619Of0CjzXzvcSR4oQsAdy9j1ZImmD2jpoGyX5Zh92gzHHZrW/GMR65zIJp/x9HFiKRZmRSl5/qFTsbeKUS0gCG4OimAj7jQDpd8kwB055I4W8InGLjDFP0MaBUCXHU8whX7iZWhHhSL5K+P7y6Y03tXIqE+0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708936104; c=relaxed/simple; bh=Ec2gGLjcpP1TqXDhmYgjFxqNVDJsro62cXTKaDeY8HM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LpMI5T2u9FrbS6Eq1Kk8no4TfOIlObd1kZFWwi9ke4PbhbzgQEFgnamCBspI3Mp7Sx3Uxf13AfHuuOptiGAbq3H/ob79/ftYowBuwGhsWUaqYJ6UQhpCOcHetu4Ipr6J0L7gba5927DeAA6KB9qjF0C08KqVbMOx/KuD2gUt04k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=jeSHSvdN; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1708936102; x=1740472102; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ec2gGLjcpP1TqXDhmYgjFxqNVDJsro62cXTKaDeY8HM=; b=jeSHSvdNYX7lwImRmAgBbA8hkvO/Ep5insnSpY4LUC9DbdRu+oo2RxvM 5vkKjrOU5ESTBID71P601bFkLH6amlM4W0KMYU1jCZCKp6FpcxN2ht3oe UGQDAgj/Mp2HUbb/Z3WH4lDc3br7s6C2+PA1xPCrRqqp+as7agr0sLBsk jlDaBvQNIfRnHjDXURP6L2gFnkXBr44RFn6QpzeQJNrjDbTLOGBGmaB0z ysTOSjS6Gj3TWYEi1hlsxUqltO7ZhrDP1l86yORvYAKNc4oVgxEcFd/gm /ZvnFPOs0OKhryF1HJtgLr0xe3CkRNOtoZNNt1+REDsMldT2TwliAtS63 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10995"; a="6155430" X-IronPort-AV: E=Sophos;i="6.06,185,1705392000"; d="scan'208";a="6155430" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2024 00:28:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.06,185,1705392000"; d="scan'208";a="6615893" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2024 00:28:17 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar , Kai Huang , chen.bo@intel.com, hang.yuan@intel.com, tina.zhang@intel.com, Sean Christopherson Subject: [PATCH v19 059/130] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases Date: Mon, 26 Feb 2024 00:26:01 -0800 Message-Id: <1ed955a44cd81738b498fe52823766622d8ad57f.1708933498.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Sean Christopherson TDX supports only write-back(WB) memory type for private memory architecturally so that (virtualized) memory type change doesn't make sense for private memory. Also currently, page migration isn't supported for TDX yet. (TDX architecturally supports page migration. it's KVM and kernel implementation issue.) Regarding memory type change (mtrr virtualization and lapic page mapping change), pages are zapped by kvm_zap_gfn_range(). On the next KVM page fault, the SPTE entry with a new memory type for the page is populated. Regarding page migration, pages are zapped by the mmu notifier. On the next KVM page fault, the new migrated page is populated. Don't zap private pages on unmapping for those two cases. When deleting/moving a KVM memory slot, zap private pages. Typically tearing down VM. Don't invalidate private page tables. i.e. zap only leaf SPTEs for KVM mmu that has a shared bit mask. The existing kvm_tdp_mmu_invalidate_all_roots() depends on role.invalid with read-lock of mmu_lock so that other vcpu can operate on KVM mmu concurrently. It marks the root page table invalid and zaps SPTEs of the root page tables. The TDX module doesn't allow to unlink a protected root page table from the hardware and then allocate a new one for it. i.e. replacing a protected root page table. Instead, zap only leaf SPTEs for KVM mmu with a shared bit mask set. Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata --- arch/x86/kvm/mmu/mmu.c | 61 ++++++++++++++++++++++++++++++++++++-- arch/x86/kvm/mmu/tdp_mmu.c | 37 +++++++++++++++++++---- arch/x86/kvm/mmu/tdp_mmu.h | 5 ++-- 3 files changed, 92 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0d6d4506ec97..30c86e858ae4 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6339,7 +6339,7 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) * e.g. before kvm_zap_obsolete_pages() could drop mmu_lock and yield. */ if (tdp_mmu_enabled) - kvm_tdp_mmu_invalidate_all_roots(kvm); + kvm_tdp_mmu_invalidate_all_roots(kvm, true); /* * Notify all vcpus to reload its shadow page table and flush TLB. @@ -6459,7 +6459,16 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end); if (tdp_mmu_enabled) - flush = kvm_tdp_mmu_zap_leafs(kvm, gfn_start, gfn_end, flush); + /* + * zap_private = false. Zap only shared pages. + * + * kvm_zap_gfn_range() is used when MTRR or PAT memory + * type was changed. Later on the next kvm page fault, + * populate it with updated spte entry. + * Because only WB is supported for private pages, don't + * care of private pages. + */ + flush = kvm_tdp_mmu_zap_leafs(kvm, gfn_start, gfn_end, flush, false); if (flush) kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start); @@ -6905,10 +6914,56 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm) kvm_mmu_zap_all(kvm); } +static void kvm_mmu_zap_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) +{ + bool flush = false; + + write_lock(&kvm->mmu_lock); + + /* + * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst + * case scenario we'll have unused shadow pages lying around until they + * are recycled due to age or when the VM is destroyed. + */ + if (tdp_mmu_enabled) { + struct kvm_gfn_range range = { + .slot = slot, + .start = slot->base_gfn, + .end = slot->base_gfn + slot->npages, + .may_block = true, + + /* + * This handles both private gfn and shared gfn. + * All private page should be zapped on memslot deletion. + */ + .only_private = true, + .only_shared = true, + }; + + flush = kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush); + } else { + /* TDX supports only TDP-MMU case. */ + WARN_ON_ONCE(1); + flush = true; + } + if (flush) + kvm_flush_remote_tlbs(kvm); + + write_unlock(&kvm->mmu_lock); +} + void kvm_arch_flush_shadow_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) { - kvm_mmu_zap_all_fast(kvm); + if (kvm_gfn_shared_mask(kvm)) + /* + * Secure-EPT requires to release PTs from the leaf. The + * optimization to zap root PT first with child PT doesn't + * work. + */ + kvm_mmu_zap_memslot(kvm, slot); + else + kvm_mmu_zap_all_fast(kvm); } void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index d47f0daf1b03..e7514a807134 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -37,7 +37,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) * for zapping and thus puts the TDP MMU's reference to each root, i.e. * ultimately frees all roots. */ - kvm_tdp_mmu_invalidate_all_roots(kvm); + kvm_tdp_mmu_invalidate_all_roots(kvm, false); kvm_tdp_mmu_zap_invalidated_roots(kvm); WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages)); @@ -771,7 +771,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp) * operation can cause a soft lockup. */ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, - gfn_t start, gfn_t end, bool can_yield, bool flush) + gfn_t start, gfn_t end, bool can_yield, bool flush, + bool zap_private) { struct tdp_iter iter; @@ -779,6 +780,10 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, lockdep_assert_held_write(&kvm->mmu_lock); + WARN_ON_ONCE(zap_private && !is_private_sp(root)); + if (!zap_private && is_private_sp(root)) + return false; + rcu_read_lock(); for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) { @@ -810,13 +815,15 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, * true if a TLB flush is needed before releasing the MMU lock, i.e. if one or * more SPTEs were zapped since the MMU lock was last acquired. */ -bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush) +bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush, + bool zap_private) { struct kvm_mmu_page *root; lockdep_assert_held_write(&kvm->mmu_lock); for_each_tdp_mmu_root_yield_safe(kvm, root) - flush = tdp_mmu_zap_leafs(kvm, root, start, end, true, flush); + flush = tdp_mmu_zap_leafs(kvm, root, start, end, true, flush, + zap_private && is_private_sp(root)); return flush; } @@ -891,7 +898,7 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm) * Note, kvm_tdp_mmu_zap_invalidated_roots() is gifted the TDP MMU's reference. * See kvm_tdp_mmu_get_vcpu_root_hpa(). */ -void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) +void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm, bool skip_private) { struct kvm_mmu_page *root; @@ -916,6 +923,12 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) * or get/put references to roots. */ list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { + /* + * Skip private root since private page table + * is only torn down when VM is destroyed. + */ + if (skip_private && is_private_sp(root)) + continue; /* * Note, invalid roots can outlive a memslot update! Invalid * roots must be *zapped* before the memslot update completes, @@ -1104,14 +1117,26 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return ret; } +/* Used by mmu notifier via kvm_unmap_gfn_range() */ bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, bool flush) { struct kvm_mmu_page *root; + bool zap_private = false; + + if (kvm_gfn_shared_mask(kvm)) { + if (!range->only_private && !range->only_shared) + /* attributes change */ + zap_private = !(range->arg.attributes & + KVM_MEMORY_ATTRIBUTE_PRIVATE); + else + zap_private = range->only_private; + } __for_each_tdp_mmu_root_yield_safe(kvm, root, range->slot->as_id, false) flush = tdp_mmu_zap_leafs(kvm, root, range->start, range->end, - range->may_block, flush); + range->may_block, flush, + zap_private && is_private_sp(root)); return flush; } diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 20d97aa46c49..b3cf58a50357 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -19,10 +19,11 @@ __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root) void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root); -bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush); +bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush, + bool zap_private); bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp); void kvm_tdp_mmu_zap_all(struct kvm *kvm); -void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm); +void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm, bool skip_private); void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm); int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); -- 2.25.1