Received: by 2002:ab2:b82:0:b0:1f3:401:3cfb with SMTP id 2csp124896lqh; Wed, 27 Mar 2024 17:37:11 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUoDgN9ynWQ0Ze8idj5NR1P4dAkwkGSQfvwyL5FSqPeA0mctnkLXel5+aLajyA8oGcNJzDG138cd/NxZcTLjySEpgiwuuK1mlIHD3kdTQ== X-Google-Smtp-Source: AGHT+IFgJgfFblih0UykAwqTUgGB1EnPlHRon1nvoePJJFtBJ6900q+5K7gHykgvmCpH/Fm/SbZQ X-Received: by 2002:a17:906:6a02:b0:a4e:1d55:c21 with SMTP id qw2-20020a1709066a0200b00a4e1d550c21mr258008ejc.67.1711586230935; Wed, 27 Mar 2024 17:37:10 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711586230; cv=pass; d=google.com; s=arc-20160816; b=c3B/opbKGwzovGPlnFGRjipmjMtwMCicHO7d7HjA5/QjIL8mVh9Oj4G8WFcy4FqLd4 b9dh1K9Q+7H3oLybqpoT7hdKZGQYCzvP1gOtRCZSo0V+0kCb5tpMFiWVF9t+R9GWoSpf mKWe3RnEXOyhNGKBPgzuX9FtLzuMiT7g6O7PwA7fZKRAX6GjtmJlcepqeGHKahvDOE2w NuUXB8OK8jChxsML06Qmjyd4+0Zl/SLgc9n1SAD5XuI0jSzgJ8T2jN470RecoW7VZAAW dTzq9+eO6jzr7ruJZtiaqmXXuwldhQZSkf7XHTkxq+Aj4n8rHPfhjowBIRKJoU/CzTlG BWyg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=D+Ab0GnFHOyCgicYohYV2KHX8+uDfRsn3oswXZKkLlc=; fh=tt0CEY5xhom/IlvDKU6vi+rzDvhB3cKAIKF4sAkibAI=; b=C7dwew34zmzYTU6Rc71d+q/tB4CHbROhZUo8zgpsVcstHnOz4+4QaGWpmGfHFamfvX I5Ob83HMjmr9/2mULskAGILZQhGn+s62+mGRccqQSM2EbimxiQyUTr1vYmqK49P2isnF oImSR6htLmGHUgdfFADNStR5/QDcNJuhC1ZYR/YV0rwtZ7QeMcnIaLF/njWMyl6jO6HH CXYT1g/D4oML2Dp3MbQ8Q6rv6n9N1bsoPIlan5CPUTcng/a6P0UIzmZEI8lSzuSrYgpN wkf3mHgoi7s53EtFVwvT94y8EMRdgPYubXHPaw/pN7ItSQsvp3AFZ6DsQbxi629/RhZ/ Xq2w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=akOphw9O; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-122232-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-122232-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id xo8-20020a170907bb8800b00a4668973ad2si112317ejc.937.2024.03.27.17.37.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Mar 2024 17:37:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-122232-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=akOphw9O; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-122232-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-122232-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 1FE9E1F35F99 for ; Thu, 28 Mar 2024 00:27:16 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DF1B422EE5; Thu, 28 Mar 2024 00:25:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="akOphw9O" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E24FBBE4F; Thu, 28 Mar 2024 00:25:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711585505; cv=none; b=dElDsgcr1r8evbqgqVU507GXVK0EJ2VyJqp3Rs1PCOpgNaH823gJ3uN7G8ykHLXWhGhx+F6HL3JkhydLh9n36rqXMk3FIDfx3f8ke+Ut6YqGB+wz+9egFebCoEcjT22ju3VuE9pFvukKY+5Nwr/ksfXCCOIBZ+t3rTEbAdEZWik= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711585505; c=relaxed/simple; bh=wpGIuibnrkqUFYM1zvbRlOwDJNADae7qqIAevX/+Jks=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=qLtui+ZtmvOXJ3N8UdRjFWpeHMgib3ReGuu6IoL12/jD9SxEg5BqfMSSwNHzYr4hM3PT1F0CSVnPmg+D3Nc5+g3yPvYG3MvkwdLYVhBwGg599TCoIQWR9AgxxwTt9w0Vj7m8TbS4bOwWARIZ+Xv3Fwd7uruckPSxq+JftlXa4VE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=akOphw9O; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711585504; x=1743121504; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=wpGIuibnrkqUFYM1zvbRlOwDJNADae7qqIAevX/+Jks=; b=akOphw9ONMJB+DARg9bBCz08J7T8QqNGSlGJGMAzoPmTZ5saCxQvuWzN HP0OIz0dhVl/UyhXCtsU1OJxy+6AmZUPX2YQsxGmhaj8/ntywFRtGG9Yr TeWmYvOYEdNQxuvMcfVzB3pSf9l+8RyxXB3H/ipqLqcpwzZJI6feII0/y dXimntZLH/zM9GCunKzRk1V/ylPwxaxHpN2mTn9vRLO6+WvqHAhWprIRL qPTwccAcyKt/05+luevH/8i/jsnnbt4AjBlRSqxYMepE6Wg6ZH7nHB7Dg Z9vDCcRxzq7wSo/WFmO1IQfm7HSb6UjVTi18RMs2gKYOM6tOxm8dvj55O g==; X-CSE-ConnectionGUID: 9iKD8o90S+uRNcyKYUQZ8A== X-CSE-MsgGUID: ceDbED2sTDy4UijyvmRVuA== X-IronPort-AV: E=McAfee;i="6600,9927,11026"; a="10501479" X-IronPort-AV: E=Sophos;i="6.07,160,1708416000"; d="scan'208";a="10501479" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2024 17:25:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,160,1708416000"; d="scan'208";a="21183958" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2024 17:25:03 -0700 Date: Wed, 27 Mar 2024 17:25:02 -0700 From: Isaku Yamahata To: Chao Gao Cc: isaku.yamahata@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar , Kai Huang , chen.bo@intel.com, hang.yuan@intel.com, tina.zhang@intel.com, isaku.yamahata@linux.intel.com Subject: Re: [PATCH v19 062/130] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU Message-ID: <20240328002502.GL2444378@ls.amr.corp.intel.com> References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: On Wed, Mar 27, 2024 at 09:07:21PM +0800, Chao Gao wrote: > > if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num)) > >@@ -4662,6 +4667,7 @@ int kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, > > }; > > > > WARN_ON_ONCE(!vcpu->arch.mmu->root_role.direct); > >+ fault.gfn = gpa_to_gfn(fault.addr) & ~kvm_gfn_shared_mask(vcpu->kvm); > > Could you clarify when shared bits need to be masked out or kept? shared bits > are masked out here but kept in the hunk right above and .. Sure, it deserves comment. I'll add a comment. When we gets pfn, kvm_faultin_pfn() or loop with kvm memslot, drop shared bits because KVM memslot doesn't know about shared bit. When walks in EPT tables, keep the shared bit because we need to find the EPT entry including shared bit. > >+++ b/arch/x86/kvm/mmu/tdp_iter.h > >@@ -91,7 +91,7 @@ struct tdp_iter { > > tdp_ptep_t pt_path[PT64_ROOT_MAX_LEVEL]; > > /* A pointer to the current SPTE */ > > tdp_ptep_t sptep; > >- /* The lowest GFN mapped by the current SPTE */ > >+ /* The lowest GFN (shared bits included) mapped by the current SPTE */ > > gfn_t gfn; > > .. in @gfn of tdp_iter. > > > /* The level of the root page given to the iterator */ > > int root_level; > > > > > >-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) > >+static struct kvm_mmu_page *kvm_tdp_mmu_get_vcpu_root(struct kvm_vcpu *vcpu, > > Maybe fold it into its sole caller. Sure. > > >+ bool private) > > { > > union kvm_mmu_page_role role = vcpu->arch.mmu->root_role; > > struct kvm *kvm = vcpu->kvm; > >@@ -221,6 +225,8 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) > > * Check for an existing root before allocating a new one. Note, the > > * role check prevents consuming an invalid root. > > */ > >+ if (private) > >+ kvm_mmu_page_role_set_private(&role); > > for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) { > > if (root->role.word == role.word && > > kvm_tdp_mmu_get_root(root)) > >@@ -244,12 +250,17 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) > > spin_unlock(&kvm->arch.tdp_mmu_pages_lock); > > > > out: > >- return __pa(root->spt); > >+ return root; > >+} > >+ > >+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu, bool private) > >+{ > >+ return __pa(kvm_tdp_mmu_get_vcpu_root(vcpu, private)->spt); > > } > > > > static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, > >- u64 old_spte, u64 new_spte, int level, > >- bool shared); > >+ u64 old_spte, u64 new_spte, > >+ union kvm_mmu_page_role role, bool shared); > > > > static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp) > > { > >@@ -376,12 +387,78 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared) > > REMOVED_SPTE, level); > > } > > handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn, > >- old_spte, REMOVED_SPTE, level, shared); > >+ old_spte, REMOVED_SPTE, sp->role, > >+ shared); > >+ } > >+ > >+ if (is_private_sp(sp) && > >+ WARN_ON(static_call(kvm_x86_free_private_spt)(kvm, sp->gfn, sp->role.level, > > WARN_ON_ONCE()? > > >+ kvm_mmu_private_spt(sp)))) { > >+ /* > >+ * Failed to unlink Secure EPT page and there is nothing to do > >+ * further. Intentionally leak the page to prevent the kernel > >+ * from accessing the encrypted page. > >+ */ > >+ kvm_mmu_init_private_spt(sp, NULL); > > } > > > > call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); > > } > > > > > rcu_read_lock(); > > > > for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) { > >@@ -960,10 +1158,26 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, > > > > if (unlikely(!fault->slot)) > > new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL); > >- else > >- wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn, > >- fault->pfn, iter->old_spte, fault->prefetch, true, > >- fault->map_writable, &new_spte); > >+ else { > >+ unsigned long pte_access = ACC_ALL; > >+ gfn_t gfn = iter->gfn; > >+ > >+ if (kvm_gfn_shared_mask(vcpu->kvm)) { > >+ if (fault->is_private) > >+ gfn |= kvm_gfn_shared_mask(vcpu->kvm); > > this is an open-coded kvm_gfn_to_shared(). > > I don't get why a spte is installed for a shared gfn when fault->is_private > is true. could you elaborate? This is stale code. And you're right. I'll remove this part. > >+ else > >+ /* > >+ * TDX shared GPAs are no executable, enforce > >+ * this for the SDV. > >+ */ > > what do you mean by the SDV? That's development nonsense. I'll remove the second sentence. > >+ pte_access &= ~ACC_EXEC_MASK; > >+ } > >+ > >+ wrprot = make_spte(vcpu, sp, fault->slot, pte_access, gfn, > >+ fault->pfn, iter->old_spte, > >+ fault->prefetch, true, fault->map_writable, > >+ &new_spte); > >+ } > > > > if (new_spte == iter->old_spte) > > ret = RET_PF_SPURIOUS; > >@@ -1041,6 +1255,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > struct kvm *kvm = vcpu->kvm; > > struct tdp_iter iter; > > struct kvm_mmu_page *sp; > >+ gfn_t raw_gfn; > >+ bool is_private = fault->is_private && kvm_gfn_shared_mask(kvm); > > int ret = RET_PF_RETRY; > > > > kvm_mmu_hugepage_adjust(vcpu, fault); > >@@ -1049,7 +1265,17 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > > > rcu_read_lock(); > > > >- tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) { > >+ raw_gfn = gpa_to_gfn(fault->addr); > >+ > >+ if (is_error_noslot_pfn(fault->pfn) || > >+ !kvm_pfn_to_refcounted_page(fault->pfn)) { > >+ if (is_private) { > >+ rcu_read_unlock(); > >+ return -EFAULT; > > This needs a comment. why this check is necessary? does this imply some > kernel bugs? Will add a comment. It's due to the current TDX KVM implementation that increments the page refcount. -- Isaku Yamahata