Received: by 2002:ab2:1149:0:b0:1f3:1f8c:d0c6 with SMTP id z9csp2002863lqz; Tue, 2 Apr 2024 04:37:28 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCW7yfH1MPxqoD/RFowgRUf4/gODlCbtLdTo+CuKWwD0y2Pl4fvnHOispYNDI1OfOyj3rZqOQIOc3yi/nIhcu368aNoDeRj7c7CWkFlB0Q== X-Google-Smtp-Source: AGHT+IHRZvHw/GvYbYbiNxjAE7uvnX433IQu3KLcQucvmF3Iv2czu+JApmZ7alu+q5Wa1LtnMrDM X-Received: by 2002:a17:907:1b1c:b0:a4e:2c2d:32df with SMTP id mp28-20020a1709071b1c00b00a4e2c2d32dfmr10151899ejc.56.1712057848403; Tue, 02 Apr 2024 04:37:28 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712057848; cv=pass; d=google.com; s=arc-20160816; b=XkwVnvG7qAY8CQMkDJAkK5QdwrCWnEI5gdXe2+2m+ois+W+BNVN8rO8lPScQaDb7RJ /T1LOTa2Sb4nsypobvD83A5l5ti+4bqWHKCCgi0nOZRv40+BcRRAsGjEgs1Cu5Wl3g5J 6AL9RxA83P+SImUQYtsq66PSpeTOb1X04Bi/DGOseynPaUWNNj8hY5cu7hRZTcy2OGsS yFB/gDGWhSwswNeZMCBhfyMx88la9fF3uF7WkUOroiKegsTlEXBwuXxlw4CSTg6CBYoa LmRhaxtm2MrOXZUhOOjHo4WAm5C2YQSWv0Sb0qE7HzTL8f2C/IPgW71RlR5mB2AhQ9Vt A6cg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:list-unsubscribe:list-subscribe:list-id :precedence:date:message-id:dkim-signature; bh=re3UqLI8WRDuPfp9cr2ObD8rMNvlD1O1NkUlox6o2wo=; fh=K0XVi60J6z53e0BFBsK5x3L9Ts4yv3uQtw9XByUrpt8=; b=FnmEGezW3aeeUgbqf1BSEmW0aHuuWU6JL+Y9gQO0xa7fwOXQJ0yqAyDpdgG9pefihJ Mv1gtCu/cfuriFzD2TGNa5XVCHE8Qhu4xMVSq5levxKzBBXJ3XzZtJglaIify0I4kLGH hXatZQXVIVZIDZ+kghPAQEbMSOevKV2wl3fRP1v8X6Nk/E3I6blVCbE8BN1Sq/c17Mn+ URQ1ng8WVpdaoKMxea5S3w3EEB0zvhOL1dC3MOO2vmndJMOHIkbzqyJJp2CMvWRx7zNN nbn/zCXOf16axJI2j9TxgvTnubtuXzo9ARVZuaBU/ym+STC71ckAYZ83UhfgiSKxjIbm fCnA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=AYW9we1Y; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-127435-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-127435-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id jg19-20020a170907971300b00a4895e39c87si5879471ejc.570.2024.04.02.04.37.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Apr 2024 04:37:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-127435-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=AYW9we1Y; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-127435-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-127435-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id E49FA1F2279E for ; Tue, 2 Apr 2024 06:22:21 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 32A67224FA; Tue, 2 Apr 2024 06:21:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AYW9we1Y" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE294224F2; Tue, 2 Apr 2024 06:21:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712038912; cv=none; b=tZT2Ud9nEhAaTbv8ALRnh4Xj4iVZu4f+PsfQpiy0lHmbKX5nM+TBIqwVus4ZDc7yAEOkLzzjcOvZpcewckPYoKXCFImGwbNhSX5EN28IzMWS9oC95erwsZK0V/gwQ3Dqu3BKxrKK9/Z3TbOienrmcy+1M1G6qg7NkOK6DuZy6o8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712038912; c=relaxed/simple; bh=GkqNJasuDn4TR8A0A0ShB3YK57TMnp9gvDs1CBvhYnY=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=RBDl71hr7+hXvbycwhnmPL7eUrbc0847ijxVF+9ADn1X7baCfPyTapcN/br3auakOyfgtI7PjEoubAWOj19KO0AoOQ8/oxpiY+X2+hxY5+36HNJM/+fHlgRxPpJsfZOrwD7Ygw5TjfdtJBBSAuu7edj5u2gsnWxJ1H0vMzgLnDI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AYW9we1Y; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712038910; x=1743574910; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=GkqNJasuDn4TR8A0A0ShB3YK57TMnp9gvDs1CBvhYnY=; b=AYW9we1Y823vkrqdzuewdHAHZGNia1weyme2tOszYjg3RGzuqGAxLRXy ov44cFE9+1YqeRxpnH+T3J0xwUwgaaVcTJfUgFu/kXZO0aRmZy6YTcApa N4nzMcmTzXiGhNTW8GvwrA3yCpUK1kVYtkiSy2nPVwbeSq7agkc6+X+1p 9NPqD4auQnJ4AW0PB0jQcWSfAwECyRWdqwDWkZMGA6TaV+opu5oWI1zFj 9VXFYRoLRWvMBrRS/ELUdH603prCFZ1GeYp5/04BO79eM3KymX8ImYM0c dFL5kVgKb5DW+yBQUuSV3blGMfkIrxUKJyk5FozftlwZOmAmZ2UQVFgNs g==; X-CSE-ConnectionGUID: /XovSydaSG2ZRKWf9R+VNw== X-CSE-MsgGUID: WOR67Jb5QLmY5E4S9MaXNQ== X-IronPort-AV: E=McAfee;i="6600,9927,11031"; a="7029334" X-IronPort-AV: E=Sophos;i="6.07,174,1708416000"; d="scan'208";a="7029334" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Apr 2024 23:21:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,174,1708416000"; d="scan'208";a="22436583" Received: from binbinwu-mobl.ccr.corp.intel.com (HELO [10.238.10.225]) ([10.238.10.225]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Apr 2024 23:21:43 -0700 Message-ID: Date: Tue, 2 Apr 2024 14:21:41 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v19 070/130] KVM: TDX: TDP MMU TDX support To: isaku.yamahata@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar , Kai Huang , chen.bo@intel.com, hang.yuan@intel.com, tina.zhang@intel.com References: <56cdb0da8bbf17dc293a2a6b4ff74f6e3e034bbd.1708933498.git.isaku.yamahata@intel.com> From: Binbin Wu In-Reply-To: <56cdb0da8bbf17dc293a2a6b4ff74f6e3e034bbd.1708933498.git.isaku.yamahata@intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 2/26/2024 4:26 PM, isaku.yamahata@intel.com wrote: > From: Isaku Yamahata > > Implement hooks of TDP MMU for TDX backend. TLB flush, TLB shootdown, > propagating the change private EPT entry to Secure EPT and freeing Secure > EPT page. TLB flush handles both shared EPT and private EPT. It flushes > shared EPT same as VMX. It also waits for the TDX TLB shootdown. For the > hook to free Secure EPT page, unlinks the Secure EPT page from the Secure > EPT so that the page can be freed to OS. > > Propagate the entry change to Secure EPT. The possible entry changes are > present -> non-present(zapping) and non-present -> present(population). On > population just link the Secure EPT page or the private guest page to the > Secure EPT by TDX SEAMCALL. Because TDP MMU allows concurrent > zapping/population, zapping requires synchronous TLB shoot down with the > frozen EPT entry. But for private memory, zapping holds write lock, right? > It zaps the secure entry, increments TLB counter, sends > IPI to remote vcpus to trigger TLB flush, and then unlinks the private > guest page from the Secure EPT. For simplicity, batched zapping with > exclude lock is handled as concurrent zapping. exclude lock -> exclusive lock How to understand this sentence? Since it's holding exclusive lock, how it can be handled as concurrent zapping? Or you want to describe the current implementation prevents concurrent zapping? > Although it's inefficient, > it can be optimized in the future. > > For MMIO SPTE, the spte value changes as follows. > initial value (suppress VE bit is set) > -> Guest issues MMIO and triggers EPT violation > -> KVM updates SPTE value to MMIO value (suppress VE bit is cleared) > -> Guest MMIO resumes. It triggers VE exception in guest TD > -> Guest VE handler issues TDG.VP.VMCALL > -> KVM handles MMIO > -> Guest VE handler resumes its execution after MMIO instruction > > Signed-off-by: Isaku Yamahata > > --- > v19: > - Compile fix when CONFIG_HYPERV != y. > It's due to the following patch. Catch it up. > https://lore.kernel.org/all/20231018192325.1893896-1-seanjc@google.com/ > - Add comments on tlb shootdown to explan the sequence. > - Use gmem_max_level callback, delete tdp_max_page_level. > > v18: > - rename tdx_sept_page_aug() -> tdx_mem_page_aug() > - checkpatch: space => tab > > v15 -> v16: > - Add the handling of TD_ATTR_SEPT_VE_DISABLE case. > > v14 -> v15: > - Implemented tdx_flush_tlb_current() > - Removed unnecessary invept in tdx_flush_tlb(). It was carry over > from the very old code base. > > Signed-off-by: Isaku Yamahata > --- > arch/x86/kvm/mmu/spte.c | 3 +- > arch/x86/kvm/vmx/main.c | 91 ++++++++- > arch/x86/kvm/vmx/tdx.c | 372 +++++++++++++++++++++++++++++++++++++ > arch/x86/kvm/vmx/tdx.h | 2 +- > arch/x86/kvm/vmx/tdx_ops.h | 6 + > arch/x86/kvm/vmx/x86_ops.h | 13 ++ > 6 files changed, 481 insertions(+), 6 deletions(-) > [...] > + > +static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn, > + enum pg_level level, kvm_pfn_t pfn) > +{ > + int tdx_level = pg_level_to_tdx_sept_level(level); > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > + union tdx_sept_level_state level_state; > + hpa_t hpa = pfn_to_hpa(pfn); > + gpa_t gpa = gfn_to_gpa(gfn); > + struct tdx_module_args out; > + union tdx_sept_entry entry; > + u64 err; > + > + err = tdh_mem_page_aug(kvm_tdx->tdr_pa, gpa, hpa, &out); > + if (unlikely(err == TDX_ERROR_SEPT_BUSY)) { > + tdx_unpin(kvm, pfn); > + return -EAGAIN; > + } > + if (unlikely(err == (TDX_EPT_ENTRY_STATE_INCORRECT | TDX_OPERAND_ID_RCX))) { > + entry.raw = out.rcx; > + level_state.raw = out.rdx; > + if (level_state.level == tdx_level && > + level_state.state == TDX_SEPT_PENDING && > + entry.leaf && entry.pfn == pfn && entry.sve) { > + tdx_unpin(kvm, pfn); > + WARN_ON_ONCE(!(to_kvm_tdx(kvm)->attributes & > + TDX_TD_ATTR_SEPT_VE_DISABLE)); to_kvm_tdx(kvm) -> kvm_tdx Since the implementation requires attributes.TDX_TD_ATTR_SEPT_VE_DISABLE is set, should it check the value passed from userspace? And the reason should be described somewhere in changelog or/and comment. > + return -EAGAIN; > + } > + } > + if (KVM_BUG_ON(err, kvm)) { > + pr_tdx_error(TDH_MEM_PAGE_AUG, err, &out); > + tdx_unpin(kvm, pfn); > + return -EIO; > + } > + > + return 0; > +} > + > +static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, > + enum pg_level level, kvm_pfn_t pfn) > +{ > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > + > + /* TODO: handle large pages. */ > + if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) > + return -EINVAL; > + > + /* > + * Because restricted mem The term "restricted mem" is not used anymore, right? Should update the comment. > doesn't support page migration with > + * a_ops->migrate_page (yet), no callback isn't triggered for KVM on no callback isn't -> no callback is > + * page migration. Until restricted mem supports page migration, "restricted mem" -> guest_mem > + * prevent page migration. > + * TODO: Once restricted mem introduces callback on page migration, ditto > + * implement it and remove get_page/put_page(). > + */ > + get_page(pfn_to_page(pfn)); > + > + if (likely(is_td_finalized(kvm_tdx))) > + return tdx_mem_page_aug(kvm, gfn, level, pfn); > + > + /* TODO: tdh_mem_page_add() comes here for the initial memory. */ > + > + return 0; > +} > + > +static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, > + enum pg_level level, kvm_pfn_t pfn) > +{ > + int tdx_level = pg_level_to_tdx_sept_level(level); > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > + struct tdx_module_args out; > + gpa_t gpa = gfn_to_gpa(gfn); > + hpa_t hpa = pfn_to_hpa(pfn); > + hpa_t hpa_with_hkid; > + u64 err; > + > + /* TODO: handle large pages. */ > + if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) > + return -EINVAL; > + > + if (unlikely(!is_hkid_assigned(kvm_tdx))) { > + /* > + * The HKID assigned to this TD was already freed and cache > + * was already flushed. We don't have to flush again. > + */ > + err = tdx_reclaim_page(hpa); > + if (KVM_BUG_ON(err, kvm)) > + return -EIO; > + tdx_unpin(kvm, pfn); > + return 0; > + } > + > + do { > + /* > + * When zapping private page, write lock is held. So no race > + * condition with other vcpu sept operation. Race only with > + * TDH.VP.ENTER. > + */ > + err = tdh_mem_page_remove(kvm_tdx->tdr_pa, gpa, tdx_level, &out); > + } while (unlikely(err == TDX_ERROR_SEPT_BUSY)); > + if (KVM_BUG_ON(err, kvm)) { > + pr_tdx_error(TDH_MEM_PAGE_REMOVE, err, &out); > + return -EIO; > + } > + > + hpa_with_hkid = set_hkid_to_hpa(hpa, (u16)kvm_tdx->hkid); > + do { > + /* > + * TDX_OPERAND_BUSY can happen on locking PAMT entry. Because > + * this page was removed above, other thread shouldn't be > + * repeatedly operating on this page. Just retry loop. > + */ > + err = tdh_phymem_page_wbinvd(hpa_with_hkid); > + } while (unlikely(err == (TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX))); > + if (KVM_BUG_ON(err, kvm)) { > + pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL); > + return -EIO; > + } > + tdx_clear_page(hpa); > + tdx_unpin(kvm, pfn); > + return 0; > +} > + > +static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, > + enum pg_level level, void *private_spt) > +{ > + int tdx_level = pg_level_to_tdx_sept_level(level); > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > + gpa_t gpa = gfn_to_gpa(gfn); > + hpa_t hpa = __pa(private_spt); > + struct tdx_module_args out; > + u64 err; > + > + err = tdh_mem_sept_add(kvm_tdx->tdr_pa, gpa, tdx_level, hpa, &out); kvm_tdx is only used here, can drop the local var. > + if (unlikely(err == TDX_ERROR_SEPT_BUSY)) > + return -EAGAIN; > + if (KVM_BUG_ON(err, kvm)) { > + pr_tdx_error(TDH_MEM_SEPT_ADD, err, &out); > + return -EIO; > + } > + > + return 0; > +} > + >