Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp4633591imw; Tue, 12 Jul 2022 11:21:50 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vwQIRpopTL5EFLf/158T0aPKJ73IIkMIu5Ysv7JrQc9XUzITjoyfQmEupX8NDP0Jt/IPkl X-Received: by 2002:aa7:d053:0:b0:43a:a164:2c3 with SMTP id n19-20020aa7d053000000b0043aa16402c3mr33745528edo.333.1657650110400; Tue, 12 Jul 2022 11:21:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657650110; cv=none; d=google.com; s=arc-20160816; b=M8xrUjPAidDT+mdHd8000H/XUKVFMlwY5yeClcpP7w2OTfE43Huq1CstxmacVD0pOv fPxsmA9sPK4Mey0HeXoXS62BtiZ+zF3bYEsvBzFIIpEocHKoVnL8mbSl2MqY9L+ehpKA dbcv0BXwGBZ/YLqxDOupdz0NVIcN+QyFURxM7Kx6vrsw6AWCYdgbvoI7uNDbtxg7zfTK IoHDvnqX5ZfHN7RxY+zZoQ2mHDY95+8NXSWO2wVYk+pvXb6UzbhrraGLhTzJr5norSEE jk1phMvX/LMjFj1B7j8SOBoNbzTYQ5C/bsLZnUyDMd2q9G1WGduI1bf7an9YeTZc9IH7 E/lA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=NSpZNCpJgwasJF2MkLJa8KdktKcXovp1YQTSuEhFfW0=; b=D4BnxuoBVQ5NGbuA2hJIOm405QG2rybNTqF4+sF9uidGahSlitTq/zrHPZrYb7rP14 y7nyP12On1qJJZPNOljhPmkm90UzSOsPhBvo4AAZi1I+ZhE8zhEmCgOJW3JbPdqJHzbg AgPK2StkXM/2Se0OBfHPfT+g0Dks6AYTkerWPlaljotslRXSqnlT295bRVExZAn8UTQh NzEqKAA/zlsq9540h4YogY6eQQdUkLqL/6AN/mr9CiupecMCJ/aREb1bqwuT8DPs7iUt rdHU9TagxZBlCk2GJHZj7J7wVtSZBumZGWBdFpy3jlkrSSikfL/9FCwHR1cP28p2APnG CYJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=p6rnGmOO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f16-20020a0564021e9000b00435afb8d3f3si16667715edf.290.2022.07.12.11.21.24; Tue, 12 Jul 2022 11:21:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=p6rnGmOO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231205AbiGLRf4 (ORCPT + 99 others); Tue, 12 Jul 2022 13:35:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229491AbiGLRfz (ORCPT ); Tue, 12 Jul 2022 13:35:55 -0400 Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F8D4C3AF7; Tue, 12 Jul 2022 10:35:51 -0700 (PDT) Received: by mail-pj1-x102c.google.com with SMTP id g16-20020a17090a7d1000b001ea9f820449so12360660pjl.5; Tue, 12 Jul 2022 10:35:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=NSpZNCpJgwasJF2MkLJa8KdktKcXovp1YQTSuEhFfW0=; b=p6rnGmOOJLrwohHgeVQcKhCfUQLSPeH/lKDMNP4oyJ6sQzNPAxEBqeWtOsQF5UjkZk +UimQxqiekHGlqBms3RxmYsQM04mHRXyjsBPKqDU2vEX4SCGaXfdPsUPs1Y3FPrnBKXm ftBlubr2MXDIsUOXHvEmsp1kdvz+baQN03se2nEODBQQLSGvIlJEzPjiu95s0JQ8bUFn rtT2UDnpDdco6LdR5UO7KylknjRmESAQtUW56MLDpTN+A4cUqLhFRsss9fR6q/R+jdkP 5RqjgODBUerdoX7mBONT5hNuXRFfsbR9xNzXKuVlHdEL/H0cEyOZshSz5nN6NujTGaAq Iy2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=NSpZNCpJgwasJF2MkLJa8KdktKcXovp1YQTSuEhFfW0=; b=DV0hjs/Oo0JEsRNvbxjvJ3aFsApVZqYckcpVQawsF5HLbelGxy0UNnDS5C4EjgYwB+ YsH0MHQges76MGyNm4gwSeI//QPjB2lXZymIxVKQoE4x4nHXwMchCNBzYLEHX116ZAui zAdWMpqlOuXFgD22RCahUYzuPnC+fv47HCQBTI5xEZWxquMxSASJjdrqimbBi4De2ADK WTFbfzTSjR6fjfqq4U0FGygVbZZ2u/DqJbhrAsp+HjhxqZyxsExXNivycWznmXU7XUvR AzbeajRj9KglnKKzOYorSETjEFPkGpP6Yy8gIyJMAYrCoNwpDZHQe4aFXStplZgouKBL ev6Q== X-Gm-Message-State: AJIora8g1ECKZWwJqru+UcjDf/ItzSbFFaLl59gjEFJxLH4kjHKx/qEj 0bP+zqNogpy2Mqr0C2E1r4F8W4Efu9E= X-Received: by 2002:a17:90b:1284:b0:1ef:877b:2f06 with SMTP id fw4-20020a17090b128400b001ef877b2f06mr5508465pjb.37.1657647350176; Tue, 12 Jul 2022 10:35:50 -0700 (PDT) Received: from localhost (fmdmzpr02-ext.fm.intel.com. [192.55.54.37]) by smtp.gmail.com with ESMTPSA id e6-20020a17090ae4c600b001ef82e5f5aesm7057434pju.47.2022.07.12.10.35.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 10:35:49 -0700 (PDT) Date: Tue, 12 Jul 2022 10:35:48 -0700 From: Isaku Yamahata To: Chao Peng Cc: Isaku Yamahata , isaku.yamahata@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Paolo Bonzini Subject: Re: [PATCH v7 000/102] KVM TDX basic feature support Message-ID: <20220712173548.GK1379820@ls.amr.corp.intel.com> References: <20220711151701.GA1375147@ls.amr.corp.intel.com> <20220712104925.GA2805143@chaop.bj.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20220712104925.GA2805143@chaop.bj.intel.com> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 12, 2022 at 06:49:25PM +0800, Chao Peng wrote: > On Mon, Jul 11, 2022 at 08:17:01AM -0700, Isaku Yamahata wrote: > > Hi. Because my description on large page support was terse, I wrote up more > > detailed one. Any feedback/thoughts on large page support? > > > > TDP MMU large page support design > > > > Two main discussion points > > * how to track page status. private vs shared, no-largepage vs can-be-largepage > > * how to trigger merging mapping from 4KB/2MB to 2MB/1GB > > > > Expected private-vs-shared page usage > > ------------------------------------- > > On TD boot all pages are private and TD converts pages into shared if necessary. > > * Most of the guest pages remain private. > > * Only limited pages are converted at kernel boot > > ** bounce buffer for IO (virt-io). It's allocated as swiotlb. Its size is > > 64MB or 6% of total guest memory. > > ** KVM PV shared page. (the current guest TD doesn't use KVM PV shared page.) > > * Only a small number of pages are dynamically converted from private to shared > > and vice versa. This usage is very limited. e.g. GetQuote, the lack of > > swiotlb buffer > > > > > > Theory of Secure-EPT operations related to large page > > ----------------------------------------------------- > > TDX Secure-EPT has differences from VMX EPT. > > To add a page to Secure-EPT > > > > * Here is the operation to resolve the EPT violation. > > 1. TD: Accepts GPA. TD needs to accept GPA before accessing GPA because TD > > needs to detect that VMM unmaps GPA and maps GPA again. > > 2. EPT violation is triggered. TD exit to VMM. > > 3. VMM: allocate a page for GPA and TDH.MEM.PAGE.AUG it to GPA. Resume TD vcpu. > > (3a. TD: #VE is injected. #VE handler accepts the page) > > 4. TD: resume #VE and continue TD vcpu execution > > > > TD may choose step 1. In that case, After step 3. #VE is injected into TD and, > > TD #VE handler needs to accept the page. > > > > When adding a page to Secure-EPT again, the page contexts are cleared and the > > page is encrypted. If a page is disassociated from Secure-EPT and added again, > > the page content is lost. > > > > * TDG.VP.VMCALL hypercall > > The page associated with GPA can be private or shared. TD converts the GPA by > > TDG.VP.VMCALL hypercall from private to shared or vice versa. VMM > > tracks whether the given GPA is private or shared. > > > > * mapping merge(promote)/split(demote) > > The page can be mapped as large page (2MB or 1GB) in addition to 4KB. The > > mapping can be merged(4KB/2MB -> 2MB/1GB) or split(2MB/1GB -> 4KB/2MB) by TDX > > SEAMCALL TDH.MEM.PAGE.PROMOTE and TDH.MEM.PAGE.DEMOTE. > > The merge of mapping requires all the pages needs to be mapped, unlike VMX EPT > > because of encryption. This implies the current KVM implementation doesn't work > > for TDX when merging mapping as follows > > > > - EPT violation and host page is 2MB mappable. > > some of the 4KB pages of the given 2MB page are already mapped, some not. > > i.e. 2MB EPT -> 4KB EPT -> 4K pages > > - KVM page fault handler zap 2MB EPT entry and populate 2MB EPT entry > > zap: 2MB EPT: non present > > populate 2MB: -> 2MB page > > > > If VMM zaps 2MB Secure-EPT entry, the page contents will be lost for TDX. > > Mapping merge requires all pages are already mapped. > > > > Instead, the following steps are needed. > > - EPT violation and host page is 2MB mappable. > > some of the 4KB pages of the given 2MB page are already mapped. Some not. > > i.e. 2MB EPT -> 4KB EPT -> 4K pages > > - VMM checks all 4KB GPAs are private. If not, it can't be mapped as a large page. > > (****) > > - VMM checks all 4KB GPAs are already mapped. If not, give up mapping merge. > > (or map missing 4KB pages.) > > - mapping merge by TDH.MEM.PAGE.PROMOTE > > > > The mapping split for TDX Secure-EPT works similarly to the VMX EPT case. > > > > > > EPT violation and MapGPA > > ------------------------ > > - EPT violation is a fast path > > - MapGPA is not a fast path. > > => Keep the EPT violation path optimized and complicates the MapGPA path. For > > (****) check, we don't want to scan the 4KB mapping on EPT violation. Instead, > > the MapGPA path scans it and records the result as the page can be mapped as 2MB > > due to private/shared. > > This sounds reasonable, Instead of tracking that in MapGPA, maybe > KVM_MEMORY_ENCRYPT_{UN,}REG_REGION introduced in UPM v7 is a better > place to put the scan code in. > > https://lkml.org/lkml/2022/7/6/259 > > Both the MapGPA (explicit conversion) and the EPT violation (implicit > conversion) can cause invocation to these two ioctls and need update to > this info. > > > > > > > Tracking private/shared and large page mappable > > ----------------------------------------------- > > VMM needs to track that page is mapped as private or shared at 4KB granularity. > > For efficiency of EPT violation path (****), at 2MB and 1GB level, VMM should > > track the page can be mapped as a large page (regarding private/shared). VMM > > updates it on MapGPA and references it on the EPT violation path. (****) > > > > For 4KB pages, 1 bit is needed. private or shared. Let's call it shared-mask bit. > > For 2MB/1GB pages, 2 bit is needed. large page mappable or not. private or > > shared if mappable. Let's call it no-largepage bit. > > I'm just thinking maybe we don't need introduce new bits, instead we > reuse lpage_info where we already use it to track whether a page can be > mapped at specified page level in kvm_mmu_max_mapping_level(). Then in > the above two ioctls we do a scan for each level and update lpage_info. > For example, we should disallow_lpage if private/shared pages are mixed > in that page level. > > It's however a bit tricky to manage lpage_info.disallow_lpage in these > two ioctls with current code. We can't simply do disallow_lpage++ and > disallow_lpage--. One possible solution can treat disallow_lpage as a > mask instead of a count. Then we define bits like below for use: > - USER_GFN_UNALIGNED set when memslot user_address/private_offset/gfn > is not aligned on the page level > - PAGE_TRACKING set during page tracking > - PRIVITE_SHARED_MIXED set when private/shared pages are mixed > > In page fault handler the page can be mapped at that level only when all > bits are zero and in above two ioctls we just switch on/off bit > PRIVITE_SHARED_MIXED. So steal 1 or 2 bits from kvm_lpage_info.disallow_lpage instead of adding one more array in struct kvm_arch_memory_slot. Nice idea. Let's call it option A.1). We increment/decrement disallow_lpage with option A.). With option A.1), it automatically handled. pros: +SPTE_SHARED_MASK is not needed cons: -one more look-up on EPT violation > Currently UMP don't have this code yet, but can be added if feasible. Anyway let me integrate UPM v7. Thanks, > Chao > > > > Option A.) > > Allocate array for pages in struct kvm_arch_memory_slot on TD creation. > > struct kvm_arch_memory_slot { > > +struct kvm_page_attr *page_attr[KVM_NR_PAGE_SIZES]; > > } > > > > pros: > > +straight forward implementation > > +SPTE_SHARED_MASK is not needed > > cons: > > -memory overhead is high > > -not optimized for expected usage > > -one more look-up on EPT violation > > > > Option B.) Steal two software usable bits from SPTE and record them in SPTE. > > SPTE_SHARED_MASK, SPTE_NOLARGE_PAGE_MASK > > pros: > > +optimized for EPT violation > > cons: > > -2bits used in SPTE entry > > -complicates the MapGPA path. > > > > Option C.) Steal one software usable bit from SPTE and record it in SPTE. > > SPTE_SHARED_MASK > > For 2MB/1GB, allocate bitmap in kvm_mmu_page. > > struct kvm_mmu_page { > > bitmap nolarge > > } > > pros: > > +optimized for EPT violation > > cons: > > -complicates the MapGPA path. > > -information is scattered in SPTE and struct kvm_mmu_page > > > > > > How to update those bits > > ------------------------ > > - MapGPA > > - at 4KB level, set or clear shared-mask bit. > > - Scan 512 4KB bit, at 2MB level > > - set or clear shared-mask bit, clear no-largepage bit or > > - clear shared-mask bit, set no-largepage bit > > - increment/decrement lpageinfo to prevent/allow large page > > - similar for 1GB level > > Note: This logic might a bit tricky. > > > > - EPT violation > > - If 2MB large page is allowed, check if no-largepage bit > > - If no-largepage bit is set, => go down to 4KB page > > - If no-largepage bit is cleared => try to map 2MB page > > - If 4KB level is not mapped, map 2MB page > > - If some 4KB level is already mapped, go down to 4KB. > > Don't try to merge mapping. Or it's possible to try to merge mapping. > > Note: 512 4KB entry scanning is not done at EPT violation because it's fast > > path. > > > > > > Map merging > > ----------- > > Map merging is necessary for TD migration. (Map split is the easy part.) The > > current KVM implementation zaps the range (mmu notification or lpage recovery > > worker) and expects large page mapping on the next EPT violation. > > > > Option A.) Keep the code similar to map merging logic. > > Zap 2MB EPT entry in some sense and trigger map merging logic on the next EPT > > violation. To keep encrypted page contents, zapped EPT entries needs to keep > > the page. Steal one more bits from SPTE. SPTE_PRIVATE_BLOCKED_MASK. > > It means that the page is zapped from SPTE. but it still alive and references > > page. > > > > Option B.) In the callback, directly merge mapping somehow. In this case, mmu > > notifier usage doesn't make sense. > > > > NOTE: > > - Implement map merging in MapGPA. This doesn't work for dirty page logging. > > - We can utilize kvm_nx_lpage_recovery_worker > > - We can utilize THP. Probably doesn't work well for fd-based private memory. > > > > Thanks, > > Isaku Yamayhata > > > > On Mon, Jun 27, 2022 at 02:52:52PM -0700, > > isaku.yamahata@intel.com wrote: > > > > > From: Isaku Yamahata > > > > > > KVM TDX basic feature support > > > > > > Hello. This is v7 the patch series vof KVM TDX support. > > > This is based on v5.19-rc1 + kvm/queue branch + TDX HOST patch series. > > > The tree can be found at https://github.com/intel/tdx/tree/kvm-upstream > > > How to run/test: It's describe at https://github.com/intel/tdx/wiki/TDX-KVM > > > > > > Major changes from v6: > > > - rebased to v5.19 base > > > > > > TODO: > > > - integrate fd-based guest memory. As the discussion is still on-going, I > > > intentionally dropped fd-based guest memory support yet. The integration can > > > be found at https://github.com/intel/tdx/tree/kvm-upstream-workaround. > > > - 2M large page support. It's work-in-progress. > > > For large page support, there are several design choices. Here is the design options. > > > Any thoughts/feedback? > > > > > > KVM MMU Large page support for TDX > > > > > > * What needs to be done > > > - Track private or shared of each page size (4KB, 2MB, 1GB) based on > > > TDG.VP.VMCALL. For large pages(2MB, 1GB), it can be mixed (some > > > lower-size pages are private and some shared.) In this case, the page can't > > > be large. > > > - if necessary, split large page on TDG.VP.VMCALL > > > (split on dirty page tracking is future work) > > > - resolving KVM page fault > > > When resolving a private page and the page is large in the host, GPA can be > > > resolved as a large page in Secure-EPT. Even if the page is large on the host > > > side, sometimes a 4KB page can be resolved because it's up to guest TD to > > > accept at 4KB, 2MB, or 1GB. > > > - collapsing pages into a large page. > > > At this point, it's okay to not implement this. When dirty page tracking is > > > supported, this needs to be supported. > > > - On MapGPA, the page can be collapsed into a large page > > > - handle zapping SPTE and try to collapse the pages on the next KVM page fault > > > Unlike the EPT case, some trick is needed. > > > - For performance, optimize KVM page fault path at the cost of complicating > > > MapGPA path. > > > > > > * options to track private or shared > > > At each page size (4KB, 2MB, and 1GB), track private, shared, or mixed (2MB and > > > 1GB case). For 4KB each page, 1 bit per page is needed. private or shared. For > > > large pages (2MB and 1GB), 2 bits per large page is needed. (private, shared, or > > > mixed). When resolving KVM page fault, we don't want to check the lower-size > > > pages to check if the given GPA can be a large for performance. On MapGPA check > > > it instead. > > > > > > Option A). enhance kvm_arch_memory_slot > > > enum kvm_page_type { > > > KVM_PAGE_TYPE_INVALID, > > > KVM_PAGE_TYPE_SHARED, > > > KVM_PAGE_TYPE_PRIVATE, > > > KVM_PAGE_TYPE_MIXED, > > > }; > > > > > > struct kvm_page_attr { > > > enum kvm_page_type type; > > > }; > > > > > > struct kvm_arch_memory_slot { > > > + struct kvm_page_attr *page_attr[KVM_NR_PAGE_SIZES]; > > > > > > Option B). steal one more bit SPTE_MIXED_MASK in addition to SPTE_SHARED_MASK > > > If !SPTE_MIXED_MASK, it can be large page. > > > > > > Option C). use SPTE_SHARED_MASK and kvm_mmu_page::mixed bitmap > > > kvm_mmu_page::mixed bitmap of 1GB, root indicates mixed for 2MB, 1GB. > > > > > > > > > * comparison > > > A). > > > + straightforward to implement > > > + SPTE_SHARED_MASK isn't needed > > > - memory overhead compared to B). or C). > > > - more memory reference on KVM page fault > > > > > > B). > > > + simpler than C) (complex than A)?) > > > + efficient on KVM page fault. (only SPTE reference) > > > + low memory overhead > > > - Waste precious SPTE bits. > > > > > > C). > > > + efficient on KVM page fault. (only SPTE reference) > > > + low memory overhead > > > - complicates MapGPA > > > - scattered data structure > > > > > > Thanks, > > > Isaku Yamahata > > > > > > Changes from v6: > > > - rebased to v5.19 > > > > > > Changes from v5: > > > - export __seamcall and use it > > > - move mutex lock from callee function of smp_call_on_cpu to the caller. > > > - rename mmu_prezap => flush_shadow_all_private() and tdx_mmu_release_hkid > > > - updated comment > > > - drop the use of tdh_mng_key.reclaimid(): as the function is for backward > > > compatibility to only return success > > > - struct kvm_tdx_cmd: metadata => flags, added __u64 error. > > > - make this ioctl systemwide ioctl > > > - ABI change to struct kvm_init_vm > > > - guest_tsc_khz: use kvm->arch.default_tsc_khz > > > - rename BUILD_BUG_ON_MEMCPY to MEMCPY_SAME_SIZE > > > - drop exporting kvm_set_tsc_khz(). > > > - fix kvm_tdp_page_fault() for mtrr emulation > > > - rename it to kvm_gfn_shared_mask(), dropped kvm_gpa_shared_mask() > > > - drop kvm_is_private_gfn(), kept kvm_is_private_gpa() > > > keep kvm_{gfn, gpa}_private(), kvm_gpa_private() > > > - update commit message > > > - rename shadow_init_value => shadow_nonprsent_value > > > - added ept_violation_ve_test mode > > > - shadow_nonpresent_value => SHADOW_NONPRESENT_VALUE in tdp_mmu.c > > > - legacy MMU case > > > => - mmu_topup_shadow_page_cache(), kvm_mmu_create() > > > - FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > > > - #VE warning: > > > - rename: REMOVED_SPTE => __REMOVED_SPTE, SHADOW_REMOVED_SPTE => REMOVED_SPTE > > > - merge into Like we discussed, this patch should be merged with patch > > > "KVM: x86/mmu: Allow non-zero init value for shadow PTE". > > > - fix pointed by Sagi. check !is_private check => (kvm_gfn_shared_mask && !is_private) > > > - introduce kvm_gfn_for_root(kvm, root, gfn) > > > - add only_shared argument to kvm_tdp_mmu_handle_gfn() > > > - use kvm_arch_dirty_log_supported() > > > - rename SPTE_PRIVATE_PROHIBIT to SPTE_SHARED_MASK. > > > - rename: is_private_prohibit_spte() => spte_shared_mask() > > > - fix: shadow_nonpresent_value => SHADOW_NONPRESENT_VALUE in comment > > > - dropped this patch as the change was merged into kvm/queue > > > - update vt_apicv_post_state_restore() > > > - use is_64_bit_hypercall() > > > - comment: expand MSMI -> Machine Check System Management Interrupt > > > - fixed TDX_SEPT_PFERR > > > - tdvmcall_p[1234]_{write, read}() => tdvmcall_a[0123]_{read,write}() > > > - rename tdmvcall_exit_readon() => tdvmcall_leaf() > > > - remove optional zero check of argument. > > > - do a check for static_call(kvm_x86_has_emulated_msr)(kvm, MSR_IA32_SMBASE) > > > in kvm_vcpu_ioctl_smi and __apic_accept_irq. > > > - WARN_ON_ONCE in tdx_smi_allowed and tdx_enable_smi_window. > > > - introduce vcpu_deliver_init to x86_ops > > > - sprinkeled KVM_BUG_ON() > > > > > > Changes from v4: > > > - rebased to TDX host kernel patch series. > > > - include all the patches to make this patch series working. > > > - add [MARKER] patches to mark the patch layer clear. > > > > > > --- > > > * What's TDX? > > > TDX stands for Trust Domain Extensions, which extends Intel Virtual Machines > > > Extensions (VMX) to introduce a kind of virtual machine guest called a Trust > > > Domain (TD) for confidential computing. > > > > > > A TD runs in a CPU mode that is designed to protect the confidentiality of its > > > memory contents and its CPU state from any other software, including the hosting > > > Virtual Machine Monitor (VMM), unless explicitly shared by the TD itself. > > > > > > We have more detailed explanations below (***). > > > We have the high-level design of TDX KVM below (****). > > > > > > In this patch series, we use "TD" or "guest TD" to differentiate it from the > > > current "VM" (Virtual Machine), which is supported by KVM today. > > > > > > > > > * The organization of this patch series > > > This patch series is on top of the patches series "TDX host kernel support": > > > https://lore.kernel.org/lkml/cover.1646007267.git.kai.huang@intel.com/ > > > > > > this patch series is available at > > > https://github.com/intel/tdx/releases/tag/kvm-upstream > > > The corresponding patches to qemu are available at > > > https://github.com/intel/qemu-tdx/commits/tdx-upstream > > > > > > The relations of the layers are depicted as follows. > > > The arrows below show the order of patch reviews we would like to have. > > > > > > The below layers are chosen so that the device model, for example, qemu can > > > exercise each layering step by step. Check if TDX is supported, create TD VM, > > > create TD vcpu, allow vcpu running, populate TD guest private memory, and handle > > > vcpu exits/hypercalls/interrupts to run TD fully. > > > > > > TDX vcpu > > > interrupt/exits/hypercall<------------\ > > > ^ | > > > | | > > > TD finalization | > > > ^ | > > > | | > > > TDX EPT violation<------------\ | > > > ^ | | > > > | | | > > > TD vcpu enter/exit | | > > > ^ | | > > > | | | > > > TD vcpu creation/destruction | \-------KVM TDP MMU MapGPA > > > ^ | ^ > > > | | | > > > TD VM creation/destruction \---------------KVM TDP MMU hooks > > > ^ ^ > > > | | > > > TDX architectural definitions KVM TDP refactoring for TDX > > > ^ ^ > > > | | > > > TDX, VMX <--------TDX host kernel KVM MMU GPA stolen bits > > > coexistence support > > > > > > > > > The followings are explanations of each layer. Each layer has a dummy commit > > > that starts with [MARKER] in subject. It is intended to help to identify where > > > each layer starts. > > > > > > TDX host kernel support: > > > https://lore.kernel.org/lkml/cover.1646007267.git.kai.huang@intel.com/ > > > The guts of system-wide initialization of TDX module. There is an > > > independent patch series for host x86. TDX KVM patches call functions > > > this patch series provides to initialize the TDX module. > > > > > > TDX, VMX coexistence: > > > Infrastructure to allow TDX to coexist with VMX and trigger the > > > initialization of the TDX module. > > > This layer starts with > > > "KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX" > > > TDX architectural definitions: > > > Add TDX architectural definitions and helper functions > > > This layer starts with > > > "[MARKER] The start of TDX KVM patch series: TDX architectural definitions". > > > TD VM creation/destruction: > > > Guest TD creation/destroy allocation and releasing of TDX specific vm > > > and vcpu structure. Create an initial guest memory image with TDX > > > measurement. > > > This layer starts with > > > "[MARKER] The start of TDX KVM patch series: TD VM creation/destruction". > > > TD vcpu creation/destruction: > > > guest TD creation/destroy Allocation and releasing of TDX specific vm > > > and vcpu structure. Create an initial guest memory image with TDX > > > measurement. > > > This layer starts with > > > "[MARKER] The start of TDX KVM patch series: TD vcpu creation/destruction" > > > TDX EPT violation: > > > Create an initial guest memory image with TDX measurement. Handle > > > secure EPT violations to populate guest pages with TDX SEAMCALLs. > > > This layer starts with > > > "[MARKER] The start of TDX KVM patch series: TDX EPT violation" > > > TD vcpu enter/exit: > > > Allow TDX vcpu to enter into TD and exit from TD. Save CPU state before > > > entering into TD. Restore CPU state after exiting from TD. > > > This layer starts with > > > "[MARKER] The start of TDX KVM patch series: TD vcpu enter/exit" > > > TD vcpu interrupts/exit/hypercall: > > > Handle various exits/hypercalls and allow interrupts to be injected so > > > that TD vcpu can continue running. > > > This layer starts with > > > "[MARKER] The start of TDX KVM patch series: TD vcpu exits/interrupts/hypercalls" > > > > > > KVM MMU GPA shared bit: > > > Introduce framework to handle shared bit repurposed bit of GPA TDX > > > repurposed a bit of GPA to indicate shared or private. If it's shared, > > > it's the same as the conventional VMX EPT case. VMM can access shared > > > guest pages. If it's private, it's handled by Secure-EPT and the guest > > > page is encrypted. > > > This layer starts with > > > "[MARKER] The start of TDX KVM patch series: KVM MMU GPA stolen bits" > > > KVM TDP refactoring for TDX: > > > TDX Secure EPT requires different constants. e.g. initial value EPT > > > entry value etc. Various refactoring for those differences. > > > This layer starts with > > > "[MARKER] The start of TDX KVM patch series: KVM TDP refactoring for TDX" > > > KVM TDP MMU hooks: > > > Introduce framework to TDP MMU to add hooks in addition to direct EPT > > > access TDX added Secure EPT which is an enhancement to VMX EPT. Unlike > > > conventional VMX EPT, CPU can't directly read/write Secure EPT. Instead, > > > use TDX SEAMCALLs to operate on Secure EPT. > > > This layer starts with > > > "[MARKER] The start of TDX KVM patch series: KVM TDP MMU hooks" > > > KVM TDP MMU MapGPA: > > > Introduce framework to handle switching guest pages from private/shared > > > to shared/private. For a given GPA, a guest page can be assigned to a > > > private GPA or a shared GPA exclusively. With TDX MapGPA hypercall, > > > guest TD converts GPA assignments from private (or shared) to shared (or > > > private). > > > This layer starts with > > > "[MARKER] The start of TDX KVM patch series: KVM TDP MMU MapGPA " > > > > > > KVM guest private memory: (not shown in the above diagram) > > > [PATCH v4 00/12] KVM: mm: fd-based approach for supporting KVM guest private > > > memory: https://lkml.org/lkml/2022/1/18/395 > > > Guest private memory requires different memory management in KVM. The > > > patch proposes a way for it. Integration with TDX KVM. > > > > > > (***) > > > * TDX module > > > A CPU-attested software module called the "TDX module" is designed to implement > > > the TDX architecture, and it is loaded by the UEFI firmware today. It can be > > > loaded by the kernel or driver at runtime, but in this patch series we assume > > > that the TDX module is already loaded and initialized. > > > > > > The TDX module provides two main new logical modes of operation built upon the > > > new SEAM (Secure Arbitration Mode) root and non-root CPU modes added to the VMX > > > architecture. TDX root mode is mostly identical to the VMX root operation mode, > > > and the TDX functions (described later) are triggered by the new SEAMCALL > > > instruction with the desired interface function selected by an input operand > > > (leaf number, in RAX). TDX non-root mode is used for TD guest operation. TDX > > > non-root operation (i.e. "guest TD" mode) is similar to the VMX non-root > > > operation (i.e. guest VM), with changes and restrictions to better assure that > > > no other software or hardware has direct visibility of the TD memory and state. > > > > > > TDX transitions between TDX root operation and TDX non-root operation include TD > > > Entries, from TDX root to TDX non-root mode, and TD Exits from TDX non-root to > > > TDX root mode. A TD Exit might be asynchronous, triggered by some external > > > event (e.g., external interrupt or SMI) or an exception, or it might be > > > synchronous, triggered by a TDCALL (TDG.VP.VMCALL) function. > > > > > > TD VCPUs can be entered using SEAMCALL(TDH.VP.ENTER) by KVM. TDH.VP.ENTER is one > > > of the TDX interface functions as mentioned above, and "TDH" stands for Trust > > > Domain Host. Those host-side TDX interface functions are categorized into > > > various areas just for better organization, such as SYS (TDX module management), > > > MNG (TD management), VP (VCPU), PHYSMEM (physical memory), MEM (private memory), > > > etc. For example, SEAMCALL(TDH.SYS.INFO) returns the TDX module information. > > > > > > TDCS (Trust Domain Control Structure) is the main control structure of a guest > > > TD, and encrypted (using the guest TD's ephemeral private key). At a high > > > level, TDCS holds information for controlling TD operation as a whole, > > > execution, EPTP, MSR bitmaps, etc that KVM needs to set it up. Note that MSR > > > bitmaps are held as part of TDCS (unlike VMX) because they are meant to have the > > > same value for all VCPUs of the same TD. > > > > > > Trust Domain Virtual Processor State (TDVPS) is the root control structure of a > > > TD VCPU. It helps the TDX module control the operation of the VCPU, and holds > > > the VCPU state while the VCPU is not running. TDVPS is opaque to software and > > > DMA access, accessible only by using the TDX module interface functions (such as > > > TDH.VP.RD, TDH.VP.WR). TDVPS includes TD VMCS, and TD VMCS auxiliary structures, > > > such as virtual APIC page, virtualization exception information, etc. > > > > > > Several VMX control structures (such as Shared EPT and Posted interrupt > > > descriptor) are directly managed and accessed by the host VMM. These control > > > structures are pointed to by fields in the TD VMCS. > > > > > > The above means that 1) KVM needs to allocate different data structures for TDs, > > > 2) KVM can reuse the existing code for TDs for some operations, 3) it needs to > > > define TD-specific handling for others. 3) Redirect operations to . 3) > > > Redirect operations to the TDX specific callbacks, like "if (is_td_vcpu(vcpu)) > > > tdx_callback() else vmx_callback();". > > > > > > *TD Private Memory > > > TD private memory is designed to hold TD private content, encrypted by the CPU > > > using the TD ephemeral key. An encryption engine holds a table of encryption > > > keys, and an encryption key is selected for each memory transaction based on a > > > Host Key Identifier (HKID). By design, the host VMM does not have access to the > > > encryption keys. > > > > > > In the first generation of MKTME, HKID is "stolen" from the physical address by > > > allocating a configurable number of bits from the top of the physical > > > address. The HKID space is partitioned into shared HKIDs for legacy MKTME > > > accesses and private HKIDs for SEAM-mode-only accesses. We use 0 for the shared > > > HKID on the host so that MKTME can be opaque or bypassed on the host. > > > > > > During TDX non-root operation (i.e. guest TD), memory accesses can be qualified > > > as either shared or private, based on the value of a new SHARED bit in the Guest > > > Physical Address (GPA). The CPU translates shared GPAs using the usual VMX EPT > > > (Extended Page Table) or "Shared EPT" (in this document), which resides in host > > > VMM memory. The Shared EPT is directly managed by the host VMM - the same as > > > with the current VMX. Since guest TDs usually require I/O, and the data exchange > > > needs to be done via shared memory, thus KVM needs to use the current EPT > > > functionality even for TDs. > > > > > > * Secure EPT and Minoring using the TDP code > > > The CPU translates private GPAs using a separate Secure EPT. The Secure EPT > > > pages are encrypted and integrity-protected with the TD's ephemeral private > > > key. Secure EPT can be managed _indirectly_ by the host VMM, using the TDX > > > interface functions, and thus conceptually Secure EPT is a subset of EPT (why > > > "subset"). Since execution of such interface functions takes much longer time > > > than accessing memory directly, in KVM we use the existing TDP code to minor the > > > Secure EPT for the TD. > > > > > > This way, we can effectively walk Secure EPT without using the TDX interface > > > functions. > > > > > > * VM life cycle and TDX specific operations > > > The userspace VMM, such as QEMU, needs to build and treat TDs differently. For > > > example, a TD needs to boot in private memory, and the host software cannot copy > > > the initial image to private memory. > > > > > > * TSC Virtualization > > > The TDX module helps TDs maintain reliable TSC (Time Stamp Counter) values > > > (e.g. consistent among the TD VCPUs) and the virtual TSC frequency is determined > > > by TD configuration, i.e. when the TD is created, not per VCPU. The current KVM > > > owns TSC virtualization for VMs, but the TDX module does for TDs. > > > > > > * MCE support for TDs > > > The TDX module doesn't allow VMM to inject MCE. Instead PV way is needed for TD > > > to communicate with VMM. For now, KVM silently ignores MCE request by VMM. MSRs > > > related to MCE (e.g, MCE bank registers) can be naturally emulated by > > > paravirtualizing MSR access. > > > > > > [1] For details, the specifications, [2], [3], [4], [5], [6], [7], are > > > available. > > > > > > * Restrictions or future work > > > Some features are not included to reduce patch size. Those features are > > > addressed as future independent patch series. > > > - large page (2M, 1G) > > > - qemu gdb stub > > > - guest PMU > > > - and more > > > > > > * Prerequisites > > > It's required to load the TDX module and initialize it. It's out of the scope > > > of this patch series. Another independent patch for the common x86 code is > > > planned. It defines CONFIG_INTEL_TDX_HOST and this patch series uses > > > CONFIG_INTEL_TDX_HOST. It's assumed that With CONFIG_INTEL_TDX_HOST=y, the TDX > > > module is initialized and ready for KVM to use the TDX module APIs for TDX guest > > > life cycle like tdh.mng.init are ready to use. > > > > > > Concretely Global initialization, LP (Logical Processor) initialization, global > > > configuration, the key configuration, and TDMR and PAMT initialization are done. > > > The state of the TDX module is SYS_READY. Please refer to the TDX module > > > specification, the chapter Intel TDX Module Lifecycle State Machine > > > > > > ** Detecting the TDX module readiness. > > > TDX host patch series implements the detection of the TDX module availability > > > and its initialization so that KVM can use it. Also it manages Host KeyID > > > (HKID) assigned to guest TD. > > > The assumed APIs the TDX host patch series provides are > > > - int seamrr_enabled() > > > Check if required cpu feature (SEAM mode) is available. This only check CPU > > > feature availability. At this point, the TDX module may not be ready for KVM > > > to use. > > > - int init_tdx(void); > > > Initialization of TDX module so that the TDX module is ready for KVM to use. > > > - const struct tdsysinfo_struct *tdx_get_sysinfo(void); > > > Return the system wide information about the TDX module. NULL if the TDX > > > isn't initialized. > > > - u32 tdx_get_global_keyid(void); > > > Return global key id that is used for the TDX module itself. > > > - int tdx_keyid_alloc(void); > > > Allocate HKID for guest TD. > > > - void tdx_keyid_free(int keyid); > > > Free HKID for guest TD. > > > > > > (****) > > > * TDX KVM high-level design > > > - Host key ID management > > > Host Key ID (HKID) needs to be assigned to each TDX guest for memory encryption. > > > It is assumed The TDX host patch series implements necessary functions, > > > u32 tdx_get_global_keyid(void), int tdx_keyid_alloc(void) and, > > > void tdx_keyid_free(int keyid). > > > > > > - Data structures and VM type > > > Because TDX is different from VMX, define its own VM/VCPU structures, struct > > > kvm_tdx and struct vcpu_tdx instead of struct kvm_vmx and struct vcpu_vmx. To > > > identify the VM, introduce VM-type to specify which VM type, VMX (default) or > > > TDX, is used. > > > > > > - VM life cycle and TDX specific operations > > > Re-purpose the existing KVM_MEMORY_ENCRYPT_OP to add TDX specific operations. > > > New commands are used to get the TDX system parameters, set TDX specific VM/VCPU > > > parameters, set initial guest memory and measurement. > > > > > > The creation of TDX VM requires five additional operations in addition to the > > > conventional VM creation. > > > - Get KVM system capability to check if TDX VM type is supported > > > - VM creation (KVM_CREATE_VM) > > > - New: Get the TDX specific system parameters. KVM_TDX_GET_CAPABILITY. > > > - New: Set TDX specific VM parameters. KVM_TDX_INIT_VM. > > > - VCPU creation (KVM_CREATE_VCPU) > > > - New: Set TDX specific VCPU parameters. KVM_TDX_INIT_VCPU. > > > - New: Initialize guest memory as boot state and extend the measurement with > > > the memory. KVM_TDX_INIT_MEM_REGION. > > > - New: Finalize VM. KVM_TDX_FINALIZE. Complete measurement of the initial > > > TDX VM contents. > > > - VCPU RUN (KVM_VCPU_RUN) > > > > > > - Protected guest state > > > Because the guest state (CPU state and guest memory) is protected, the KVM VMM > > > can't operate on them. For example, accessing CPU registers, injecting > > > exceptions, and accessing guest memory. Those operations are handled as > > > silently ignored, returning zero or initial reset value when it's requested via > > > KVM API ioctls. > > > > > > VM/VCPU state and callbacks for TDX specific operations. > > > Define tdx specific VM state and VCPU state instead of VMX ones. Redirect > > > operations to TDX specific callbacks. "if (tdx) tdx_op() else vmx_op()". > > > > > > Operations on the CPU state > > > silently ignore operations on the guest state. For example, the write to > > > CPU registers is ignored and the read from CPU registers returns 0. > > > > > > . ignore access to CPU registers except for allowed ones. > > > . TSC: add a check if tsc is immutable and return an error. Because the KVM > > > implementation updates the internal tsc state and it's difficult to back > > > out those changes. Instead, skip the logic. > > > . dirty logging: add check if dirty logging is supported. > > > . exceptions/SMI/MCE/SIPI/INIT: silently ignore > > > > > > Note: virtual external interrupt and NMI can be injected into TDX guests. > > > > > > - KVM MMU integration > > > One bit of the guest physical address (bit 51 or 47) is repurposed to indicate if > > > the guest physical address is private (the bit is cleared) or shared (the bit is > > > set). The bits are called stolen bits. > > > > > > - Stolen bits framework > > > systematically tracks which guest physical address, shared or private, is > > > used. > > > > > > - Shared EPT and secure EPT > > > There are two EPTs. Shared EPT (the conventional one) and Secure > > > EPT(the new one). Shared EPT is handled the same for the stolen > > > bit set. Secure EPT points to private guest pages. To resolve > > > EPT violation, KVM walks one of two EPTs based on faulted GPA. > > > Because it's costly to access secure EPT during walking EPTs with > > > SEAMCALLs for the private guest physical address, another private > > > EPT is used as a shadow of Secure-EPT with the existing logic at > > > the cost of extra memory. > > > > > > The following depicts the relationship. > > > > > > KVM | TDX module > > > | | | > > > -------------+---------- | | > > > | | | | > > > V V | | > > > shared GPA private GPA | | > > > CPU shared EPT pointer KVM private EPT pointer | CPU secure EPT pointer > > > | | | | > > > | | | | > > > V V | V > > > shared EPT private EPT--------mirror----->Secure EPT > > > | | | | > > > | \--------------------+------\ | > > > | | | | > > > V | V V > > > shared guest page | private guest page > > > | > > > | > > > non-encrypted memory | encrypted memory > > > | > > > > > > - Operating on Secure EPT > > > Use the TDX module APIs to operate on Secure EPT. To call the TDX API > > > during resolving EPT violation, add hooks to additional operation and wiring > > > it to TDX backend. > > > > > > * References > > > > > > [1] TDX specification > > > https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html > > > [2] Intel Trust Domain Extensions (Intel TDX) > > > https://cdrdv2.intel.com/v1/dl/getContent/726790 > > > [3] Intel CPU Architectural Extensions Specification > > > https://www.intel.com/content/dam/develop/external/us/en/documents-tps/intel-tdx-cpu-architectural-specification.pdf > > > [4] Intel TDX Module 1.0 Specification > > > https://www.intel.com/content/dam/develop/external/us/en/documents/tdx-module-1.0-public-spec-v0.931.pdf > > > [5] Intel TDX Loader Interface Specification > > > https://www.intel.com/content/dam/develop/external/us/en/documents-tps/intel-tdx-seamldr-interface-specification.pdf > > > [6] Intel TDX Guest-Hypervisor Communication Interface > > > https://cdrdv2.intel.com/v1/dl/getContent/726790 > > > [7] Intel TDX Virtual Firmware Design Guide > > > https://www.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.01.pdf > > > [8] intel public github > > > kvm TDX branch: https://github.com/intel/tdx/tree/kvm > > > TDX guest branch: https://github.com/intel/tdx/tree/guest > > > qemu TDX https://github.com/intel/qemu-tdx > > > [9] TDVF > > > https://github.com/tianocore/edk2-staging/tree/TDVF > > > This was merged into EDK2 main branch. https://github.com/tianocore/edk2 > > > > > > Chao Gao (3): > > > KVM: x86: Move check_processor_compatibility from init ops to runtime > > > ops > > > Partially revert "KVM: Pass kvm_init()'s opaque param to additional > > > arch funcs" > > > KVM: x86: Allow to update cached values in kvm_user_return_msrs w/o > > > wrmsr > > > > > > Isaku Yamahata (72): > > > KVM: Refactor CPU compatibility check on module initialiization > > > x86/virt/vmx/tdx: export platform_tdx_enabled() > > > KVM: TDX: Detect CPU feature on kernel module initialization > > > KVM: x86: Refactor KVM VMX module init/exit functions > > > KVM: TDX: Add placeholders for TDX VM/vcpu structure > > > x86/virt/tdx: Add a helper function to return system wide info about > > > TDX module > > > KVM: TDX: Initialize TDX module when loading kvm_intel.ko > > > KVM: TDX: Make TDX VM type supported > > > [MARKER] The start of TDX KVM patch series: TDX architectural > > > definitions > > > KVM: TDX: Define TDX architectural definitions > > > KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module > > > KVM: TDX: Add helper functions to print TDX SEAMCALL error > > > [MARKER] The start of TDX KVM patch series: TD VM creation/destruction > > > x86/cpu: Add helper functions to allocate/free TDX private host key id > > > KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl > > > KVM: TDX: Make pmu_intel.c ignore guest TD case > > > [MARKER] The start of TDX KVM patch series: TD vcpu > > > creation/destruction > > > KVM: TDX: allocate/free TDX vcpu structure > > > KVM: TDX: allocate/free TDX vcpu structure > > > [MARKER] The start of TDX KVM patch series: KVM MMU GPA shared bits > > > KVM: x86/mmu: introduce config for PRIVATE KVM MMU > > > [MARKER] The start of TDX KVM patch series: KVM TDP refactoring for > > > TDX > > > KVM: x86/mmu: Disallow fast page fault on private GPA > > > KVM: VMX: Introduce test mode related to EPT violation VE > > > [MARKER] The start of TDX KVM patch series: KVM TDP MMU hooks > > > KVM: x86/mmu: Focibly use TDP MMU for TDX > > > KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page > > > KVM: x86/tdp_mmu: refactor kvm_tdp_mmu_map() > > > KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU > > > [MARKER] The start of TDX KVM patch series: TDX EPT violation > > > KVM: x86/tdp_mmu: Ignore unsupported mmu operation on private GFNs > > > KVM: TDX: don't request KVM_REQ_APIC_PAGE_RELOAD > > > KVM: TDX: TDP MMU TDX support > > > [MARKER] The start of TDX KVM patch series: KVM TDP MMU MapGPA > > > KVM: x86/mmu: steal software usable git to record if GFN is for shared > > > or not > > > KVM: x86/tdp_mmu: implement MapGPA hypercall for TDX > > > [MARKER] The start of TDX KVM patch series: TD finalization > > > KVM: TDX: Create initial guest memory > > > KVM: TDX: Finalize VM initialization > > > [MARKER] The start of TDX KVM patch series: TD vcpu enter/exit > > > KVM: TDX: Add helper assembly function to TDX vcpu > > > KVM: TDX: Implement TDX vcpu enter/exit path > > > KVM: TDX: vcpu_run: save/restore host state(host kernel gs) > > > KVM: TDX: restore host xsave state when exit from the guest TD > > > KVM: TDX: restore user ret MSRs > > > [MARKER] The start of TDX KVM patch series: TD vcpu > > > exits/interrupts/hypercalls > > > KVM: TDX: complete interrupts after tdexit > > > KVM: TDX: restore debug store when TD exit > > > KVM: TDX: handle vcpu migration over logical processor > > > KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched > > > behavior > > > KVM: TDX: remove use of struct vcpu_vmx from posted_interrupt.c > > > KVM: TDX: Implement interrupt injection > > > KVM: TDX: Implements vcpu request_immediate_exit > > > KVM: TDX: Implement methods to inject NMI > > > KVM: TDX: Add a place holder to handle TDX VM exit > > > KVM: TDX: handle EXIT_REASON_OTHER_SMI > > > KVM: TDX: handle ept violation/misconfig exit > > > KVM: TDX: handle EXCEPTION_NMI and EXTERNAL_INTERRUPT > > > KVM: TDX: Add a place holder for handler of TDX hypercalls > > > (TDG.VP.VMCALL) > > > KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL > > > KVM: TDX: Handle TDX PV CPUID hypercall > > > KVM: TDX: Handle TDX PV HLT hypercall > > > KVM: TDX: Handle TDX PV port io hypercall > > > KVM: TDX: Implement callbacks for MSR operations for TDX > > > KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall > > > KVM: TDX: Handle TDX PV report fatal error hypercall > > > KVM: TDX: Handle TDX PV map_gpa hypercall > > > KVM: TDX: Handle TDG.VP.VMCALL hypercall > > > KVM: TDX: Silently discard SMI request > > > KVM: TDX: Silently ignore INIT/SIPI > > > Documentation/virtual/kvm: Document on Trust Domain Extensions(TDX) > > > KVM: x86: design documentation on TDX support of x86 KVM TDP MMU > > > > > > Rick Edgecombe (1): > > > KVM: x86/mmu: Add address conversion functions for TDX shared bits > > > > > > Sean Christopherson (25): > > > KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX > > > KVM: Enable hardware before doing arch VM initialization > > > KVM: x86: Introduce vm_type to differentiate default VMs from > > > confidential VMs > > > KVM: TDX: Add TDX "architectural" error codes > > > KVM: TDX: Stub in tdx.h with structs, accessors, and VMCS helpers > > > KVM: TDX: create/destroy VM structure > > > KVM: TDX: x86: Add ioctl to get TDX systemwide parameters > > > KVM: TDX: Do TDX specific vcpu initialization > > > KVM: x86/mmu: Explicitly check for MMIO spte in fast page fault > > > KVM: x86/mmu: Allow non-zero value for non-present SPTE > > > KVM: x86/mmu: Track shadow MMIO value/mask on a per-VM basis > > > KVM: x86/mmu: Allow per-VM override of the TDP max page level > > > KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot for > > > private mmu > > > KVM: x86/mmu: Disallow dirty logging for x86 TDX > > > KVM: VMX: Split out guts of EPT violation to common/exposed function > > > KVM: VMX: Move setting of EPT MMU masks to common VT-x code > > > KVM: TDX: Add load_mmu_pgd method for TDX > > > KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX > > > KVM: TDX: Add support for find pending IRQ in a protected local APIC > > > KVM: x86: Assume timer IRQ was injected if APIC state is proteced > > > KVM: VMX: Modify NMI and INTR handlers to take intr_info as function > > > argument > > > KVM: VMX: Move NMI/exception handler to common helper > > > KVM: x86: Split core of hypercall emulation to helper function > > > KVM: TDX: Handle TDX PV MMIO hypercall > > > KVM: TDX: Add methods to ignore accesses to CPU state > > > > > > Xiaoyao Li (1): > > > KVM: TDX: initialize VM with TDX specific parameters > > > > > > Documentation/virt/kvm/api.rst | 30 +- > > > .../virt/kvm/intel-tdx-layer-status.rst | 33 + > > > Documentation/virt/kvm/intel-tdx.rst | 381 +++ > > > Documentation/virt/kvm/tdx-tdp-mmu.rst | 466 ++++ > > > arch/arm64/kvm/arm.c | 2 +- > > > arch/mips/kvm/mips.c | 14 +- > > > arch/powerpc/kvm/powerpc.c | 2 +- > > > arch/riscv/kvm/main.c | 2 +- > > > arch/s390/kvm/kvm-s390.c | 2 +- > > > arch/x86/events/intel/ds.c | 1 + > > > arch/x86/include/asm/kvm-x86-ops.h | 10 + > > > arch/x86/include/asm/kvm_host.h | 56 +- > > > arch/x86/include/asm/tdx.h | 67 + > > > arch/x86/include/asm/vmx.h | 14 + > > > arch/x86/include/uapi/asm/kvm.h | 95 + > > > arch/x86/include/uapi/asm/vmx.h | 5 +- > > > arch/x86/kvm/Kconfig | 4 + > > > arch/x86/kvm/Makefile | 3 +- > > > arch/x86/kvm/irq.c | 3 + > > > arch/x86/kvm/lapic.c | 37 +- > > > arch/x86/kvm/lapic.h | 2 + > > > arch/x86/kvm/mmu.h | 42 +- > > > arch/x86/kvm/mmu/mmu.c | 360 ++- > > > arch/x86/kvm/mmu/mmu_internal.h | 123 +- > > > arch/x86/kvm/mmu/paging_tmpl.h | 5 +- > > > arch/x86/kvm/mmu/spte.c | 46 +- > > > arch/x86/kvm/mmu/spte.h | 65 +- > > > arch/x86/kvm/mmu/tdp_iter.c | 1 + > > > arch/x86/kvm/mmu/tdp_iter.h | 5 +- > > > arch/x86/kvm/mmu/tdp_mmu.c | 690 ++++- > > > arch/x86/kvm/mmu/tdp_mmu.h | 12 +- > > > arch/x86/kvm/svm/svm.c | 13 +- > > > arch/x86/kvm/vmx/common.h | 174 ++ > > > arch/x86/kvm/vmx/evmcs.c | 2 +- > > > arch/x86/kvm/vmx/evmcs.h | 2 +- > > > arch/x86/kvm/vmx/main.c | 1071 +++++++ > > > arch/x86/kvm/vmx/pmu_intel.c | 39 +- > > > arch/x86/kvm/vmx/pmu_intel.h | 28 + > > > arch/x86/kvm/vmx/posted_intr.c | 43 +- > > > arch/x86/kvm/vmx/posted_intr.h | 13 + > > > arch/x86/kvm/vmx/tdx.c | 2465 +++++++++++++++++ > > > arch/x86/kvm/vmx/tdx.h | 275 ++ > > > arch/x86/kvm/vmx/tdx_arch.h | 157 ++ > > > arch/x86/kvm/vmx/tdx_errno.h | 29 + > > > arch/x86/kvm/vmx/tdx_error.c | 22 + > > > arch/x86/kvm/vmx/tdx_ops.h | 188 ++ > > > arch/x86/kvm/vmx/vmenter.S | 146 + > > > arch/x86/kvm/vmx/vmx.c | 737 ++--- > > > arch/x86/kvm/vmx/vmx.h | 39 +- > > > arch/x86/kvm/vmx/x86_ops.h | 235 ++ > > > arch/x86/kvm/x86.c | 148 +- > > > arch/x86/virt/vmx/tdx/seamcall.S | 2 + > > > arch/x86/virt/vmx/tdx/tdx.c | 54 +- > > > arch/x86/virt/vmx/tdx/tdx.h | 52 - > > > include/linux/kvm_host.h | 4 +- > > > include/uapi/linux/kvm.h | 2 + > > > tools/arch/x86/include/uapi/asm/kvm.h | 95 + > > > tools/include/uapi/linux/kvm.h | 1 + > > > virt/kvm/kvm_main.c | 67 +- > > > 59 files changed, 7877 insertions(+), 804 deletions(-) > > > create mode 100644 Documentation/virt/kvm/intel-tdx-layer-status.rst > > > create mode 100644 Documentation/virt/kvm/intel-tdx.rst > > > create mode 100644 Documentation/virt/kvm/tdx-tdp-mmu.rst > > > create mode 100644 arch/x86/kvm/vmx/common.h > > > create mode 100644 arch/x86/kvm/vmx/main.c > > > create mode 100644 arch/x86/kvm/vmx/pmu_intel.h > > > create mode 100644 arch/x86/kvm/vmx/tdx.c > > > create mode 100644 arch/x86/kvm/vmx/tdx.h > > > create mode 100644 arch/x86/kvm/vmx/tdx_arch.h > > > create mode 100644 arch/x86/kvm/vmx/tdx_errno.h > > > create mode 100644 arch/x86/kvm/vmx/tdx_error.c > > > create mode 100644 arch/x86/kvm/vmx/tdx_ops.h > > > create mode 100644 arch/x86/kvm/vmx/x86_ops.h > > > > > > -- > > > 2.25.1 > > > > > > > -- > > Isaku Yamahata -- Isaku Yamahata