Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp5598014ioo; Wed, 1 Jun 2022 08:38:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxHXdJqT+eHVOwRLGj4wF+7sYf0ot+pVGZBgTmwLT9TBSoxGzOphk8EfCBjl8KNknA61Ozt X-Received: by 2002:a17:90a:77c5:b0:1df:dc04:526e with SMTP id e5-20020a17090a77c500b001dfdc04526emr35021324pjs.225.1654097883462; Wed, 01 Jun 2022 08:38:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654097883; cv=none; d=google.com; s=arc-20160816; b=xBbsJzr+wg19GaoznHQuhksqUogpy8VLx8N0ML05TNlwrYuWoybluz4NngfdB6IpZb Vd9+Hip/C+YNdBF8+dNxXrizehf/Umi1vuRHpPz/ohYP9JZl/Xbsknnw36mnDKmTO8GY Pa1EjTNHtElGTAQHbwzgAfPvus8JX06ccw0dMzhbf14lUmg+b6e3ifssYPnklciRaJGp UfGvQ2SoCwUC/g8Ha2SpS+NzND40ajtSCezJvRSudRQuVJQj8ODRay+vYxEhjNEapZ4Z tbY3RhpmMxdvzIh3B1Lc/kMFbZfE+NBqw+ugIIPmIXEkEFugIu82yEKMDYXEVP8NtX+U +XwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:sender:dkim-signature; bh=zKyiqaMgj9E3VVjadFx+RAnWozyWm2mk3DiElGS4p9A=; b=gF7wtisKWCrFj8z27eX01rceV6upJK2OFWBxTN6nYdeI5D1XGEw8kxUS1Bjw/7hYrf V6JzEl9IMlKhcJ6T1xZUx5XXhvZ7uhH/5nqLa7Ze3vZBtjGQP2Oxj1Gvzf8L8izI6n32 jA5NG8NHhuFTYYUat+MZ/d3bxv+L29KtfDN9zN3BQDS7sEtnf1l4clGacCEC0ctbrZxk W3qJVxDbVf7BtvPhr3dCK7ez6Hy/c8asQ+LxLZ2NuS5xml1P0nCAT+4Uxs0CBigJl9Q8 3hsKkveBRnN/w8z1VbZFU53DNppxP18HCTkgXDcqVaoaAuhYQFt497x9D9Fnrw7zD6EJ nATA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="IS/k73NG"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h2-20020a170902f54200b00164db3d3af7si3130301plf.284.2022.06.01.08.37.47; Wed, 01 Jun 2022 08:38:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="IS/k73NG"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345289AbiEaOqM (ORCPT + 99 others); Tue, 31 May 2022 10:46:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236138AbiEaOqK (ORCPT ); Tue, 31 May 2022 10:46:10 -0400 Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10F0C7674; Tue, 31 May 2022 07:46:07 -0700 (PDT) Received: by mail-ed1-x533.google.com with SMTP id h19so9753138edj.0; Tue, 31 May 2022 07:46:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=zKyiqaMgj9E3VVjadFx+RAnWozyWm2mk3DiElGS4p9A=; b=IS/k73NGPbHIahIT2tYRaWFYMEhvyfQS1chXLrpMG2juAwQeL/9pysOHSyHaoW4zZ0 KqzM5GUmS8v3rLJqoJcsuD4/AKRb3BAyZgi/2CCViBynRUWmWn+gJ3LevXLx0jRc5YL0 aQs2sYQfAFl/krE2fuVOETjeMsagp+RpGQ2LjjkYKVGhyXD50x+8qxmRY/PfX5XobVHl GNm9tkM/5q1l9lWJhhtRqxYazRwUp4ntwEYfE69jLe5ecN0vQs6TM6gcfMkaL13Srmkq Nl2Z18qrPaTM0lOMygNvkYp3He43vWTDe26DrjRgwmJc0uPHjmmj5T0aZN6GD90MbmvD tW4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:message-id:date:mime-version:user-agent :subject:content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=zKyiqaMgj9E3VVjadFx+RAnWozyWm2mk3DiElGS4p9A=; b=8JyicUT8vkNgXMIYPVrI1guW9PZS5RvXngjGXM2amCxnuSHLcXPq91DNu6wKP8Q78P n8aAMLeBVpQR4fWEBxK/IpzS2/XYV5Jg1nFRgk0uMAJ5ORckJjBJaZmZmmzxBwbCiVkp UzR14V3bERORi+bAbdgfM1Q0fQAXAvSjvBOS8kPt0IjHu3CBPseV8N61q+FFgqyP+jv7 XcBqHqhtvpDJZIYmoPjSkpOqoTUl8Qt1LVFNn2QJ0iqKiGTLsQMRWWM74NJzVkYy+COP Nt6pdgMlmk68nw238S41+ZuiELSUPsvAa86UIGTcftNmvz6mM4bWKI4W6Uk8/OlTAvk9 wkIQ== X-Gm-Message-State: AOAM5319GHBEz9DXdNffV5gKJGbvcEmLiJvw1HpxOr47YBUcfmOrxZIH I9/n5IFDTFlpkRTYJYvlsJVLj927r0z8/A== X-Received: by 2002:a05:6402:3819:b0:42b:ddcc:2109 with SMTP id es25-20020a056402381900b0042bddcc2109mr27242630edb.246.1654008365070; Tue, 31 May 2022 07:46:05 -0700 (PDT) Received: from ?IPV6:2001:b07:6468:f312:9af8:e5f5:7516:fa89? ([2001:b07:6468:f312:9af8:e5f5:7516:fa89]) by smtp.googlemail.com with ESMTPSA id r23-20020a50c017000000b0042bb229e81esm8493184edb.15.2022.05.31.07.46.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 31 May 2022 07:46:04 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <7366c21e-6f70-91cb-ddda-d9a277622c31@redhat.com> Date: Tue, 31 May 2022 16:46:02 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [RFC PATCH v6 000/104] KVM TDX basic feature support Content-Language: en-US To: isaku.yamahata@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@gmail.com, erdemaktas@google.com, Sean Christopherson , Sagi Shahar References: From: Paolo Bonzini In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,NICE_REPLY_A,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/5/22 20:13, isaku.yamahata@intel.com wrote: > From: Isaku Yamahata > > Hello. This is v6 the patch series vof KVM TDX support. > This is based on v5.18-rc3 + kvm/queue branch + TDX HOST patch series. > The tree can be found at https://github.com/intel/tdx/tree/kvm-upstream > > Major changes from v5: > - initialize TDX module on loading kvm_intel.ko > This requires changes to other arch. I compile-tested only other arch. > Needs review by each KVM arch maintainer. > - introduced protected apic suggested by Sean Christopherson > - use constants for non-present SPTE value > I tested on VMX, but complie test only for SVM. > - introduced debug mode to enable #VE suppressbit for VMX and warn on #VE exit > > TODO: > - 2M large page support. It's work-in-progress. So the only important conflicts are with the PRIVATE mapping series (see reply to patch 47) and with commit ba3a6120a4e7: Author: Sean Christopherson Date: Sat Apr 23 03:47:43 2022 +0000 KVM: x86/mmu: Use atomic XCHG to write TDP MMU SPTEs with volatile bits which are a bit boring but not hard. If you can post a v7 relatively soon I'd be grateful. Paolo > How to run/test: > It's describe at > https://github.com/intel/tdx/blob/kvm-upstream-workaround/KVM-TDX.README.md > > Trello: > I've created to track details. If you want to update items, please let me know. > https://trello.com/b/B1cLGCcA/kvm-tdx > > Thanks, > Isaku Yamahata > > Changes from v5: > - export __seamcall and use it > - move mutex lock from callee function of smp_call_on_cpu to the caller. > - rename mmu_prezap => flush_shadow_all_private() and tdx_mmu_release_hkid > - updated comment > - drop the use of tdh_mng_key.reclaimid(): as the function is for backward > compatibility to only return success > - struct kvm_tdx_cmd: metadata => flags, added __u64 error. > - make this ioctl systemwide ioctl > - ABI change to struct kvm_init_vm > - guest_tsc_khz: use kvm->arch.default_tsc_khz > - rename BUILD_BUG_ON_MEMCPY to MEMCPY_SAME_SIZE > - drop exporting kvm_set_tsc_khz(). > - fix kvm_tdp_page_fault() for mtrr emulation > - rename it to kvm_gfn_shared_mask(), dropped kvm_gpa_shared_mask() > - drop kvm_is_private_gfn(), kept kvm_is_private_gpa() > keep kvm_{gfn, gpa}_private(), kvm_gpa_private() > - update commit message > - rename shadow_init_value => shadow_nonprsent_value > - added ept_violation_ve_test mode > - shadow_nonpresent_value => SHADOW_NONPRESENT_VALUE in tdp_mmu.c > - legacy MMU case > => - mmu_topup_shadow_page_cache(), kvm_mmu_create() > - FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > - #VE warning: > - rename: REMOVED_SPTE => __REMOVED_SPTE, SHADOW_REMOVED_SPTE => REMOVED_SPTE > - merge into Like we discussed, this patch should be merged with patch > "KVM: x86/mmu: Allow non-zero init value for shadow PTE". > - fix pointed by Sagi. check !is_private check => (kvm_gfn_shared_mask && !is_private) > - introduce kvm_gfn_for_root(kvm, root, gfn) > - add only_shared argument to kvm_tdp_mmu_handle_gfn() > - use kvm_arch_dirty_log_supported() > - rename SPTE_PRIVATE_PROHIBIT to SPTE_SHARED_MASK. > - rename: is_private_prohibit_spte() => spte_shared_mask() > - fix: shadow_nonpresent_value => SHADOW_NONPRESENT_VALUE in comment > - dropped this patch as the change was merged into kvm/queue > - update vt_apicv_post_state_restore() > - use is_64_bit_hypercall() > - comment: expand MSMI -> Machine Check System Management Interrupt > - fixed TDX_SEPT_PFERR > - tdvmcall_p[1234]_{write, read}() => tdvmcall_a[0123]_{read,write}() > - rename tdmvcall_exit_readon() => tdvmcall_leaf() > - remove optional zero check of argument. > - do a check for static_call(kvm_x86_has_emulated_msr)(kvm, MSR_IA32_SMBASE) > in kvm_vcpu_ioctl_smi and __apic_accept_irq. > - WARN_ON_ONCE in tdx_smi_allowed and tdx_enable_smi_window. > - introduce vcpu_deliver_init to x86_ops > - sprinkeled KVM_BUG_ON() > > Changes from v4: > - rebased to TDX host kernel patch series. > - include all the patches to make this patch series working. > - add [MARKER] patches to mark the patch layer clear. > > --- > * What's TDX? > TDX stands for Trust Domain Extensions, which extends Intel Virtual Machines > Extensions (VMX) to introduce a kind of virtual machine guest called a Trust > Domain (TD) for confidential computing. > > A TD runs in a CPU mode that is designed to protect the confidentiality of its > memory contents and its CPU state from any other software, including the hosting > Virtual Machine Monitor (VMM), unless explicitly shared by the TD itself. > > We have more detailed explanations below (***). > We have the high-level design of TDX KVM below (****). > > In this patch series, we use "TD" or "guest TD" to differentiate it from the > current "VM" (Virtual Machine), which is supported by KVM today. > > > * The organization of this patch series > This patch series is on top of the patches series "TDX host kernel support": > https://lore.kernel.org/lkml/cover.1646007267.git.kai.huang@intel.com/ > > this patch series is available at > https://github.com/intel/tdx/releases/tag/kvm-upstream > The corresponding patches to qemu are available at > https://github.com/intel/qemu-tdx/commits/tdx-upstream > > The relations of the layers are depicted as follows. > The arrows below show the order of patch reviews we would like to have. > > The below layers are chosen so that the device model, for example, qemu can > exercise each layering step by step. Check if TDX is supported, create TD VM, > create TD vcpu, allow vcpu running, populate TD guest private memory, and handle > vcpu exits/hypercalls/interrupts to run TD fully. > > TDX vcpu > interrupt/exits/hypercall<------------\ > ^ | > | | > TD finalization | > ^ | > | | > TDX EPT violation<------------\ | > ^ | | > | | | > TD vcpu enter/exit | | > ^ | | > | | | > TD vcpu creation/destruction | \-------KVM TDP MMU MapGPA > ^ | ^ > | | | > TD VM creation/destruction \---------------KVM TDP MMU hooks > ^ ^ > | | > TDX architectural definitions KVM TDP refactoring for TDX > ^ ^ > | | > TDX, VMX <--------TDX host kernel KVM MMU GPA share mask > coexistence support > > > The followings are explanations of each layer. Each layer has a dummy commit > that starts with [MARKER] in subject. It is intended to help to identify where > each layer starts. > > TDX host kernel support: > https://lore.kernel.org/lkml/cover.1646007267.git.kai.huang@intel.com/ > The guts of system-wide initialization of TDX module. There is an > independent patch series for host x86. TDX KVM patches call functions > this patch series provides to initialize the TDX module. > > TDX, VMX coexistence: > Infrastructure to allow TDX to coexist with VMX and trigger the > initialization of the TDX module. > This layer starts with > "KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX" > TDX architectural definitions: > Add TDX architectural definitions and helper functions > This layer starts with > "[MARKER] The start of TDX KVM patch series: TDX architectural definitions". > TD VM creation/destruction: > Guest TD creation/destroy allocation and releasing of TDX specific vm > and vcpu structure. Create an initial guest memory image with TDX > measurement. > This layer starts with > "[MARKER] The start of TDX KVM patch series: TD VM creation/destruction". > TD vcpu creation/destruction: > guest TD creation/destroy Allocation and releasing of TDX specific vm > and vcpu structure. Create an initial guest memory image with TDX > measurement. > This layer starts with > "[MARKER] The start of TDX KVM patch series: TD vcpu creation/destruction" > TDX EPT violation: > Create an initial guest memory image with TDX measurement. Handle > secure EPT violations to populate guest pages with TDX SEAMCALLs. > This layer starts with > "[MARKER] The start of TDX KVM patch series: TDX EPT violation" > TD vcpu enter/exit: > Allow TDX vcpu to enter into TD and exit from TD. Save CPU state before > entering into TD. Restore CPU state after exiting from TD. > This layer starts with > "[MARKER] The start of TDX KVM patch series: TD vcpu enter/exit" > TD vcpu interrupts/exit/hypercall: > Handle various exits/hypercalls and allow interrupts to be injected so > that TD vcpu can continue running. > This layer starts with > "[MARKER] The start of TDX KVM patch series: TD vcpu exits/interrupts/hypercalls" > > KVM MMU GPA shared bit: > Introduce framework to handle shared bit repurposed bit of GPA TDX > repurposed a bit of GPA to indicate shared or private. If it's shared, > it's the same as the conventional VMX EPT case. VMM can access shared > guest pages. If it's private, it's handled by Secure-EPT and the guest > page is encrypted. > This layer starts with > "[MARKER] The start of TDX KVM patch series: KVM MMU GPA stolen bits" > KVM TDP refactoring for TDX: > TDX Secure EPT requires different constants. e.g. initial value EPT > entry value etc. Various refactoring for those differences. > This layer starts with > "[MARKER] The start of TDX KVM patch series: KVM TDP refactoring for TDX" > KVM TDP MMU hooks: > Introduce framework to TDP MMU to add hooks in addition to direct EPT > access TDX added Secure EPT which is an enhancement to VMX EPT. Unlike > conventional VMX EPT, CPU can't directly read/write Secure EPT. Instead, > use TDX SEAMCALLs to operate on Secure EPT. > This layer starts with > "[MARKER] The start of TDX KVM patch series: KVM TDP MMU hooks" > KVM TDP MMU MapGPA: > Introduce framework to handle switching guest pages from private/shared > to shared/private. For a given GPA, a guest page can be assigned to a > private GPA or a shared GPA exclusively. With TDX MapGPA hypercall, > guest TD converts GPA assignments from private (or shared) to shared (or > private). > This layer starts with > "[MARKER] The start of TDX KVM patch series: KVM TDP MMU MapGPA " > > KVM guest private memory: (not shown in the above diagram) > [PATCH v4 00/12] KVM: mm: fd-based approach for supporting KVM guest private > memory: https://lkml.org/lkml/2022/1/18/395 > Guest private memory requires different memory management in KVM. The > patch proposes a way for it. Integration with TDX KVM. > > (***) > * TDX module > A CPU-attested software module called the "TDX module" is designed to implement > the TDX architecture, and it is loaded by the UEFI firmware today. It can be > loaded by the kernel or driver at runtime, but in this patch series we assume > that the TDX module is already loaded and initialized. > > The TDX module provides two main new logical modes of operation built upon the > new SEAM (Secure Arbitration Mode) root and non-root CPU modes added to the VMX > architecture. TDX root mode is mostly identical to the VMX root operation mode, > and the TDX functions (described later) are triggered by the new SEAMCALL > instruction with the desired interface function selected by an input operand > (leaf number, in RAX). TDX non-root mode is used for TD guest operation. TDX > non-root operation (i.e. "guest TD" mode) is similar to the VMX non-root > operation (i.e. guest VM), with changes and restrictions to better assure that > no other software or hardware has direct visibility of the TD memory and state. > > TDX transitions between TDX root operation and TDX non-root operation include TD > Entries, from TDX root to TDX non-root mode, and TD Exits from TDX non-root to > TDX root mode. A TD Exit might be asynchronous, triggered by some external > event (e.g., external interrupt or SMI) or an exception, or it might be > synchronous, triggered by a TDCALL (TDG.VP.VMCALL) function. > > TD VCPUs can be entered using SEAMCALL(TDH.VP.ENTER) by KVM. TDH.VP.ENTER is one > of the TDX interface functions as mentioned above, and "TDH" stands for Trust > Domain Host. Those host-side TDX interface functions are categorized into > various areas just for better organization, such as SYS (TDX module management), > MNG (TD management), VP (VCPU), PHYSMEM (physical memory), MEM (private memory), > etc. For example, SEAMCALL(TDH.SYS.INFO) returns the TDX module information. > > TDCS (Trust Domain Control Structure) is the main control structure of a guest > TD, and encrypted (using the guest TD's ephemeral private key). At a high > level, TDCS holds information for controlling TD operation as a whole, > execution, EPTP, MSR bitmaps, etc that KVM needs to set it up. Note that MSR > bitmaps are held as part of TDCS (unlike VMX) because they are meant to have the > same value for all VCPUs of the same TD. > > Trust Domain Virtual Processor State (TDVPS) is the root control structure of a > TD VCPU. It helps the TDX module control the operation of the VCPU, and holds > the VCPU state while the VCPU is not running. TDVPS is opaque to software and > DMA access, accessible only by using the TDX module interface functions (such as > TDH.VP.RD, TDH.VP.WR). TDVPS includes TD VMCS, and TD VMCS auxiliary structures, > such as virtual APIC page, virtualization exception information, etc. > > Several VMX control structures (such as Shared EPT and Posted interrupt > descriptor) are directly managed and accessed by the host VMM. These control > structures are pointed to by fields in the TD VMCS. > > The above means that 1) KVM needs to allocate different data structures for TDs, > 2) KVM can reuse the existing code for TDs for some operations, 3) it needs to > define TD-specific handling for others. 3) Redirect operations to . 3) > Redirect operations to the TDX specific callbacks, like "if (is_td_vcpu(vcpu)) > tdx_callback() else vmx_callback();". > > *TD Private Memory > TD private memory is designed to hold TD private content, encrypted by the CPU > using the TD ephemeral key. An encryption engine holds a table of encryption > keys, and an encryption key is selected for each memory transaction based on a > Host Key Identifier (HKID). By design, the host VMM does not have access to the > encryption keys. > > In the first generation of MKTME, HKID is "stolen" from the physical address by > allocating a configurable number of bits from the top of the physical > address. The HKID space is partitioned into shared HKIDs for legacy MKTME > accesses and private HKIDs for SEAM-mode-only accesses. We use 0 for the shared > HKID on the host so that MKTME can be opaque or bypassed on the host. > > During TDX non-root operation (i.e. guest TD), memory accesses can be qualified > as either shared or private, based on the value of a new SHARED bit in the Guest > Physical Address (GPA). The CPU translates shared GPAs using the usual VMX EPT > (Extended Page Table) or "Shared EPT" (in this document), which resides in host > VMM memory. The Shared EPT is directly managed by the host VMM - the same as > with the current VMX. Since guest TDs usually require I/O, and the data exchange > needs to be done via shared memory, thus KVM needs to use the current EPT > functionality even for TDs. > > * Secure EPT and Minoring using the TDP code > The CPU translates private GPAs using a separate Secure EPT. The Secure EPT > pages are encrypted and integrity-protected with the TD's ephemeral private > key. Secure EPT can be managed _indirectly_ by the host VMM, using the TDX > interface functions, and thus conceptually Secure EPT is a subset of EPT (why > "subset"). Since execution of such interface functions takes much longer time > than accessing memory directly, in KVM we use the existing TDP code to minor the > Secure EPT for the TD. > > This way, we can effectively walk Secure EPT without using the TDX interface > functions. > > * VM life cycle and TDX specific operations > The userspace VMM, such as QEMU, needs to build and treat TDs differently. For > example, a TD needs to boot in private memory, and the host software cannot copy > the initial image to private memory. > > * TSC Virtualization > The TDX module helps TDs maintain reliable TSC (Time Stamp Counter) values > (e.g. consistent among the TD VCPUs) and the virtual TSC frequency is determined > by TD configuration, i.e. when the TD is created, not per VCPU. The current KVM > owns TSC virtualization for VMs, but the TDX module does for TDs. > > * MCE support for TDs > The TDX module doesn't allow VMM to inject MCE. Instead PV way is needed for TD > to communicate with VMM. For now, KVM silently ignores MCE request by VMM. MSRs > related to MCE (e.g, MCE bank registers) can be naturally emulated by > paravirtualizing MSR access. > > [1] For details, the specifications, [2], [3], [4], [5], [6], [7], are > available. > > * Restrictions or future work > Some features are not included to reduce patch size. Those features are > addressed as future independent patch series. > - large page (2M, 1G) > - qemu gdb stub > - guest PMU > - and more > > * Prerequisites > It's required to load the TDX module and initialize it. It's out of the scope > of this patch series. Another independent patch for the common x86 code is > planned. It defines CONFIG_INTEL_TDX_HOST and this patch series uses > CONFIG_INTEL_TDX_HOST. It's assumed that With CONFIG_INTEL_TDX_HOST=y, the TDX > module is initialized and ready for KVM to use the TDX module APIs for TDX guest > life cycle like tdh.mng.init are ready to use. > > Concretely Global initialization, LP (Logical Processor) initialization, global > configuration, the key configuration, and TDMR and PAMT initialization are done. > The state of the TDX module is SYS_READY. Please refer to the TDX module > specification, the chapter Intel TDX Module Lifecycle State Machine > > ** Detecting the TDX module readiness. > TDX host patch series implements the detection of the TDX module availability > and its initialization so that KVM can use it. Also it manages Host KeyID > (HKID) assigned to guest TD. > The assumed APIs the TDX host patch series provides are > - int seamrr_enabled() > Check if required cpu feature (SEAM mode) is available. This only check CPU > feature availability. At this point, the TDX module may not be ready for KVM > to use. > - int init_tdx(void); > Initialization of TDX module so that the TDX module is ready for KVM to use. > - const struct tdsysinfo_struct *tdx_get_sysinfo(void); > Return the system wide information about the TDX module. NULL if the TDX > isn't initialized. > - u32 tdx_get_global_keyid(void); > Return global key id that is used for the TDX module itself. > - int tdx_keyid_alloc(void); > Allocate HKID for guest TD. > - void tdx_keyid_free(int keyid); > Free HKID for guest TD. > > (****) > * TDX KVM high-level design > - Host key ID management > Host Key ID (HKID) needs to be assigned to each TDX guest for memory encryption. > It is assumed The TDX host patch series implements necessary functions, > u32 tdx_get_global_keyid(void), int tdx_keyid_alloc(void) and, > void tdx_keyid_free(int keyid). > > - Data structures and VM type > Because TDX is different from VMX, define its own VM/VCPU structures, struct > kvm_tdx and struct vcpu_tdx instead of struct kvm_vmx and struct vcpu_vmx. To > identify the VM, introduce VM-type to specify which VM type, VMX (default) or > TDX, is used. > > - VM life cycle and TDX specific operations > Re-purpose the existing KVM_MEMORY_ENCRYPT_OP to add TDX specific operations. > New commands are used to get the TDX system parameters, set TDX specific VM/VCPU > parameters, set initial guest memory and measurement. > > The creation of TDX VM requires five additional operations in addition to the > conventional VM creation. > - Get KVM system capability to check if TDX VM type is supported > - VM creation (KVM_CREATE_VM) > - New: Get the TDX specific system parameters. KVM_TDX_GET_CAPABILITY. > - New: Set TDX specific VM parameters. KVM_TDX_INIT_VM. > - VCPU creation (KVM_CREATE_VCPU) > - New: Set TDX specific VCPU parameters. KVM_TDX_INIT_VCPU. > - New: Initialize guest memory as boot state and extend the measurement with > the memory. KVM_TDX_INIT_MEM_REGION. > - New: Finalize VM. KVM_TDX_FINALIZE. Complete measurement of the initial > TDX VM contents. > - VCPU RUN (KVM_VCPU_RUN) > > - Protected guest state > Because the guest state (CPU state and guest memory) is protected, the KVM VMM > can't operate on them. For example, accessing CPU registers, injecting > exceptions, and accessing guest memory. Those operations are handled as > silently ignored, returning zero or initial reset value when it's requested via > KVM API ioctls. > > VM/VCPU state and callbacks for TDX specific operations. > Define tdx specific VM state and VCPU state instead of VMX ones. Redirect > operations to TDX specific callbacks. "if (tdx) tdx_op() else vmx_op()". > > Operations on the CPU state > silently ignore operations on the guest state. For example, the write to > CPU registers is ignored and the read from CPU registers returns 0. > > . ignore access to CPU registers except for allowed ones. > . TSC: add a check if tsc is immutable and return an error. Because the KVM > implementation updates the internal tsc state and it's difficult to back > out those changes. Instead, skip the logic. > . dirty logging: add check if dirty logging is supported. > . exceptions/SMI/MCE/SIPI/INIT: silently ignore > > Note: virtual external interrupt and NMI can be injected into TDX guests. > > - KVM MMU integration > One bit of the guest physical address (bit 51 or 47) is repurposed to indicate if > the guest physical address is private (the bit is cleared) or shared (the bit is > set). The bits are called stolen bits. > > - Stolen bits framework > systematically tracks which guest physical address, shared or private, is > used. > > - Shared EPT and secure EPT > There are two EPTs. Shared EPT (the conventional one) and Secure > EPT(the new one). Shared EPT is handled the same for the stolen > bit set. Secure EPT points to private guest pages. To resolve > EPT violation, KVM walks one of two EPTs based on faulted GPA. > Because it's costly to access secure EPT during walking EPTs with > SEAMCALLs for the private guest physical address, another private > EPT is used as a shadow of Secure-EPT with the existing logic at > the cost of extra memory. > > The following depicts the relationship. > > KVM | TDX module > | | | > -------------+---------- | | > | | | | > V V | | > shared GPA private GPA | | > CPU shared EPT pointer KVM private EPT pointer | CPU secure EPT pointer > | | | | > | | | | > V V | V > shared EPT private EPT--------mirror----->Secure EPT > | | | | > | \--------------------+------\ | > | | | | > V | V V > shared guest page | private guest page > | > | > non-encrypted memory | encrypted memory > | > > - Operating on Secure EPT > Use the TDX module APIs to operate on Secure EPT. To call the TDX API > during resolving EPT violation, add hooks to additional operation and wiring > it to TDX backend. > > * References > > [1] TDX specification > https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html > [2] Intel Trust Domain Extensions (Intel TDX) > https://cdrdv2.intel.com/v1/dl/getContent/726790 > [3] Intel CPU Architectural Extensions Specification > https://www.intel.com/content/dam/develop/external/us/en/documents-tps/intel-tdx-cpu-architectural-specification.pdf > [4] Intel TDX Module 1.0 Specification > https://www.intel.com/content/dam/develop/external/us/en/documents/tdx-module-1.0-public-spec-v0.931.pdf > [5] Intel TDX Loader Interface Specification > https://www.intel.com/content/dam/develop/external/us/en/documents-tps/intel-tdx-seamldr-interface-specification.pdf > [6] Intel TDX Guest-Hypervisor Communication Interface > https://cdrdv2.intel.com/v1/dl/getContent/726790 > [7] Intel TDX Virtual Firmware Design Guide > https://www.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.01.pdf > [8] intel public github > kvm TDX branch: https://github.com/intel/tdx/tree/kvm > TDX guest branch: https://github.com/intel/tdx/tree/guest > qemu TDX https://github.com/intel/qemu-tdx > [9] TDVF > https://github.com/tianocore/edk2-staging/tree/TDVF > This was merged into EDK2 main branch. https://github.com/tianocore/edk2 > > Chao Gao (3): > KVM: x86: Move check_processor_compatibility from init ops to runtime > ops > Partially revert "KVM: Pass kvm_init()'s opaque param to additional > arch funcs" > KVM: x86: Allow to update cached values in kvm_user_return_msrs w/o > wrmsr > > Isaku Yamahata (74): > KVM: Refactor CPU compatibility check on module initialiization > x86/virt/vmx/tdx: export platform_has_tdx > KVM: TDX: Detect CPU feature on kernel module initialization > KVM: x86: Refactor KVM VMX module init/exit functions > KVM: TDX: Add placeholders for TDX VM/vcpu structure > x86/virt/tdx: Add a helper function to return system wide info about > TDX module > KVM: TDX: Initialize TDX module when loading kvm_intel.ko > KVM: TDX: Make TDX VM type supported > [MARKER] The start of TDX KVM patch series: TDX architectural > definitions > KVM: TDX: Define TDX architectural definitions > KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module > KVM: TDX: Add helper functions to print TDX SEAMCALL error > [MARKER] The start of TDX KVM patch series: TD VM creation/destruction > x86/cpu: Add helper functions to allocate/free TDX private host key id > KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl > KVM: TDX: Make KVM_CAP_SET_IDENTITY_MAP_ADDR unsupported for TDX > KVM: TDX: Make pmu_intel.c ignore guest TD case > [MARKER] The start of TDX KVM patch series: TD vcpu > creation/destruction > KVM: TDX: allocate/free TDX vcpu structure > KVM: TDX: allocate/free TDX vcpu structure > [MARKER] The start of TDX KVM patch series: KVM MMU GPA shared bits > KVM: x86/mmu: introduce config for PRIVATE KVM MMU > [MARKER] The start of TDX KVM patch series: KVM TDP refactoring for > TDX > KVM: x86/mmu: Disallow fast page fault on private GPA > KVM: VMX: Introduce test mode related to EPT violation VE > [MARKER] The start of TDX KVM patch series: KVM TDP MMU hooks > KVM: x86/mmu: Focibly use TDP MMU for TDX > KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page > KVM: x86/tdp_mmu: refactor kvm_tdp_mmu_map() > KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU > [MARKER] The start of TDX KVM patch series: TDX EPT violation > KVM: x86/tdp_mmu: Ignore unsupported mmu operation on private GFNs > KVM: TDX: don't request KVM_REQ_APIC_PAGE_RELOAD > KVM: TDX: TDP MMU TDX support > [MARKER] The start of TDX KVM patch series: KVM TDP MMU MapGPA > KVM: x86/mmu: steal software usable git to record if GFN is for shared > or not > KVM: x86/tdp_mmu: implement MapGPA hypercall for TDX > [MARKER] The start of TDX KVM patch series: TD finalization > KVM: TDX: Create initial guest memory > KVM: TDX: Finalize VM initialization > [MARKER] The start of TDX KVM patch series: TD vcpu enter/exit > KVM: TDX: Add helper assembly function to TDX vcpu > KVM: TDX: Implement TDX vcpu enter/exit path > KVM: TDX: vcpu_run: save/restore host state(host kernel gs) > KVM: TDX: restore host xsave state when exit from the guest TD > KVM: TDX: restore user ret MSRs > [MARKER] The start of TDX KVM patch series: TD vcpu > exits/interrupts/hypercalls > KVM: TDX: complete interrupts after tdexit > KVM: TDX: restore debug store when TD exit > KVM: TDX: handle vcpu migration over logical processor > KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched > behavior > KVM: TDX: remove use of struct vcpu_vmx from posted_interrupt.c > KVM: TDX: Implement interrupt injection > KVM: TDX: Implements vcpu request_immediate_exit > KVM: TDX: Implement methods to inject NMI > KVM: TDX: Add a place holder to handle TDX VM exit > KVM: TDX: handle EXIT_REASON_OTHER_SMI > KVM: TDX: handle ept violation/misconfig exit > KVM: TDX: handle EXCEPTION_NMI and EXTERNAL_INTERRUPT > KVM: TDX: Add a place holder for handler of TDX hypercalls > (TDG.VP.VMCALL) > KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL > KVM: TDX: Handle TDX PV CPUID hypercall > KVM: TDX: Handle TDX PV HLT hypercall > KVM: TDX: Handle TDX PV port io hypercall > KVM: TDX: Implement callbacks for MSR operations for TDX > KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall > KVM: TDX: Handle TDX PV report fatal error hypercall > KVM: TDX: Handle TDX PV map_gpa hypercall > KVM: TDX: Handle TDG.VP.VMCALL hypercall > KVM: TDX: Silently discard SMI request > KVM: TDX: Silently ignore INIT/SIPI > Documentation/virtual/kvm: Document on Trust Domain Extensions(TDX) > KVM: x86: design documentation on TDX support of x86 KVM TDP MMU > [MARKER] the end of (the first phase of) TDX KVM patch series > > Rick Edgecombe (1): > KVM: x86/mmu: Add address conversion functions for TDX shared bits > > Sean Christopherson (25): > KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX > KVM: Enable hardware before doing arch VM initialization > KVM: x86: Introduce vm_type to differentiate default VMs from > confidential VMs > KVM: TDX: Add TDX "architectural" error codes > KVM: TDX: Stub in tdx.h with structs, accessors, and VMCS helpers > KVM: TDX: create/destroy VM structure > KVM: TDX: x86: Add ioctl to get TDX systemwide parameters > KVM: TDX: Do TDX specific vcpu initialization > KVM: x86/mmu: Explicitly check for MMIO spte in fast page fault > KVM: x86/mmu: Allow non-zero value for non-present SPTE > KVM: x86/mmu: Track shadow MMIO value/mask on a per-VM basis > KVM: x86/mmu: Allow per-VM override of the TDP max page level > KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot for > private mmu > KVM: x86/mmu: Disallow dirty logging for x86 TDX > KVM: VMX: Split out guts of EPT violation to common/exposed function > KVM: VMX: Move setting of EPT MMU masks to common VT-x code > KVM: TDX: Add load_mmu_pgd method for TDX > KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX > KVM: TDX: Add support for find pending IRQ in a protected local APIC > KVM: x86: Assume timer IRQ was injected if APIC state is proteced > KVM: VMX: Modify NMI and INTR handlers to take intr_info as function > argument > KVM: VMX: Move NMI/exception handler to common helper > KVM: x86: Split core of hypercall emulation to helper function > KVM: TDX: Handle TDX PV MMIO hypercall > KVM: TDX: Add methods to ignore accesses to CPU state > > Xiaoyao Li (1): > KVM: TDX: initialize VM with TDX specific parameters > > Documentation/virt/kvm/api.rst | 30 +- > Documentation/virt/kvm/intel-tdx.rst | 381 ++++ > Documentation/virt/kvm/tdx-tdp-mmu.rst | 466 +++++ > arch/arm64/kvm/arm.c | 2 +- > arch/mips/kvm/mips.c | 14 +- > arch/powerpc/kvm/powerpc.c | 2 +- > arch/riscv/kvm/main.c | 2 +- > arch/s390/kvm/kvm-s390.c | 2 +- > arch/x86/events/intel/ds.c | 1 + > arch/x86/include/asm/kvm-x86-ops.h | 10 + > arch/x86/include/asm/kvm_host.h | 56 +- > arch/x86/include/asm/tdx.h | 66 + > arch/x86/include/asm/vmx.h | 14 + > arch/x86/include/uapi/asm/kvm.h | 95 + > arch/x86/include/uapi/asm/vmx.h | 5 +- > arch/x86/kvm/Kconfig | 4 + > arch/x86/kvm/Makefile | 3 +- > arch/x86/kvm/irq.c | 3 + > arch/x86/kvm/lapic.c | 37 +- > arch/x86/kvm/lapic.h | 2 + > arch/x86/kvm/mmu.h | 48 +- > arch/x86/kvm/mmu/mmu.c | 371 +++- > arch/x86/kvm/mmu/mmu_internal.h | 120 ++ > arch/x86/kvm/mmu/paging_tmpl.h | 5 +- > arch/x86/kvm/mmu/spte.c | 46 +- > arch/x86/kvm/mmu/spte.h | 65 +- > arch/x86/kvm/mmu/tdp_iter.c | 1 + > arch/x86/kvm/mmu/tdp_iter.h | 5 +- > arch/x86/kvm/mmu/tdp_mmu.c | 683 ++++++- > arch/x86/kvm/mmu/tdp_mmu.h | 12 +- > arch/x86/kvm/svm/svm.c | 13 +- > arch/x86/kvm/vmx/common.h | 154 ++ > arch/x86/kvm/vmx/evmcs.c | 2 +- > arch/x86/kvm/vmx/evmcs.h | 2 +- > arch/x86/kvm/vmx/main.c | 1073 ++++++++++ > arch/x86/kvm/vmx/pmu_intel.c | 33 + > arch/x86/kvm/vmx/pmu_intel.h | 29 + > arch/x86/kvm/vmx/posted_intr.c | 43 +- > arch/x86/kvm/vmx/posted_intr.h | 13 + > arch/x86/kvm/vmx/tdx.c | 2470 ++++++++++++++++++++++++ > arch/x86/kvm/vmx/tdx.h | 275 +++ > arch/x86/kvm/vmx/tdx_arch.h | 157 ++ > arch/x86/kvm/vmx/tdx_errno.h | 29 + > arch/x86/kvm/vmx/tdx_error.c | 22 + > arch/x86/kvm/vmx/tdx_ops.h | 188 ++ > arch/x86/kvm/vmx/vmenter.S | 146 ++ > arch/x86/kvm/vmx/vmx.c | 716 +++---- > arch/x86/kvm/vmx/vmx.h | 41 +- > arch/x86/kvm/vmx/x86_ops.h | 235 +++ > arch/x86/kvm/x86.c | 155 +- > arch/x86/virt/vmx/tdx/seamcall.S | 1 + > arch/x86/virt/vmx/tdx/tdx.c | 53 +- > arch/x86/virt/vmx/tdx/tdx.h | 52 - > include/linux/kvm_host.h | 4 +- > include/uapi/linux/kvm.h | 2 + > tools/arch/x86/include/uapi/asm/kvm.h | 95 + > tools/include/uapi/linux/kvm.h | 1 + > virt/kvm/kvm_main.c | 67 +- > 58 files changed, 7839 insertions(+), 783 deletions(-) > create mode 100644 Documentation/virt/kvm/intel-tdx.rst > create mode 100644 Documentation/virt/kvm/tdx-tdp-mmu.rst > create mode 100644 arch/x86/kvm/vmx/common.h > create mode 100644 arch/x86/kvm/vmx/main.c > create mode 100644 arch/x86/kvm/vmx/pmu_intel.h > create mode 100644 arch/x86/kvm/vmx/tdx.c > create mode 100644 arch/x86/kvm/vmx/tdx.h > create mode 100644 arch/x86/kvm/vmx/tdx_arch.h > create mode 100644 arch/x86/kvm/vmx/tdx_errno.h > create mode 100644 arch/x86/kvm/vmx/tdx_error.c > create mode 100644 arch/x86/kvm/vmx/tdx_ops.h > create mode 100644 arch/x86/kvm/vmx/x86_ops.h >