Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp2261122pxm; Fri, 4 Mar 2022 12:33:57 -0800 (PST) X-Google-Smtp-Source: ABdhPJx4EPQ9uaadC4+gzPB+xjorldx5EtzlUSlNtbkfTyDqAcPjYZQAJwvLB0rrsp+//ONcSXo2 X-Received: by 2002:a17:90b:1b46:b0:1bf:353e:4ece with SMTP id nv6-20020a17090b1b4600b001bf353e4ecemr2266317pjb.207.1646426037028; Fri, 04 Mar 2022 12:33:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646426037; cv=none; d=google.com; s=arc-20160816; b=WVd/uHz86Mz0OEaLLP3LxdobPgc1m26HiQi5UvcgEfdgwlr+nIsLTG7n9OY1tOaPQE Y6ukAhvRucz/Xnw77p6jSNjD8/H7amjOJ9IG1TcEfequYloVAhNnNlavRXNvVaHeTPb9 4yc5idlSqezGi0o3Skxc2f7LYVmO5ko5NdNmyorKz8F10MFJapevwL+wCDC8RmEGrdMo xrXHrIC6T0/5sGV7hUPggzFgfCr83sio+eC6wUIjKysa8/3H09ppX+HGba5iIf0JysYP ClI/xm9MnJESBhpke61zSNLM1FMqBAghK89XLhfbW07yhWIFlDViMobKnC5wS6svB1dM mTvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=A9gN8eYTOi6eERXoijqIMIFWhwP4wxaKZuPDZCSACzU=; b=nRdwcX/Izj24PP5Get5/diID6NesSOvDuZv5zGvsKHmD1EX5Djze+7qmXIoIGd1hjy Ifo/dOpo8fpFLeUq5aulSJm6hUQIgFz6LEb/ntV8tpYnfEmsepfKN50Vhz4S4N5otsay 1CBKqxo7I91DT0BYPVrifbXpjMVsaSdjxlRGQLZ/nmgGnLksEG93kAnd6jTLke3taT/l FiK+3tJn8PqPp7hJhFDGaI/WcjOmT2/A+i+SFvVXCvmNBabFrimSP+zJW/1T/wb7uVnN 1blrtnJePROAh7Cj5aaNyVihAuIgYpCy8FGR04TGt/U9UDojnzWoPn7Z2afOLNtBG7ZA ZXpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jScdxAMC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g21-20020a632015000000b0037385565d0fsi5470304pgg.120.2022.03.04.12.33.41; Fri, 04 Mar 2022 12:33:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jScdxAMC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230392AbiCDUdx (ORCPT + 99 others); Fri, 4 Mar 2022 15:33:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230038AbiCDUcV (ORCPT ); Fri, 4 Mar 2022 15:32:21 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F51F1E745D; Fri, 4 Mar 2022 12:31:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1646425892; x=1677961892; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XReXNqpwpeyey4iDt5Hc4w6I7ya/Tm4BYwR6+cJm6/A=; b=jScdxAMChw9gisXtyDR80F1mcu4Wa15Q49xxsuS3pSa8wJI0xQA5638y P+W4h/nJjWNUv60dLlqBgUmYHdZrHjOX8XTollSrc1cOPLtG26MuLHLKa wH4Uk8jUh+y0z3hMOxNzMKj3K+xRrRv0OhqCdXJROZpl5hAqCmH4YXGYz hr/+UH++ORFTcMSu8fTZdAAwk18MpqlIhuouk14syxOJ5vEKONFXsFuBw gfoXNPhRrQXB6cmgdy5/WQn3GflLhV87+HSdJZg9Lo9tnZDwhPxQ99elP HZqRVDQG5+KJ2ERxmtMipP+9mpoXx7kABCAgT9OsBRmO03QeIxR/EMEsS w==; X-IronPort-AV: E=McAfee;i="6200,9189,10276"; a="251624275" X-IronPort-AV: E=Sophos;i="5.90,156,1643702400"; d="scan'208";a="251624275" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Mar 2022 11:50:36 -0800 X-IronPort-AV: E=Sophos;i="5.90,156,1643702400"; d="scan'208";a="552344484" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Mar 2022 11:50:35 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Jim Mattson , erdemaktas@google.com, Connor Kuehl , Sean Christopherson Subject: [RFC PATCH v5 073/104] KVM: TDX: track LP tdx vcpu run and teardown vcpus on descroing the guest TD Date: Fri, 4 Mar 2022 11:49:29 -0800 Message-Id: <6e096d8509ef40ce3e25c1e132643e772641241b.1646422845.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Isaku Yamahata When shutting down the machine, (VMX or TDX) vcpus needs to be shutdown on each pcpu. Do the similar for TDX with TDX SEAMCALL APIs. Signed-off-by: Isaku Yamahata --- arch/x86/kvm/vmx/main.c | 23 +++++++++++-- arch/x86/kvm/vmx/tdx.c | 70 ++++++++++++++++++++++++++++++++++++-- arch/x86/kvm/vmx/tdx.h | 2 ++ arch/x86/kvm/vmx/x86_ops.h | 4 +++ 4 files changed, 95 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 2cd5ba0e8788..882358ac270b 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -13,6 +13,25 @@ static bool vt_is_vm_type_supported(unsigned long type) return type == KVM_X86_DEFAULT_VM || tdx_is_vm_type_supported(type); } +static int vt_hardware_enable(void) +{ + int ret; + + ret = vmx_hardware_enable(); + if (ret) + return ret; + + tdx_hardware_enable(); + return 0; +} + +static void vt_hardware_disable(void) +{ + /* Note, TDX *and* VMX need to be disabled if TDX is enabled. */ + tdx_hardware_disable(); + vmx_hardware_disable(); +} + static __init int vt_hardware_setup(void) { int ret; @@ -199,8 +218,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .hardware_unsetup = vt_hardware_unsetup, - .hardware_enable = vmx_hardware_enable, - .hardware_disable = vmx_hardware_disable, + .hardware_enable = vt_hardware_enable, + .hardware_disable = vt_hardware_disable, .cpu_has_accelerated_tpr = report_flexpriority, .has_emulated_msr = vmx_has_emulated_msr, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index a6b1a8ce888d..690298fb99c7 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -48,6 +48,14 @@ struct tdx_capabilities tdx_caps; static DEFINE_MUTEX(tdx_lock); static struct mutex *tdx_mng_key_config_lock; +/* + * A per-CPU list of TD vCPUs associated with a given CPU. Used when a CPU + * is brought down to invoke TDH_VP_FLUSH on the approapriate TD vCPUS. + * Protected by interrupt mask. This list is manipulated in process context + * of vcpu and IPI callback. See tdx_flush_vp_on_cpu(). + */ +static DEFINE_PER_CPU(struct list_head, associated_tdvcpus); + static u64 hkid_mask __ro_after_init; static u8 hkid_start_pos __ro_after_init; @@ -87,6 +95,8 @@ static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx) static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu) { + list_del(&to_tdx(vcpu)->cpu_list); + /* * Ensure tdx->cpu_list is updated is before setting vcpu->cpu to -1, * otherwise, a different CPU can see vcpu->cpu = -1 and add the vCPU @@ -97,6 +107,22 @@ static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu) vcpu->cpu = -1; } +void tdx_hardware_enable(void) +{ + INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, raw_smp_processor_id())); +} + +void tdx_hardware_disable(void) +{ + int cpu = raw_smp_processor_id(); + struct list_head *tdvcpus = &per_cpu(associated_tdvcpus, cpu); + struct vcpu_tdx *tdx, *tmp; + + /* Safe variant needed as tdx_disassociate_vp() deletes the entry. */ + list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list) + tdx_disassociate_vp(&tdx->vcpu); +} + static void tdx_clear_page(unsigned long page) { const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0))); @@ -230,9 +256,11 @@ void tdx_mmu_prezap(struct kvm *kvm) struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); cpumask_var_t packages; bool cpumask_allocated; + struct kvm_vcpu *vcpu; u64 err; int ret; int i; + unsigned long j; if (!is_hkid_assigned(kvm_tdx)) return; @@ -248,6 +276,17 @@ void tdx_mmu_prezap(struct kvm *kvm) return; } + kvm_for_each_vcpu(j, vcpu, kvm) + tdx_flush_vp_on_cpu(vcpu); + + mutex_lock(&tdx_lock); + err = tdh_mng_vpflushdone(kvm_tdx->tdr.pa); + mutex_unlock(&tdx_lock); + if (WARN_ON_ONCE(err)) { + pr_tdx_error(TDH_MNG_VPFLUSHDONE, err, NULL); + return; + } + cpumask_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL); for_each_online_cpu(i) { if (cpumask_allocated && @@ -472,8 +511,22 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { - if (vcpu->cpu != cpu) - tdx_flush_vp_on_cpu(vcpu); + struct vcpu_tdx *tdx = to_tdx(vcpu); + + if (vcpu->cpu == cpu) + return; + + tdx_flush_vp_on_cpu(vcpu); + + local_irq_disable(); + /* + * Pairs with the smp_wmb() in tdx_disassociate_vp() to ensure + * vcpu->cpu is read before tdx->cpu_list. + */ + smp_rmb(); + + list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu)); + local_irq_enable(); } void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) @@ -522,6 +575,19 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu) tdx_reclaim_td_page(&tdx->tdvpx[i]); kfree(tdx->tdvpx); tdx_reclaim_td_page(&tdx->tdvpr); + + /* + * kvm_free_vcpus() + * -> kvm_unload_vcpu_mmu() + * + * does vcpu_load() for every vcpu after they already disassociated + * from the per cpu list when tdx_vm_teardown(). So we need to + * disassociate them again, otherwise the freed vcpu data will be + * accessed when do list_{del,add}() on associated_tdvcpus list + * later. + */ + tdx_flush_vp_on_cpu(vcpu); + WARN_ON(vcpu->cpu != -1); } void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 8b1cf9c158e3..180360a65545 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -81,6 +81,8 @@ struct vcpu_tdx { struct tdx_td_page tdvpr; struct tdx_td_page *tdvpx; + struct list_head cpu_list; + union tdx_exit_reason exit_reason; bool initialized; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index ceafd6e18f4e..aae0f4449ec5 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -132,6 +132,8 @@ void __init tdx_pre_kvm_init(unsigned int *vcpu_size, bool tdx_is_vm_type_supported(unsigned long type); void __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops); void tdx_hardware_unsetup(void); +void tdx_hardware_enable(void); +void tdx_hardware_disable(void); int tdx_vm_init(struct kvm *kvm); void tdx_mmu_prezap(struct kvm *kvm); @@ -156,6 +158,8 @@ static inline void tdx_pre_kvm_init( static inline bool tdx_is_vm_type_supported(unsigned long type) { return false; } static inline void tdx_hardware_setup(struct kvm_x86_ops *x86_ops) {} static inline void tdx_hardware_unsetup(void) {} +static inline void tdx_hardware_enable(void) {} +static inline void tdx_hardware_disable(void) {} static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } static inline void tdx_mmu_prezap(struct kvm *kvm) {} -- 2.25.1