Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp3475907rwb; Fri, 30 Sep 2022 04:16:58 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4YFVxGPD02idv2HbrU4ceR2Q036Q0AN8ZW9bqWKoYDJT/yRTyTKC4E8g3UPhUS1iiBVZhM X-Received: by 2002:a05:6402:3489:b0:451:a859:8a4f with SMTP id v9-20020a056402348900b00451a8598a4fmr7271004edc.279.1664536618234; Fri, 30 Sep 2022 04:16:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664536618; cv=none; d=google.com; s=arc-20160816; b=eAc7qB/bCMDFj0AdiqyxcupzbZUBR/1lWvHubR92xlayKxwC0wUeUAfUIyFO1uja8K baTeNGI4qN0s0ykAceX8G2jRjYtq7rYhBiBniiKR2j/Kha0Ux8tkfDPWGwLbZ2VBgCtc UN7M9D2GC/l9Ilfom4HzuMGycUGDIDtpSp4uoED/DtZC/abseN6tMwgh7VCS16U2Y/5+ LcqXkRP+RwdfGEsN6U4ICp0rhXMjcRVgKppkNsLFt/o8cJfDENR2fGpGeCXMM5/zAgsL GGITgHIK9F4oV/v6TTLHa7QH8oXJzdyFMXofDtXgMn/aIscSAC7nHO7DFU4ry7nzOzhQ JZ6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=iuW8UkP0u66ESKUx8o6vfPXqj9eizhbKHOBFufHwJ1w=; b=oqVrCCVHG9Y3cCPupVMTLMj8JehRpjeQ57GbObbLb28bKL0vJFOl1zdZeqFJE0FdFn osrHJqg6kKUGtFwdP74dv7GfsxwV3zamZcO7Z2AGCpJT5ikpQz1JfkNWXveNFs5kfRXN v0crVlvhfuT/Aanp8c6t+iG9fDLduWsI/A9G0xlOlB+mLFsFiF5nGk0DE1t6fcQDguF4 UmruxuDBvbpTX1V9xy12So0t2hh8EE0EhuiGzY0xlAhNZm6zYXxiLz6pY9yiFWtH0Qj0 x3iWge5nkQL1bQMhcPKUb0GVxcCvAz+7v8lI6eI2i82QtOsCdhNMGWCHESqaITZgwjpE /SuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=JopKo0wM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g20-20020a056402321400b004587e08bbcbsi1263558eda.520.2022.09.30.04.16.32; Fri, 30 Sep 2022 04:16:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=JopKo0wM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232385AbiI3KYD (ORCPT + 99 others); Fri, 30 Sep 2022 06:24:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231649AbiI3KTS (ORCPT ); Fri, 30 Sep 2022 06:19:18 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26AF31C6A7D; Fri, 30 Sep 2022 03:19:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664533147; x=1696069147; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GqvFey6xb19YQgsYnj21x/0rfMZsOIrkovFD+ChU914=; b=JopKo0wM4gy0RfmhQ0G99oWjrHjh8kj4ZX7MNqdGAorybt6J7Il7OAD0 i+9xIF+yVw1Ix1dai4UUFfPRsGeNpjhP+CQbyRcOlfF3n4OSVTIyPi/jk NqwLqAqD6YIy3h9cc3RzgNQAHpX/cmrSVqGbGCADvov0I/JdP/pr3FfB2 QSR2TRnKQGuaA9TnrRah2K3glxgotTLsZtGoXFtqMAeNlnctvrEpYPkdd Pqy/hhxtWVmGj8yUCUiUiiJkpjqO1iDbs8YTLiAPZeZ3rC47Ud9UnQW4A hEdBQTdaLlkCIKfkW6Co7mOFXrc7CLOwzEzqjG4jM4IFCdCQF6j3n8h3z A==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289320477" X-IronPort-AV: E=Sophos;i="5.93,358,1654585200"; d="scan'208";a="289320477" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Sep 2022 03:19:03 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807740" X-IronPort-AV: E=Sophos;i="5.93,358,1654585200"; d="scan'208";a="726807740" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Sep 2022 03:19:03 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar Subject: [PATCH v9 073/105] KVM: TDX: handle vcpu migration over logical processor Date: Fri, 30 Sep 2022 03:18:07 -0700 Message-Id: <924d20b1a894c9a94502fea037954194dc229ecf.1664530908.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Isaku Yamahata For vcpu migration, in the case of VMX, VCMS is flushed on the source pcpu, and load it on the target pcpu. There are corresponding TDX SEAMCALL APIs, call them on vcpu migration. The logic is mostly same as VMX except the TDX SEAMCALLs are used. When shutting down the machine, (VMX or TDX) vcpus needs to be shutdown on each pcpu. Do the similar for TDX with TDX SEAMCALL APIs. Signed-off-by: Isaku Yamahata --- arch/x86/kvm/vmx/main.c | 43 +++++++++++-- arch/x86/kvm/vmx/tdx.c | 121 +++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/tdx.h | 2 + arch/x86/kvm/vmx/x86_ops.h | 6 ++ 4 files changed, 168 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 44f9fc9e987b..b60e113696c0 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -17,6 +17,25 @@ static bool vt_is_vm_type_supported(unsigned long type) (enable_tdx && tdx_is_vm_type_supported(type)); } +static int vt_hardware_enable(void) +{ + int ret; + + ret = vmx_hardware_enable(); + if (ret) + return ret; + + tdx_hardware_enable(); + return 0; +} + +static void vt_hardware_disable(void) +{ + /* Note, TDX *and* VMX need to be disabled if TDX is enabled. */ + tdx_hardware_disable(); + vmx_hardware_disable(); +} + static __init int vt_hardware_setup(void) { int ret; @@ -141,6 +160,14 @@ static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu) return vmx_vcpu_run(vcpu); } +static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu) +{ + if (is_td_vcpu(vcpu)) + return tdx_vcpu_load(vcpu, cpu); + + return vmx_vcpu_load(vcpu, cpu); +} + static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) @@ -199,6 +226,14 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level); } +static void vt_sched_in(struct kvm_vcpu *vcpu, int cpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_sched_in(vcpu, cpu); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -222,8 +257,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .offline_cpu = tdx_offline_cpu, .check_processor_compatibility = vmx_check_processor_compatibility, - .hardware_enable = vmx_hardware_enable, - .hardware_disable = vmx_hardware_disable, + .hardware_enable = vt_hardware_enable, + .hardware_disable = vt_hardware_disable, .has_emulated_msr = vmx_has_emulated_msr, .is_vm_type_supported = vt_is_vm_type_supported, @@ -239,7 +274,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_reset = vt_vcpu_reset, .prepare_switch_to_guest = vt_prepare_switch_to_guest, - .vcpu_load = vmx_vcpu_load, + .vcpu_load = vt_vcpu_load, .vcpu_put = vt_vcpu_put, .update_exception_bitmap = vmx_update_exception_bitmap, @@ -327,7 +362,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .request_immediate_exit = vmx_request_immediate_exit, - .sched_in = vmx_sched_in, + .sched_in = vt_sched_in, .cpu_dirty_log_size = PML_ENTITY_NUM, .update_cpu_dirty_logging = vmx_update_cpu_dirty_logging, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 407216512729..51de41fbe098 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -51,6 +51,14 @@ static DEFINE_MUTEX(tdx_lock); static struct mutex *tdx_mng_key_config_lock; static atomic_t nr_configured_hkid; +/* + * A per-CPU list of TD vCPUs associated with a given CPU. Used when a CPU + * is brought down to invoke TDH_VP_FLUSH on the approapriate TD vCPUS. + * Protected by interrupt mask. This list is manipulated in process context + * of vcpu and IPI callback. See tdx_flush_vp_on_cpu(). + */ +static DEFINE_PER_CPU(struct list_head, associated_tdvcpus); + static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid) { return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits); @@ -82,6 +90,36 @@ static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx) return kvm_tdx->finalized; } +static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu) +{ + list_del(&to_tdx(vcpu)->cpu_list); + + /* + * Ensure tdx->cpu_list is updated is before setting vcpu->cpu to -1, + * otherwise, a different CPU can see vcpu->cpu = -1 and add the vCPU + * to its list before its deleted from this CPUs list. + */ + smp_wmb(); + + vcpu->cpu = -1; +} + +void tdx_hardware_enable(void) +{ + INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, raw_smp_processor_id())); +} + +void tdx_hardware_disable(void) +{ + int cpu = raw_smp_processor_id(); + struct list_head *tdvcpus = &per_cpu(associated_tdvcpus, cpu); + struct vcpu_tdx *tdx, *tmp; + + /* Safe variant needed as tdx_disassociate_vp() deletes the entry. */ + list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list) + tdx_disassociate_vp(&tdx->vcpu); +} + static void tdx_clear_page(unsigned long page) { const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0))); @@ -164,6 +202,41 @@ static void tdx_reclaim_td_page(struct tdx_td_page *page) } } +static void tdx_flush_vp(void *arg) +{ + struct kvm_vcpu *vcpu = arg; + u64 err; + + lockdep_assert_irqs_disabled(); + + /* Task migration can race with CPU offlining. */ + if (vcpu->cpu != raw_smp_processor_id()) + return; + + /* + * No need to do TDH_VP_FLUSH if the vCPU hasn't been initialized. The + * list tracking still needs to be updated so that it's correct if/when + * the vCPU does get initialized. + */ + if (is_td_vcpu_created(to_tdx(vcpu))) { + err = tdh_vp_flush(to_tdx(vcpu)->tdvpr.pa); + if (unlikely(err && err != TDX_VCPU_NOT_ASSOCIATED)) { + if (WARN_ON_ONCE(err)) + pr_tdx_error(TDH_VP_FLUSH, err, NULL); + } + } + + tdx_disassociate_vp(vcpu); +} + +static void tdx_flush_vp_on_cpu(struct kvm_vcpu *vcpu) +{ + if (unlikely(vcpu->cpu == -1)) + return; + + smp_call_function_single(vcpu->cpu, tdx_flush_vp, vcpu, 1); +} + static int tdx_do_tdh_phymem_cache_wb(void *param) { u64 err = 0; @@ -188,6 +261,8 @@ void tdx_mmu_release_hkid(struct kvm *kvm) struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); cpumask_var_t packages; bool cpumask_allocated; + struct kvm_vcpu *vcpu; + unsigned long j; u64 err; int ret; int i; @@ -198,6 +273,19 @@ void tdx_mmu_release_hkid(struct kvm *kvm) if (!is_td_created(kvm_tdx)) goto free_hkid; + kvm_for_each_vcpu(j, vcpu, kvm) + tdx_flush_vp_on_cpu(vcpu); + + mutex_lock(&tdx_lock); + err = tdh_mng_vpflushdone(kvm_tdx->tdr.pa); + mutex_unlock(&tdx_lock); + if (WARN_ON_ONCE(err)) { + pr_tdx_error(TDH_MNG_VPFLUSHDONE, err, NULL); + pr_err("tdh_mng_vpflushdone failed. HKID %d is leaked.\n", + kvm_tdx->hkid); + return; + } + cpumask_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL); cpus_read_lock(); for_each_online_cpu(i) { @@ -347,6 +435,26 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) return 0; } +void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + + if (vcpu->cpu == cpu) + return; + + tdx_flush_vp_on_cpu(vcpu); + + local_irq_disable(); + /* + * Pairs with the smp_wmb() in tdx_disassociate_vp() to ensure + * vcpu->cpu is read before tdx->cpu_list. + */ + smp_rmb(); + + list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu)); + local_irq_enable(); +} + void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) { struct vcpu_tdx *tdx = to_tdx(vcpu); @@ -397,6 +505,19 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu) tdx->tdvpx = NULL; } tdx_reclaim_td_page(&tdx->tdvpr); + + /* + * kvm_free_vcpus() + * -> kvm_unload_vcpu_mmu() + * + * does vcpu_load() for every vcpu after they already disassociated + * from the per cpu list when tdx_vm_teardown(). So we need to + * disassociate them again, otherwise the freed vcpu data will be + * accessed when do list_{del,add}() on associated_tdvcpus list + * later. + */ + tdx_flush_vp_on_cpu(vcpu); + WARN_ON_ONCE(vcpu->cpu != -1); } void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index caf837d7f64d..8b425e7415c2 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -82,6 +82,8 @@ struct vcpu_tdx { struct tdx_td_page tdvpr; struct tdx_td_page *tdvpx; + struct list_head cpu_list; + union tdx_exit_reason exit_reason; bool vcpu_initialized; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index de94aa189268..36a2956ba677 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -138,6 +138,8 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops); bool tdx_is_vm_type_supported(unsigned long type); void tdx_hardware_unsetup(void); int tdx_offline_cpu(void); +void tdx_hardware_enable(void); +void tdx_hardware_disable(void); int tdx_dev_ioctl(void __user *argp); int tdx_vm_init(struct kvm *kvm); @@ -150,6 +152,7 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu); void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu); void tdx_vcpu_put(struct kvm_vcpu *vcpu); +void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu); int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); @@ -162,6 +165,8 @@ static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return 0; } static inline bool tdx_is_vm_type_supported(unsigned long type) { return false; } static inline void tdx_hardware_unsetup(void) {} static inline int tdx_offline_cpu(void) { return 0; } +static inline void tdx_hardware_enable(void) {} +static inline void tdx_hardware_disable(void) {} static inline int tdx_dev_ioctl(void __user *argp) { return -EOPNOTSUPP; }; static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } @@ -175,6 +180,7 @@ static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {} static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu) { return EXIT_FASTPATH_NONE; } static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {} static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {} +static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {} static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; } static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } -- 2.25.1