Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp333265rwb; Tue, 25 Jul 2023 16:55:48 -0700 (PDT) X-Google-Smtp-Source: APBJJlFtIOEi2CKIJ7nj3uEAkxI3ESw0DyXADaLauXzJ+IAv4AfPniSgzonYqxkbonjKCv6Lrbg+ X-Received: by 2002:a17:902:bcc9:b0:1bb:8461:159e with SMTP id o9-20020a170902bcc900b001bb8461159emr495274pls.55.1690329348164; Tue, 25 Jul 2023 16:55:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690329348; cv=none; d=google.com; s=arc-20160816; b=dIIwAOtM0cTmX8x0xPxvS0JUWdQAgcNXCX+0LJ3nA6nVInXHGPth49WWiy6nqQ52bn 4jZhuQK8J5TzcByHAlSJL1CV763CdBzQk1RHm9yxlvMBw9lpEtN+/UPcN/4sz2//hP2h CWSydxNyo3Wmggfh+YbvkDWg9iVCGja1LfGPsYAWVw/JQYu20YVJKVfYUny7AY0qp1pj HDdgFEFNU/+zGEBGhlWnpENyqsdmwbIYpI6GP9y3KAzPtDUiFf8am7C2j2oGrPo718OB LuWM0Y637nbIoH5PDAojNKHJy+etZjDYLNSC7m/ic24Ae9vRPbTpHLBw5dun1SK3Q2Gm 769Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=oHUKUQwx3dFbsffEKmnkdKVxtEykQyRDivWkAH8oHMM=; fh=juxwNcA6iKwLISFgUCNwdoIYC0NKLUseq3xZdq25RR4=; b=Twed2+eXkYi0hh7c5uJfwBJw0TdNh4EiS0TDJ0jfJLibI/1xA/O7kUQ+DKI622oj9j EVJnWewig9B1TBiHjcaVReO2yM5MuaZRcLubJuB/HsQNeBIg3qfULeQLaeTxlt+BhkvN 4QslhMNiMn9O9nz4erpr0Y8FoqyNPWJRpxlpIbne368C853+gWlv7iOOrI2aOena2qwX 9BlIcNICTdA6MHTGG1ITN2YjTm07xBlC9wAq8DYa5KhvbwfDPjFjKdgj1GQE9tk0+ZLr ba5YHZ2wTFMvDsFNcDFB2J4kN/LFinbD6AJXPPGk9zAQavVip52DIBboscEABk8Vztvm z1Gg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ITGoh1aS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x14-20020a170902ec8e00b001b7e19195b7si12862584plg.29.2023.07.25.16.55.35; Tue, 25 Jul 2023 16:55:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ITGoh1aS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232564AbjGYWVb (ORCPT + 99 others); Tue, 25 Jul 2023 18:21:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232480AbjGYWT7 (ORCPT ); Tue, 25 Jul 2023 18:19:59 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC1821738; Tue, 25 Jul 2023 15:17:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690323421; x=1721859421; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=oAoRI/MaGc2MZJ2E0YaA8ssWcjEPMxc5IyNgwHBfkA8=; b=ITGoh1aSp1/aWJx5aYhwuUlhkRrmMCuRUJL9527qwImKPq/pW+BA9l4i 2lFOitUYzmz4/eWBwVuGxVHieALRH98hwGDwf/7a9/91wr6lY96Nt3TVL REc6nt1ejwI9mY6goEooYzTDpM4eP63ePRlhQ3Y+fz5pKs6RXJ4693MxG biCrGqlZpHb5Cq8luC4J8pK8XdDBoQ3cl4b0p6zKD3tfjGFtbLluJqnRy cS+9+RxxogQWqZbfIT/g8eyxJh9LMoOLJFDfUjIVIZA13x4YKqISb0Y/I Q1GwavhVyzpY2+7aKmDt4BYaBa0/m58Mnxa/gnvWCEezv6khAqqBag1DG Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10782"; a="367882585" X-IronPort-AV: E=Sophos;i="6.01,231,1684825200"; d="scan'208";a="367882585" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2023 15:15:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10782"; a="840001817" X-IronPort-AV: E=Sophos;i="6.01,231,1684825200"; d="scan'208";a="840001817" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2023 15:15:54 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, hang.yuan@intel.com, tina.zhang@intel.com Subject: [PATCH v15 072/115] KVM: TDX: handle vcpu migration over logical processor Date: Tue, 25 Jul 2023 15:14:23 -0700 Message-Id: <03580494a8c32eff5cd5accef49fa0910ae2174a.1690322424.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Isaku Yamahata For vcpu migration, in the case of VMX, VMCS is flushed on the source pcpu, and load it on the target pcpu. There are corresponding TDX SEAMCALL APIs, call them on vcpu migration. The logic is mostly same as VMX except the TDX SEAMCALLs are used. When shutting down the machine, (VMX or TDX) vcpus needs to be shutdown on each pcpu. Do the similar for TDX with TDX SEAMCALL APIs. Signed-off-by: Isaku Yamahata --- arch/x86/kvm/vmx/main.c | 32 ++++++- arch/x86/kvm/vmx/tdx.c | 165 +++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/tdx.h | 2 + arch/x86/kvm/vmx/x86_ops.h | 4 + 4 files changed, 200 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index d4edb479648e..a0570ff1eae1 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -44,6 +44,14 @@ static int vt_hardware_enable(void) return ret; } +static void vt_hardware_disable(void) +{ + /* Note, TDX *and* VMX need to be disabled if TDX is enabled. */ + if (enable_tdx) + tdx_hardware_disable(); + vmx_hardware_disable(); +} + static __init int vt_hardware_setup(void) { int ret; @@ -212,6 +220,16 @@ static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu) return vmx_vcpu_run(vcpu); } +static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu) +{ + if (is_td_vcpu(vcpu)) { + tdx_vcpu_load(vcpu, cpu); + return; + } + + vmx_vcpu_load(vcpu, cpu); +} + static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) { @@ -271,6 +289,14 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level); } +static void vt_sched_in(struct kvm_vcpu *vcpu, int cpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_sched_in(vcpu, cpu); +} + static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { if (is_td_vcpu(vcpu)) @@ -313,7 +339,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .offline_cpu = tdx_offline_cpu, .hardware_enable = vt_hardware_enable, - .hardware_disable = vmx_hardware_disable, + .hardware_disable = vt_hardware_disable, .has_emulated_msr = vmx_has_emulated_msr, .is_vm_type_supported = vt_is_vm_type_supported, @@ -331,7 +357,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_reset = vt_vcpu_reset, .prepare_switch_to_guest = vt_prepare_switch_to_guest, - .vcpu_load = vmx_vcpu_load, + .vcpu_load = vt_vcpu_load, .vcpu_put = vt_vcpu_put, .update_exception_bitmap = vmx_update_exception_bitmap, @@ -417,7 +443,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .request_immediate_exit = vmx_request_immediate_exit, - .sched_in = vmx_sched_in, + .sched_in = vt_sched_in, .cpu_dirty_log_size = PML_ENTITY_NUM, .update_cpu_dirty_logging = vmx_update_cpu_dirty_logging, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index b46bd963349c..259139abb8ba 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -73,6 +73,14 @@ static DEFINE_MUTEX(tdx_lock); static struct mutex *tdx_mng_key_config_lock; static atomic_t nr_configured_hkid; +/* + * A per-CPU list of TD vCPUs associated with a given CPU. Used when a CPU + * is brought down to invoke TDH_VP_FLUSH on the approapriate TD vCPUS. + * Protected by interrupt mask. This list is manipulated in process context + * of vcpu and IPI callback. See tdx_flush_vp_on_cpu(). + */ +static DEFINE_PER_CPU(struct list_head, associated_tdvcpus); + static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid) { return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits); @@ -104,6 +112,35 @@ static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx) return kvm_tdx->finalized; } +static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu) +{ + list_del(&to_tdx(vcpu)->cpu_list); + + /* + * Ensure tdx->cpu_list is updated is before setting vcpu->cpu to -1, + * otherwise, a different CPU can see vcpu->cpu = -1 and add the vCPU + * to its list before its deleted from this CPUs list. + */ + smp_wmb(); + + vcpu->cpu = -1; +} + +static void tdx_disassociate_vp_arg(void *vcpu) +{ + tdx_disassociate_vp(vcpu); +} + +static void tdx_disassociate_vp_on_cpu(struct kvm_vcpu *vcpu) +{ + int cpu = vcpu->cpu; + + if (unlikely(cpu == -1)) + return; + + smp_call_function_single(cpu, tdx_disassociate_vp_arg, vcpu, 1); +} + static void tdx_clear_page(unsigned long page_pa) { const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0))); @@ -186,6 +223,85 @@ static void tdx_reclaim_td_page(unsigned long td_page_pa) free_page((unsigned long)__va(td_page_pa)); } +struct tdx_flush_vp_arg { + struct kvm_vcpu *vcpu; + u64 err; +}; + +static void tdx_flush_vp(void *arg_) +{ + struct tdx_flush_vp_arg *arg = arg_; + struct kvm_vcpu *vcpu = arg->vcpu; + u64 err; + + arg->err = 0; + lockdep_assert_irqs_disabled(); + + /* Task migration can race with CPU offlining. */ + if (unlikely(vcpu->cpu != raw_smp_processor_id())) + return; + + /* + * No need to do TDH_VP_FLUSH if the vCPU hasn't been initialized. The + * list tracking still needs to be updated so that it's correct if/when + * the vCPU does get initialized. + */ + if (is_td_vcpu_created(to_tdx(vcpu))) { + /* + * No need to retry. TDX Resources needed for TDH.VP.FLUSH are, + * TDVPR as exclusive, TDR as shared, and TDCS as shared. This + * vp flush function is called when destructing vcpu/TD or vcpu + * migration. No other thread uses TDVPR in those cases. + */ + err = tdh_vp_flush(to_tdx(vcpu)->tdvpr_pa); + if (unlikely(err && err != TDX_VCPU_NOT_ASSOCIATED)) { + /* + * This function is called in IPI context. Do not use + * printk to avoid console semaphore. + * The caller prints out the error message, instead. + */ + if (err) + arg->err = err; + } + } + + tdx_disassociate_vp(vcpu); +} + +static void tdx_flush_vp_on_cpu(struct kvm_vcpu *vcpu) +{ + struct tdx_flush_vp_arg arg = { + .vcpu = vcpu, + }; + int cpu = vcpu->cpu; + + if (unlikely(cpu == -1)) + return; + + smp_call_function_single(cpu, tdx_flush_vp, &arg, 1); + if (WARN_ON_ONCE(arg.err)) { + pr_err("cpu: %d ", cpu); + pr_tdx_error(TDH_VP_FLUSH, arg.err, NULL); + } +} + +void tdx_hardware_disable(void) +{ + int cpu = raw_smp_processor_id(); + struct list_head *tdvcpus = &per_cpu(associated_tdvcpus, cpu); + struct tdx_flush_vp_arg arg; + struct vcpu_tdx *tdx, *tmp; + unsigned long flags; + + local_irq_save(flags); + /* Safe variant needed as tdx_disassociate_vp() deletes the entry. */ + list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list) { + arg.vcpu = &tdx->vcpu; + tdx_flush_vp(&arg); + } + local_irq_restore(flags); +} + static int tdx_do_tdh_phymem_cache_wb(void *param) { u64 err = 0; @@ -210,6 +326,8 @@ void tdx_mmu_release_hkid(struct kvm *kvm) struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); cpumask_var_t packages; bool cpumask_allocated; + struct kvm_vcpu *vcpu; + unsigned long j; u64 err; int ret; int i; @@ -220,6 +338,19 @@ void tdx_mmu_release_hkid(struct kvm *kvm) if (!is_td_created(kvm_tdx)) goto free_hkid; + kvm_for_each_vcpu(j, vcpu, kvm) + tdx_flush_vp_on_cpu(vcpu); + + mutex_lock(&tdx_lock); + err = tdh_mng_vpflushdone(kvm_tdx->tdr_pa); + mutex_unlock(&tdx_lock); + if (WARN_ON_ONCE(err)) { + pr_tdx_error(TDH_MNG_VPFLUSHDONE, err, NULL); + pr_err("tdh_mng_vpflushdone failed. HKID %d is leaked.\n", + kvm_tdx->hkid); + return; + } + cpumask_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL); cpus_read_lock(); for_each_online_cpu(i) { @@ -406,6 +537,26 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) return 0; } +void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + + if (vcpu->cpu == cpu) + return; + + tdx_flush_vp_on_cpu(vcpu); + + local_irq_disable(); + /* + * Pairs with the smp_wmb() in tdx_disassociate_vp() to ensure + * vcpu->cpu is read before tdx->cpu_list. + */ + smp_rmb(); + + list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu)); + local_irq_enable(); +} + void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) { struct vcpu_tdx *tdx = to_tdx(vcpu); @@ -455,6 +606,16 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu) return; } + /* + * When destroying VM, kvm_unload_vcpu_mmu() calls vcpu_load() for every + * vcpu after they already disassociated from the per cpu list by + * tdx_mmu_release_hkid(). So we need to disassociate them again, + * otherwise the freed vcpu data will be accessed when do + * list_{del,add}() on associated_tdvcpus list later. + */ + tdx_disassociate_vp_on_cpu(vcpu); + WARN_ON_ONCE(vcpu->cpu != -1); + if (tdx->tdvpx_pa) { for (i = 0; i < tdx_info.nr_tdvpx_pages; i++) { if (tdx->tdvpx_pa[i]) @@ -1776,6 +1937,10 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops) return -EINVAL; } + /* tdx_hardware_disable() uses associated_tdvcpus. */ + for_each_possible_cpu(i) + INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, i)); + for (i = 0; i < ARRAY_SIZE(tdx_uret_msrs); i++) { /* * Here it checks if MSRs (tdx_uret_msrs) can be saved/restored diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 2970536e014a..da7e83dc34b8 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -70,6 +70,8 @@ struct vcpu_tdx { unsigned long tdvpr_pa; unsigned long *tdvpx_pa; + struct list_head cpu_list; + union tdx_exit_reason exit_reason; bool initialized; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 8fcc5807e594..231da434a08b 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -138,6 +138,7 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu); #ifdef CONFIG_INTEL_TDX_HOST int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops); void tdx_hardware_unsetup(void); +void tdx_hardware_disable(void); bool tdx_is_vm_type_supported(unsigned long type); int tdx_offline_cpu(void); @@ -154,6 +155,7 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu); void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu); void tdx_vcpu_put(struct kvm_vcpu *vcpu); +void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu); u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); @@ -165,6 +167,7 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); #else static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return -EOPNOTSUPP; } static inline void tdx_hardware_unsetup(void) {} +static inline void tdx_hardware_disable(void) {} static inline bool tdx_is_vm_type_supported(unsigned long type) { return false; } static inline int tdx_offline_cpu(void) { return 0; } @@ -184,6 +187,7 @@ static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {} static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu) { return EXIT_FASTPATH_NONE; } static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {} static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {} +static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {} static inline u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { return 0; } static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } -- 2.25.1