Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp501538rdb; Thu, 30 Nov 2023 10:08:32 -0800 (PST) X-Google-Smtp-Source: AGHT+IFWHMkpTI+xRbfUOyd+Am/fGKlAMhRqwV9rd0v6fJevWGe0qdGxysmsLSr7N/l4xfKmvuP6 X-Received: by 2002:a05:6a00:1401:b0:68a:5cf8:dac5 with SMTP id l1-20020a056a00140100b0068a5cf8dac5mr25915388pfu.22.1701367711794; Thu, 30 Nov 2023 10:08:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701367711; cv=none; d=google.com; s=arc-20160816; b=kk6xuJS5Ic7qOTvtuKc+7vaLAATsVa3JIZlKOr/RcnODFG4VTuArkwnmvri5z7Lwwj NMsxLX/gf6/Wb5D+7sF4ZavsdZctg0hKK3kC6DppyjsuM4KL17MOgpjLUCQ9MBdj/8XZ rb+dN+cls/YSADg6Ga9ALDDMGwC6Xw+zir+o1bMgr9f/iiSLCZOeh+3G9ily6GsLEeSq sP4DcnFJ3TAHUaGDuIIwxUNfiYB0QRjlh1H57Aiz6qq7G91Wnqye0gmO5HyalztuV5H6 9cLi+WeI+8ZYXiwatMEhqnyKYN650p6tfuq0ZhUICru0wn97gxIxIFe6UXrcg/vB83mL a9Uw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=2jFx75F08lvUnGClRbcPA/+lk5WtIKWp7dBnqR8MnnQ=; fh=qm/3G+Adpj/B7TaijAEJbs/udoUfJHgp4MqQlWOh7ts=; b=r3pMCa9nL2o4cLQA4zkXFweENH+HU3bGRGUk4EzRXsms7Aq3IHuI58EWwBHxxSscWA JI2Sgdug+S+oMim9w+/U0Gf0ZJ0Y3dPQ5Y+O7o/tswB+CN54nfrV8dQTSFdji1YYRUHw L8WhaQOmqCvr+nJ+dG8YjNEnuSom4uJSw70P0I0GAuzOwNy3WMCaVx/vibuOwiuGwTMU 3MUfaYiPfXd8HbYK/LN57Sfedfp29MS7GmHoHOg6TjknzUcSgdLnF8xFtXr/PXky1wQa 4Wujv1rW+vwNvrNOdx59t5WGlapjbMnup/A9UTMUXamVNCx1kxt+4rA6aX8kwhMjnNke Ifrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aMyjcWBw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id r11-20020a63ec4b000000b005c2783b71d7si1730947pgj.354.2023.11.30.10.08.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 10:08:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aMyjcWBw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id E74F280A6BEB; Thu, 30 Nov 2023 10:08:28 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346723AbjK3SIK (ORCPT + 99 others); Thu, 30 Nov 2023 13:08:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346731AbjK3SIJ (ORCPT ); Thu, 30 Nov 2023 13:08:09 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56E8210E5 for ; Thu, 30 Nov 2023 10:08:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1701367694; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2jFx75F08lvUnGClRbcPA/+lk5WtIKWp7dBnqR8MnnQ=; b=aMyjcWBwQfFXlPWlQaqfAqtAssHVkplFM1+IiLCSRB5XIe/0vvZYwfSpOsZ5/KcJfz6CuE VC/JMGheZ4zgCoQw+Q3MZrtwR+cnckFgBY90EbuOcw1960973Ew+ZQMg4WsT0JHB/Gj90p ZCzMdpdVs4VMJ6efscj5LJeehSbw80I= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-235-lQX9AhICOIiZe5MP3NGn4w-1; Thu, 30 Nov 2023 13:08:12 -0500 X-MC-Unique: lQX9AhICOIiZe5MP3NGn4w-1 Received: by mail-lf1-f71.google.com with SMTP id 2adb3069b0e04-50bc961b435so1261737e87.2 for ; Thu, 30 Nov 2023 10:08:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701367689; x=1701972489; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=2jFx75F08lvUnGClRbcPA/+lk5WtIKWp7dBnqR8MnnQ=; b=aKKEagl1zyZ4FxkODaTO7AaWyAugZbzAmVHdssf/8mdsxDWvbC1e1zZ1DQ6XXS0ySH vlFR0o9VrPdv3vhji5l3nBM1yPu3Xjw3Xt9TpnwBISXtH8xyeXTV3WJY23qy2f0VaNUs 2ETbs6AvzjEYjzzmyMZkBegx+zh2VvjX6ZD34g2vJ0IiYXsOUbuAELS7k5CdPqb2pwR9 Ee3U7NxFAVou8UrnrB9E2gc6tosMoporx9V7YqfSDHzrD0QWD7wY1zOmkuCcSJWDa9qz kLC+LS5DOewwRYkbEX41AmTq2roDWY10NmdEYVH27AZhAySyXifK44ivY5JLRvVQjuQC VT6Q== X-Gm-Message-State: AOJu0YwVwmeph30E8CoOMLUpfo/QICDKUbchQU2PnthVvs0EF/swLPU0 RXZA68E5ZFNd5vXBxL+VG9xm/o47c5u9zPU/epVkS+b6JpR4dno8SzFmUKxh5BK7zd5RyyayBcm 6o6ZgKQbZs3/7N2bznd5ZeGdkmhE1dpuo X-Received: by 2002:a05:600c:1d24:b0:40b:4aed:ef31 with SMTP id l36-20020a05600c1d2400b0040b4aedef31mr7149wms.21.1701366870198; Thu, 30 Nov 2023 09:54:30 -0800 (PST) X-Received: by 2002:a17:906:739c:b0:a10:7811:f421 with SMTP id f28-20020a170906739c00b00a107811f421mr5439ejl.5.1701366800047; Thu, 30 Nov 2023 09:53:20 -0800 (PST) Received: from starship ([5.28.147.32]) by smtp.gmail.com with ESMTPSA id g22-20020a2ea4b6000000b002c9ba689a94sm201953ljm.137.2023.11.30.09.53.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 09:53:19 -0800 (PST) Message-ID: <2e280f545e8b15500fc4a2a77f6000a51f6f8bbd.camel@redhat.com> Subject: Re: [PATCH v7 26/26] KVM: nVMX: Enable CET support for nested guest From: Maxim Levitsky To: Yang Weijiang , seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: peterz@infradead.org, chao.gao@intel.com, rick.p.edgecombe@intel.com, john.allen@amd.com Date: Thu, 30 Nov 2023 19:53:16 +0200 In-Reply-To: <20231124055330.138870-27-weijiang.yang@intel.com> References: <20231124055330.138870-1-weijiang.yang@intel.com> <20231124055330.138870-27-weijiang.yang@intel.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.5 (3.36.5-2.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 30 Nov 2023 10:08:29 -0800 (PST) On Fri, 2023-11-24 at 00:53 -0500, Yang Weijiang wrote: > Set up CET MSRs, related VM_ENTRY/EXIT control bits and fixed CR4 setting > to enable CET for nested VM. > > Note, generally L1 VMM only touches CET VMCS fields when live migration or > vmcs_{read,write}() to the fields happens, so the fields only need to be > synced in these "rare" cases. To be honest we can't assume anything about L1, but what we can assume is that if vmcs12 field is not shadowed, then L1 vmwrite/vmread will be always intercepted and during the interception the fields can be synced, however I studied this area long ago and I might be mistaken. > And here only considers the case that L1 VMM > has set VM_ENTRY_LOAD_CET_STATE in its VMCS vm_entry_controls as it's the > common usage. > > Suggested-by: Chao Gao > Signed-off-by: Yang Weijiang > --- > arch/x86/kvm/vmx/nested.c | 48 +++++++++++++++++++++++++++++++++++++-- > arch/x86/kvm/vmx/vmcs12.c | 6 +++++ > arch/x86/kvm/vmx/vmcs12.h | 14 +++++++++++- > arch/x86/kvm/vmx/vmx.c | 2 ++ > 4 files changed, 67 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index d8c32682ca76..965173650542 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -660,6 +660,28 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu, > nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, > MSR_IA32_FLUSH_CMD, MSR_TYPE_W); > > + /* Pass CET MSRs to nested VM if L0 and L1 are set to pass-through. */ > + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, > + MSR_IA32_U_CET, MSR_TYPE_RW); > + > + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, > + MSR_IA32_S_CET, MSR_TYPE_RW); > + > + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, > + MSR_IA32_PL0_SSP, MSR_TYPE_RW); > + > + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, > + MSR_IA32_PL1_SSP, MSR_TYPE_RW); > + > + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, > + MSR_IA32_PL2_SSP, MSR_TYPE_RW); > + > + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, > + MSR_IA32_PL3_SSP, MSR_TYPE_RW); > + > + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, > + MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW); > + > kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false); > > vmx->nested.force_msr_bitmap_recalc = false; > @@ -2469,6 +2491,18 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12) > if (kvm_mpx_supported() && vmx->nested.nested_run_pending && > (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)) > vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs); > + > + if (vmx->nested.nested_run_pending && I don't think that nested.nested_run_pending check is needed. prepare_vmcs02_rare is not going to be called unless the nested run is pending. > + (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE)) { > + if (guest_can_use(&vmx->vcpu, X86_FEATURE_SHSTK)) { > + vmcs_writel(GUEST_SSP, vmcs12->guest_ssp); > + vmcs_writel(GUEST_INTR_SSP_TABLE, > + vmcs12->guest_ssp_tbl); > + } > + if (guest_can_use(&vmx->vcpu, X86_FEATURE_SHSTK) || > + guest_can_use(&vmx->vcpu, X86_FEATURE_IBT)) > + vmcs_writel(GUEST_S_CET, vmcs12->guest_s_cet); > + } > } > > if (nested_cpu_has_xsaves(vmcs12)) > @@ -4300,6 +4334,15 @@ static void sync_vmcs02_to_vmcs12_rare(struct kvm_vcpu *vcpu, > vmcs12->guest_pending_dbg_exceptions = > vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); > > + if (guest_can_use(&vmx->vcpu, X86_FEATURE_SHSTK)) { > + vmcs12->guest_ssp = vmcs_readl(GUEST_SSP); > + vmcs12->guest_ssp_tbl = vmcs_readl(GUEST_INTR_SSP_TABLE); > + } > + if (guest_can_use(&vmx->vcpu, X86_FEATURE_SHSTK) || > + guest_can_use(&vmx->vcpu, X86_FEATURE_IBT)) { > + vmcs12->guest_s_cet = vmcs_readl(GUEST_S_CET); > + } The above code should be conditional on VM_ENTRY_LOAD_CET_STATE - if the guest (L2) state was loaded, then it must be updated on exit - this is usually how VMX works. Also I don't see any mention of usage of VM_EXIT_LOAD_CET_STATE, which if set, should reset the L1 CET state to values in 'host_s_cet/host_ssp/host_ssp_tbl' (This is also a common theme in VMX - host state is reset to values that the hypervisor sets in VMCS, and the hypervisor must care to update these fields itself). As a rule of thumb, if you add a field to vmcs12, you should use it somewhere, and you should never use it unconditionally, as almost always its use depends on entry or exit controls. Same is true for entry/exit/execution controls - if you add one, you almost always have to use it somewhere. Best regards, Maxim Levitsky > + > vmx->nested.need_sync_vmcs02_to_vmcs12_rare = false; > } > > @@ -6798,7 +6841,7 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf, > VM_EXIT_HOST_ADDR_SPACE_SIZE | > #endif > VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | > - VM_EXIT_CLEAR_BNDCFGS; > + VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_CET_STATE; > msrs->exit_ctls_high |= > VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | > VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER | > @@ -6820,7 +6863,8 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf, > #ifdef CONFIG_X86_64 > VM_ENTRY_IA32E_MODE | > #endif > - VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS; > + VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS | > + VM_ENTRY_LOAD_CET_STATE; > msrs->entry_ctls_high |= > (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER | > VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL); > diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c > index 106a72c923ca..4233b5ca9461 100644 > --- a/arch/x86/kvm/vmx/vmcs12.c > +++ b/arch/x86/kvm/vmx/vmcs12.c > @@ -139,6 +139,9 @@ const unsigned short vmcs12_field_offsets[] = { > FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions), > FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp), > FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip), > + FIELD(GUEST_S_CET, guest_s_cet), > + FIELD(GUEST_SSP, guest_ssp), > + FIELD(GUEST_INTR_SSP_TABLE, guest_ssp_tbl), > FIELD(HOST_CR0, host_cr0), > FIELD(HOST_CR3, host_cr3), > FIELD(HOST_CR4, host_cr4), > @@ -151,5 +154,8 @@ const unsigned short vmcs12_field_offsets[] = { > FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip), > FIELD(HOST_RSP, host_rsp), > FIELD(HOST_RIP, host_rip), > + FIELD(HOST_S_CET, host_s_cet), > + FIELD(HOST_SSP, host_ssp), > + FIELD(HOST_INTR_SSP_TABLE, host_ssp_tbl), > }; > const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets); > diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h > index 01936013428b..3884489e7f7e 100644 > --- a/arch/x86/kvm/vmx/vmcs12.h > +++ b/arch/x86/kvm/vmx/vmcs12.h > @@ -117,7 +117,13 @@ struct __packed vmcs12 { > natural_width host_ia32_sysenter_eip; > natural_width host_rsp; > natural_width host_rip; > - natural_width paddingl[8]; /* room for future expansion */ > + natural_width host_s_cet; > + natural_width host_ssp; > + natural_width host_ssp_tbl; > + natural_width guest_s_cet; > + natural_width guest_ssp; > + natural_width guest_ssp_tbl; > + natural_width paddingl[2]; /* room for future expansion */ > u32 pin_based_vm_exec_control; > u32 cpu_based_vm_exec_control; > u32 exception_bitmap; > @@ -292,6 +298,12 @@ static inline void vmx_check_vmcs12_offsets(void) > CHECK_OFFSET(host_ia32_sysenter_eip, 656); > CHECK_OFFSET(host_rsp, 664); > CHECK_OFFSET(host_rip, 672); > + CHECK_OFFSET(host_s_cet, 680); > + CHECK_OFFSET(host_ssp, 688); > + CHECK_OFFSET(host_ssp_tbl, 696); > + CHECK_OFFSET(guest_s_cet, 704); > + CHECK_OFFSET(guest_ssp, 712); > + CHECK_OFFSET(guest_ssp_tbl, 720); > CHECK_OFFSET(pin_based_vm_exec_control, 744); > CHECK_OFFSET(cpu_based_vm_exec_control, 748); > CHECK_OFFSET(exception_bitmap, 752); > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index a1aae8709939..947028ff2e25 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7734,6 +7734,8 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu) > cr4_fixed1_update(X86_CR4_PKE, ecx, feature_bit(PKU)); > cr4_fixed1_update(X86_CR4_UMIP, ecx, feature_bit(UMIP)); > cr4_fixed1_update(X86_CR4_LA57, ecx, feature_bit(LA57)); > + cr4_fixed1_update(X86_CR4_CET, ecx, feature_bit(SHSTK)); > + cr4_fixed1_update(X86_CR4_CET, edx, feature_bit(IBT)); > > #undef cr4_fixed1_update > }