Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp2214493pxb; Wed, 30 Mar 2022 19:34:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwp4gHLAdWq50dzqXfUzMWQdrc2POJ+ru8vhdBimB0CEMV63z9jXE6/MAyieFJ+jD2Hs/Sn X-Received: by 2002:a17:902:7598:b0:155:f634:5f45 with SMTP id j24-20020a170902759800b00155f6345f45mr2841143pll.28.1648694047241; Wed, 30 Mar 2022 19:34:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648694047; cv=none; d=google.com; s=arc-20160816; b=mXhoiza5iXq3+nFhokYX1SOGavkkAbhOnWI7lba1fRC6xtqGxA+GLoSjNBZkGVxU81 /dtuOwZmx77kZDy5ymTVEHQCwCF/NvSFmaATnDNzZyl+c2/jDUQbsGIbiDAgCVrtmcXw ZHmmDS6PYr6CzJ/eW813li3wQICUzge0a4t6/YL9+1JmFFNL5PN1h2RKGI8flO86zdoY Pecag9J6PRrhBB49Dp1VE6Jj4CsMZAD+dnXQ9SxTqoSPgGrvf3gaTlOs7wa+rEvs6ct/ I2Uneznl0vE+ATULUU2RXNVUKimdcj5KmgN/62eeMS5LEM+6d8OeQCSoRWUQv3aBkwLb 8cqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Lu/OaZ0ppwhDx18TN60M/FhAWhGydJGFXEfzdV2g8HY=; b=iIInl9NTF5Qu9eGddR9hwj0fCgGcvqwdLRXgTvpjjNsGMlOqR3TvajhezrYxAJMg39 yxmYforB8b1O4vxu5vmiIId6tFDk47d8p3hztCoosnJ9A8oCFetXqBZJoUL6cQV2lEm6 WXUHY7L++2n7Q9RJ+TsbM8LSe5yeKlr1iwUud81v6g5YZPBgDfcnzCK5M7QKne0rVHIF fjIwYSRvOek4UDVecgU7Apvk0QGnFF7dsrK9kFhdSwhq0AhW61/cxUIO0FM0RiqnaRZM VwvyY4ttCAVBXWQYaA0GGbacvVcBkMux8Nb43OcyuVefRG7Wz76iVdkFykP18hjfdXc3 MeNg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=RAqIT9MK; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id pf13-20020a17090b1d8d00b001c361252184si1953361pjb.141.2022.03.30.19.34.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 19:34:07 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=RAqIT9MK; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2A2662D1F6; Wed, 30 Mar 2022 19:29:27 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351340AbiC3VtT (ORCPT + 99 others); Wed, 30 Mar 2022 17:49:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348397AbiC3VtS (ORCPT ); Wed, 30 Mar 2022 17:49:18 -0400 Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3431314032 for ; Wed, 30 Mar 2022 14:47:32 -0700 (PDT) Received: by mail-pg1-x536.google.com with SMTP id z128so18501762pgz.2 for ; Wed, 30 Mar 2022 14:47:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Lu/OaZ0ppwhDx18TN60M/FhAWhGydJGFXEfzdV2g8HY=; b=RAqIT9MKFGfwgpem6ZfAgI3Lqn7WOSMQZjhERWSbGyspdBvdKxB0h7tAUgjzYtNi0H P375JcY6Y7E54a2Z0VtgXklIgfkGShzR76HPNW4ZjCFYtrxWSorkwlJo7zgY58qRyK2Q tdWV6jw+9ukpMq9pPmDTUvw0Z9ZWJxZW96erI92X7HCu+2kMeLr6yw1DUL0LwW1fV4t2 Zs+dXPMiBFRlaySqkZFNd9YiNAHBm+3yRrj/WS0nDh1+tothXFjuhhON469Xk7XKrp62 Srthzd0HpF4w4hZzoMbqoZHeUWHw86CvwfKluDBoFo08shza3gKMEuI/uk7j4NMWBZM2 Y/JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Lu/OaZ0ppwhDx18TN60M/FhAWhGydJGFXEfzdV2g8HY=; b=sLr90yzbjgL44K7pieEUAvqT5Z290VLL3sgj4UgWWBqQvXaAZIFjsKu696sOybcwPO C9Hg5mZKx2SdfJbJ7MiFy1XqB730lNRcMwnXASrhdOoYn6ltSdbcqY15I9nh8vSkma2j U6GthWWdzQDjX/cx0PAvxT6Zs1i56ksMeC7JiE3z1vv5nlyAMvbRZ17+Vd8men4HfBD/ 7DI1Fy0iq7f16oFhbe6Mtgsu1LlC6OnUVpfwoHLm34pzw5JVB89d9zqgbM7cY0reXwNo fJQ9VKWAeZs/Cul0HgDhB5diK44YEtzvABkYTpwZxw4o7eWUuyI2GcpH/ZabEp40t8h8 mtaQ== X-Gm-Message-State: AOAM533Vf/qQqDpr5SPu3zJCJzpeVkUkB7/JICOzFrYfK9TGB/E7EcHm 0VXD/R5THcCBnpELyCxmDU3I7UCiRNxIRA== X-Received: by 2002:a05:6a00:2402:b0:4e1:3df2:5373 with SMTP id z2-20020a056a00240200b004e13df25373mr1735242pfh.40.1648676850190; Wed, 30 Mar 2022 14:47:30 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id 123-20020a620681000000b004fa7c20d732sm23730995pfg.133.2022.03.30.14.47.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 14:47:29 -0700 (PDT) Date: Wed, 30 Mar 2022 21:47:26 +0000 From: Sean Christopherson To: Chenyi Qiang Cc: Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Xiaoyao Li , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 7/7] KVM: VMX: Enable PKS for nested VM Message-ID: References: <20220221080840.7369-1-chenyi.qiang@intel.com> <20220221080840.7369-8-chenyi.qiang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220221080840.7369-8-chenyi.qiang@intel.com> X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 21, 2022, Chenyi Qiang wrote: > PKS MSR passes through guest directly. Configure the MSR to match the > L0/L1 settings so that nested VM runs PKS properly. > > Signed-off-by: Chenyi Qiang > --- > arch/x86/kvm/vmx/nested.c | 38 ++++++++++++++++++++++++++++++++++++-- > arch/x86/kvm/vmx/vmcs12.c | 2 ++ > arch/x86/kvm/vmx/vmcs12.h | 4 ++++ > arch/x86/kvm/vmx/vmx.c | 1 + > arch/x86/kvm/vmx/vmx.h | 2 ++ > 5 files changed, 45 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index f235f77cbc03..c42a1df385ef 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -252,6 +252,10 @@ static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx, > dest->ds_sel = src->ds_sel; > dest->es_sel = src->es_sel; > #endif > + if (unlikely(src->pkrs != dest->pkrs)) { > + vmcs_write64(HOST_IA32_PKRS, src->pkrs); > + dest->pkrs = src->pkrs; > + } It's worth adding a helper for this, a la vmx_set_host_fs_gs(), though this one can probably be an inline in vmx.h. E.g. to yield diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index bfa37c7665a5..906a2913a886 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -252,10 +252,7 @@ static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx, dest->ds_sel = src->ds_sel; dest->es_sel = src->es_sel; #endif - if (unlikely(src->pkrs != dest->pkrs)) { - vmcs_write64(HOST_IA32_PKRS, src->pkrs); - dest->pkrs = src->pkrs; - } + vmx_set_host_pkrs(dest, src->pkrs); } static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 35fee600fae7..b6b5f1a46544 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1157,10 +1157,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) */ if (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_IA32_PKRS) { host_pkrs = get_current_pkrs(); - if (unlikely(host_pkrs != host_state->pkrs)) { - vmcs_write64(HOST_IA32_PKRS, host_pkrs); - host_state->pkrs = host_pkrs; - } + vmx_set_host_pkrs(host_state, host_pkrs); } #ifdef CONFIG_X86_64 > } > > static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs) > @@ -685,6 +689,9 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu, > nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, > MSR_IA32_PRED_CMD, MSR_TYPE_W); > > + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, > + MSR_IA32_PKRS, MSR_TYPE_RW); > + > kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false); > > vmx->nested.force_msr_bitmap_recalc = false; > @@ -2433,6 +2440,10 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12) > if (kvm_mpx_supported() && vmx->nested.nested_run_pending && > (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)) > vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs); > + > + if (vmx->nested.nested_run_pending && > + (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS)) > + vmcs_write64(GUEST_IA32_PKRS, vmcs12->guest_ia32_pkrs); > } > > if (nested_cpu_has_xsaves(vmcs12)) > @@ -2521,6 +2532,11 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, > if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending || > !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) > vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs); > + if (kvm_cpu_cap_has(X86_FEATURE_PKS) && > + (!vmx->nested.nested_run_pending || > + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS))) > + vmcs_write64(GUEST_IA32_PKRS, vmx->nested.vmcs01_guest_pkrs); > + > vmx_set_rflags(vcpu, vmcs12->guest_rflags); > > /* EXCEPTION_BITMAP and CR0_GUEST_HOST_MASK should basically be the > @@ -2897,6 +2913,10 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu, > vmcs12->host_ia32_perf_global_ctrl))) > return -EINVAL; > > + if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PKRS) && > + CC(!kvm_pkrs_valid(vmcs12->host_ia32_pkrs))) Please align the indentation: if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PKRS) && CC(!kvm_pkrs_valid(vmcs12->host_ia32_pkrs))) return -EINVAL; > + return -EINVAL; > + > #ifdef CONFIG_X86_64 > ia32e = !!(vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE); > #else > @@ -3049,6 +3069,10 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu, > if (nested_check_guest_non_reg_state(vmcs12)) > return -EINVAL; > > + if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS) && > + CC(!kvm_pkrs_valid(vmcs12->guest_ia32_pkrs))) > + return -EINVAL; > + > return 0; > } > > @@ -3377,6 +3401,9 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, > if (kvm_mpx_supported() && > !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)) > vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); > + if (kvm_cpu_cap_has(X86_FEATURE_PKS) && > + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS)) This needs read the current PKRS if from_vmentry == false, e.g. if (kvm_cpu_cap_has(X86_FEATURE_PKS) && (!from_vmentry || !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS))) because in the migration case, if nested state is set after MSR state, the value needs to come from the current MSR value, which was propagated to vmc02 (which this calls vmcs01, but whatever). Note, I'm pretty sure the GUEST_BNDCFGS code is broken, surprise surprise. > + vmx->nested.vmcs01_guest_pkrs = vmcs_read64(GUEST_IA32_PKRS); > > /* > * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and* > @@ -4022,6 +4049,7 @@ static bool is_vmcs12_ext_field(unsigned long field) > case GUEST_IDTR_BASE: > case GUEST_PENDING_DBG_EXCEPTIONS: > case GUEST_BNDCFGS: > + case GUEST_IA32_PKRS: > return true; > default: > break; > @@ -4073,6 +4101,8 @@ static void sync_vmcs02_to_vmcs12_rare(struct kvm_vcpu *vcpu, > vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); > if (kvm_mpx_supported()) > vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); > + if (guest_cpuid_has(vcpu, X86_FEATURE_PKS)) This needs to check vmx->nested.msrs.entry_ctls_* (I can never remember if it's the high or low part...). The SDM states PKRS is saved "if the processor supports the 1-setting of the 'load PKRS' VM-entry control", which is different than PKRS being supported in CPUID. Also, guest CPUID is userspace controlled, e.g. userspace could induce a failed VMREAD by giving a garbage CPUID model, where vmx->nested.msrs can only be restricted by userspace, i.e. is trusted. Happyily, checking vmx->nested.msrs is also a performance win, as guest_cpuid_has() can require walking a large array.