Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755507AbeAILH4 (ORCPT + 1 other); Tue, 9 Jan 2018 06:07:56 -0500 Received: from mail-lf0-f68.google.com ([209.85.215.68]:36064 "EHLO mail-lf0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752345AbeAILHx (ORCPT ); Tue, 9 Jan 2018 06:07:53 -0500 X-Google-Smtp-Source: ACJfBovWemKHmbm+AqTk1fNkoVRIkIcQMV+hY+CeChmQc07hY25tA5LvsTipjLSD4xOM3sRaLFmULg== Subject: Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU To: Liran Alon , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: jmattson@google.com, aliguori@amazon.com, thomas.lendacky@amd.com, dwmw@amazon.co.uk, bp@alien8.de References: <1515434925-10250-1-git-send-email-pbonzini@redhat.com> <1515434925-10250-7-git-send-email-pbonzini@redhat.com> <5A53CDE0.7030607@ORACLE.COM> From: Paolo Bonzini Message-ID: <2ea3137a-c63c-cf1e-7d90-ed6416ba443d@redhat.com> Date: Tue, 9 Jan 2018 12:07:48 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <5A53CDE0.7030607@ORACLE.COM> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 08/01/2018 21:00, Liran Alon wrote: > > > On 08/01/18 20:08, Paolo Bonzini wrote: >> From: Tom Lendacky >> >> Set IBPB (Indirect Branch Prediction Barrier) when the current CPU is >> going to run a VCPU different from what was previously run.  Nested >> virtualization uses the same VMCB for the second level guest, but the >> L1 hypervisor should be using IBRS to protect itself. >> >> Signed-off-by: Tom Lendacky >> Signed-off-by: Paolo Bonzini >> --- >>   arch/x86/kvm/svm.c | 31 +++++++++++++++++++++++++++++++ >>   1 file changed, 31 insertions(+) >> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c >> index 779981a00d01..dd744d896cec 100644 >> --- a/arch/x86/kvm/svm.c >> +++ b/arch/x86/kvm/svm.c >> @@ -289,6 +289,7 @@ struct amd_svm_iommu_ir { >>   module_param(vgif, int, 0444); >> >>   static bool __read_mostly have_spec_ctrl; >> +static bool __read_mostly have_ibpb_support; >> >>   static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0); >>   static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa); >> @@ -540,6 +541,7 @@ struct svm_cpu_data { >>       struct kvm_ldttss_desc *tss_desc; >> >>       struct page *save_area; >> +    struct vmcb *current_vmcb; >>   }; >> >>   static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); >> @@ -1151,6 +1153,11 @@ static __init int svm_hardware_setup(void) >>           pr_info("kvm: SPEC_CTRL available\n"); >>       else >>           pr_info("kvm: SPEC_CTRL not available\n"); >> +    have_ibpb_support = have_spec_ctrl || cpu_has_ibpb_support(); >> +    if (have_ibpb_support) >> +        pr_info("kvm: IBPB_SUPPORT available\n"); >> +    else >> +        pr_info("kvm: IBPB_SUPPORT not available\n"); >> >>       return 0; >> >> @@ -1725,11 +1732,19 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) >>       __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); >>       kvm_vcpu_uninit(vcpu); >>       kmem_cache_free(kvm_vcpu_cache, svm); >> + >> +    /* >> +     * The VMCB could be recycled, causing a false negative in >> +     * svm_vcpu_load; block speculative execution. >> +     */ >> +    if (have_ibpb_support) >> +        wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB); >>   } >> >>   static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) >>   { >>       struct vcpu_svm *svm = to_svm(vcpu); >> +    struct svm_cpu_data *sd = per_cpu(svm_data, cpu); >>       int i; >> >>       if (unlikely(cpu != vcpu->cpu)) { >> @@ -1758,6 +1773,12 @@ static void svm_vcpu_load(struct kvm_vcpu >> *vcpu, int cpu) >>       if (static_cpu_has(X86_FEATURE_RDTSCP)) >>           wrmsrl(MSR_TSC_AUX, svm->tsc_aux); >> >> +    if (sd->current_vmcb != svm->vmcb) { >> +        sd->current_vmcb = svm->vmcb; >> +        if (have_ibpb_support) >> +            wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB); >> +    } >> + >>       avic_vcpu_load(vcpu, cpu); >>   } >> >> @@ -2798,6 +2819,11 @@ static int nested_svm_vmexit(struct vcpu_svm *svm) >>       if (!nested_vmcb) >>           return 1; >> >> +    /* >> +     * No need for IBPB here, the L1 hypervisor should be running with >> +     * IBRS=1 and inserts one already when switching L2 VMs. >> +     */ >> + > > I am not fully convinced yet this is OK from security perspective. > From the CPU point-of-view, both L1 & L2 run in the same prediction-mode > (as they are both running in SVM guest-mode). Therefore, IBRS will not > hide the BHB & BTB of L1 from L2. Indeed, IBRS will not hide the indirect branch predictor state of L2 user mode from L1 user mode. On current generation processors, as I understand it, IBRS simply disables the indirect branch predictor when set to 1. Therefore, as long as the L1 hypervisor sets IBRS=1, the indirect branch predictor state left by L2 does not affect execution of the L1 hypervisor. Future processors might have a mode that says "just set IBRS=1 and I'll DTRT". If that's done by indexing the BTB using the full virtual address, the CPL _and_ the ASID/PCID/VPID, then IBPB is not needed here because the L2 VM uses a different ASID. If that's done by only augmenting the BTB index with the CPL, we'd need an IBPB. But in this new world we've been thrown into, that would be a processor bug... Note that AMD currently doesn't implement IBRS at all. In order to mitigate against CVE-2017-5715, the hypervisor should issue an IBPB on every vmexit (in both L0 and L1). However, the cost of that is very high. While we may want a paranoia mode that does it, it is outside the scope of this series (which is more of a "let's unblock QEMU patches" thing than a complete fix). > 6. One can implicitly assume that L1 hypervisor did issued a IBPB when > loading an VMCB and therefore it doesn't have to do it again in L0. > However, I see 2 problems with this approach: > (a) We don't protect old non-patched L1 hypervisors from Spectre even > though we could easily do that from L0 with small performance hit. Yeah, that's a nice thing to have. However, I disagree on the "small" performance hit... on a patched hypervisor, the cost of a double fix can be very noticeable. > (b) When L1 hypervisor runs only a single L2 VM, it doesn't have to > issue an IBPB when loading the VMCB (as it didn't run any other VMCB > before) and it should be sufficient from L1 perspective to just write 1 > to IBRS. However, in our nested-case, this is a security-hole. This is the main difference in our reasoning. I think IBRS=1 (or IBPB on vmexit) should be enough. Paolo > Am I misunderstanding something? > > Regards, > -Liran > >>       /* Exit Guest-Mode */ >>       leave_guest_mode(&svm->vcpu); >>       svm->nested.vmcb = 0; >> @@ -3061,6 +3087,11 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm) >>       if (!nested_vmcb) >>           return false; >> >> +    /* >> +     * No need for IBPB here, since the nested VM is less >> privileged.  The >> +     * L1 hypervisor inserts one already when switching L2 VMs. >> +     */ >> + >>       if (!nested_vmcb_checks(nested_vmcb)) { >>           nested_vmcb->control.exit_code    = SVM_EXIT_ERR; >>           nested_vmcb->control.exit_code_hi = 0; >> >