Received: by 10.223.176.5 with SMTP id f5csp2075186wra; Wed, 31 Jan 2018 16:30:12 -0800 (PST) X-Google-Smtp-Source: AH8x2257lre4dGy8JxNBBVAyn5lGndpQYvZWY7zHp8+BxXxoE/vu0ahnXpsjyKUX61IFTK29yVki X-Received: by 2002:a17:902:a589:: with SMTP id az9-v6mr9660593plb.298.1517445012105; Wed, 31 Jan 2018 16:30:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517445012; cv=none; d=google.com; s=arc-20160816; b=VgXn/gjbyT8bAetdXq0CTQYITqAHleQjcnqTXMBD522q4C0KZFNPMX2qLL7HDRviwy xjYqAVB8JWvxYXevnSIaE5O3RkqZ4rXaY4OcsFrfXuIXJWcXU5kE7TPuNup3XDO9dgIs 7/Gp+zGTfCBOxAF51mzJcwIGpJdO5voYPIZG4lm+VEXCgBK0lAFauxBU2BZ3VQ/8VAh/ /ejyreKpHFkwgUfBg78j4DB3ndHT89plIa/aghE04O7kh/K7Kw+kpuqNvGliile9DjbV Nff7xLl2PaQWyunakIXNk49lasalQdC1DieCm8xOYDOwx1mW/mGXfIsJ642JnXvK8zPl nO3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject :dkim-signature:arc-authentication-results; bh=V6Bih04q3Ys/4pTsGq0oZhoDiNvpyBDMHczjRqJczcE=; b=D3j1U+MKqVXcYW7JcI1tNxhnZb3Ng6gsvzs9Pns8tkjRILRq4/s0ktXxiHlFLMxHb4 c8gp6kUS3uwVHOHfDThaYh7yt+4oOo5yik31UEAAUyWcsXunrXYNE18FKrZczLF9Ny8o Zor1LWQ74qhcuFQSRBBQo1WtcASC0U1lGxfEmJB+tTnYOQrYEsVfJvOIjsdOJCldTLKn Xt/oDimqZzTmK+t5sJ/fpzASysEtNgAiozHy2/RyfMVkJyTyL+I6yTKPa5AWSIfXseHa UWMdsw6gOVsD2we5bWmIDMZdCSz8Rrns7M/o1XKBa4Ro7dcIssB5Cw9O1V9u4OkudHJu aFEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=bwBArsnN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bj5-v6si692344plb.732.2018.01.31.16.29.57; Wed, 31 Jan 2018 16:30:12 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=bwBArsnN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932144AbeBAA3A (ORCPT + 99 others); Wed, 31 Jan 2018 19:29:00 -0500 Received: from smtp-fw-33001.amazon.com ([207.171.190.10]:1490 "EHLO smtp-fw-33001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753461AbeBAA26 (ORCPT ); Wed, 31 Jan 2018 19:28:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1517444938; x=1548980938; h=subject:to:cc:references:from:message-id:date: mime-version:in-reply-to; bh=V6Bih04q3Ys/4pTsGq0oZhoDiNvpyBDMHczjRqJczcE=; b=bwBArsnNPuurT7JVSIy0SHtwR/mD4pNvnZSGDVpq9Muzxek5KvGNnAUM o34LDNaYvR63P4YXPXbFNasDVwSkYWb1pSZgrOzCi3XbjbvduCvz6Mgh5 b3nrOoekGAdu+ESg/H/YAOwiDrXllItJ7mnHsoqyUwa76y9NCqdtIA8qI M=; X-Amazon-filename: 0002-KVM-x86-Add-IBPB-support.patch X-IronPort-AV: E=Sophos;i="5.46,442,1511827200"; d="scan'208,223";a="717716919" Received: from sea3-co-svc-lb6-vlan2.sea.amazon.com (HELO email-inbound-relay-1d-74cf8b49.us-east-1.amazon.com) ([10.47.22.34]) by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 01 Feb 2018 00:27:28 +0000 Received: from EX13MTAUEA001.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166]) by email-inbound-relay-1d-74cf8b49.us-east-1.amazon.com (8.14.7/8.14.7) with ESMTP id w110RHkx027044 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Thu, 1 Feb 2018 00:27:18 GMT Received: from EX13D01EUB003.ant.amazon.com (10.43.166.248) by EX13MTAUEA001.ant.amazon.com (10.43.61.243) with Microsoft SMTP Server (TLS) id 15.0.1236.3; Thu, 1 Feb 2018 00:27:16 +0000 Received: from [10.85.102.109] (10.43.160.193) by EX13D01EUB003.ant.amazon.com (10.43.166.248) with Microsoft SMTP Server (TLS) id 15.0.1236.3; Thu, 1 Feb 2018 00:27:07 +0000 Subject: Re: [PATCH v5 2/5] KVM: x86: Add IBPB support To: Jim Mattson , David Woodhouse CC: KarimAllah Ahmed , kvm list , LKML , the arch/x86 maintainers , Ashok Raj , Asit Mallick , Dave Hansen , "Arjan Van De Ven" , Tim Chen , Linus Torvalds , Andrea Arcangeli , Andi Kleen , Thomas Gleixner , Dan Williams , Jun Nakajima , Andy Lutomirski , Greg KH , Paolo Bonzini , "Peter Zijlstra" References: <1517427467-28567-1-git-send-email-karahmed@amazon.de> <1517427467-28567-3-git-send-email-karahmed@amazon.de> <1517428398.18619.197.camel@infradead.org> From: KarimAllah Ahmed Message-ID: Date: Thu, 1 Feb 2018 01:27:02 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------D41975B490A2E466640D6590" Content-Language: en-US X-Originating-IP: [10.43.160.193] X-ClientProxiedBy: EX13D06UWA001.ant.amazon.com (10.43.160.220) To EX13D01EUB003.ant.amazon.com (10.43.166.248) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --------------D41975B490A2E466640D6590 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: base64 T24gMDEvMzEvMjAxOCAwODo1NSBQTSwgSmltIE1hdHRzb24gd3JvdGU6Cj4gT24gV2VkLCBKYW4g MzEsIDIwMTggYXQgMTE6NTMgQU0sIERhdmlkIFdvb2Rob3VzZSA8ZHdtdzJAaW5mcmFkZWFkLm9y Zz4gd3JvdGU6Cj4+IFJhdGhlciB0aGFuIGRvaW5nIHRoZSBleHBlbnNpdmUgZ3Vlc3RfY3B1X2hh cygpIGV2ZXJ5IHRpbWUgKHdoaWNoIGlzCj4+IHdvcnNlIG5vdyBhcyB3ZSByZWFsaXNlZCB3ZSBu ZWVkIHR3byBvZiB0aGVtKSBwZXJoYXBzIHdlIHNob3VsZAo+PiBpbnRyb2R1Y2UgYSBsb2NhbCBm bGFnIGZvciB0aGF0IHRvbz8KPiAKPiBUaGF0IHNvdW5kcyBnb29kIHRvIG1lLgo+IAoKRG9uZS4K QW1hem9uIERldmVsb3BtZW50IENlbnRlciBHZXJtYW55IEdtYkgKQmVybGluIC0gRHJlc2RlbiAt IEFhY2hlbgptYWluIG9mZmljZTogS3JhdXNlbnN0ci4gMzgsIDEwMTE3IEJlcmxpbgpHZXNjaGFl ZnRzZnVlaHJlcjogRHIuIFJhbGYgSGVyYnJpY2gsIENocmlzdGlhbiBTY2hsYWVnZXIKVXN0LUlE OiBERTI4OTIzNzg3OQpFaW5nZXRyYWdlbiBhbSBBbXRzZ2VyaWNodCBDaGFybG90dGVuYnVyZyBI UkIgMTQ5MTczIEIK --------------D41975B490A2E466640D6590 Content-Type: text/x-patch; name="0002-KVM-x86-Add-IBPB-support.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0002-KVM-x86-Add-IBPB-support.patch" From d51391ae3667f85cd1d6160e83c1d6c28b47b7d8 Mon Sep 17 00:00:00 2001 From: Ashok Raj Date: Thu, 11 Jan 2018 17:32:19 -0800 Subject: [PATCH] KVM: x86: Add IBPB support The Indirect Branch Predictor Barrier (IBPB) is an indirect branch control mechanism. It keeps earlier branches from influencing later ones. Unlike IBRS and STIBP, IBPB does not define a new mode of operation. It's a command that ensures predicted branch targets aren't used after the barrier. Although IBRS and IBPB are enumerated by the same CPUID enumeration, IBPB is very different. IBPB helps mitigate against three potential attacks: * Mitigate guests from being attacked by other guests. - This is addressed by issing IBPB when we do a guest switch. * Mitigate attacks from guest/ring3->host/ring3. These would require a IBPB during context switch in host, or after VMEXIT. The host process has two ways to mitigate - Either it can be compiled with retpoline - If its going through context switch, and has set !dumpable then there is a IBPB in that path. (Tim's patch: https://patchwork.kernel.org/patch/10192871) - The case where after a VMEXIT you return back to Qemu might make Qemu attackable from guest when Qemu isn't compiled with retpoline. There are issues reported when doing IBPB on every VMEXIT that resulted in some tsc calibration woes in guest. * Mitigate guest/ring0->host/ring0 attacks. When host kernel is using retpoline it is safe against these attacks. If host kernel isn't using retpoline we might need to do a IBPB flush on every VMEXIT. Even when using retpoline for indirect calls, in certain conditions 'ret' can use the BTB on Skylake-era CPUs. There are other mitigations available like RSB stuffing/clearing. * IBPB is issued only for SVM during svm_free_vcpu(). VMX has a vmclear and SVM doesn't. Follow discussion here: https://lkml.org/lkml/2018/1/15/146 Please refer to the following spec for more details on the enumeration and control. Refer here to get documentation about mitigations. https://software.intel.com/en-us/side-channel-security-support [peterz: rebase and changelog rewrite] [karahmed: - rebase - vmx: expose PRED_CMD if guest has it in CPUID - svm: only pass through IBPB if guest has it in CPUID - vmx: support !cpu_has_vmx_msr_bitmap()] - vmx: support nested] [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS) PRED_CMD is a write-only MSR] Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Signed-off-by: Ashok Raj Signed-off-by: Peter Zijlstra (Intel) Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com Signed-off-by: David Woodhouse Signed-off-by: KarimAllah Ahmed --- v6: - introduce pred_cmd_used v5: - Use MSR_TYPE_W instead of MSR_TYPE_R for the MSR. - Always merge the bitmaps unconditionally. - Add PRED_CMD to direct_access_msrs. - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes - rewrite the commit message (from ashok.raj@) --- arch/x86/kvm/cpuid.c | 11 ++++++++++- arch/x86/kvm/svm.c | 28 ++++++++++++++++++++++++++++ arch/x86/kvm/vmx.c | 42 ++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 78 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index c0eb337..033004d 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) | 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM); + /* cpuid 0x80000008.ebx */ + const u32 kvm_cpuid_8000_0008_ebx_x86_features = + F(IBPB); + /* cpuid 0xC0000001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) | @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (!g_phys_as) g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); - entry->ebx = entry->edx = 0; + entry->edx = 0; + /* IBPB isn't necessarily present in hardware cpuid */ + if (boot_cpu_has(X86_FEATURE_IBPB)) + entry->ebx |= F(IBPB); + entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; + cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; } case 0x80000019: diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f40d0da..bfbb7b9 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -250,6 +250,7 @@ static const struct svm_direct_access_msrs { { .index = MSR_SYSCALL_MASK, .always = true }, #endif { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false }, + { .index = MSR_IA32_PRED_CMD, .always = false }, { .index = MSR_IA32_LASTBRANCHTOIP, .always = false }, { .index = MSR_IA32_LASTINTFROMIP, .always = false }, { .index = MSR_IA32_LASTINTTOIP, .always = false }, @@ -529,6 +530,7 @@ struct svm_cpu_data { struct kvm_ldttss_desc *tss_desc; struct page *save_area; + struct vmcb *current_vmcb; }; static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); @@ -1703,11 +1705,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, svm); + /* + * The vmcb page can be recycled, causing a false negative in + * svm_vcpu_load(). So do a full IBPB now. + */ + indirect_branch_prediction_barrier(); } static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_svm *svm = to_svm(vcpu); + struct svm_cpu_data *sd = per_cpu(svm_data, cpu); int i; if (unlikely(cpu != vcpu->cpu)) { @@ -1736,6 +1744,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (static_cpu_has(X86_FEATURE_RDTSCP)) wrmsrl(MSR_TSC_AUX, svm->tsc_aux); + if (sd->current_vmcb != svm->vmcb) { + sd->current_vmcb = svm->vmcb; + indirect_branch_prediction_barrier(); + } avic_vcpu_load(vcpu, cpu); } @@ -3684,6 +3696,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr); break; + case MSR_IA32_PRED_CMD: + if (!msr->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB)) + return 1; + + if (data & ~PRED_CMD_IBPB) + return 1; + + if (!data) + break; + + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); + if (is_guest_mode(vcpu)) + break; + set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1); + break; case MSR_STAR: svm->vmcb->save.star = data; break; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d46a61b..c057a0a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -592,6 +592,14 @@ struct vcpu_vmx { u64 msr_host_kernel_gs_base; u64 msr_guest_kernel_gs_base; #endif + + /* + * This indicates that: + * 1) guest_cpuid_has(X86_FEATURE_IBPB) = true && + * 2) The guest has initiated a write against the MSR. + */ + bool pred_cmd_used; + u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; u32 secondary_exec_control; @@ -2285,6 +2293,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) { per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; vmcs_load(vmx->loaded_vmcs->vmcs); + indirect_branch_prediction_barrier(); } if (!already_loaded) { @@ -3342,6 +3351,28 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr_info); break; + case MSR_IA32_PRED_CMD: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) && + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) + return 1; + + vmx->pred_cmd_used = true; + + if (data & ~PRED_CMD_IBPB) + return 1; + + if (!data) + break; + + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); + + if (is_guest_mode(vcpu)) + break; + + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, + MSR_TYPE_W); + break; case MSR_IA32_CR_PAT: if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) @@ -10045,8 +10076,8 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, unsigned long *msr_bitmap_l1; unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; - /* This shortcut is ok because we support only x2APIC MSRs so far. */ - if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) + if (!nested_cpu_has_virt_x2apic_mode(vmcs12) && + !to_vmx(vcpu)->pred_cmd_used) return false; page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap); @@ -10079,6 +10110,13 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, MSR_TYPE_W); } } + + if (to_vmx(vcpu)->pred_cmd_used) + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PRED_CMD, + MSR_TYPE_W); + kunmap(page); kvm_release_page_clean(page); -- 2.7.4 --------------D41975B490A2E466640D6590--