Received: by 10.223.176.46 with SMTP id f43csp1126914wra; Sat, 20 Jan 2018 11:27:43 -0800 (PST) X-Google-Smtp-Source: AH8x2264fT7VdFp9FyTofto597B51y7SYbXYS8vp0gkQC9baushRtW/664yOUVyDNzDaA7qDsaOk X-Received: by 10.99.97.200 with SMTP id v191mr2807213pgb.121.1516476463582; Sat, 20 Jan 2018 11:27:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516476463; cv=none; d=google.com; s=arc-20160816; b=f4LKN3Zeg98PElKNTSTmQXnec+KGYdFs5syPEsEOvM4g0r4Fks1DXzoJ45M27c+zg/ KOJnVA/WUEE/SPDqnBZlH8DGxXAEESSH5ey09dwuHIadP/HL3ll6vrNSI8mTRMAYGpPB tJdtfPMNiVXzdlSkHu5VFQ9L611ENiJW8G9o33scS62fWJaT2X3uB/1ipaRtrcSpvBjK EUKoBbq2hhGtuh2MRmPV2B0SFXPQajoDqB85dnGw1x/Sw7c9ot+Vr/bL02JeLzAu5QKu vXegqfu1MepJh9gWFmLKg5eJCeRRFXrMNkqSeRkEvr/CfUuuy9h8hcH++AM3LjttoMnR dosg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=emFkADQsW0nmfZ+TXEP2890TS5ao2WNKjsi4RYR/IL8=; b=QGM17o3gugBPJCxh3Sz0TpVyDSrzJtW6+8pXjBkwT6XesOUxv5uk26z8lUGhCgXN1E xheragGsRb5zkIKieeLUvy3ZaJUfUwP6ppIT4wHL8BZHI0fkfT15xvS+2exxTi5kytL7 5Dzj/XidYIDjtGrqQYQBPRqtX0C16imOVlk4iDUzY9xwFdyE+Coy5KjoTVefDk34Y1rV cPg7geq7sW6aF/aK6MKasSCO/RKIdBw2o1vta/EWmNuOKNsMnKxi5jcb9c0vnzVQJDv4 Hg6hI0uyx8CMj9PsBpaZacjY9uvIvgN0nsdqXTgBNjQg6GkvD/LqpqogG5q7rps7xHUM V+Iw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.de header.s=amazon201209 header.b=vefF/8mA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c1-v6si1917348plz.801.2018.01.20.11.27.29; Sat, 20 Jan 2018 11:27:43 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.de header.s=amazon201209 header.b=vefF/8mA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932942AbeATTZ0 (ORCPT + 99 others); Sat, 20 Jan 2018 14:25:26 -0500 Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:22527 "EHLO smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932742AbeATTYK (ORCPT ); Sat, 20 Jan 2018 14:24:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1516476250; x=1548012250; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=emFkADQsW0nmfZ+TXEP2890TS5ao2WNKjsi4RYR/IL8=; b=vefF/8mAx1UlW5OyhFanQNFfyEYVxZR/87GN4m9VDdwUuSYLalpkqttg dRFLrsyGh9EkpfFdF/xxH+WVqk4bFR/JRLBD/6Jli+vaTIpiW3hXA59W5 1NsEmyTyvu1chXs8LHKYYQZVi8GzgCx16JDiuFBaFi74BD9en8ZZNK/jK M=; X-IronPort-AV: E=Sophos;i="5.46,387,1511827200"; d="scan'208";a="328325152" Received: from iad6-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1a-807d4a99.us-east-1.amazon.com) ([10.124.125.6]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 20 Jan 2018 19:24:02 +0000 Received: from u54e1ad5160425a4b64ea.ant.amazon.com (iad1-ws-svc-lb91-vlan2.amazon.com [10.0.103.146]) by email-inbound-relay-1a-807d4a99.us-east-1.amazon.com (8.14.7/8.14.7) with ESMTP id w0KJNnot012683 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 20 Jan 2018 19:23:52 GMT Received: from u54e1ad5160425a4b64ea.ant.amazon.com (localhost [127.0.0.1]) by u54e1ad5160425a4b64ea.ant.amazon.com (8.15.2/8.15.2/Debian-3) with ESMTP id w0KJNkZw005306; Sat, 20 Jan 2018 20:23:46 +0100 Received: (from karahmed@localhost) by u54e1ad5160425a4b64ea.ant.amazon.com (8.15.2/8.15.2/Submit) id w0KJNjJc005303; Sat, 20 Jan 2018 20:23:45 +0100 From: KarimAllah Ahmed To: linux-kernel@vger.kernel.org Cc: KarimAllah Ahmed , Andi Kleen , Andrea Arcangeli , Andy Lutomirski , Arjan van de Ven , Ashok Raj , Asit Mallick , Borislav Petkov , Dan Williams , Dave Hansen , David Woodhouse , Greg Kroah-Hartman , "H . Peter Anvin" , Ingo Molnar , Janakarajan Natarajan , Joerg Roedel , Jun Nakajima , Laura Abbott , Linus Torvalds , Masami Hiramatsu , Paolo Bonzini , Peter Zijlstra , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= , Thomas Gleixner , Tim Chen , Tom Lendacky , kvm@vger.kernel.org, x86@kernel.org, Arjan Van De Ven Subject: [RFC 10/10] x86/enter: Use IBRS on syscall and interrupts Date: Sat, 20 Jan 2018 20:23:01 +0100 Message-Id: <1516476182-5153-11-git-send-email-karahmed@amazon.de> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516476182-5153-1-git-send-email-karahmed@amazon.de> References: <1516476182-5153-1-git-send-email-karahmed@amazon.de> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Tim Chen Stop Indirect Branch Speculation on every user space to kernel space transition and reenable it when returning to user space./ The NMI interrupt save/restore of IBRS state was based on Andrea Arcangeli's implementation. Here's an explanation by Dave Hansen on why we save IBRS state for NMI. The normal interrupt code uses the 'error_entry' path which uses the Code Segment (CS) of the instruction that was interrupted to tell whether it interrupted the kernel or userspace and thus has to switch IBRS, or leave it alone. The NMI code is different. It uses 'paranoid_entry' because it can interrupt the kernel while it is running with a userspace IBRS (and %GS and CR3) value, but has a kernel CS. If we used the same approach as the normal interrupt code, we might do the following; SYSENTER_entry <-------------- NMI HERE IBRS=1 do_something() IBRS=0 SYSRET The NMI code might notice that we are running in the kernel and decide that it is OK to skip the IBRS=1. This would leave it running unprotected with IBRS=0, which is bad. However, if we unconditionally set IBRS=1, in the NMI, we might get the following case: SYSENTER_entry IBRS=1 do_something() IBRS=0 <-------------- NMI HERE (set IBRS=1) SYSRET and we would return to userspace with IBRS=1. Userspace would run slowly until we entered and exited the kernel again. Instead of those two approaches, we chose a third one where we simply save the IBRS value in a scratch register (%r13) and then restore that value, verbatim. [karahmed use the new SPEC_CTRL_IBRS defines] Co-developed-by: Andrea Arcangeli Signed-off-by: Andrea Arcangeli Signed-off-by: Tim Chen Signed-off-by: Thomas Gleixner Signed-off-by: KarimAllah Ahmed Cc: Andi Kleen Cc: Peter Zijlstra Cc: Greg KH Cc: Dave Hansen Cc: Andy Lutomirski Cc: Paolo Bonzini Cc: Dan Williams Cc: Arjan Van De Ven Cc: Linus Torvalds Cc: David Woodhouse Cc: Ashok Raj Link: https://lkml.kernel.org/r/d5e4c03ec290c61dfbe5a769f7287817283fa6b7.1515542293.git.tim.c.chen@linux.intel.com --- arch/x86/entry/entry_64.S | 35 ++++++++++++++++++++++++++++++++++- arch/x86/entry/entry_64_compat.S | 21 +++++++++++++++++++-- 2 files changed, 53 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 63f4320..b3d90cf 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -171,6 +171,8 @@ ENTRY(entry_SYSCALL_64_trampoline) /* Load the top of the task stack into RSP */ movq CPU_ENTRY_AREA_tss + TSS_sp1 + CPU_ENTRY_AREA, %rsp + /* Restrict indirect branch speculation */ + RESTRICT_IB_SPEC /* Start building the simulated IRET frame. */ pushq $__USER_DS /* pt_regs->ss */ @@ -214,6 +216,8 @@ ENTRY(entry_SYSCALL_64) */ movq %rsp, PER_CPU_VAR(rsp_scratch) movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp + /* Restrict Indirect Branch Speculation */ + RESTRICT_IB_SPEC TRACE_IRQS_OFF @@ -409,6 +413,8 @@ syscall_return_via_sysret: pushq RSP-RDI(%rdi) /* RSP */ pushq (%rdi) /* RDI */ + /* Unrestrict Indirect Branch Speculation */ + UNRESTRICT_IB_SPEC /* * We are on the trampoline stack. All regs except RDI are live. * We can do future final exit work right here. @@ -757,11 +763,12 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode) /* Push user RDI on the trampoline stack. */ pushq (%rdi) + /* Unrestrict Indirect Branch Speculation */ + UNRESTRICT_IB_SPEC /* * We are on the trampoline stack. All regs except RDI are live. * We can do future final exit work right here. */ - SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi /* Restore RDI. */ @@ -849,6 +856,13 @@ native_irq_return_ldt: SWAPGS /* to kernel GS */ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi /* to kernel CR3 */ + /* + * There is no point in disabling Indirect Branch Speculation + * here as this is going to return to user space immediately + * after fixing ESPFIX stack. There is no vulnerable code + * to protect so spare two MSR writes. + */ + movq PER_CPU_VAR(espfix_waddr), %rdi movq %rax, (0*8)(%rdi) /* user RAX */ movq (1*8)(%rsp), %rax /* user RIP */ @@ -982,6 +996,8 @@ ENTRY(switch_to_thread_stack) SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi movq %rsp, %rdi movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp + /* Restrict Indirect Branch Speculation */ + RESTRICT_IB_SPEC UNWIND_HINT sp_offset=16 sp_reg=ORC_REG_DI pushq 7*8(%rdi) /* regs->ss */ @@ -1282,6 +1298,8 @@ ENTRY(paranoid_entry) 1: SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14 + /* Restrict Indirect Branch speculation */ + RESTRICT_IB_SPEC_SAVE_AND_CLOBBER save_reg=%r13d ret END(paranoid_entry) @@ -1305,6 +1323,8 @@ ENTRY(paranoid_exit) testl %ebx, %ebx /* swapgs needed? */ jnz .Lparanoid_exit_no_swapgs TRACE_IRQS_IRETQ + /* Restore Indirect Branch Speculation to the previous state */ + RESTORE_IB_SPEC_CLOBBER save_reg=%r13d RESTORE_CR3 scratch_reg=%rbx save_reg=%r14 SWAPGS_UNSAFE_STACK jmp .Lparanoid_exit_restore @@ -1335,6 +1355,8 @@ ENTRY(error_entry) SWAPGS /* We have user CR3. Change to kernel CR3. */ SWITCH_TO_KERNEL_CR3 scratch_reg=%rax + /* Restrict Indirect Branch Speculation */ + RESTRICT_IB_SPEC_CLOBBER .Lerror_entry_from_usermode_after_swapgs: /* Put us onto the real thread stack. */ @@ -1382,6 +1404,8 @@ ENTRY(error_entry) */ SWAPGS SWITCH_TO_KERNEL_CR3 scratch_reg=%rax + /* Restrict Indirect Branch Speculation */ + RESTRICT_IB_SPEC_CLOBBER jmp .Lerror_entry_done .Lbstep_iret: @@ -1396,6 +1420,8 @@ ENTRY(error_entry) */ SWAPGS SWITCH_TO_KERNEL_CR3 scratch_reg=%rax + /* Restrict Indirect Branch Speculation */ + RESTRICT_IB_SPEC /* * Pretend that the exception came from user mode: set up pt_regs @@ -1497,6 +1523,10 @@ ENTRY(nmi) SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx movq %rsp, %rdx movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp + + /* Restrict Indirect Branch Speculation */ + RESTRICT_IB_SPEC + UNWIND_HINT_IRET_REGS base=%rdx offset=8 pushq 5*8(%rdx) /* pt_regs->ss */ pushq 4*8(%rdx) /* pt_regs->rsp */ @@ -1747,6 +1777,9 @@ end_repeat_nmi: movq $-1, %rsi call do_nmi + /* Restore Indirect Branch speculation to the previous state */ + RESTORE_IB_SPEC_CLOBBER save_reg=%r13d + RESTORE_CR3 scratch_reg=%r15 save_reg=%r14 testl %ebx, %ebx /* swapgs needed? */ diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 98d5358..5b45d93 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -54,6 +54,8 @@ ENTRY(entry_SYSENTER_compat) SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp + /* Restrict Indirect Branch Speculation */ + RESTRICT_IB_SPEC /* * User tracing code (ptrace or signal handlers) might assume that @@ -224,12 +226,18 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe) pushq $0 /* pt_regs->r14 = 0 */ pushq $0 /* pt_regs->r15 = 0 */ - /* - * User mode is traced as though IRQs are on, and SYSENTER + /* Restrict Indirect Branch Speculation. All registers are saved already */ + RESTRICT_IB_SPEC_CLOBBER + + /* User mode is traced as though IRQs are on, and SYSENTER * turned them off. */ TRACE_IRQS_OFF + /* + * We just saved %rdi so it is safe to clobber. It is not + * preserved during the C calls inside TRACE_IRQS_OFF anyway. + */ movq %rsp, %rdi call do_fast_syscall_32 /* XEN PV guests always use IRET path */ @@ -239,6 +247,15 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe) /* Opportunistic SYSRET */ sysret32_from_system_call: TRACE_IRQS_ON /* User mode traces as IRQs on. */ + + /* + * Unrestrict Indirect Branch Speculation. This is safe to do here + * because there are no indirect branches between here and the + * return to userspace (sysretl). + * Clobber of %rax, %rcx, %rdx is OK before register restoring. + */ + UNRESTRICT_IB_SPEC_CLOBBER + movq RBX(%rsp), %rbx /* pt_regs->rbx */ movq RBP(%rsp), %rbp /* pt_regs->rbp */ movq EFLAGS(%rsp), %r11 /* pt_regs->flags (in r11) */ -- 2.7.4