Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934508AbbDWMf2 (ORCPT ); Thu, 23 Apr 2015 08:35:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41275 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934251AbbDWMf0 (ORCPT ); Thu, 23 Apr 2015 08:35:26 -0400 From: Denys Vlasenko To: Ingo Molnar Cc: Denys Vlasenko , Brian Gerst , Linus Torvalds , Steven Rostedt , Borislav Petkov , "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , x86@kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] x86/asm/entry/32: Restore %ss before SYSRETL if necessary Date: Thu, 23 Apr 2015 14:34:51 +0200 Message-Id: <1429792491-5978-1-git-send-email-dvlasenk@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3050 Lines: 86 AMD docs say that SYSRET32 loads %ss selector with a value from a MSR, but *cached descriptor* of %ss is not modified. (Intel CPUs reset the descriptor to a fixed, valid state). It was observed to cause Wine crashes. Conjectured sequence of events causing it is as follows: 1. Wine process enters kernel via syscall insn. 2. Context switch to any other task. 3. Interrupt or exception happens, CPU loads %ss with 0. (This happens according to both Intel and AMD docs.) %ss cached descriptor is set to "invalid" state. 4. Context switch back to Wine. 5. sysret to 32-bit userspace. %ss selector has correct value but its cached descriptor is still invalid. 6. The very first userspace POP insn after this causes exception 12. Fix this by checking %ss selector value. If it is not __KERNEL_DS, (and it really can only be __KERNEL_DS or zero), then load it with __KERNEL_DS. We also use SYSRET32 for SYSENTER-based syscalls, but that codepath is only used by Intel CPUs, which don't have this quirk. Signed-off-by: Denys Vlasenko Reported-by: Brian Gerst CC: Brian Gerst CC: Linus Torvalds CC: Steven Rostedt CC: Ingo Molnar CC: Borislav Petkov CC: "H. Peter Anvin" CC: Andy Lutomirski CC: Oleg Nesterov CC: Frederic Weisbecker CC: Alexei Starovoitov CC: Will Drewry CC: Kees Cook CC: x86@kernel.org CC: linux-kernel@vger.kernel.org --- arch/x86/ia32/ia32entry.S | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S index 0c302d0..9537dcb 100644 --- a/arch/x86/ia32/ia32entry.S +++ b/arch/x86/ia32/ia32entry.S @@ -408,6 +408,18 @@ cstar_dispatch: sysretl_from_sys_call: andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS) RESTORE_RSI_RDI_RDX + /* + * On AMD, SYSRET32 loads %ss selector, but does not modify its + * cached descriptor; and in kernel, %ss can be loaded with 0, + * setting cached descriptor to "invalid". This has no effect on + * 64-bit mode, but on return to 32-bit mode, it makes stack ops fail. + * Fix %ss only if it's wrong: read from %ss takes ~2 cycles, + * write to %ss is ~40 cycles. + */ + movl %ss, %ecx + cmpl $__KERNEL_DS, %ecx + jne reload_ss +ss_is_good: movl RIP(%rsp),%ecx CFI_REGISTER rip,rcx movl EFLAGS(%rsp),%r11d @@ -426,6 +438,10 @@ sysretl_from_sys_call: * does not exist, it merely sets eflags.IF=1). */ USERGS_SYSRET32 +reload_ss: + movl $__KERNEL_DS, %ecx + movl %ecx, %ss + jmp ss_is_good #ifdef CONFIG_AUDITSYSCALL cstar_auditsys: -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/