Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933753AbbDWLMt (ORCPT ); Thu, 23 Apr 2015 07:12:49 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53668 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933631AbbDWLMr (ORCPT ); Thu, 23 Apr 2015 07:12:47 -0400 Message-ID: <5538D389.4090007@redhat.com> Date: Thu, 23 Apr 2015 13:12:09 +0200 From: Denys Vlasenko User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Brian Gerst , Steven Rostedt , Oleg Nesterov , Ingo Molnar , "H. Peter Anvin" , Borislav Petkov , Andy Lutomirski , Linus Torvalds , Andy Lutomirski , Will Drewry , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVja2Vy?= , Alexei Starovoitov , Linux Kernel Mailing List , Kees Cook , Thomas Gleixner CC: linux-tip-commits@vger.kernel.org Subject: Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss References: <63da6d778f69fd0f1345d9287f6764d58be519fa.1427482099.git.luto@kernel.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5588 Lines: 150 On 04/23/2015 09:37 AM, Brian Gerst wrote: > This patch unfortunately is causing Wine to break on some applications: > > Unhandled exception: stack overflow in 32-bit code (0xf779bc07). > Register dump: > CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b > EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216( R- -- I -A-P- ) > EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040 > ESI:00000040 EDI:7ffd4000 > Stack dump: > 0x00aed60c: 00aed648 f7575e5b 7bcc8000 00000000 > 0x00aed61c: 7bc7bc09 00000010 00aed750 00000040 > 0x00aed62c: 00aed750 00aed650 7bcc8000 7bc7bbdd > 0x00aed63c: 7bcc8000 00aed6a0 00aed750 00aed738 > 0x00aed64c: 7bc7cfa9 00000011 00aed750 00000040 > 0x00aed65c: 00000020 00000000 00000000 7bc4f141 > Backtrace: > =>0 0xf779bc07 __kernel_vsyscall+0x7() in [vdso].so (0x00aed750) > 1 0xf7575e5b __libc_read+0x4a() in libpthread.so.0 (0x00aed648) > 2 0x7bc7bc09 read_reply_data+0x38(buffer=0xaed750, size=0x40) > [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/server.c:239] > in ntdll (0x00aed648) > 3 0x7bc7cfa9 wine_server_call+0x178() in ntdll (0x00aed738) > 4 0x7bc840ec NtSetEvent+0x4b(handle=0x80, > NumberOfThreadsReleased=0x0(nil)) > [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/sync.c:361] > in ntdll (0x00aed7c8) > 5 0x7b874afa SetEvent+0x24(handle=) > [/home/bgerst/src/wine/wine32/dlls/kernel32/../../../dlls/kernel32/sync.c:572] > in kernel32 (0x00aed7e8) > 6 0x0044e31a in battle.net launcher (+0x4e319) (0x00aed818) > ... > > __kernel_vsyscall+0x7 points to "pop %ebp". > > This is on an AMD Phenom(tm) II X6 1055T Processor. > > It appears that there are some subtle differences in how sysretl works > on AMD vs. Intel. According to the Intel docs, the SS selector and > descriptor cache is completely reset by sysret to fixed values. The > AMD docs however are concerning: > > AMD's syscall: > SS.sel = MSR_STAR.SYSCALL_CS + 8 > SS.attr = 64-bit stack,dpl0 > SS.base = 0x00000000 > SS.limit = 0xFFFFFFFF > > AMD's sysret: > SS.sel = MSR_STAR.SYSRET_CS + 8 // SS selector is changed, > // SS base, limit, attributes unchanged. > > Not changing base or limit is no big deal, but not changing attributes > could be the problem. It might be leaving the "64-bit stack" > attribute set, for whatever that means. > > Reloading SS from the GDT would obviously reset any bad state left by > sysretl. Unfortunately we may have to put it back in, and then NOP it > out on Intel. Please try attached tentative fix: >From 4b9e9bdf08b092724934e62892116af6dd712128 Mon Sep 17 00:00:00 2001 From: Denys Vlasenko Date: Thu, 23 Apr 2015 12:38:47 +0200 Subject: [PATCH] x86/asm/entry/32: Restore %ss before SYSRETL if necessary AMD docs say that SYSRET32 loads %ss selector with a value from a MSR, but *cached descriptor* of %ss is not modified. It was observed to cause Wine crashes. Conjectured sequence of events causing it is: 1. Wine process does syscall. 2. Context switch to any other task. 3. Interrupt (software or hardware), which loads %ss with 0. (This happens according to both Intel and AMD docs.) %ss cached descriptor is set to "invalid" state. 4. Context switch back to Wine. 5. sysret to 32-bit. %ss has correct value but its cached descriptor is still invalid. 6. The very first userspace POP insn after this causes exception 12. Fix this by checking %ss value. If it is not __KERNEL_DS, (and it really can only be __KERNEL_DS or zero), then load it with __KERNEL_DS. Signed-off-by: Denys Vlasenko --- arch/x86/ia32/ia32entry.S | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S index 0c302d0..94c0b39 100644 --- a/arch/x86/ia32/ia32entry.S +++ b/arch/x86/ia32/ia32entry.S @@ -198,6 +198,20 @@ sysexit_from_sys_call: * with 'sysenter' and it uses the SYSENTER calling convention. */ andl $~TS_COMPAT,ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS) + /* + * On AMD, SYSRET32 loads %ss selector, but does not modify its + * cached descriptor; and in kernel, %ss can be loaded with 0, + * setting cached descriptor to "invalid". This has no effect on + * 64-bit mode, but on return to 32-bit mode, it makes stack ops fail. + * Fix %ss only if it's wrong: read from %ss takes ~2 cycles, + * write to %ss is ~40 cycles. + */ + movl %ss, %ecx + cmpl $__KERNEL_DS, %ecx + je 1f + movl $__KERNEL_DS, %ecx + movl %ecx, %ss +1: movl RIP(%rsp),%ecx /* User %eip */ CFI_REGISTER rip,rcx RESTORE_RSI_RDI @@ -408,6 +421,20 @@ cstar_dispatch: sysretl_from_sys_call: andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS) RESTORE_RSI_RDI_RDX + /* + * On AMD, SYSRET32 loads %ss selector, but does not modify its + * cached descriptor; and in kernel, %ss can be loaded with 0, + * setting cached descriptor to "invalid". This has no effect on + * 64-bit mode, but on return to 32-bit mode, it makes stack ops fail. + * Fix %ss only if it's wrong: read from %ss takes ~2 cycles, + * write to %ss is ~40 cycles. + */ + movl %ss, %ecx + cmpl $__KERNEL_DS, %ecx + je 1f + movl $__KERNEL_DS, %ecx + movl %ecx, %ss +1: movl RIP(%rsp),%ecx CFI_REGISTER rip,rcx movl EFLAGS(%rsp),%r11d -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/