Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933216AbbDWJVK (ORCPT ); Thu, 23 Apr 2015 05:21:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59891 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932889AbbDWJVC (ORCPT ); Thu, 23 Apr 2015 05:21:02 -0400 Message-ID: <5538B978.3060307@redhat.com> Date: Thu, 23 Apr 2015 11:20:56 +0200 From: Denys Vlasenko User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Brian Gerst , Steven Rostedt , Oleg Nesterov , Ingo Molnar , "H. Peter Anvin" , Borislav Petkov , Andy Lutomirski , Linus Torvalds , Andy Lutomirski , Will Drewry , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVja2Vy?= , Alexei Starovoitov , Linux Kernel Mailing List , Kees Cook , Thomas Gleixner CC: linux-tip-commits@vger.kernel.org Subject: Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss References: <63da6d778f69fd0f1345d9287f6764d58be519fa.1427482099.git.luto@kernel.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7598 Lines: 181 On 04/23/2015 09:37 AM, Brian Gerst wrote: > On Tue, Mar 31, 2015 at 8:38 AM, tip-bot for Denys Vlasenko > wrote: >> Commit-ID: e7d6eefaaa443130079d73cd05039d90b3db7a4a >> Gitweb: http://git.kernel.org/tip/e7d6eefaaa443130079d73cd05039d90b3db7a4a >> Author: Denys Vlasenko >> AuthorDate: Fri, 27 Mar 2015 11:48:17 -0700 >> Committer: Ingo Molnar >> CommitDate: Tue, 31 Mar 2015 10:45:15 +0200 >> >> x86/vdso32/syscall.S: Do not load __USER32_DS to %ss >> >> This vDSO code only gets used by 64-bit kernels, not 32-bit ones. >> >> On 64-bit kernels, the data segment is the same for 32-bit and >> 64-bit userspace, and the SYSRET instruction loads %ss with its >> selector. >> >> So there's no need to repeat it by hand. Segment loads are somewhat >> expensive: tens of cycles. >> >> Signed-off-by: Denys Vlasenko >> [ Removed unnecessary comment. ] >> Signed-off-by: Andy Lutomirski >> Cc: Alexei Starovoitov >> Cc: Andy Lutomirski >> Cc: Borislav Petkov >> Cc: Frederic Weisbecker >> Cc: H. Peter Anvin >> Cc: Kees Cook >> Cc: Linus Torvalds >> Cc: Oleg Nesterov >> Cc: Steven Rostedt >> Cc: Will Drewry >> Link: http://lkml.kernel.org/r/63da6d778f69fd0f1345d9287f6764d58be519fa.1427482099.git.luto@kernel.org >> Signed-off-by: Ingo Molnar >> --- >> arch/x86/vdso/vdso32/syscall.S | 2 -- >> 1 file changed, 2 deletions(-) >> >> diff --git a/arch/x86/vdso/vdso32/syscall.S b/arch/x86/vdso/vdso32/syscall.S >> index 5415b56..6b286bb 100644 >> --- a/arch/x86/vdso/vdso32/syscall.S >> +++ b/arch/x86/vdso/vdso32/syscall.S >> @@ -19,8 +19,6 @@ __kernel_vsyscall: >> .Lpush_ebp: >> movl %ecx, %ebp >> syscall >> - movl $__USER32_DS, %ecx >> - movl %ecx, %ss >> movl %ebp, %ecx >> popl %ebp >> .Lpop_ebp: > > This patch unfortunately is causing Wine to break on some applications: > > Unhandled exception: stack overflow in 32-bit code (0xf779bc07). > Register dump: > CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b > EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216( R- -- I -A-P- ) > EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040 > ESI:00000040 EDI:7ffd4000 > Stack dump: > 0x00aed60c: 00aed648 f7575e5b 7bcc8000 00000000 > 0x00aed61c: 7bc7bc09 00000010 00aed750 00000040 > 0x00aed62c: 00aed750 00aed650 7bcc8000 7bc7bbdd > 0x00aed63c: 7bcc8000 00aed6a0 00aed750 00aed738 > 0x00aed64c: 7bc7cfa9 00000011 00aed750 00000040 > 0x00aed65c: 00000020 00000000 00000000 7bc4f141 > Backtrace: > =>0 0xf779bc07 __kernel_vsyscall+0x7() in [vdso].so (0x00aed750) > 1 0xf7575e5b __libc_read+0x4a() in libpthread.so.0 (0x00aed648) > 2 0x7bc7bc09 read_reply_data+0x38(buffer=0xaed750, size=0x40) > [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/server.c:239] > in ntdll (0x00aed648) > 3 0x7bc7cfa9 wine_server_call+0x178() in ntdll (0x00aed738) > 4 0x7bc840ec NtSetEvent+0x4b(handle=0x80, > NumberOfThreadsReleased=0x0(nil)) > [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/sync.c:361] > in ntdll (0x00aed7c8) > 5 0x7b874afa SetEvent+0x24(handle=) > [/home/bgerst/src/wine/wine32/dlls/kernel32/../../../dlls/kernel32/sync.c:572] > in kernel32 (0x00aed7e8) > 6 0x0044e31a in battle.net launcher (+0x4e319) (0x00aed818) > ... > > __kernel_vsyscall+0x7 points to "pop %ebp". > > This is on an AMD Phenom(tm) II X6 1055T Processor. > > It appears that there are some subtle differences in how sysretl works > on AMD vs. Intel. According to the Intel docs, the SS selector and > descriptor cache is completely reset by sysret to fixed values. The > AMD docs however are concerning: > > AMD's syscall: > SS.sel = MSR_STAR.SYSCALL_CS + 8 > SS.attr = 64-bit stack,dpl0 > SS.base = 0x00000000 > SS.limit = 0xFFFFFFFF > > AMD's sysret: > SS.sel = MSR_STAR.SYSRET_CS + 8 // SS selector is changed, > // SS base, limit, attributes unchanged. > Not changing base or limit is no big deal, but not changing attributes > could be the problem. It might be leaving the "64-bit stack" > attribute set, for whatever that means. I am not aware of any officially existing "64-bit" stack or data segment attribute. x86 data segment descriptors don't have any such bits, the 64-bitness of stack operations in long mode is hardwired. (Unlike code segment descriptors, which _do_ have a bit which controls 64-bitness). This is not to say that CPU internally is prohibited from having something along those lines. However, if AMD CPUs would have a bug where after sysretl %ss descriptor cache is left in a bad state causing stack ops to be done in 64-bit fashion, *any* 32-bit userspace would immediately explode. This is not the case. What Wine could do differently from a typical Linux executable? It may use nonzero %ss base, it may use a non-4Gb limit, it may use 16-bit stack segment, it may use an expand-down stack segment. (I know very little about Windows/Wine internals, so I just listed all possibilities which came to mind). Looking at the error message: > Unhandled exception: stack overflow in 32-bit code (0xf779bc07). > Register dump: > CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b > EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216( R- -- I -A-P- ) > EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040 > ESI:00000040 EDI:7ffd4000 it is not coming from Wine itself, looks like it's from Windows code, and I'd guess it just tells us that they got exception 12, without further information on the cause. I don't have any solid theories here, so I'd just brainstorm. * what if %ss before syscall was NOT the usual value of 0x2b, but some other segment, not the typical 0-base, 0xffffffff limit 32-bit expand-up one? Not restoring proper %ss would not go well. [but then, Intel CPUs work, and old code worked....] * what if there is indeed a hidden "64-bit mode" bit in %ss descriptor cache which makes stack ops use %rsp, not %esp; and %rsp somehow has nonzero high half? Then, in compat mode, CPU checks segment limit on every memory reference including POP (which it doesn't do in 64-bit mode) and we got exception 12. [But we do: movl RSP(%rsp),%esp USERGS_SYSRET32 which ensures that %rsp never has upper bits set...] * what if in kernel (64-bmode) we load %ss descriptor cache with a segment whose limit is too small or with outright invalid segment? In 64-bit mode, %ss is not actually checked by CPU. For example, nested exceptions/interrupts even auto-load %ss with zero selector - and stack ops continue to work fine in 64-bit mode. But as soon as we return to non-64-bit world, CPU starts checking data sergemnt limits and validity - and if cached %ss descriptor was made invalid while in 64-bit mode and was not changed by SYSRETL, POP will fail. > Reloading SS from the GDT would obviously reset any bad state left by > sysretl. Unfortunately we may have to put it back in, and then NOP it > out on Intel. Sure we can, but let's try to get a better understanding what's going on here. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/