Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966873AbbDXUV3 (ORCPT ); Fri, 24 Apr 2015 16:21:29 -0400 Received: from mail-la0-f44.google.com ([209.85.215.44]:33851 "EHLO mail-la0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757868AbbDXUV0 (ORCPT ); Fri, 24 Apr 2015 16:21:26 -0400 MIME-Version: 1.0 In-Reply-To: <5d120f358612d73fc909f5bfa47e7bd082db0af0.1429841474.git.luto@kernel.org> References: <5d120f358612d73fc909f5bfa47e7bd082db0af0.1429841474.git.luto@kernel.org> From: Andy Lutomirski Date: Fri, 24 Apr 2015 13:21:03 -0700 Message-ID: Subject: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue To: Andy Lutomirski Cc: X86 ML , "H. Peter Anvin" , Borislav Petkov , Denys Vlasenko , Linus Torvalds , Brian Gerst , Denys Vlasenko , Ingo Molnar , Steven Rostedt , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , Linux Kernel Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2466 Lines: 58 On Thu, Apr 23, 2015 at 7:15 PM, Andy Lutomirski wrote: > AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET > with SS == 0 results in an invalid usermode state in which SS is > apparently equal to __USER_DS but causes #SS if used. > > Work around the issue by replacing NULL SS values with __KERNEL_DS > in __switch_to, thus ensuring that SYSRET never happens with SS set > to NULL. > > This was exposed by a recent vDSO cleanup. > > Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss > Signed-off-by: Andy Lutomirski > --- > > Tested only on Intel, which isn't very interesting. I'll tidy up > and send a test case, too, once Borislav confirms that it works. > > Please don't actually apply this until we're sure we understand the > scope of the issue. If this doesn't affect SYSRETQ, then we might > to fix it on before SYSRETL to avoid impacting 64-bit processes > at all. > After sleeping on it, I think I want to offer a different, more complicated approach. AFAIK there are really only two ways that this issue can be visible: 1. SYSRETL. We can fix that up in the AMD SYSRETL path. I think there's a decent argument that that path is less performance-critical than context switches. 2. SYSRETQ. The only way that I know of to see the problem is SYSRETQ followed by a far jump or return. This is presumably *extremely* rare. What if we fixed #2 up in do_stack_segment. We should double-check the docs, but I think that this will only ever manifest as #SS(0) with regs->ss == __USER_DS and !user_mode_64bit(regs). We need to avoid infinite retry looks, but this might be okay. I think that #SS(0) from userspace under those conditions can *only* happen as a result of this issue. Even if not, we could come up with a way to only retry once per syscall (e.g. set some ti->status flag in the 64-bit syscall path on AMD and clear it in do_stack_segment). This might be way more trouble than it's worth. For one thing, we need to be careful with the IRET fixup. Ick. So maybe this should be written off as my useless ramblings. NB: I suspect that all of this is irrelevant on Xen. Xen does its own thing wrt sysret. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/