Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932875AbbDWJHq (ORCPT ); Thu, 23 Apr 2015 05:07:46 -0400 Received: from mail-la0-f43.google.com ([209.85.215.43]:36285 "EHLO mail-la0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932330AbbDWJHl (ORCPT ); Thu, 23 Apr 2015 05:07:41 -0400 MIME-Version: 1.0 In-Reply-To: References: <63da6d778f69fd0f1345d9287f6764d58be519fa.1427482099.git.luto@kernel.org> From: Andy Lutomirski Date: Thu, 23 Apr 2015 02:07:19 -0700 Message-ID: Subject: Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss To: Brian Gerst Cc: Steven Rostedt , Oleg Nesterov , Ingo Molnar , "H. Peter Anvin" , Borislav Petkov , Linus Torvalds , Andy Lutomirski , Will Drewry , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVja2Vy?= , Alexei Starovoitov , Linux Kernel Mailing List , Denys Vlasenko , Kees Cook , Thomas Gleixner , "linux-tip-commits@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2137 Lines: 52 On Thu, Apr 23, 2015 at 1:49 AM, Andy Lutomirski wrote: > > I'm curious whether we can somehow end up in the kernel without a > sensible SS. What happens if we have SS = 0? > > Try this on for size: > > 1. Wine process does syscall > 2. Context switch to any other task > 3. Interrupt (software or hardware), which loads SS with ss0, which is > 0 on x86_64. > 4. Context switch back to Wine. > 5. sysretl > > Would fixing this be as simple as changing this code in > arch/x86/kernel/process.c: > > __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { > .x86_tss = { > .sp0 = TOP_OF_INIT_STACK, > #ifdef CONFIG_X86_32 > .ss0 = __KERNEL_DS, > > by moving the ifdef down a line? Even if that fixed it, it would be > extremely fragile, but IMO it would be a good change to make > regardless (i.e. the kernel's SS would be less unpredictable). Confirmed with KVM on VMX: we can definitely end up in the kernel with SS == 0. I don't know whether changing ss0 would be a good idea, though. It would be cleaner, but it could slow down interrupt processing: interrupt delivery would have to do an extra GDT load. Food for thought: wouldn't this mean that we have a bug on sysretq too? If we're in the kernel with SS == 0, we do sysretq, and then user code does a far jump to 32-bit code, then we end up with a bogus SS. Maybe we don't care, and reloading SS on every sysretq would suck. We could fix it up in a kind of evil way: in do_stack_segment, we could detect that we had SS == __USER_DS, in which case #SS should be impossible, and just return without signalling the process. IRET would fix up the attributes. We just might need a stable fix, though -- I wonder if there's any bad interaction with opportunistic sysret in 4.0. Maybe we should benchmark ss0 = __KERNEL_DS and try it after all. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/