Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756244AbaDWRIo (ORCPT ); Wed, 23 Apr 2014 13:08:44 -0400 Received: from mail-ve0-f170.google.com ([209.85.128.170]:48136 "EHLO mail-ve0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751248AbaDWRIm (ORCPT ); Wed, 23 Apr 2014 13:08:42 -0400 MIME-Version: 1.0 In-Reply-To: <5357E214.6050501@zytor.com> References: <20140422112312.GB15882@pd.tnic> <20140422144659.GF15882@pd.tnic> <53569467.1030809@zytor.com> <5356A3B6.5050901@zytor.com> <20140423105411.2e166dd8@alan.etchedpixels.co.uk> <5357E214.6050501@zytor.com> From: Andrew Lutomirski Date: Wed, 23 Apr 2014 10:08:20 -0700 Message-ID: Subject: Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE* To: "H. Peter Anvin" Cc: One Thousand Gnomes , Linus Torvalds , Borislav Petkov , "H. Peter Anvin" , Linux Kernel Mailing List , Ingo Molnar , Alexander van Heukelum , Konrad Rzeszutek Wilk , Boris Ostrovsky , Arjan van de Ven , Brian Gerst , Alexandre Julliard , Andi Kleen , Thomas Gleixner Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 23, 2014 at 8:53 AM, H. Peter Anvin wrote: > On 04/23/2014 02:54 AM, One Thousand Gnomes wrote: >>> Ideally the tests should be doable such that on a normal machine the >>> tests can be overlapped with the other things we have to do on that >>> path. The exit branch will be strongly predicted in the negative >>> direction, so it shouldn't be a significant problem. >>> >>> Again, this is not the case in the current prototype. >> >> Or you make sure that you switch to those code paths only after software >> has executed syscalls that make it possible it will use a 16bit ss. >> > > Which, again, would introduce a race, I believe, at least if we have an > LDT at all (and since we only enter these code paths for LDT descriptors > in the first place, it is equivalent to the current code minus the filters.) The only way I can see to trigger the race is with sigreturn, but it's still there. Sigh. Here are two semi-related things: 1. The Intel manual's description of iretq does seems like it forgot to mention that iret restores the stack pointer in anything except vm86 mode. Fortunately, the AMD manual seems to thing that, when returning *from* 64-bit mode, RSP is always restored, which I think is necessary for this patch to work correctly. 2. I've often pondered changing the way we return *to* CPL 0 to bypass iret entirely. It could be something like: SS RSP EFLAGS CS RIP push 16($rsp) popfq [does this need to force rex.w somehow?] ret $64 This may break backtraces if cfi isn't being used and we get an NMI just before the popfq. I'm not quite sure how that works. I haven't benchmarked this at all, but the only slow part should be the popfq, and I doubt it's anywhere near as slow as iret. > >> The other question I have is - is there any reason we can't fix up the >> IRET to do a 32bit return into a vsyscall type userspace page which then >> does a long jump or retf to the right place ? > > I did a writeup on this a while ago. It does have the problem that you > need additional memory in userspace, which is per-thread and in the > right region of userspace; this pretty much means you have to muck about > with the user space stack when user space is running in weird modes. > This gets complex very quickly and does have some "footprint". > Furthermore, on some CPUs (not including any recent Intel CPUs) there is > still a way to leak bits [63:32]. I believe the in-kernel solution is > actually simpler. > There's also no real guarantee that user code won't unmap the vdso. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/