Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752468AbdGaOPW (ORCPT ); Mon, 31 Jul 2017 10:15:22 -0400 Received: from mail.kernel.org ([198.145.29.99]:54684 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752445AbdGaOPU (ORCPT ); Mon, 31 Jul 2017 10:15:20 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 36F6B22BE3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org MIME-Version: 1.0 In-Reply-To: References: From: Andy Lutomirski Date: Mon, 31 Jul 2017 07:14:58 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: FSGSBASE ABI considerations To: Linus Torvalds Cc: Andy Lutomirski , "Bae, Chang Seok" , X86 ML , "linux-kernel@vger.kernel.org" , Borislav Petkov , Brian Gerst , Stas Sergeev Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3329 Lines: 72 On Sun, Jul 30, 2017 at 9:38 PM, Linus Torvalds wrote: > On Sun, Jul 30, 2017 at 8:05 PM, Andy Lutomirski wrote: >> >> This means that, when gdb saves away a regset and reloads it using >> PTRACE_SETREGS or similar, the effect is to load gs_base and then load >> gs. If gs != 0, this will blow away gs_base. Without FSGSBASE, this >> doesn't matter so much. With FSGSBASE, it means that using gdb to do, >> say, 'print func()' may corrupt gsbase. >> >> What, if anything, should we do about this? One option would be to >> make gs_base be accurate all the time (it currently isn't) and teach >> PTRACE_SETREGS to restore in the opposite order despite the struct >> layout. > > I do not think that ordering should ever matter. If it does, it means > that you've designed something. We already screwed that up with the > msr interface, can we try to not do it again? > > Could we perhaps do something like: > > - every process starts out with CR4.FSGSBASE cleared > > - if we get an #UD due to the process using the {rd|wr}{gs|fs}base > instructions, we enable FSGSBASE and mark the process as using those > instructions. > > - once a process is marked as FSGSBASE, the kernel prioritizes > FSGSBASE. We'll still save/restore the selector too, but every time we > restore the selector, we will first do a rd*base, and then do a > wr*base afterwards > > IOW, the "selector" ends up being meaningless after people have used > fsgsbase. It is saved and restored as a _value_, but it has no effect > what-so-ever on the actual base pointer. > > Yes, it's modal, but at least you don't end up in some situation where > it matters whether you write the selector first or not. > > Hmm? I hadn't thought of that approach. I have three very different objections. - The only reason I think that FSGSBASE is worth supporting at all is that it provides a fairly dramatic speedup to context switches by getting rid of the awful serializing WRMSR. I tend to consider the actual exposure of the instructions to userspace to be more trouble than it's worth. But, with your approach, we may only get the speedup when running SPECJava Environmentally Friendly Threads, and we'll lose it again due to all the CR4 writes, and that would make me want to just drop the whole thing. - The modal approach makes the modify_ldt() consistency issue go away, but it doesn't help with ptrace, I think, because, with ptrace, we care about the debugger, not the debuggee. - glibc will probably be daft and start using WRGSBASE instead of arch_prctl and this whole idea may become irrelevant. All that being said, we might be able to get away with treating the selector and the base totally separately no matter what. I've searched a bit, and I haven't come up with anything that needs modify_ldt() to behave synchronously, presumably because its behavior used to be so utterly erratic that user code always had to follow modify_ldt() by an explicit segment write. The only thing that cares about ptrace that I've spotted and that do anything more complicated than reading the state and writing it back out the same way it found it is stuff like gdb's 'print $gs = 43', and I find it hard to believe that there are gdb scripts that do that and need to be supported for compatibility. --Andy