Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751737AbdGaDGG (ORCPT ); Sun, 30 Jul 2017 23:06:06 -0400 Received: from mail.kernel.org ([198.145.29.99]:38684 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751595AbdGaDGF (ORCPT ); Sun, 30 Jul 2017 23:06:05 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EDA3022BE3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org MIME-Version: 1.0 From: Andy Lutomirski Date: Sun, 30 Jul 2017 20:05:43 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: FSGSBASE ABI considerations To: "Bae, Chang Seok" , X86 ML , "linux-kernel@vger.kernel.org" , Linus Torvalds , Borislav Petkov , Brian Gerst , Stas Sergeev Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4171 Lines: 108 Hi all- Chang wants to get the FSGSBASE patches in. Here's a bit on a brain dump on what I think the relevant considerations are and why I haven't sent out my patches. ----- Background ----- Setting CR4.FSGSBASE has two major advantages and one major disadvantage. The major advantages are: - We can avoid some WRMSR instructions in the context switch path, which makes a rather large difference. - User code can use the new RD/WR FS/GS BASE instructions. Apparently some users really want this for, umm, userspace threading. Think Java. The major disadvantage is that user code can use the new instructions. Now userspace is going to do totally stupid shite like writing some nonzero value to GS and then doing WRGSBASE or like linking some idiotic library that uses WRGSBASE into a perfectly innocent program like dosemu2 and resulting in utterly nonsensical descriptor state. In Windows, supposedly the scheduler reserves the right to do arbitrarily awful things to you if you use WRFSBASE or WRGSBASE inappropriately. Andi took a similar approach in his original FSGSBASE patches. I think this is wrong and we need to have sensible, documented, and tested behavior for what happens when you use the new instructions. For simplicity, the text below talks about WRGSBASE and ignores WRFSBASE. The ABI considerations are identical, even if the kernel implementation details are different. ----- Requirements ----- In my book, there's only one sensible choice for what happens when you are scheduled out and back in on a Linux system with FSGSBASE enabled: all of your descriptors end up *exactly* the way they were when you scheduled out. ptrace users need to keep working. It would be nice if existing gdb versions continue to work right when user code uses WRGSBASE, but it might be okay if a new ptrace interface is needed. The existing regset ABI is exactly backwards from what it needs to be to make this easy. ----- interaction with modify_ldt() ----- The first sticking point we'll hit is modify_ldt() and, in particular, what happens if you call modify_ldt() to change the base of a segment that is ioaded into gs by another thread in the same mm. Our current behavior here is nonsensical: on 32-bit kernels, FS would be fully refreshed on other threads and GS might be depending on compiler options. On 64-bit kernels, neither FS nor GS is immediately refreshed. Historically, we didn't refresh anything reliably. On the bright side, this means that existing modify_ldt() users are (AFAIK) tolerant of somewhat crazy behavior. On an FSGSBASE-enabled system, I think we need to provide deterministic, documented, tested behavior. I can think of three plausible choices: 1a. modify_ldt() immediately updates FSBASE and GSBASE all threads that reference the modified selector. 1b. modify_ldt() immediatley updates FSBASE and GSBASE on all threads that reference the LDT. 2. modify_ldt() leaves FSBASE and GSBASE alone on all threads. (2) is trivial to implement, whereas (1a) and (1b) are a bit nasty to implement when FSGSBASE is on. The tricky bit is that 32-bit kernels can't do (2), so, if we want modify_ldt() to behave the same on 32-bit and 64-bit kernels, we're stuck with (1). (I think we can implement (2) with acceptable performance on 64-bit non-FSGSBASE kernels if we wanted to.) Thoughts? ----- Interaction with ptrace ----- struct user_regs_struct looks like this: ... unsigned long fs_base; unsigned long gs_base; unsigned long ds; unsigned long es; unsigned long fs; unsigned long gs; ... This means that, when gdb saves away a regset and reloads it using PTRACE_SETREGS or similar, the effect is to load gs_base and then load gs. If gs != 0, this will blow away gs_base. Without FSGSBASE, this doesn't matter so much. With FSGSBASE, it means that using gdb to do, say, 'print func()' may corrupt gsbase. What, if anything, should we do about this? One option would be to make gs_base be accurate all the time (it currently isn't) and teach PTRACE_SETREGS to restore in the opposite order despite the struct layout. Thoughts?