Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757639AbcCUTlh (ORCPT ); Mon, 21 Mar 2016 15:41:37 -0400 Received: from mail-oi0-f46.google.com ([209.85.218.46]:35706 "EHLO mail-oi0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755911AbcCUTlf (ORCPT ); Mon, 21 Mar 2016 15:41:35 -0400 MIME-Version: 1.0 In-Reply-To: <1458576969-13309-1-git-send-email-andi@firstfloor.org> References: <1458576969-13309-1-git-send-email-andi@firstfloor.org> From: Andy Lutomirski Date: Mon, 21 Mar 2016 11:39:07 -0700 Message-ID: Subject: Re: Updated version of RD/WR FS/GS BASE patchkit To: Andi Kleen Cc: X86 ML , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2912 Lines: 66 On Mon, Mar 21, 2016 at 9:16 AM, Andi Kleen wrote: > This is a reworked version of my older fsgsbase patchkit. > Main changes: > - Ported to new entry/* code, which simplified it somewhat > - Now has a test program > - Fixed ptrace/core dump support > - Better documentation > - Some minor fixes improvement I think that the biggest remaining issue is to define the semantics. As an architectural matter, the relevant user state is (fs selector, fs base, gs selector, gs base). With FSGSBASE enabled, user code can more or less independently control all four of those values. (It's slightly more complicated than that because set_thread_area and modify_ldt both forget to reload segment registers IIRC, but we can fix that independently.) Keeping in mind that we'll probably want to add percpu segment bases at some point (to allow very fast atomic percpu data access for user code), the questions I have are: 1a. What happens when a task switches out and back in on the same CPU? 1b. What happens when a task switches out and back in on a different CPU? 2a. What happens when a tracer reads the state out and writes exactly the same thing back in and the task resumes on the CPU it started on? 2b. What happens when a tracer reads the state out and writes exactly the same thing back in and the task resumes on a different CPU? 3. What happens if fs or gs points to a real descriptor and that descriptor changes? 4. Does the sigcontext format need to change? For maximum safely, comprehensibility, and sanity, there's an argument to be made that 1a and 2a should leave the state exactly as it started and that 1b and 2b should leave it alone unless percpu bases are in use. For maximum simplicity of implementation, there's an argument that, if the fs or gs selector is nonzero and the base doesn't match the in-memory descriptor, then the kernel can do whatever it wants. I propose the following semantics: - All "save state" or "report state" events unconditionally save the base and selector as they actually were in the CPU state. (Keep it simple. Also, with these patches applied, on an FSGSBASE-capable CPU, selector != 0 is a slow path.) - When restoring state, if selector == 0, then the base is restored as it was. - When restoring state, if selector != 0, then the base is restored to whatever the in-memory descriptor says. (Optionally, down the road, we could make it so that a save + restore without an intervening migration, set_thread_area, or modify_ldt would restore the base as it was. This would make things more predictable.) - If/when we add percpu bases, they are associated with a nonzero selector. The big open question is: should signal delivery and restore do anything to the selectors or bases? I think that, by default, it can't, but maybe we'll want an option to do it some day. Does all this make sense? Do people agree with me?