Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752154AbcJDCKJ (ORCPT ); Mon, 3 Oct 2016 22:10:09 -0400 Received: from mail-vk0-f52.google.com ([209.85.213.52]:33930 "EHLO mail-vk0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751393AbcJDCKH (ORCPT ); Mon, 3 Oct 2016 22:10:07 -0400 MIME-Version: 1.0 In-Reply-To: <1475544579.21644.37.camel@redhat.com> References: <1475353895-22175-1-git-send-email-riel@redhat.com> <1475353895-22175-3-git-send-email-riel@redhat.com> <1475366900.21644.8.camel@redhat.com> <1475529679.4622.27.camel@redhat.com> <1475544579.21644.37.camel@redhat.com> From: Andy Lutomirski Date: Mon, 3 Oct 2016 19:09:46 -0700 Message-ID: Subject: Re: [PATCH RFC 2/5] x86,fpu: delay FPU register loading until switch to userspace To: Rik van Riel Cc: Dave Hansen , "linux-kernel@vger.kernel.org" , X86 ML , Thomas Gleixner , Paolo Bonzini , Ingo Molnar , Andrew Lutomirski , "H. Peter Anvin" , Borislav Petkov Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8189 Lines: 229 On Mon, Oct 3, 2016 at 6:29 PM, Rik van Riel wrote: > On Mon, 2016-10-03 at 14:36 -0700, Andy Lutomirski wrote: >> On Mon, Oct 3, 2016 at 2:21 PM, Rik van Riel wrote: >> > >> > > >> > > One of my objections to the current code structure is that I find >> > > it >> > > quite hard to keep track of all of the possible states of the fpu >> > > that >> > > we have. Off the top of my head, we have: >> > > >> > > 1. In-cpu state is valid and in-memory state is invalid. (This >> > > is >> > > the >> > > case when we're running user code, for example.) >> > > 2. In-memory state is valid. >> > > 2a. In-cpu state is *also* valid for some particular task. In >> > > this >> > > state, no one may write to either FPU state copy for that task >> > > lest >> > > they get out of sync. >> > > 2b. In-cpu state is not valid. >> > >> > It gets more fun with ptrace. Look at xfpregs_get >> > and xfpregs_set, which call fpu__activate_fpstate_read >> > and fpu__activate_fpstate_write respectively. >> > >> > It looks like those functions already deal with the >> > cases you outline above. >> >> Hmm, maybe they could be used. The names are IMO atrocious. I had >> to >> read the docs and the code to figure out WTF "activating" the >> "fpstate" even meant. > > If you have good naming ideas, I'd be willing to submit > patches to rename them :) > >> > > >> > > Rik, your patches further complicate this by making it possible >> > > to >> > > have the in-cpu state be valid for a non-current task even when >> > > non-idle. >> > >> > However, we never need to examine the in-cpu state for >> > a task that is not currently running, or being scheduled >> > to in the context switch code. >> > >> > We always save the FPU register state when a task is >> > context switched out, so for a non-current task, we >> > still only need to consider whether the in-memory state >> > is valid or not. >> > >> > The in-cpu state does not matter. >> > >> > fpu__activate_fpstate_write will disable lazy restore, >> > and ensure a full restore from memory. >> >> Something has to fix up owner_ctx so that the previous thread doesn't >> think its CPU state is still valid. I don't think >> activate_fpstate_write does it, because nothing should (I think) be >> calling it as a mattor of course on context switch. > > These functions are called on tasks that are ptraced, > and currently stopped. fpu__activate_fpstate_write > sets fpu->last_cpu to -1, which will cause the next > context switch to the task to load the fpstate into > the registers. > > The status in the struct fpu keeps track of the status > of things, though maybe not as explicitly as you would > want. > > Having two separate status booleans for "registers valid" > and "memory valid" may make more sense. I have no problem with the concept of "owner_ctx", and I think it's a perfectly reasonable data structure. My problem with it is that it's subtle and knowledge of it is spread all over the place. Just going with "registers valid" in a variable won't work, I think, because there's nowhere to put it. We need to be able to delete a struct fpu while that struct fpu might have a valid copy in a different cpu's registers. Anyway, feel free to tell me that I'm making this too difficult :) > > Running the task in user space, would have to ensure > "registers valid" is true, and make "memory valid" > false, because userspace could write to the registers. > > Doing a ptrace fpstate_read would make "memory valid" > true, but does not need to invalidate "registers valid". > > Doing a ptrace fpstate_write would make "memory valid" > true, but would invalidate "registers valid". > > Context switching away from a task would make "memory > valid" true, by virtue of copying the fpstate to > memory. > > Ingo, would you have any objection to patches tracking > the validity of both register and memory states > independently? > > We can get rid of fpu.counter, since nobody uses it > any more. We should definitely do this. Maybe getting in some cleanups first (my lazy fpu deletion, fpu.counter removal, etc) first is the way to go. > >> > > This is why I'm thinking that we should maybe revisit the >> > > API a bit to make this clearer and to avoid scattering around all >> > > the >> > > state manipulation quite so much. I propose, as a straw-man: >> > > >> > > user_fpregs_copy_to_cpu() -- Makes the CPU state be valid for >> > > current. >> > > After calling this, we're in state 1 or 2a. >> > > >> > > user_fpregs_move_to_cpu() -- Makes the CPU state be valid and >> > > writable >> > > for current. Invalidates any in-memory copy. After calling >> > > this, >> > > we're in state 1. >> > >> > How can we distinguish between these two? >> > >> > If the task is running, it can change its FPU >> > registers at any time, without any obvious thing >> > to mark the transition from state 2a to state 1. >> >> That's true now, but I don't think it's really true any more with >> your >> patches. If a task is running in *kernel* mode, then I think that >> pretty much all of the combinations are possible. When the task >> actually enters user mode, we have to make sure we're in state 1 so >> the CPU regs belong to the task and are writable. > > That is fair. > >> > It sounds like the first function you name would >> > only be useful if we prevented a switch to user >> > space, which could immediately do whatever it wants >> > with the registers. >> > >> > Is there a use case for user_fpregs_copy_to_cpu()? >> >> With PKRU, any user memory access would need >> user_fpregs_copy_to_cpu(). I think we should get rid of this and >> just >> switch to using RDPKRU and WRPKRU to eagerly update PKRU in every >> context switch to avoid this. Other than that, I can't think of any >> use cases off the top of my head unless the signal code wanted to >> play >> games with this stuff. > > You are right, read_pkru() and write_pkru() can only deal with > the pkru state being present in registers. Is this because of an > assumption in the code, or because of a hardware requirement? > >> > > user_fpregs_copy_to_memory() -- Makes the in-memory state be >> > > valid >> > > for >> > > current. After calling this, we're in state 2a or 2b. >> > > >> > > user_fpregs_move_to_memory() -- Makes the in-memory state be >> > > valid >> > > and >> > > writable for current. After calling this, we're in state 2b. >> > >> > We can only ensure that the in memory copy stays >> > valid, by stopping the task from writing to the >> > in-CPU copy. >> > >> > This sounds a lot like fpu__activate_fpstate_read >> > >> > What we need after copying the contents to memory is >> > a way to invalidate the in-CPU copy, if changes were >> > made to the in-memory copy. >> >> My proposal is that we invalidate the CPU state *before* writing the >> in-memory state, not after. > > Modifying the in-memory FPU state can only be done on a task > that is not currently running in user space. As long as the > modification is done before task runs again, that is safe. > > I am probably missing some special case you are concerned about. > >> > This is already done by fpu__activate_fpstate_write >> > >> > > >> > > Scheduling out calls user_fpregs_copy_to_memory(), as do all the >> > > MPX >> > > and PKRU memory state readers. MPX and PKRU will do >> > > user_fpregs_move_to_memory() before writing to memory. >> > > kernel_fpu_begin() does user_fpregs_move_to_memory(). >> > >> > This is done by switch_fpu_prepare. >> > >> > Can you think of any other place in the kernel where we >> > need a 1 -> 2a transition, besides context switching and >> > ptrace/xfpregs_get? >> >> Anything else that tries to read task xstate from memory, i.e. MPX >> and >> PKRU. (Although if we switch to eager-switched PKRU, then PKRU stops >> mattering for this purpose.) >> >> Actually, I don't see any way your patches can be compatible with >> PKRU >> without switching to eager-switched PKRU. > > My patches preserve eager saving of the FPU state, they only > make restore lazy. That is 1 -> 2a transitions continue to > be eager. > > -- > All Rights Reversed. -- Andy Lutomirski AMA Capital Management, LLC