Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752195AbcJDB3o (ORCPT ); Mon, 3 Oct 2016 21:29:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40618 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751521AbcJDB3n (ORCPT ); Mon, 3 Oct 2016 21:29:43 -0400 Message-ID: <1475544579.21644.37.camel@redhat.com> Subject: Re: [PATCH RFC 2/5] x86,fpu: delay FPU register loading until switch to userspace From: Rik van Riel To: Andy Lutomirski Cc: Dave Hansen , "linux-kernel@vger.kernel.org" , X86 ML , Thomas Gleixner , Paolo Bonzini , Ingo Molnar , Andrew Lutomirski , "H. Peter Anvin" , Borislav Petkov Date: Mon, 03 Oct 2016 21:29:39 -0400 In-Reply-To: References: <1475353895-22175-1-git-send-email-riel@redhat.com> <1475353895-22175-3-git-send-email-riel@redhat.com> <1475366900.21644.8.camel@redhat.com> <1475529679.4622.27.camel@redhat.com> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-mowIYilVUhiwGh7UKTBb" Mime-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 04 Oct 2016 01:29:42 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8274 Lines: 239 --=-mowIYilVUhiwGh7UKTBb Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, 2016-10-03 at 14:36 -0700, Andy Lutomirski wrote: > On Mon, Oct 3, 2016 at 2:21 PM, Rik van Riel wrote: > >=C2=A0 > > >=20 > > > One of my objections to the current code structure is that I find > > > it > > > quite hard to keep track of all of the possible states of the fpu > > > that > > > we have.=C2=A0=C2=A0Off the top of my head, we have: > > >=20 > > > 1. In-cpu state is valid and in-memory state is invalid.=C2=A0=C2=A0(= This > > > is > > > the > > > case when we're running user code, for example.) > > > 2. In-memory state is valid. > > > 2a. In-cpu state is *also* valid for some particular task.=C2=A0=C2= =A0In > > > this > > > state, no one may write to either FPU state copy for that task > > > lest > > > they get out of sync. > > > 2b. In-cpu state is not valid. > >=20 > > It gets more fun with ptrace. Look at xfpregs_get > > and xfpregs_set, which call fpu__activate_fpstate_read > > and fpu__activate_fpstate_write respectively. > >=20 > > It looks like those functions already deal with the > > cases you outline above. >=20 > Hmm, maybe they could be used.=C2=A0=C2=A0The names are IMO atrocious.=C2= =A0=C2=A0I had > to > read the docs and the code to figure out WTF "activating" the > "fpstate" even meant. If you have good naming ideas, I'd be willing to submit patches to rename them :) > > >=20 > > > Rik, your patches further complicate this by making it possible > > > to > > > have the in-cpu state be valid for a non-current task even when > > > non-idle. > >=20 > > However, we never need to examine the in-cpu state for > > a task that is not currently running, or being scheduled > > to in the context switch code. > >=20 > > We always save the FPU register state when a task is > > context switched out, so for a non-current task, we > > still only need to consider whether the in-memory state > > is valid or not. > >=20 > > The in-cpu state does not matter. > >=20 > > fpu__activate_fpstate_write will disable lazy restore, > > and ensure a full restore from memory. >=20 > Something has to fix up owner_ctx so that the previous thread doesn't > think its CPU state is still valid.=C2=A0=C2=A0I don't think > activate_fpstate_write does it, because nothing should (I think) be > calling it as a mattor of course on context switch. These functions are called on tasks that are ptraced, and currently stopped. fpu__activate_fpstate_write sets fpu->last_cpu to -1, which will cause the next context switch to the task to load the fpstate into the registers. The status in the struct fpu keeps track of the status of things, though maybe not as explicitly as you would want. Having two separate status booleans for "registers valid" and "memory valid" may make more sense. Running the task in user space, would have to ensure "registers valid" is true, and make "memory valid" false, because userspace could write to the registers. Doing a ptrace fpstate_read would make "memory valid" true, but does not need to invalidate "registers valid". Doing a ptrace fpstate_write would make "memory valid" true, but would invalidate "registers valid". Context switching away from a task would make "memory valid" true, by virtue of copying the fpstate to memory. Ingo, would you have any objection to patches tracking the validity of both register and memory states independently? We can get rid of fpu.counter, since nobody uses it any more. > > > =C2=A0This is why I'm thinking that we should maybe revisit the > > > API a bit to make this clearer and to avoid scattering around all > > > the > > > state manipulation quite so much.=C2=A0=C2=A0I propose, as a straw-ma= n: > > >=20 > > > user_fpregs_copy_to_cpu() -- Makes the CPU state be valid for > > > current. > > > After calling this, we're in state 1 or 2a. > > >=20 > > > user_fpregs_move_to_cpu() -- Makes the CPU state be valid and > > > writable > > > for current.=C2=A0=C2=A0Invalidates any in-memory copy.=C2=A0=C2=A0Af= ter calling > > > this, > > > we're in state 1. > >=20 > > How can we distinguish between these two? > >=20 > > If the task is running, it can change its FPU > > registers at any time, without any obvious thing > > to mark the transition from state 2a to state 1. >=20 > That's true now, but I don't think it's really true any more with > your > patches.=C2=A0=C2=A0If a task is running in *kernel* mode, then I think t= hat > pretty much all of the combinations are possible.=C2=A0=C2=A0When the tas= k > actually enters user mode, we have to make sure we're in state 1 so > the CPU regs belong to the task and are writable. That is fair. > > It sounds like the first function you name would > > only be useful if we prevented a switch to user > > space, which could immediately do whatever it wants > > with the registers. > >=20 > > Is there a use case for user_fpregs_copy_to_cpu()? >=20 > With PKRU, any user memory access would need > user_fpregs_copy_to_cpu().=C2=A0=C2=A0I think we should get rid of this a= nd > just > switch to using RDPKRU and WRPKRU to eagerly update PKRU in every > context switch to avoid this.=C2=A0=C2=A0Other than that, I can't think o= f any > use cases off the top of my head unless the signal code wanted to > play > games with this stuff. You are right, read_pkru() and write_pkru() can only deal with the pkru state being present in registers. Is this because of an assumption in the code, or because of a hardware requirement? > > > user_fpregs_copy_to_memory() -- Makes the in-memory state be > > > valid > > > for > > > current.=C2=A0=C2=A0After calling this, we're in state 2a or 2b. > > >=20 > > > user_fpregs_move_to_memory() -- Makes the in-memory state be > > > valid > > > and > > > writable for current.=C2=A0=C2=A0After calling this, we're in state 2= b. > >=20 > > We can only ensure that the in memory copy stays > > valid, by stopping the task from writing to the > > in-CPU copy. > >=20 > > This sounds a lot like fpu__activate_fpstate_read > >=20 > > What we need after copying the contents to memory is > > a way to invalidate the in-CPU copy, if changes were > > made to the in-memory copy. >=20 > My proposal is that we invalidate the CPU state *before* writing the > in-memory state, not after. Modifying the in-memory FPU state can only be done on a task that is not currently running in user space. As long as the modification is done before task runs again, that is safe. I am probably missing some special case you are concerned about. > > This is already done by fpu__activate_fpstate_write > >=20 > > >=20 > > > Scheduling out calls user_fpregs_copy_to_memory(), as do all the > > > MPX > > > and PKRU memory state readers.=C2=A0=C2=A0MPX and PKRU will do > > > user_fpregs_move_to_memory() before writing to memory. > > > kernel_fpu_begin() does user_fpregs_move_to_memory(). > >=20 > > This is done by switch_fpu_prepare. > >=20 > > Can you think of any other place in the kernel where we > > need a 1 -> 2a transition, besides context switching and > > ptrace/xfpregs_get? >=20 > Anything else that tries to read task xstate from memory, i.e. MPX > and > PKRU.=C2=A0=C2=A0(Although if we switch to eager-switched PKRU, then PKRU= stops > mattering for this purpose.) >=20 > Actually, I don't see any way your patches can be compatible with > PKRU > without switching to eager-switched PKRU. My patches preserve eager saving of the FPU state, they only make restore lazy. That is 1 -> 2a transitions continue to be eager. --=20 All Rights Reversed. --=-mowIYilVUhiwGh7UKTBb Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJX8wYDAAoJEM553pKExN6DVE0IAIAj0wm9tlJjRPgyCpNuFkyX eriu9b4lm14QgPCzm8k0ND6RpzFOHXgZcxSDEh8rWqRVYt5wYBerzpXYz6mcbQA9 gpWvlBdt28pPUbyX2F9eqphjrT4I8MAShV5ji7NpfRCs/V7Znnx9xHmwDj0hb3h5 o1Itm8BRRDohAsoBLu3dQTuFNzMuzngMVwJNX2BBjnDk24kyU3VVgaT16qJ4Bzaj 4DHuuRLGe7lUTbGh039QbR8uNkPE7hLrDfAt3ojCS2KSe47S6ykOXYGC/JFBr2TR iwV2ZdnrLvDuSMZwd2LA49vg4LUKRPQ8aQ2XA+FnLaNkoqn8v0XPcrxKzAOBEuY= =UPyH -----END PGP SIGNATURE----- --=-mowIYilVUhiwGh7UKTBb--