Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752138AbcJBAIa (ORCPT ); Sat, 1 Oct 2016 20:08:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58834 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751263AbcJBAIY (ORCPT ); Sat, 1 Oct 2016 20:08:24 -0400 Message-ID: <1475366900.21644.8.camel@redhat.com> Subject: Re: [PATCH RFC 2/5] x86,fpu: delay FPU register loading until switch to userspace From: Rik van Riel To: Andy Lutomirski Cc: "linux-kernel@vger.kernel.org" , X86 ML , Thomas Gleixner , Paolo Bonzini , Ingo Molnar , Andrew Lutomirski , "H. Peter Anvin" , Dave Hansen , Borislav Petkov Date: Sat, 01 Oct 2016 20:08:20 -0400 In-Reply-To: References: <1475353895-22175-1-git-send-email-riel@redhat.com> <1475353895-22175-3-git-send-email-riel@redhat.com> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-PO385DLOKFsB+HEfKaAW" Mime-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Sun, 02 Oct 2016 00:08:24 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6129 Lines: 161 --=-PO385DLOKFsB+HEfKaAW Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, 2016-10-01 at 16:44 -0700, Andy Lutomirski wrote: > On Sat, Oct 1, 2016 at 1:31 PM,=C2=A0=C2=A0 wrote: > >=C2=A0 > > =C2=A0#define CREATE_TRACE_POINTS > > =C2=A0#include > > @@ -197,6 +198,9 @@ __visible inline void > > prepare_exit_to_usermode(struct pt_regs *regs) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (unlikely(cached_fla= gs & EXIT_TO_USERMODE_LOOP_FLAGS)) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0exit_to_usermode_loop(regs, cached_flags); > >=20 > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (unlikely(test_and_clear_= thread_flag(TIF_LOAD_FPU))) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0switch_fpu_return(); > > + >=20 > How about: >=20 > if (unlikely(...)) { > =C2=A0 exit_to_usermode_loop(regs, cached_flags); > =C2=A0 cached_flags =3D READ_ONCE(ti->flags); > } >=20 > if (ti->flags & _TIF_LOAD_FPU) { > =C2=A0 clear_thread_flag(TIF_LOAD_FPU); > =C2=A0 switch_fpu_return(); > } It looks like test_thread_flag should be fine for avoiding atomics, too? =C2=A0 if (unlikely(test_thread_flag(TIF_LOAD_FPU))) { =C2=A0 =C2=A0 =C2=A0 clear_thread_flag(TIF_LOAD_FPU); =C2=A0 =C2=A0 =C2=A0 switch_fpu_return(); =C2=A0 } > > +static inline void switch_fpu_finish(void) > > =C2=A0{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0set_thread_flag(TIF_LOAD_FPU= ); > > =C2=A0} >=20 > I can imagine this causing problems with kernel code that accesses > current's FPU state, e.g. get_xsave_field_ptr().=C2=A0=C2=A0I wonder if i= t > would > make sense to make your changes deeper into the FPU core innards so > that, for example, we'd have explicit functions that cause the Whereabouts do you have in mind? I like your general idea, but am not sure quite where to put it. > in-memory state for current to be up-to-date and readable, to cause > the in-memory state to be up-to-date and writable (which is the same > thing + TIF_FPU_LOAD + whatever other bookkeeping), and causing the > in-CPU state to be up-to-date (possibly readable and writable). > TIF_LOAD_FPU would trigger the latter. >=20 > I've often found it confusing that fpu__save by itself has somewhat > ill-defined effects. Fully agreed on both. I guess that means we want a few different functions: 1) initialize FPU state, both in memory and in registers 2) cause FPU state in registers to be updated from memory, =C2=A0 =C2=A0if necessary 3) cause FPU state in memory to be updated from registers, =C2=A0 =C2=A0if necessary The latter functions could use fpu->fpregs_active to see whether any action is required, and be called from various places in the code. The signal code in particular is doing some strange things that should probably be hidden away in FPU specific functions. >=20 > >=20 > > +/* > > + * Set up the userspace FPU context before returning to userspace. > > + */ > > +void switch_fpu_return(void) > > +{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct fpu *fpu =3D ¤t= ->thread.fpu; > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0bool preload; > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* If the task has used= the math, pre-load the FPU on xsave > > processors > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* or if the past 5 con= secutive context-switches used math. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0preload =3D static_cpu_has(X= 86_FEATURE_FPU) && > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0fpu->fpstate_active && > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0(use_eager_fpu() || fpu->counter > 5); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (preload) { > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0prefetch(&fpu->state); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0fpu->counter++; > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0__fpregs_activate(fpu); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0trace_x86_fpu_regs_activated(fpu); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0/* Don't change CR0.TS if we just switch! */ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0if (!__this_cpu_read(fpu_active)) { > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0__fpre= gs_activate_hw(); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0__this= _cpu_write(fpu_active, true); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0} >=20 > We should just finish getting rid of all TS uses. Agreed. I will rebase on top of your FPU changes, that will make things easier for everybody. --=20 All Rights Reversed. --=-PO385DLOKFsB+HEfKaAW Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJX8E/1AAoJEM553pKExN6DWSMH/jTywkE2TQJtsiLEvPcNuVei S2fH1wOepDu5hhho7RkK/uK3ajRbHsWTdiibz7i0i7aFS7kfkYK+XSSN4NKlJcYR ENBsww4AszGMw5k8ncmFtLsuRl3nbAUhEraAIjBo3qjIKWnvDA1yYRxR+KQN6UaA ho5beFcg/4ViepIpuUnf5lE/Uu7QWgAgwMHiLNfnNOE5JNF4y6p64fVxpgFAImOf //A+A8HnD/IHZJFbLW70WydjW1TYJKqzgRHTMEOsy6c2AmjQdeaO8kZ9eJ5fpgSW fvAzanBFME4Pov1m8q0g67RshgK1fzun57OBcEusQof2KzOJcLdEsd0sk5UrabY= =tmsk -----END PGP SIGNATURE----- --=-PO385DLOKFsB+HEfKaAW--