Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp1128459imm; Fri, 15 Jun 2018 11:40:48 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJ+PQ85lsbgvPmuFsSo9J5GwHwFClBHba3udByLfnRxOfPP4Iattvzii1Y8NseD4WWTC0Tx X-Received: by 2002:a63:2ac4:: with SMTP id q187-v6mr2663651pgq.333.1529088048503; Fri, 15 Jun 2018 11:40:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529088048; cv=none; d=google.com; s=arc-20160816; b=fJva+rI7QhVccxYcKnkRJVZmJCwVn2fF0ABJItqvkn1DYYjAvkmUwTUKbFrZzq5QxT f5kY285i+uWOD/Adg7XFWnBSeFse0ARCKt3YFxqphRYHdbaIzwAN4Gzeh5bW9s20eYAB iICpFuu3TWeZBj/7kHYIWCs57l1QazKmnvWtxSa4Io0jLxAMVm6seFzOzeM/T0V09PtZ xy1pIgV8pfgi6nucqiOmxT1AdWrzUG3RTchRSW2tCsJwOPjIs0X9y2EX5JE6jUglYthB pRK1yO0jqML82mVo7ziQBRSFGVlHogCdvHeRb/8Gc5D4YSpEh6CK8shsCBfHVNQzYUuF BJYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to:date :cc:to:from:subject:message-id:arc-authentication-results; bh=AA9Jnm74wOmMuVA+CjLzliYF8xGvSaZXFu+CAZqXoNY=; b=LExnTE2+tKcIdvj6aABCFyfR4fiWpthozw+tuktxruzrVsK1P8XLvzuVzvp2/oSi6v MYrHpLUKLKcosoIdKjLYanETW2Azi/di/PKRSGhRU1h2d813QtMbobd4+2baeVbG2PG4 a4ah9g1a+DYJ0HhsxvvakgPbz3j5DpHy9ReepUw9fyyq291XhstxZGFHJbrRPQowcJC1 GKxV1TzP+ONPs1FHke7H569VxThmzb4zVzBp+pFnsqb0AnDBVjc0cN0UAUM179NgADyq alR121RdXCIn1LpTNXvNf7WlXhTZ0DOzyQuv2nb7p2UBotm3+TquhGnOQGuUpHjE2LS3 IElg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m5-v6si6678819pgp.269.2018.06.15.11.40.33; Fri, 15 Jun 2018 11:40:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756484AbeFOSkH (ORCPT + 99 others); Fri, 15 Jun 2018 14:40:07 -0400 Received: from shelob.surriel.com ([96.67.55.147]:34162 "EHLO shelob.surriel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756418AbeFOSkF (ORCPT ); Fri, 15 Jun 2018 14:40:05 -0400 Received: from imladris.surriel.com ([96.67.55.152]) by shelob.surriel.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from ) id 1fTtdj-0003xb-07; Fri, 15 Jun 2018 14:40:03 -0400 Message-ID: <1529088002.7898.156.camel@surriel.com> Subject: Re: Lazy FPU restoration / moving kernel_fpu_end() to context switch From: Rik van Riel To: Andy Lutomirski , "Jason A. Donenfeld" , Dave Hansen Cc: LKML , X86 ML Date: Fri, 15 Jun 2018 14:40:02 -0400 In-Reply-To: References: Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-HGSaphGtUIGmRKIBrJtn" X-Mailer: Evolution 3.26.6 (3.26.6-1.fc27) Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-HGSaphGtUIGmRKIBrJtn Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2018-06-15 at 11:31 -0700, Andy Lutomirski wrote: > On Fri, Jun 15, 2018 at 6:11 AM Jason A. Donenfeld > wrote: > >=20 > > Hi Andy & folks, > >=20 > > Lots of crypto routines look like this: > >=20 > > kernel_fpu_begin(); > > encrypt(); > > kernel_fpu_end(); > >=20 > > If you call such a routine twice, you get: > >=20 > > kernel_fpu_begin(); > > encrypt(); > > kernel_fpu_end(); > > kernel_fpu_begin(); > > encrypt(); > > kernel_fpu_end(); > >=20 > > In a loop this looks like: > >=20 > > for (thing) { > > kernel_fpu_begin(); > > encrypt(thing); > > kernel_fpu_end(); > > } > >=20 > > This is obviously very bad, because begin() and end() are slow, so > > WireGuard does the obvious: > >=20 > > kernel_fpu_begin(); > > for (thing) > > encrypt(thing); > > kernel_fpu_end(); > >=20 > > This is fine and well, and the crypto API I'm working on will > > enable > > this to be done in a clear way, but I do wonder if maybe this is > > not > > something that should be happening at the level of the caller, but > > rather in the fpu functions themselves. Namely, what are your > > thoughts > > on modifying kernel_fpu_end() so that it doesn't actually restore > > the > > state, but just reenables preemption and marks that on the next > > context switch, the state should be restored? Then, essentially, > > kernel_fpu_begin() and end() become free after the first usage of > > kernel_fpu_begin(). > >=20 > > Is this something feasible? I know that performance-wise, I'm > > really > > gaining a lot from hoisting those functions out of the loops, and > > API > > wise, it'd be slightly simpler to implement if I didn't have to all > > for that hoisting. >=20 > Hi Jason- >=20 > Funny you asked. This has been discussed a few times, although not > quite in the form you imagined. The idea that we've tossed around is > to restore FPU state on return to user mode. Roughly, we'd introduce > a new thread flag TIF_FPU_UNLOADED (name TBD). > prepare_exit_to_usermode() would notice this flag, copy the fpstate > to > fpregs, and clear the flag. (Or maybe exit_to_usermode_loop() -- No > one has quite thought it through, but I think it should be outside > the > loop.) We'd update all the FPU accessors to understand the flag. > We'd probably have a helper like: >=20 > void unload_user_fpstate(void) >=20 > that would do nothing if TIF_FPU_UNLOADED is set. If > TIF_FPU_UNLOADED > is clear, it would move the state to memory and set TIF_FPU_UNLOADED. > We'd call unload_user_fpstate() during context switch in the prev > task > context and in kernel_fpu_begin(). >=20 > The one major complication I know of is that PKRU is different from > all other FPU state in that it needs to be loaded when *kernel* code > does any user memory access. So we'd need to make sure that context > switches load the new task's PKRU eagerly. Using WRPKRU is easy, > but, > unless we do something very clever, actually finding PKRU in the > in-memory fpstate image may be slightly nontrivial. IIRC the in-kernel FPU state always has every FPU field at a constant location. > I suppose we could eagerly load the new FPU state on context > switches, > but I'm not sure I see any point. We'd still have to load it when we > switch back to user mode, so it would be slower but not necessarily > any simpler. >=20 > I'd love to review patches to make this change, but I don't have the > bandwidth to write them right now. I can take a look at this, I am already looking at some context switch code right now, anyway. I also should still have the TIF_FPU_UNLOADED patches (or whatever the flag was called) code around somewhere. --=20 All Rights Reversed. --=-HGSaphGtUIGmRKIBrJtn Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEKR73pCCtJ5Xj3yADznnekoTE3oMFAlskCAIACgkQznnekoTE 3oMjuwf/SbbTYUatQ9ktCLm9yg78ti/RzOTxPZ25MNwDw9k8n5eUM5LBlxGYrG7o IyNccVyzBaihjGtFNiorjyV7/p5TuAdKZEk6GZYTO7xaGSsH5XVH+tc2TpZ/I4IS HqDtBKU6vWOcweCHtiZW9xunMLm+UBcWky1zyfAwfq14FY77OM4jpNSxKEv7bFgp wDoHm3GCXKKv7QqmIEBm6X93Wa5jpNkxmHzKZx4VET34YMSvPVVNu6YQGMerJ5Id 0D0F2Cj3RD478xbjDC/E3lNZD7a5AEq8AA8iQvAmVDD4dKTyNDl8SHIAHP0XNbmB OplSYr0nVHf5PtYa4oGxAfUkzQttGQ== =g/fG -----END PGP SIGNATURE----- --=-HGSaphGtUIGmRKIBrJtn--