Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4056991imm; Mon, 18 Jun 2018 08:27:48 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJRPdt2q0QH95hwv74H0zJHTavztmvxUv4ReXypId5ffxqABHP/hXv2W/rB/3wQb3HzYxZs X-Received: by 2002:a62:904c:: with SMTP id a73-v6mr14022381pfe.145.1529335668102; Mon, 18 Jun 2018 08:27:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529335668; cv=none; d=google.com; s=arc-20160816; b=nSDa8Xj/5oBLAd/hsNaguBrgdhByxIRw2cMXLL9JJLzaXJb1qVh1acAiQv4VLcNSs0 vZBTDVDcOBTfjo8CijSXwTQ4BXnbcjezkDFpR/v3/iciQ8YG5ZPzOKZ/HPJrGAamHutV 8FWUBZa/XkZalwyMdlv19Y1J/zlzrZ2z20hV7z7QlCkF/cQ1E8Q6K9sw0cwcz8Jqg6JT AMt+TTCWFm6PLYMlyCvKzlQ5snL9OATSsxtaVUYhe9ypiOSgccRd/7oWaIdV5sP8znW+ HVWlZEFAPk3Lc4Db4lLX+WlwgzwdNvmK7onOAMZ9jD0S3rXFPkey1sC2vYKgOzXSQJwu 1CmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=9aKUqLcQIX4E+cwATLpBijk7c/P9MvRYH7DmQkEhSoU=; b=l7SaX2bcyrZtqvpHI1Iy7zJ+5HTjKNE1hnbnItbrRIFyC5stc4FaitJOdX1hd2zTp9 Dumwxvju3RF6+KqbSjy6sEIs7lKqXGiVrsn3NRuLagz6FL/tMWPysw3LZbl3nov0xVsZ 4Zr95fe5EIQP4qzM2BS+KxOmK1iPzHRjr2SomyoV0UrFZH2C3plqKaxr7S7wpFJXFomp IipHKvrI4ni2/gE3tDz7FjnK2RTs5nEzkRptGQL38Y1GwehWOhHHjXT74X5r6V+pVRhg aGzwQZgV6vq0Gwgx2wW8ARtKwomhkBDi2cGv16Gmoz+6HVelDWm1wxKLUI5fvFg5vOlb 2lKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=h9iPWzoP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f7-v6si15684797plb.253.2018.06.18.08.27.33; Mon, 18 Jun 2018 08:27:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=h9iPWzoP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754692AbeFRP0H (ORCPT + 99 others); Mon, 18 Jun 2018 11:26:07 -0400 Received: from frisell.zx2c4.com ([192.95.5.64]:60569 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754110AbeFRP0F (ORCPT ); Mon, 18 Jun 2018 11:26:05 -0400 Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTP id 87a6dfe5 for ; Mon, 18 Jun 2018 15:20:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=mime-version :references:in-reply-to:from:date:message-id:subject:to:cc :content-type; s=mail; bh=COjnSFlcb4X92uEDe3+JDCHJYRw=; b=h9iPWz oPiYkpjd8ZkYIVJ2zWyXWp4Me2F7+CvoKRiuNhjFo/IZ/a4Cfc3DJQDXHiv94FCD bcoBn5mg2SWvQo8jjpVfAencFBkvEwvO5UKYjS2hd1rKwop569mzFWVUixKwaY8s NTtYNhIvr3ZmRQHhCYVr+qdixNvvozuo9GAEY9lV2jat33TZaCVYKNgmZakCOTH0 Cnpof3TbtPC3JYvI7pZkUg9hCBh03hHaSrRwQEcfPb/qojTjVOu69AZm0oxXRVRo LtQGO63haKmZsHaiy5bgwdH7t9rOtP07cd4UFGTCzi+iwOnX8tzrdC4kfsM1WGGf xn9Z6tZaPG73Jp2g== Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id f43305d9 (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128:NO) for ; Mon, 18 Jun 2018 15:20:19 +0000 (UTC) Received: by mail-oi0-f45.google.com with SMTP id a141-v6so15184899oii.8 for ; Mon, 18 Jun 2018 08:26:04 -0700 (PDT) X-Gm-Message-State: APt69E3lF7hSmmuMlZKIhQA/FJQz9WhU5cph55Z89uy8ZxIh+fW+9KxJ vNeTNb/7hhxFBeO4/RLR5EjcDn8PWQG8u6NcHGE= X-Received: by 2002:aca:e40b:: with SMTP id b11-v6mr6593204oih.243.1529335564243; Mon, 18 Jun 2018 08:26:04 -0700 (PDT) MIME-Version: 1.0 References: <20180615193438.GE2458@hirez.programming.kicks-ass.net> <20180618094447.GG2458@hirez.programming.kicks-ass.net> In-Reply-To: <20180618094447.GG2458@hirez.programming.kicks-ass.net> From: "Jason A. Donenfeld" Date: Mon, 18 Jun 2018 17:25:53 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Lazy FPU restoration / moving kernel_fpu_end() to context switch To: Peter Zijlstra Cc: Thomas Gleixner , LKML , X86 ML , Andy Lutomirski Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 18, 2018 at 11:44 AM Peter Zijlstra wrote: > > On Fri, Jun 15, 2018 at 10:30:46PM +0200, Jason A. Donenfeld wrote: > > On Fri, Jun 15, 2018 at 9:34 PM Peter Zijlstra wrote: > > > Didn't we recently do a bunch of crypto patches to help with this? > > > > > > I think they had the pattern: > > > > > > kernel_fpu_begin(); > > > for (units-of-work) { > > > do_unit_of_work(); > > > if (need_resched()) { > > > kernel_fpu_end(); > > > cond_resched(); > > > kernel_fpu_begin(); > > > } > > > } > > > kernel_fpu_end(); > > > > Right, so that's the thing -- this is an optimization easily available > > to individual crypto primitives. But I'm interested in applying this > > kind of optimization to an entire queue of, say, tiny packets, where > > each packet is processed individually. Or, to a cryptographic > > construction, where several different primitives are used, such that > > it'd be meaningful not to have to get the performance hit of > > end()begin() in between each and everyone of them. > > I'm confused.. how does the above not apply to your situation? In the example you've given, the optimization is applied at the level of the, say, encryption function. Suppose you send a scattergather off to an encryption function, which then walks the sglist and encrypts each of the parts using some particular key. For each of the parts, it benefits from the above optimization. But what I'm referring to is encrypting multiple different things, with different keys. In the case I'd like to optimize, I have a worker thread that's processing a large queue of separate sglists and encrypting them separately under different keys. In this case, having kernel_fpu_begin/end inside the encryption function itself is a problem, since that means toggling the FPU in between every queue item. The solution, for now, is to just hoist the kernel_fpu_begin/end out of the encryption function, and put them instead at the beginning and end of my worker thread that handles all the items of the queue. This is fine and dandy, but far from ideal, as putting that kind of logic inside the encryption function itself makes more sense. For example, the encryption function can decide whether or not it even wants to use the FPU before calling kernel_fpu_begin. Ostensibly this logic too could be hoisted outside, but at what point do you draw the line and decide these optimizations are leading the API in the wrong direction? Hence, the idea here in this thread is to make it cost-free to place kernel_fpu_begin/end as close as possible to the actual use of the FPU. The solution, it seems, is to have the actual kernel_fpu_end work occur on context switch.