Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp901624iob; Wed, 4 May 2022 10:15:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzUIbUfnyrENGNli8xnKRSM7Hsowhi0pdhUuXQlR7/yxcxHcq0o66HX3Y2QNfKfMBhSoc+L X-Received: by 2002:a17:907:7282:b0:6ef:f412:f7c4 with SMTP id dt2-20020a170907728200b006eff412f7c4mr20975317ejc.258.1651684507586; Wed, 04 May 2022 10:15:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651684507; cv=none; d=google.com; s=arc-20160816; b=YTxO44wJn+T78kwZ8m9dG35iwD3YWoE9+pv5cmWmvnXNlP8gyKwzvYz+k2xEow4xEo +wK+Pr8yJEWsNDcCI9zN5F8D8H+LjAxChOU7YB7rUG2k6zkjSkeR06Hop1EDHaoZub3o LcKFqY/HjHpeUISkQC6tAjze1p8foiGliz6SbzcS6xnfiHZSs1evl3EqkFU/XmXRcBEI YoBkww7c5c8+TLD5gOi3QsXIQdxNrKqGPpDzCqenVKv0vyJMG0XwCkpUBuW0jffs/p19 q03cj/HzWsXXblCCs1AFQ1+mAwPbGFDWGPhL8tpkR/rmpFgUQUxiW96vt54cP8Wf+Cm1 j4dA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=jXLtyiQ0yzGv0MwBOzbiZq4Wuv0+MbxgkH7y3R8DVS4=; b=x/+Cj1ANdR9RiuLUNt8NDbmaY++9yo/E7Ts/IxwT46NESofsjXjf1x7Cpia+b508eu 2SUmRbZiCO232fKXqVdLd4dChbzKQNyqNgx2LELmJ1TAneFyybu2jAH+CXJaq9h/JTdi Wb/3oL0JfjmuYZFkglNNG5w5wai81hyg1NdLwmo4TytzNLtNJc15H/24YIdqWGmTlXwt aCbiIcQ1Wt5LIVxpL5ms+CIWyjpEtcdIP0fkuAZelA8vbt4ZQyEyNHPR4TemcYcK6Q2W fjX9dAzkRMTFxT1PwiFF4e4Q0Wev48sK6TFYBcJb/63Q3njCHi8zCmRvqLjGUN8dGlzg 3Rwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=TavzHSeF; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id zk10-20020a17090733ca00b006e89480fb4dsi16654708ejb.344.2022.05.04.10.14.42; Wed, 04 May 2022 10:15:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=TavzHSeF; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352542AbiEDPkR (ORCPT + 99 others); Wed, 4 May 2022 11:40:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1352537AbiEDPkQ (ORCPT ); Wed, 4 May 2022 11:40:16 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D10F1659C; Wed, 4 May 2022 08:36:40 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1651678599; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jXLtyiQ0yzGv0MwBOzbiZq4Wuv0+MbxgkH7y3R8DVS4=; b=TavzHSeFulEXeEchW8J2tstpjC4Ump7rBph2KtJdn/m2KNb3XlTkcX5shbBVGnotH6Zjr1 zyzNNaGMCU+18h/Ymjz13Eu4cvzgLKkzfi048oOk2v5QznFrANKYAvgW/HdAXt3l0li20J +MrGXVfrevPJVZ/x+9QOiZQ/b7x6Ks1ot+nrxA9hcRwoXMgyLDS1s9MfE9nphOOEV8FV/V acKLJIVEHwzhWi7MbXSXgDqQUhf2PRCINMdESR/opEJJuz+cuycb7H4dYDDPZxbQt1rxY0 hQ5roQ7rQtqsayhb2KlseGSaYnOBR4g2w5yuMOkyoz2f7Ex0YZmBwITOVxXWZw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1651678599; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jXLtyiQ0yzGv0MwBOzbiZq4Wuv0+MbxgkH7y3R8DVS4=; b=97RwyIZmaBA3/a0DfH0+3MS8W+jR6bZDalTSkrWixSHJK24ep0NRbiRQ9awpdskhiAfGum /5sy7rNqEAkYIEAg== To: Peter Zijlstra Cc: Borislav Petkov , LKML , x86@kernel.org, Filipe Manana , "Jason A. Donenfeld" , linux-crypto@vger.kernel.org Subject: Re: [patch 3/3] x86/fpu: Make FPU protection more robust In-Reply-To: References: <20220501192740.203963477@linutronix.de> <20220501193102.704267030@linutronix.de> <87k0b4lydr.ffs@tglx> Date: Wed, 04 May 2022 17:36:38 +0200 Message-ID: <87fslpjomx.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Tue, May 03 2022 at 11:06, Peter Zijlstra wrote: > On Mon, May 02, 2022 at 05:58:40PM +0200, Thomas Gleixner wrote: >> Right, though currently it's guaranteed that softirq processing context >> can use the FPU. Quite some of the network crypto work runs in softirq >> context, so this might cause a regression. If so, then this needs to be >> an explicit commit on top which is easy to revert. Let me stare at it >> some more. > > Right, so with the: > > preempt_disable(); > this_cpu_write(fpu_in_use, true); > barrier(); > > sequence it is safe against both softirq and hardirq fpu usage. The only > concern is performance not correctness when dropping that > local_bh_disable() thing. > > So what Thomas proposes makes sense to me. Now I was looking at it the other way round too; i.e. to use local_bh_disable() for both fpregs_lock() and kernel_fpu_begin(). Using local_bh_disable() for both fpregs_lock() and kernel_fpu_begin() is not possible with the current constraints, because kernel_fpu_begin() can be called from hard interrupt context. But the only use case which utilizes FPU from hard interrupt context is the random generator via add_randomness_...(). I did a benchmark of these functions, which invoke blake2s_update() three times in a row, on a SKL-X and a ZEN3. The generic code and the FPU accelerated code are pretty much on par vs. execution time of the algorithm itself plus/minus noise. But in case that the interrupt hits a userspace task the FPU needs to be saved and if the interrupt does not result in rescheduling then the return to user space has to restore it. That's _expensive_ and the actual cost depends on the FPU state, but 200-300 cycles for save and 200-700 cycles for restore are due. Even if we ignore the save/restore part and assume that it averages out vs. schedule() having to save FPU state anyway, then there is another aspect to this: power consumption which affects also thermal budget and capacity. Though that made me more curious and I did the same comparison for crc32 which is heavily used by ext4. crc32c_pcl_intel_update() already contains a switch to software when the buffer length is less than 512 bytes. But even on larger buffers, typically ~4k, FPU is not necessarily a win. It's consistently slower by a factor of ~1.4x. And that's not due to xsave/rstor overhead because these computations run on a worker thread which does not do that dance at all. IOW, using the FPU blindly for this kind of computations is not necessarily a good plan. I have no idea how these things are analyzed and evaluated if at all. Maybe the crypto people can shed some light on this. Thanks, tglx