Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp325065pxk; Wed, 2 Sep 2020 02:18:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwLgu/a27GfxIOmO8TYVbTmSulHgIqocwmHYP/eJqLAyEHKU476FOt0X17RKVvmrccJGzX6 X-Received: by 2002:a17:906:2998:: with SMTP id x24mr5051817eje.131.1599038313592; Wed, 02 Sep 2020 02:18:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599038313; cv=none; d=google.com; s=arc-20160816; b=unYztXDRGbMNJFkLUB3g29NTk8/s1nRwmOxxvWJb0vnnc4sYAXtSWyvMcajV+Ib6t/ sw6aIbPLW5oB/Nu9hKdHWGXFVoo7vw+uElrwosiiIgZjiTYO/KoLIfFLqK2naIvF2m3a Xla/HEXg5I8QbCHkTsKrS++eRtSyh0PpSGGWy5AmlVZw9gcPLgCFiX40hZS0TnAgWxfU 0eTBAr5VPf7J618FfMM0ZvEt5WYP4Vq03ln4WmIp1HAGE10ohDyvFq4kVYW4ZJEezmoW yeEEWNvRWNY6h5YNsVp0fu8EdkkvClsAo+iw9idfUa+H2KUamDmLe3/8Hdv/G7HFGpxh FltQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=ZnearYLs9EAIjRLhenewv8wpyf6ZjBlTieaIgpiCvRI=; b=qgk7Hl2miPnShtvolq0a4qGbS5tqWZR8VxdPJMff2snsb+TPXye+90tYVC9FvfKgK0 vrrbsWwPkAXhoglP5iaKx+YjUQKrFsc6/ye23sXxBqv/U6G9l0gjL1Wsi+mjxtVzqtGv EyXSplKuLFYQbdzv6YxHxF2a4Ro6tMBk6tOloyg2VwgMp0uYO/CuWeVWYvGhGIdQk8ZE JxCC7c9zSz1tP5LobXLBboA45n1RSSnoR9bC8ddHhHESXW2KnKVJJd37FjCsk9s0gS8+ 0aLBcYl401xpsUkiwvTHWGSlbY/hE3X82bZcNQFFwjqstH6B6LGXETq6KR72qqahNllG mCJg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=FgAXutFv; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g19si2342531ejf.115.2020.09.02.02.18.09; Wed, 02 Sep 2020 02:18:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=FgAXutFv; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726400AbgIBJR5 (ORCPT + 99 others); Wed, 2 Sep 2020 05:17:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726307AbgIBJR4 (ORCPT ); Wed, 2 Sep 2020 05:17:56 -0400 Received: from merlin.infradead.org (merlin.infradead.org [IPv6:2001:8b0:10b:1231::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E315FC061244 for ; Wed, 2 Sep 2020 02:17:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=ZnearYLs9EAIjRLhenewv8wpyf6ZjBlTieaIgpiCvRI=; b=FgAXutFvWZotVv/+SUruPU58fq OTwMCUWfym9BW16aBAVC2LSpvjb/3hC0MQAohOj3CwN3heE/WtlRPaZFqxt6MDsk0MQmQ6E7C5Ikr d1zYqudfWSxdWw2z0jO+8j9SrNsIcJDP3jNVy3gx7PpXwCLgjZpGYl+M2r6I5jZYepFHjiXHdjcyI IkQDrP13wLppi64ywsJhaHn3Tb+T+ccPia2GCHUUATQS+MgPFbMPNZFx6VBw9OxXP1iHiBBdJh5Ic lS6aupD2dDG7ZhBW1XCctk+X81aeELgQazwQZAiocAOL1b5fESqWxzulkxTVKhTuys90D2nvSo64L H991ObbQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kDOtj-0007u6-Mr; Wed, 02 Sep 2020 09:17:44 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id BE49A304B92; Wed, 2 Sep 2020 11:17:41 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 850AC20598A34; Wed, 2 Sep 2020 11:17:41 +0200 (CEST) Date: Wed, 2 Sep 2020 11:17:41 +0200 From: peterz@infradead.org To: Uros Bizjak Cc: "Jason A. Donenfeld" , Karthik Bhargavan , Chris.Hawblitzel@microsoft.com, Jonathan Protzenko , Aymeric Fromherz , Linux Crypto Mailing List , X86 ML , Herbert Xu , "David S. Miller" , Ard Biesheuvel Subject: Re: [PATCH] crypto/x86: Use XORL r32,32 in curve25519-x86_64.c Message-ID: <20200902091741.GX1362448@hirez.programming.kicks-ass.net> References: <20200827173058.94519-1-ubizjak@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Wed, Sep 02, 2020 at 07:50:36AM +0200, Uros Bizjak wrote: > On Tue, Sep 1, 2020 at 9:12 PM Jason A. Donenfeld wrote: > > > > On Tue, Sep 1, 2020 at 8:13 PM Jason A. Donenfeld wrote: > > > operands are the same. Also, have you seen any measurable differences > > > when benching this? I can stick it into kbench9000 to see if you > > > haven't looked yet. > > > > On a Skylake server (Xeon Gold 5120), I'm unable to see any measurable > > difference with this, at all, no matter how much I median or mean or > > reduce noise by disabling interrupts. > > > > One thing that sticks out is that all the replacements of r8-r15 by > > their %r8d-r15d counterparts still have the REX prefix, as is > > necessary to access those registers. The only ones worth changing, > > then, are the legacy registers, and on a whole, this amounts to only > > 48 bytes of difference. > > The patch implements one of x86 target specific optimizations, > performed by gcc. The optimization results in code size savings of one > byte, where REX prefix is omitted with legacy registers, but otherwise > should have no measurable runtime effect. Since gcc applies this > optimization universally to all integer registers, I took the same > approach and implemented the same change to legacy and REX registers. > As measured above, 48 bytes saved is a good result for such a trivial > optimization. Could we instead implement this optimization in GAS ? Then we can leave the code as-is.