Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp529652pxk; Wed, 2 Sep 2020 08:05:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzKVjtyLTfkFWS9xVfK4KGGqmd7AnZr6MuJPalcXUHv5KFSOZDHGFWyqYl9x4HqQLvywqsR X-Received: by 2002:aa7:c693:: with SMTP id n19mr497504edq.101.1599059111909; Wed, 02 Sep 2020 08:05:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599059111; cv=none; d=google.com; s=arc-20160816; b=ayOaF8b/aEpp/jEXSijpCrOpvLHPpvN0HW8dlbwlX4eE8+0bSD557YBkiyop8GhbNg NCt0JWTstGzP6RcVMlAIiPic9KUqnz63Nmoiq50Ls+aVgo7a0Tl050+83x7F+D8v9Vom dXHtdKKtzsjREQxRYoS3OAxGgwcFP5EGH9P9AwdKjuWWi7tTJFJb4/0fK9C9PKXvQtwp WpOVp8kA00V5pOforZQGDLOQ+3PcPYhVI1MauiClpE/4CeIeSErhQrr3SLp59QtdJVtx 486J0foykamyhcomZxcobv74ze4r7pTxLiROO3Oe7MgeLB66uoyYiXtQQxGeXRK9Z35s DJZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=OklpskQWuvIEpV5F+LgCqOkn2qyAgRlLsfvIsusqfgo=; b=pNFlbacQhLQdGR9Y0zEU7MKzr7PiA73TuXsoVW2qL+uK6SKJ7R1pBrwwWChk29oQjc UHmrBvveRTY+Xgbk8TrYNGj39wKWXKfULQI7daqLE/UiCBea7HIR8bGWbebQ1+ua0zw/ cmU4aSRV/r9pX+xmApVmGEXahPXOCpI5/TRj5YCJuj3QDQ0yAMHC9DAnfNIFCSMr/kwR PbslOmZ+mJLDyBfSM89jGWh/GHA22/w6TC4J0xmIebnSJEDxqasJFT1Q2pvn7gJfOgdo r6/TGWgclQMxoxWlkS6AirXBHUU7ToaY37OfHFb1EMfV44v7aDc5x2VlJtcdJvebp+nh +k0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=oiLdwJt5; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l9si2776778ejf.462.2020.09.02.08.04.45; Wed, 02 Sep 2020 08:05:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=oiLdwJt5; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726380AbgIBPAX (ORCPT + 99 others); Wed, 2 Sep 2020 11:00:23 -0400 Received: from mail.zx2c4.com ([192.95.5.64]:38823 "EHLO mail.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726967AbgIBPAT (ORCPT ); Wed, 2 Sep 2020 11:00:19 -0400 Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTP id f20b84ad for ; Wed, 2 Sep 2020 14:32:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=mime-version :references:in-reply-to:from:date:message-id:subject:to:cc :content-type; s=mail; bh=i9rH/PEnRK3NX/H+oGyvx4NJxYg=; b=oiLdwJ t5NqSXdIDEPqRdOhnWtIbYXIY7XJ8FXJq9of9nmVxZuJciZ2wid7+onwzCeIrIsl M0NPClRTGtwvYWhfF1AiL5vtKUXOXjMETSx1lksp2sRLZjaaFGVcQBZ6WnLpqqlb nOn38CMIN7h+T03rFEecPnsRZhkDZf98x4uC+NRwQyHrBeh4mxPbfm2xmPJRnxXP lcgs2+oWNFN21ceV9g2IOqet2pSyWtJAtkkgDL4Aai7liwHuSbpHjUKleau/Z6d9 /ZJAQVEYS5KnRyYiK6HKNcE7shyd9jEc9m9E8NL+u4wvg4uv/xKNaQYQXG1SesrK /OyyMIQUEk3+nCSw== Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id aaf55646 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Wed, 2 Sep 2020 14:32:08 +0000 (UTC) Received: by mail-io1-f51.google.com with SMTP id b6so5578483iof.6 for ; Wed, 02 Sep 2020 08:00:16 -0700 (PDT) X-Gm-Message-State: AOAM530I1AlEXejUc0BrJbL8O4WX+9iQwfR68fchDn8edNkLZmAcMbfN 3SqZ/q3lLEdku0NtJwEnnT9/fHGezoWAg9REZ+c= X-Received: by 2002:a6b:7112:: with SMTP id q18mr3701689iog.79.1599058815143; Wed, 02 Sep 2020 08:00:15 -0700 (PDT) MIME-Version: 1.0 References: <20200827173058.94519-1-ubizjak@gmail.com> <20200902091741.GX1362448@hirez.programming.kicks-ass.net> In-Reply-To: From: "Jason A. Donenfeld" Date: Wed, 2 Sep 2020 17:00:03 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] crypto/x86: Use XORL r32,32 in curve25519-x86_64.c To: Uros Bizjak Cc: Peter Zijlstra , Karthik Bhargavan , Chris.Hawblitzel@microsoft.com, Jonathan Protzenko , Aymeric Fromherz , Linux Crypto Mailing List , X86 ML , Herbert Xu , "David S. Miller" , Ard Biesheuvel Content-Type: text/plain; charset="UTF-8" Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Wed, Sep 2, 2020 at 1:42 PM Uros Bizjak wrote: > > On Wed, Sep 2, 2020 at 11:17 AM wrote: > > > > On Wed, Sep 02, 2020 at 07:50:36AM +0200, Uros Bizjak wrote: > > > On Tue, Sep 1, 2020 at 9:12 PM Jason A. Donenfeld wrote: > > > > > > > > On Tue, Sep 1, 2020 at 8:13 PM Jason A. Donenfeld wrote: > > > > > operands are the same. Also, have you seen any measurable differences > > > > > when benching this? I can stick it into kbench9000 to see if you > > > > > haven't looked yet. > > > > > > > > On a Skylake server (Xeon Gold 5120), I'm unable to see any measurable > > > > difference with this, at all, no matter how much I median or mean or > > > > reduce noise by disabling interrupts. > > > > > > > > One thing that sticks out is that all the replacements of r8-r15 by > > > > their %r8d-r15d counterparts still have the REX prefix, as is > > > > necessary to access those registers. The only ones worth changing, > > > > then, are the legacy registers, and on a whole, this amounts to only > > > > 48 bytes of difference. > > > > > > The patch implements one of x86 target specific optimizations, > > > performed by gcc. The optimization results in code size savings of one > > > byte, where REX prefix is omitted with legacy registers, but otherwise > > > should have no measurable runtime effect. Since gcc applies this > > > optimization universally to all integer registers, I took the same > > > approach and implemented the same change to legacy and REX registers. > > > As measured above, 48 bytes saved is a good result for such a trivial > > > optimization. > > > > Could we instead implement this optimization in GAS ? Then we can leave > > the code as-is. > > I don't think that e.g. "xorq %rax,%rax" should universally be > translated to "xorl %eax,%eax" in the assembler. Perhaps the author > expected exactly 3 bytes (to align the code or similar), and the > assembler would change the length to 2 bytes behind his back, breaking > the expectations. Are you sure that's something that GAS actually provides now? Seems like a lot of mnemonics have ambiguous/injective encodings, and this wouldn't make things any different. Most authors use .byte or .align when they care about the actual bytes, no?