Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp568795imm; Fri, 5 Oct 2018 08:16:44 -0700 (PDT) X-Google-Smtp-Source: ACcGV61Oj3BWsCt2WzdZlQDiTo7bqkMrZArHs3YHJwcHL8yFM+eIaUXx/18uB40ohLsDrPkGq+yC X-Received: by 2002:a17:902:650f:: with SMTP id b15-v6mr12100639plk.2.1538752604481; Fri, 05 Oct 2018 08:16:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538752604; cv=none; d=google.com; s=arc-20160816; b=Pj5H031HDNY1wlWP60ZaPjbIz7ul2m4YLWHbMucBiw0WCKjTS1tI6+wGI8RecX2Nxi vn3p4zshI53MrAkR+cDb4rqn50L1aGfND6YC6WDLhpJo6h3ch9TMKcTM1/NV4kAjYyD7 6aCnrfHl8kivkcpsSaOtQ/D1+FMbWmB4pNFVNEAUjZt4gvgB8pnlVklDcs65iOw320bX LchFStUIX6oQ2O0UAiUgdsKVop6dW7XVbzua7KOF+d4io7FiWxbhMrJKkBv9ExxvzHI+ 5Q2rTpUMQtjsdzSY28mBSYppGZgTbOH/FUJuUoNCVKw66O/46c8gcl4UjQUl2Ffbyi4n sIPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature; bh=K2fR45txb+KGTYn8sBXSjm3wPg1ebjIW3jHzlvMEYbk=; b=WDf1XuLsnojYs6xcZjJBrnTJxmbEnwnYYbDNFIvG7guMrrdRtP9+fnXg7jYqiRfUo9 t1nIpcGlA6y5gQ0H6dmhGoCokZ0pIZLJvM7UwlieIBZlybshI/QeCESqVFX9+TTyLEs6 8UmMRrr/UWhZK1GARv9kIafZNgpO0Gw6ryqTy79sFV+NUqyodLhHSxhamiHhdiUd1P54 ufZMIVUM1054C8K6REIuikyAl9namySBgTfQl+QM4vMIaWLkofev1fmmDiW4yCDJSD4v SfftFZLFBgXRGldHLpxI22sZAvkEG9QDLFnljlsN1ndUU/e65D/X4pOQg2a3Uih80nQN CERA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=YnvOXkCd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k25-v6si7155641pgl.239.2018.10.05.08.16.28; Fri, 05 Oct 2018 08:16:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=YnvOXkCd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728260AbeJEWPT (ORCPT + 99 others); Fri, 5 Oct 2018 18:15:19 -0400 Received: from mail-io1-f66.google.com ([209.85.166.66]:45248 "EHLO mail-io1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726082AbeJEWPS (ORCPT ); Fri, 5 Oct 2018 18:15:18 -0400 Received: by mail-io1-f66.google.com with SMTP id e12-v6so10918871iok.12 for ; Fri, 05 Oct 2018 08:16:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=K2fR45txb+KGTYn8sBXSjm3wPg1ebjIW3jHzlvMEYbk=; b=YnvOXkCdJ1oJ+UhIdJGdzU3Ri0RkbyrwIJSzkOse46phpV6Ep6iwujnFO+mdgF0dB1 KXlJ/6LiRql0L7x19ycqtf+Ef8Ixc+VDOWZ+LVDGdlQ7j/6VTmptn2vgwF/TUABJ1ISD yyKEy7g+GTWcM2J09NkvhPENi/JMn+c+HUuVI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=K2fR45txb+KGTYn8sBXSjm3wPg1ebjIW3jHzlvMEYbk=; b=VQ+oB0EptN5keQgtcHNrZf2hDlTs1lJ8XQkM7azKUJqx1P8ZVKYsgM+u3PAtcsIP4i gC2gxVWTC00oUVugMvl6MchrD2WjPm4kJ2wotc6DIgoMnDY0pYfOCmJ9JaEVnzPntTV4 ZNeU6UYem/DQAe2fSlcoNZJYH4wcjoTQ1Co8jAhTDliivZIICf5+Y9C0YPMxLOWHYfqZ lXDL+1M4hkgg3hzJgFvYK9VMlRgw72l3s/WmvoScKYvYHyPs9iOxKJ9P3NTdvbHk6lGG II9l782lrtfZSe0TmH3aS2GYHj9mbVxWhVNZtGcLoQ69fx2XOeceXLsM+8QS0f2HLAeI tHjQ== X-Gm-Message-State: ABuFfoj3bdnALG2ML8GOXh15jH1/DJclRC9yWQ3DAiVpFIETSLQO+J5j kJI5F8sfoqP5xaXZTPw7IkukUgXBotB8+i4rFkEWPA== X-Received: by 2002:a6b:3787:: with SMTP id e129-v6mr8524150ioa.60.1538752569558; Fri, 05 Oct 2018 08:16:09 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a6b:5910:0:0:0:0:0 with HTTP; Fri, 5 Oct 2018 08:16:08 -0700 (PDT) In-Reply-To: <20181005150538.17006.qmail@cr.yp.to> References: <20180925145622.29959-1-Jason@zx2c4.com> <20180925145622.29959-20-Jason@zx2c4.com> <20181005150538.17006.qmail@cr.yp.to> From: Ard Biesheuvel Date: Fri, 5 Oct 2018 17:16:08 +0200 Message-ID: Subject: Re: [PATCH net-next v6 19/23] zinc: Curve25519 ARM implementation To: "Jason A. Donenfeld" , Ard Biesheuvel , Linux Kernel Mailing List , "" , "open list:HARDWARE RANDOM NUMBER GENERATOR CORE" , "David S. Miller" , Greg Kroah-Hartman , Samuel Neves , Andy Lutomirski , Jean-Philippe Aumasson , Russell King , linux-arm-kernel , peter@cryptojedi.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5 October 2018 at 17:05, D. J. Bernstein wrote: > For the in-order ARM Cortex-A8 (the target for this code), adjacent > multiply-add instructions forward summands quickly. A simple in-order > dot-product computation has no latency problems, while interleaving > computations, as suggested in this thread, creates problems. Also, on > this microarchitecture, occasional ARM instructions run in parallel with > NEON, so trying to manually eliminate ARM instructions through global > pointer tracking wouldn't gain speed; it would simply create unnecessary > code-maintenance problems. > > See https://cr.yp.to/papers.html#neoncrypto for analysis of the > performance of---and remaining bottlenecks in---this code. Further > speedups should be possible on this microarchitecture, but, for anyone > interested in this, I recommend focusing on building a cycle-accurate > simulator (e.g., fixing inaccuracies in the Sobole simulator) first. > > Of course, there are other ARM microarchitectures, and there are many > cases where different microarchitectures prefer different optimizations. > The kernel already has boot-time benchmarks for different optimizations > for raid6, and should do the same for crypto code, so that implementors > can focus on each microarchitecture separately rather than living in the > barbaric world of having to choose which CPUs to favor. > Thanks Dan for the insight. We have already established in a separate discussion that Cortex-A7, which is main optimization target for future development, does not have the microarchitectural peculiarity that you are referring to that ARM instructions are essentially free when interleaved with NEON code. But I take your point re benchmarking (as I already indicated in my reply to Jason): if we optimize towards speed, we should ideally reuse the existing benchmarking infrastructure we have to select the fastest implementation at runtime. For instance, it turns out that scalar ChaCha20 is almost as fast as NEON (or even faster?) on A7, and using NEON in the kernel has some issues of its own.