Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp70735imm; Tue, 14 Aug 2018 14:13:58 -0700 (PDT) X-Google-Smtp-Source: AA+uWPzbzFzkiAwNiutTBByRC2W6vhrYKG54Tc+zxFGN7DNWwLLmAeBG1o8aKPf5Xuuz7wayN3A+ X-Received: by 2002:a17:902:b608:: with SMTP id b8-v6mr22079812pls.312.1534281238858; Tue, 14 Aug 2018 14:13:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534281238; cv=none; d=google.com; s=arc-20160816; b=kd+5rKxshGppCB71QqLnsRuI5RySXZsePYR8lhi3gQv/QmYIzRuuKHpq1zPpSMWdV6 2FVDLq1+X/zE9uducCNiPeNNtBkShwsnWybG1DW9cJnNeSaCXL83Y5VE/fYWpuT1SdeP oiXCu0P1cs/B9IqyP+naq0tcMB8ZByHtxURYvdkRO1/nzBNvjDsx4eI5D2z2aGnNw9gf COM/6gO5vE7ansFHJKi38WQtuAQxgwJr2XTe5DFPnqM1aVjxtmDDW201art7lzMCJrGZ 15rJXd3nMvBOtuXC4Wq3vw3IySWPSOoVFJ4mqJI84StD2dZ7G3csFDklkTLwQsWU68Kb d7kQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=mY796dr+M0eleQDvd3pC3m/olwVKOG8bENfS+MilEZM=; b=vfljo/8h9lE04FC6uK+e+naCJ2mdF0qOaMslQrfJSEoX+y8OjMZE+BabFu5jIwlw5q shuLjysr95HuatKnwlZW1Rdfz451ayenZg15u79D6yQhp7SjvdWf/PLONwUgoD4Lr8g8 38MplJ2JwSDLrXAR93dcQyD6SQO4r8dBphA8GVxj+fImgdgqfpQ5AD8/MAmunXHvJN2N JmnhIFJuYjVorQklwa3aInDPx7wZBinVbTEDOSqYTGLMD/f8/UVOjnrgrzpSCBL4Cg1T 2W/DExUkE4ohb5mqbttoaQe1aG/8pIyQ5G5ZPfTZ6i8fPjtKdoVYqh7XvXVduK2MjbUo T3yQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ucs7vjer; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bj4-v6si17073548plb.119.2018.08.14.14.13.42; Tue, 14 Aug 2018 14:13:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ucs7vjer; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728573AbeHOABf (ORCPT + 99 others); Tue, 14 Aug 2018 20:01:35 -0400 Received: from mail.kernel.org ([198.145.29.99]:42996 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726648AbeHOABe (ORCPT ); Tue, 14 Aug 2018 20:01:34 -0400 Received: from gmail.com (unknown [104.132.51.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B1BF321723; Tue, 14 Aug 2018 21:12:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1534281152; bh=ZVuNt8pYAOOXAKJRwGDv1EKA/T/jmYLHc7N/KOkYK58=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ucs7vjerpBqsdkEzMZ4trLZW1oGZgjWntrJeAiDwlvhZ5tFlqgBRWqY3kWe14K4JQ JtWCBF7DcMFUlqKUSla1p7ygtOoOT2FuNUV8TKFrqp0EwI4cWKiuw4Y5duz7gUXYoL cQ5OoZ/m53wfovJKpgAOY0zHa02ykZbEYRnxsB6A= Date: Tue, 14 Aug 2018 14:12:31 -0700 From: Eric Biggers To: "Jason A. Donenfeld" Cc: Eric Biggers , Linux Crypto Mailing List , LKML , Netdev , David Miller , Andrew Lutomirski , Greg Kroah-Hartman , Samuel Neves , "Daniel J . Bernstein" , Tanja Lange , Jean-Philippe Aumasson , Karthikeyan Bhargavan Subject: Re: [PATCH v1 2/3] zinc: Introduce minimal cryptography library Message-ID: <20180814211229.GB24575@gmail.com> References: <20180801072246.GA15677@sol.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1+60 (20b17ca5) (2018-08-02) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 03, 2018 at 04:33:50AM +0200, Jason A. Donenfeld wrote: > > > Also, earlier when I tested OpenSSL's ChaCha NEON implementation on ARM > > Cortex-A7 it was actually quite a bit slower than the one in the Linux > > kernel written by Ard Biesheuvel... I trust that when claiming the > > performance of all implementations you're adding is "state-of-the-art > > and unrivaled", you actually compared them to the ones already in the > > Linux kernel which you're advocating replacing, right? :-) > > Yes, I have, and my results don't corroborate your findings. It will > be interesting to get out a wider variety of hardware for comparisons. > I suspect, also, that if the snarky emoticons subside, AndyP would be > very interested in whatever we find and could have interest in > improving implementations, should we actually find performance > differences. > On ARM Cortex-A7, OpenSSL's ChaCha20 implementation is 13.9 cpb (cycles per byte), whereas Linux's is faster: 11.9 cpb. I've also recently improved the Linux implementation to 11.3 cpb and would like to send out a patch soon... I've also written a scalar ChaCha20 implementation (no NEON instructions!) that is 12.2 cpb on one block at a time on Cortex-A7, taking advantage of the free rotates; that would be useful for the single permutation used to compute XChaCha's subkey, and also for the ends of messages. The reason Linux's ChaCha20 NEON implementation is faster than OpenSSL's is that Linux's does 4 blocks at once using NEON instructions, and the words are de-interleaved so the rows don't need to be shifted between each round. OpenSSL's implementation, on the other hand, only does 3 blocks at once with NEON instructions and has to shift the rows between each round. OpenSSL's implementation also does a 4th block at the same time using regular ARM instructions, but that doesn't help on Cortex-A7; it just makes it slower. I understand there are tradeoffs, and different implementations can be faster on different CPUs. Just know that from my point of view, switching to the OpenSSL implementation actually introduces a performance regression, and we care a *lot* about this since we need ChaCha to be absolutely as fast as possible for HPolyC disk encryption. So if your proposal goes in, I'd likely need to write a patch to get the old performance back, at least on Cortex-A7... Also, I don't know whether Andy P. considered the 4xNEON implementation technique. It could even be fastest on other ARM CPUs too, I don't know. - Eric