Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp1196879ybp; Fri, 11 Oct 2019 10:22:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqzUGB9whRVZjfkHzy9aYJxb//LkM1P2QFREy3qhOgLi6vfSJYCCksRUUArJLWdHOJK1oEDp X-Received: by 2002:a05:6402:149a:: with SMTP id e26mr14583019edv.123.1570814523567; Fri, 11 Oct 2019 10:22:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570814523; cv=none; d=google.com; s=arc-20160816; b=wWosqJce4smAtz/aZlC0NMgX9b4R2+ajZ0InjBCP5LywynSU2JZEttC9CuM3tXPnRH Y/lv3ilje+eK7QGkSkXTCBCzZ/iW+I7Ao8ssh3SNZv8IoyIy1Iw0p8tQJsSR5wNyTKea 3M6ybvwILuHqw5tYjFrarr+9M9HkUdOVZvYVAsDYCnlE4+grud5+VBlp3NQYIFp6WSou Z/OGC4zdhlZiI8aLuUDLqhQ/x9z/FUCVbn4DNCeSBhG8IkHXAaqniWaMgmSV0belm2XT 8cs0jULOlvictvfLWi3EyNV6pBvOctaUWhM5thY6rzYDNxD519PP8otgw5+ay9iYScVU e5ow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-disposition:mime-version:user-agent:in-reply-to:references :subject:cc:to:from:message-id:date:dkim-signature:dkim-filter; bh=VKBpyBcunIosn/sC6Z7RuILrN+OdZB/cUALWtgo/tQ0=; b=JKVeOSQ4np6/2+vCnGOyP2+cJA6E2a6yU3xUHmM8/M8VyEj0O4OwmNn9b8j0vZ//eB pTi1Vui3la4wfUPBUN6AQ2vrmVH76fwcuuDkq3Dh3B+YF4yFMdY2/Xu35nm4Uu69Sgl2 ir734I+Y0ZCBOi/VY/EXlrbtz+qjtpOPLYIwvE8grMn1hSg386f+94Q4HSk8r1nBocQg 9YFk+azbwhGN3ZrgJVL+oS7nNFetImgXRIyav0b+uXm2IuyhjI8xSh1b+474JRU3ULLE //uD7/awkIAHKalmN/VNrGfm+kxJKbk53e9/FIWlh0JMQ6J/+iN3lzQRGeh8RVLez/hb HdWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@vdorst.com header.s=default header.b=V00Nwr6b; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=vdorst.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i56si6076622eda.19.2019.10.11.10.21.37; Fri, 11 Oct 2019 10:22:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@vdorst.com header.s=default header.b=V00Nwr6b; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=vdorst.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728449AbfJKRVf (ORCPT + 99 others); Fri, 11 Oct 2019 13:21:35 -0400 Received: from mx.0dd.nl ([5.2.79.48]:51016 "EHLO mx.0dd.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728086AbfJKRVf (ORCPT ); Fri, 11 Oct 2019 13:21:35 -0400 Received: from mail.vdorst.com (mail.vdorst.com [IPv6:fd01::250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx.0dd.nl (Postfix) with ESMTPS id 68AB55FBD4; Fri, 11 Oct 2019 19:21:33 +0200 (CEST) Authentication-Results: mx.0dd.nl; dkim=pass (2048-bit key; secure) header.d=vdorst.com header.i=@vdorst.com header.b="V00Nwr6b"; dkim-atps=neutral Received: from www (www.vdorst.com [192.168.2.222]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.vdorst.com (Postfix) with ESMTPSA id 20AA24079D; Fri, 11 Oct 2019 19:21:33 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.11.0 mail.vdorst.com 20AA24079D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vdorst.com; s=default; t=1570814493; bh=VKBpyBcunIosn/sC6Z7RuILrN+OdZB/cUALWtgo/tQ0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=V00Nwr6ba5hXAKpjMIg1vBWdiUuEw0xfxsNTNFfMsQCzBbOeB18mFKmnxTpgWdP1n 6quBG0JByxK2Zs39hDxvNgTg+AaIymZ9QLGx4ZqU9pMy7eWn1RgFndM5/UhRqKN9Oj 1UmKs1bSK/eevWBif8B1UzNQ7KLQKIFfreaUPYTR7x0lRJzKZiZlmq9kCa09sZKIYR nw/41Ks5Mn48lYDwGYU/IZHuG7Pp4ClkadtT/ytnOtwxVlTwFQlBWR59T8FehQesOo XNXhqIXdCsX8F9njztaNIiWewxxjBCq1EHy0r1o4OqSGkoIywGpeD2pN1ixJDkNu/R rfshSdQOd0qlQ== Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by www.vdorst.com (Horde Framework) with HTTPS; Fri, 11 Oct 2019 17:21:33 +0000 Date: Fri, 11 Oct 2019 17:21:33 +0000 Message-ID: <20191011172133.Horde.sxiyClHzSJAUvHtYJdMQEbN@www.vdorst.com> From: =?utf-8?b?UmVuw6k=?= van Dorst To: Andy Polyakov Cc: Ard Biesheuvel , linux-crypto@vger.kernel.org, Herbert Xu , David Miller , "Jason A . Donenfeld" , Samuel Neves , Arnd Bergmann , Eric Biggers , Andy Lutomirski , Martin Willi Subject: Re: [PATCH v3 19/29] crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized implementation References: <20191007164610.6881-1-ard.biesheuvel@linaro.org> <20191007164610.6881-20-ard.biesheuvel@linaro.org> <20191007210242.Horde.FiSEhRSAuhKHgFx9ROLFIco@www.vdorst.com> In-Reply-To: User-Agent: Horde Application Framework 5 Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: 8bit Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Hi Andy, Quoting Andy Polyakov : > Hi, > > On 10/8/19 1:38 PM, Andy Polyakov wrote: >>>> >>> >>> Hi Ard, >>> >>> Is it also an option to include my mip32r2 optimized poly1305 version? >>> >>> Below the results which shows a good improvement over the Andy Polyakov >>> version. >>> I swapped the poly1305 assembly file and rename the function to >>> _mips >>> Full WireGuard source with the changes [0] >>> >>> bytes | RvD | openssl | delta | delta / openssl >>> ... >>> 4096 | 9160 | 11755 | -2595 | -22,08% > > Update is pushed to cryptogams. Thanks to René for ideas, feedback and > testing! There is even a question about supporting DSP ASE, let's > discuss details off-list first. > Thanks! I see that you have found an other spot to save 1 cycle. Last results: poly1305: 4096 bytes, 188.671 MB/sec, 9066 cycles I also wonder if we can also replace the "li $x, -4" and "and $x" with "sll $x" combination on other places like [0], also on line 1169? Replace this on line 1169, works on my device. - li $in0,-4 srl $ctx,$tmp4,2 - and $in0,$in0,$tmp4 andi $tmp4,$tmp4,3 + sll $in0, $ctx, 2 addu $ctx,$ctx,$in0 > As for multiply-by-1-n-add. > >> I assume that the presented results depict regression after switch to >> cryptogams module. Right? RvD implementation distinguishes itself in two >> ways: >> >> 1. some of additions in inner loop are replaced with multiply-by-1-n-add; >> ... >> >> I recall attempting 1. and chosen not to do it with following rationale. >> On processor I have access to, Octeon II, it made no significant >> difference. It was better, but only marginally. And it's understandable, >> because Octeon II should have lesser difficulty pairing those additions >> with multiply-n-add instructions. But since multiplication is an >> expensive operation, it can be pretty slow, I reckoned that on processor >> less potent than Octeon II it might be more appropriate to minimize >> amount of multiplication-n-add instructions. > > As an example, MIPS 1004K manual discusses that that there are two > options for multiplier for this core, proper and poor-man's. Proper > multiplier unit can issue multiplication or multiplication-n-add each > cycle, with multiplication latency apparently being 4. Poor-man's unit > on the other hand can issue multiplication each 32nd[!] cycle with > corresponding latency. This means that core with poor-man's unit would > perform ~13% worse than it could have been. Updated module does use > multiply-by-1-n-add, so this note is effectively for reference in case > "poor man" wonders. > > Cheers. Thanks for this information. I wonder how many devices do exist with the "poor man" version. Greats, René [0]: https://github.com/dot-asm/cryptogams/blob/d22ade312a7af958ec955620b0d241cf42c37feb/mips/poly1305-mips.pl#L461