Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp1517249pxb; Wed, 4 Nov 2020 11:03:21 -0800 (PST) X-Google-Smtp-Source: ABdhPJyKk72XigLKKVtUWypt6+rz9tBIv/X5SYZtfi0/HUWh/pz0S6k8eo+Twxv7kAvEHK+s0TUm X-Received: by 2002:a17:906:a186:: with SMTP id s6mr15059922ejy.193.1604516600882; Wed, 04 Nov 2020 11:03:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604516600; cv=none; d=google.com; s=arc-20160816; b=JW5/C2DEcTnteMEyvbwiRzEUWoH9kfqS0qnYFH+QZNwYwFZ0T/a3Iwk012gZkHpspC 3f0KcKEJug48IY8uWM8kRlGuPz9I2T1/CKmzmQMoFKi6B7dP4GXay8mJ5lJneF7ZAznL /ZYiSoKBbGZI+d3rKdUTwxPukTsVf8/1+1Ag8U35CReDso51WvE2waEDkKWqEZP6HDn6 RddxyHVRDGDy8cHK+izC28OvlidN8f5/n29Sk6GMdkzCs5eqbS+JLaMFl4hEGHbOqn+3 VXECaMuWGsmuj+qL6C0osvW90uWFiAPWZEMeZLfxTh5E1p9tclThPXwNScAaXiT5Dm3O AkXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=BTRNMXR4w/fI/QR8glLwWnCF17AQ5AYQfqLQ8d9grjc=; b=pES4AwrdHl3fTGD5bCjxdk0Eh+rdcr+1Xg3DIc2iQx8praytL+YW3kGDfPCO3lUMPy Ns4VFDyZDxn729rhHmB1S+g1f4T/BTq7pBgh5dnlDstVOisNm/7Mqet8XsuJHjDKuODq YxrRka5ePvRmW+sRGBBOg+/pH+zf67sJzOXdsunOM4xJCG3eJ/JsHgxkdj9f/vwDYSxm Y0zrL+j0fuBWk6Neij6GbxHI/FNpxiiVb//5J3SBnhZf9/TB2sqohmXUl477IqVPtUPU GgnnaSGgZrwxpoU4xFSQFRa7zBPImMmWc7kjmeFFQ+AoIpT+EdWnXkGUTWxnKyur3EGp vEJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="GUw/u0M0"; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z2si2089504ejc.253.2020.11.04.11.02.38; Wed, 04 Nov 2020 11:03:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="GUw/u0M0"; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731043AbgKDR5q (ORCPT + 99 others); Wed, 4 Nov 2020 12:57:46 -0500 Received: from mail.kernel.org ([198.145.29.99]:38206 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726608AbgKDR5p (ORCPT ); Wed, 4 Nov 2020 12:57:45 -0500 Received: from sol.localdomain (172-10-235-113.lightspeed.sntcca.sbcglobal.net [172.10.235.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 604B920639; Wed, 4 Nov 2020 17:57:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1604512664; bh=xW2DCv1bVj8xZGAEExQ8YdJt58u6JujrKWAdykKMBT8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=GUw/u0M0wg3yZno9YljUV9hR5ZAxyDxdNF2C6Ss9DHqGp7RMQWyA+lCUSGL9/sewd bjHoLOQawHNfYTX6Dkm1T4VDytSDbbZFTcs4ao69FBZq6BIdQJ52+jshAfL4IfyLf/ TfTdHPpqxkd3yvrJF8AcJcj5DDIPfMr7YbegNZIc= Date: Wed, 4 Nov 2020 09:57:42 -0800 From: Eric Biggers To: l00374334 Cc: herbert@gondor.apana.org.au, davem@davemloft.net, catalin.marinas@arm.com, will@kernel.org, mcoquelin.stm32@gmail.com, alexandre.torgue@st.com, linux-arm-kernel@lists.infradead.org, linux-crypto@vger.kernel.org Subject: Re: [PATCH 1/1] arm64: Accelerate Adler32 using arm64 SVE instructions. Message-ID: <20201104175742.GA846@sol.localdomain> References: <20201103121506.1533-1-liqiang64@huawei.com> <20201103121506.1533-2-liqiang64@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201103121506.1533-2-liqiang64@huawei.com> Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Tue, Nov 03, 2020 at 08:15:06PM +0800, l00374334 wrote: > From: liqiang > > In the libz library, the checksum algorithm adler32 usually occupies > a relatively high hot spot, and the SVE instruction set can easily > accelerate it, so that the performance of libz library will be > significantly improved. > > We can divides buf into blocks according to the bit width of SVE, > and then uses vector registers to perform operations in units of blocks > to achieve the purpose of acceleration. > > On machines that support ARM64 sve instructions, this algorithm is > about 3~4 times faster than the algorithm implemented in C language > in libz. The wider the SVE instruction, the better the acceleration effect. > > Measured on a Taishan 1951 machine that supports 256bit width SVE, > below are the results of my measured random data of 1M and 10M: > > [root@xxx adler32]# ./benchmark 1000000 > Libz alg: Time used: 608 us, 1644.7 Mb/s. > SVE alg: Time used: 166 us, 6024.1 Mb/s. > > [root@xxx adler32]# ./benchmark 10000000 > Libz alg: Time used: 6484 us, 1542.3 Mb/s. > SVE alg: Time used: 2034 us, 4916.4 Mb/s. > > The blocks can be of any size, so the algorithm can automatically adapt > to SVE hardware with different bit widths without modifying the code. > > > Signed-off-by: liqiang Note that this patch does nothing to actually wire up the kernel's copy of libz (lib/zlib_{deflate,inflate}/) to use this implementation of Adler32. To do so, libz would either need to be changed to use the shash API, or you'd need to implement an adler32() function in lib/crypto/ that automatically uses an accelerated implementation if available, and make libz call it. Also, in either case a C implementation would be required too. There can't be just an architecture-specific implementation. Also as others have pointed out, there's probably not much point in having a SVE implementation of Adler32 when there isn't even a NEON implementation yet. It's not too hard to implement Adler32 using NEON, and there are already several permissively-licensed NEON implementations out there that could be used as a reference, e.g. my implementation using NEON instrinsics here: https://github.com/ebiggers/libdeflate/blob/v1.6/lib/arm/adler32_impl.h - Eric