Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp230617pxb; Wed, 4 Nov 2020 20:56:47 -0800 (PST) X-Google-Smtp-Source: ABdhPJy6wkTV2HyMV63L6SNUhkyXm0DLzWBr21GBsIRd/KzvTxj/nWJR3G6DKL0RJ5CpmQFxjEKl X-Received: by 2002:a17:906:280a:: with SMTP id r10mr548648ejc.45.1604552207192; Wed, 04 Nov 2020 20:56:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604552207; cv=none; d=google.com; s=arc-20160816; b=woYv3FNMnrxVAVfHMdluETxbwMC+FzY7qhctnQzw0SjwHcrjx9tavGUUeGyXK9R4qn Vh976306JldEf/Hd2IUNg9KdOpbe0jFoLbwJzaNqcM4fa5Tr3GebxXkKICywV+ml/w+t LClmG3r7nTiKsrXp8PpaGVlDZ6qtvH9b2jw6NGOZMlNPBxSr4zOQHlHAx/CDfSfLEll2 4+39dsQn7a9GrOPVsu06awrJROIVgUKN08ff44Oj354D5xl4IUA+ZaixJSg+CK6x9tXZ Kox0Ep5Saa9LmkR8O1Iq8se9zTwmW2jS7p3/uJrkXtFn1gZ/TrVUagJy3pMKv0Xvq5N7 Wmrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=9pNtG0k17V6FLjH9OX65VvVQZiNPT4b0qwWqEVTtv+4=; b=xPWTgE/Ptlhf5SXjg6JZmWXMLeKHVGsKfxYj/f6izBucBZkMuevD1qCWV4Ra2nVwmg Mw4CHN9nUiVamhN5jy/8WT55UWjkvgVpHECPSCfgnv7QaugDtajNtYqgDJjnY26YVXrA 6XdBvVRsWcUmLm88iHASghG5gEG6y2dKWSj2fMtDEwktKYiceDTReu73Yc4q5ecpDDM+ 2DKe/ZBYNjfr8EKMFvcetxsaEong3rX4fj765gQ4RwhX+/d6r4S6b96Ys/3yD97ca/rt Q/LPukxSZgyLFNU+XvloOTmjRZSok3LPlPZS8g1ReNx7hjt1IR3/BpazrU1j1kgukOPt sEvw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dt26si319091ejb.284.2020.11.04.20.56.24; Wed, 04 Nov 2020 20:56:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729344AbgKECt7 (ORCPT + 99 others); Wed, 4 Nov 2020 21:49:59 -0500 Received: from szxga05-in.huawei.com ([45.249.212.191]:7056 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726185AbgKECt6 (ORCPT ); Wed, 4 Nov 2020 21:49:58 -0500 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4CRSfn0D8gzhgRL; Thu, 5 Nov 2020 10:49:53 +0800 (CST) Received: from [10.110.54.32] (10.110.54.32) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.487.0; Thu, 5 Nov 2020 10:49:51 +0800 Subject: Re: [PATCH 1/1] arm64: Accelerate Adler32 using arm64 SVE instructions. To: Eric Biggers CC: , , , , , , , References: <20201103121506.1533-1-liqiang64@huawei.com> <20201103121506.1533-2-liqiang64@huawei.com> <20201104175742.GA846@sol.localdomain> From: Li Qiang Message-ID: <2dad168c-f6cb-103c-04ce-cc3c2561e01b@huawei.com> Date: Thu, 5 Nov 2020 10:49:50 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.8.1 MIME-Version: 1.0 In-Reply-To: <20201104175742.GA846@sol.localdomain> Content-Type: text/plain; charset="gbk" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.110.54.32] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Hi Eric, ?? 2020/11/5 1:57, Eric Biggers ะด??: > On Tue, Nov 03, 2020 at 08:15:06PM +0800, l00374334 wrote: >> From: liqiang >> >> In the libz library, the checksum algorithm adler32 usually occupies >> a relatively high hot spot, and the SVE instruction set can easily >> accelerate it, so that the performance of libz library will be >> significantly improved. >> >> We can divides buf into blocks according to the bit width of SVE, >> and then uses vector registers to perform operations in units of blocks >> to achieve the purpose of acceleration. >> >> On machines that support ARM64 sve instructions, this algorithm is >> about 3~4 times faster than the algorithm implemented in C language >> in libz. The wider the SVE instruction, the better the acceleration effect. >> >> Measured on a Taishan 1951 machine that supports 256bit width SVE, >> below are the results of my measured random data of 1M and 10M: >> >> [root@xxx adler32]# ./benchmark 1000000 >> Libz alg: Time used: 608 us, 1644.7 Mb/s. >> SVE alg: Time used: 166 us, 6024.1 Mb/s. >> >> [root@xxx adler32]# ./benchmark 10000000 >> Libz alg: Time used: 6484 us, 1542.3 Mb/s. >> SVE alg: Time used: 2034 us, 4916.4 Mb/s. >> >> The blocks can be of any size, so the algorithm can automatically adapt >> to SVE hardware with different bit widths without modifying the code. >> >> >> Signed-off-by: liqiang > > Note that this patch does nothing to actually wire up the kernel's copy of libz > (lib/zlib_{deflate,inflate}/) to use this implementation of Adler32. To do so, > libz would either need to be changed to use the shash API, or you'd need to > implement an adler32() function in lib/crypto/ that automatically uses an > accelerated implementation if available, and make libz call it. > > Also, in either case a C implementation would be required too. There can't be > just an architecture-specific implementation. Okay, thank you for the problems and suggestions you gave. I will continue to improve my code. > > Also as others have pointed out, there's probably not much point in having a SVE > implementation of Adler32 when there isn't even a NEON implementation yet. It's > not too hard to implement Adler32 using NEON, and there are already several > permissively-licensed NEON implementations out there that could be used as a > reference, e.g. my implementation using NEON instrinsics here: > https://github.com/ebiggers/libdeflate/blob/v1.6/lib/arm/adler32_impl.h > > - Eric > . > I am very happy to get this NEON implementation code. :) -- Best regards, Li Qiang