Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp229080pxb; Wed, 4 Nov 2020 20:52:39 -0800 (PST) X-Google-Smtp-Source: ABdhPJzrFHIZJNICId1sqUuA1bFJlmBunIjAckoX3Jz5o04OzqOOeIb5kO3eERmTyqcYZVS4uIP1 X-Received: by 2002:a05:6402:287:: with SMTP id l7mr750196edv.212.1604551958870; Wed, 04 Nov 2020 20:52:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604551958; cv=none; d=google.com; s=arc-20160816; b=FuHD57oQ88GGLnfroyh9EZoIA6dZfRw0QH3pjE3YBn4k+MGAYxlXv5H97q9KbHOISC HCcJ5DYYzzU8wEbZTUMkviLYV7Hqab7HZ6QdNa5KHwirvu0Ibh4C4vMZ7Tr3CAQDpAr1 ALcI03CkrA3G//kRQbXyhW7paAXHUrLDT4jMNg67TkV1hUeFN1ePPqwOEhnIQYzVLLxY UHcSoDPOpIoAyauC2PZKT7lLhJaTeWdp/D3tmAhw0RSIlGc4gNoOE/3UlObDEkmu7IsB kXz+Q5gf5Hlmaq0q26v5GDqxgDHR+NLTLsuSWWTuWn+wzwLdXH7XkhVj58zGZlkgLxNh D88g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=qEgGAdX2OGQfxAjJu7UWqBxTX/yvUO73xFr3Ztz+6kA=; b=jKJ47Cto3pX4g5xYfPjE17JDx3a/oS7f/dGMD+E5+er251BtztI9wLgGh9YvvlaSR5 dm/PGrhBCazA+kid/N4TBWmR5gyqgaHG3XEzGzRQQANRrk+nXTwHY+VkPiImk2rNMShJ cZWoG+kxetHcyclRrD/W4w+WqI0ekbVP3ka6RnqDQZzeTtjWFxpmkJrPeOKr6ZwIQC3A cuGF4MJmLjAqoC3u755Ah8pQu/IN7Cw+M06xjguGOSHd70H7YuRDDVGDPgYAivv9zpE5 GPo7F1AGo4ZnRYE1AjKj/K/aJNaIJKI6rpwlElHog9w9Bgt7xio3wNoyAoC1vh0aERkN UwPw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dt26si319091ejb.284.2020.11.04.20.52.06; Wed, 04 Nov 2020 20:52:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729490AbgKECcn (ORCPT + 99 others); Wed, 4 Nov 2020 21:32:43 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:7145 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729068AbgKECcm (ORCPT ); Wed, 4 Nov 2020 21:32:42 -0500 Received: from DGGEMS401-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4CRSGr3pZCz15Qn0; Thu, 5 Nov 2020 10:32:36 +0800 (CST) Received: from [10.110.54.32] (10.110.54.32) by DGGEMS401-HUB.china.huawei.com (10.3.19.201) with Microsoft SMTP Server id 14.3.487.0; Thu, 5 Nov 2020 10:32:35 +0800 Subject: Re: [PATCH 1/1] arm64: Accelerate Adler32 using arm64 SVE instructions. To: Dave Martin CC: Ard Biesheuvel , Alexandre Torgue , Catalin Marinas , "Linux Crypto Mailing List" , Maxime Coquelin , Will Deacon , "David S. Miller" , Linux ARM , Herbert Xu References: <20201103121506.1533-1-liqiang64@huawei.com> <20201103121506.1533-2-liqiang64@huawei.com> <20201103180031.GO6882@arm.com> <8c62099c-46b5-924f-d044-e442af4aab08@huawei.com> <20201104144914.GZ6882@arm.com> From: Li Qiang Message-ID: <99e9fc5a-986a-98bf-ca5f-44b896e0759d@huawei.com> Date: Thu, 5 Nov 2020 10:32:34 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.8.1 MIME-Version: 1.0 In-Reply-To: <20201104144914.GZ6882@arm.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.110.54.32] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org 在 2020/11/4 22:49, Dave Martin 写道: > On Wed, Nov 04, 2020 at 05:19:18PM +0800, Li Qiang wrote: ... >>> >>> I haven't tried to understand this algorithm in detail, but there should >>> probably be no need for this special case to handle the trailing bytes. >>> >>> You should search for examples of speculative vectorization using >>> WHILELO etc., to get a better feel for how to do this. >> >> Yes, I have considered this problem, but I have not found a good way to achieve it, >> because before the end of the loop is reached, the decreasing sequence used for >> calculation is determined. >> >> For example, buf is divided into 32-byte blocks. This sequence should be 32,31,...,2,1, >> if there are only 10 bytes left at the end of the loop, then this sequence >> should be 10,9,8,...,2,1. >> >> If I judge whether the end of the loop has been reached in the body of the loop, >> and reset the starting point of the sequence according to the length of the tail, >> it does not seem very good. > > That would indeed be inefficient, since the adjustment is only needed on > the last iteration. > > Can you do instead do the adjustment after the loop ends? > > For example, if > > y = x[n] * 32 + x[n+1] * 31 + x[n+2] * 30 ... > > then > > y - (x[n] * 22 + x[n+1] * 22 + x[n+2] * 22 ...) > > equals > > x[n] + 10 + x[n+1] * 9 + x[n+2] * 8 + ,,, > > (This isn't exactly what the algorithm demands, but hopefully you see the > general idea.) > > [...] > > Cheers > ---Dave > . > This idea seems feasible, so that the judgment can be made only once after the end of the loop, and the extra part is subtracted, and there is no need to enter another loop to process the trailing bytes. I will try this solution later. Thank you! :) -- Best regards, Li Qiang