Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp12890638rwl; Tue, 3 Jan 2023 23:47:49 -0800 (PST) X-Google-Smtp-Source: AMrXdXtNb6QEmO/QMpybFUu5sMGu9zS7tvHFydnArTKfpvKUwfj4jttYJbSBfGJvIONMwGYtgOKX X-Received: by 2002:a17:906:2859:b0:7c1:32:3574 with SMTP id s25-20020a170906285900b007c100323574mr32452938ejc.12.1672818469428; Tue, 03 Jan 2023 23:47:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672818469; cv=none; d=google.com; s=arc-20160816; b=XTZeEOlP0RVXmdD+pvGNQ6jdgpW30jWSsxCORLXfaMwcqmO6yJYARTwuMPvBGAu+Zr kD6dcuCjmtNIwsqsQzVjBWwb0v9rTK0Jhd+plijGwt8QlbEnw+sr9l2/gt7l0/Zivvp9 PsCy/ubKo8vUpCEsfZxeXX0jQl035njR9jGZ6ScuC0E5D/qmwmv7sbJPcVevG1HhLUrB fLUkcx1mdQSDW+oW2UbbUBB8dJykUuSZ74Gl/45rYxgGQ7l9Y5rW4w+5C0lQbPNTK+nz stKnA9wyDfzxvpe6EGiFeiX5EJi79UiMGnt3SqxnSC7ofFDqhit8xjeR8ocmoM+oA144 3dYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=wpko4Zt92/q6WQ22SC07fGVwIgEM13ApzmO0qf8hFjM=; b=IN+y1hy++hpBbEToBLix41UuaZhc6pWUCEVCRRPEie/yHDYrJij4g8PSFohYVWB3MR S25YlwirZKoiDYQQ/BFPQRk2UBzt+Da1WjkMuzfRzI+inbfz4Vpu4Oqp5jl9gdhiL2H9 b9z9oJPDFYBebX8MuV3n3ObVG/s7k4SMnS8xrtficlcvOXBatgP2w3SYe8MEr+Lmi6GN EiGk8TCbzvuoRRkuLadXkyR0YJ1HxDyk954npY8cIDkYyp0U97KCBOU6Upzj9n2KvkSi XPkvzQPTjj/RgUVzECIrd2G6OBaEGHq50IVB+KRMlaS0hKRjrPpo9oLjF4MF9kOIkYZU HGlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=BUXr8OEb; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hr35-20020a1709073fa300b007c1633cea0csi31610333ejc.513.2023.01.03.23.47.08; Tue, 03 Jan 2023 23:47:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=BUXr8OEb; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233767AbjADHrF (ORCPT + 99 others); Wed, 4 Jan 2023 02:47:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233721AbjADHqt (ORCPT ); Wed, 4 Jan 2023 02:46:49 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3F5CE36 for ; Tue, 3 Jan 2023 23:46:48 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 6CEEAB81203 for ; Wed, 4 Jan 2023 07:46:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 169F2C433D2; Wed, 4 Jan 2023 07:46:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672818406; bh=eZ4F0Vs4hiX60OmPCpMb00jhDPIHpJ7/GIt2QNFF9nQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=BUXr8OEbEKq7c3348o7O/SgXTSn5m+j9xtbWZa059jYhiq/xl3AiGATLVCi77VRuO 028FVZwEbHRimvUih8PW/BS+optm0pOUIp+Yipskim9+ssENT42AoLEiVY/5rQbl3t AuJ8L4z0s2AmeCnulYve/umAE37gbLd29NuwQGT3uxkrsMru6eRonAlLNVSVszdoia 162a54dgdrYbAejmgoJLhycGUK5wFTOKDN9IWKfvn96vIhpdf3AWRLTzxQjjEkuILV AxhriNVud/4RAWCNdLZpNNR3qAGFG+9R3caQC33x4wt+ub0yRdJAAZ4W4ekd3F4pab GTZGjZ+wfxGng== Date: Tue, 3 Jan 2023 23:46:44 -0800 From: Eric Biggers To: Lukasz Stelmach Cc: Herbert Xu , "David S. Miller" , linux-crypto@vger.kernel.org Subject: Re: xor_blocks() assumptions Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Tue, Jan 03, 2023 at 12:13:30PM +0100, Lukasz Stelmach wrote: > > It also would be worth considering just optimizing crypto_xor() by > > unrolling the word-at-a-time loop to 4x or so. > > If I understand correctly the generic 8regs and 32regs implementations > in include/asm-generic/xor.h are what you mean. Using xor_blocks() in > crypto_xor() could enable them for free on architectures lacking SIMD or > vector instructions. I actually meant exactly what I said -- unrolling the word-at-a-time loop in crypto_xor(). Not using xor_blocks(). Something like this: diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h index 61b327206b557..c0b90f14cae18 100644 --- a/include/crypto/algapi.h +++ b/include/crypto/algapi.h @@ -167,7 +167,18 @@ static inline void crypto_xor(u8 *dst, const u8 *src, unsigned int size) unsigned long *s = (unsigned long *)src; unsigned long l; - while (size > 0) { + while (size >= 4 * sizeof(unsigned long)) { + l = get_unaligned(d) ^ get_unaligned(s++); + put_unaligned(l, d++); + l = get_unaligned(d) ^ get_unaligned(s++); + put_unaligned(l, d++); + l = get_unaligned(d) ^ get_unaligned(s++); + put_unaligned(l, d++); + l = get_unaligned(d) ^ get_unaligned(s++); + put_unaligned(l, d++); + size -= 4 * sizeof(unsigned long); + } + if (size > 0) { l = get_unaligned(d) ^ get_unaligned(s++); put_unaligned(l, d++); size -= sizeof(unsigned long); Actually, the compiler might unroll the loop automatically anyway, so even the above change might not even be necessary. The point is, I expect that a proper scalar implementation will perform well for pretty much anything other than large input sizes. It's only large input sizes where xor_blocks() might be worth it, considering the significant overhead of the indirect call in xor_blocks() as well as entering an SIMD code section. (Note that indirect calls are very expensive these days, due to the speculative execution mitigations.) Of course, the real question is what real-world scenario are you actually trying to optimize for... - Eric