Received: by 2002:ab2:69cc:0:b0:1f4:be93:e15a with SMTP id n12csp1728614lqp; Mon, 15 Apr 2024 15:46:41 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUyvQ47D/vdtcYKTSS5jrRIMoYC6UXBtz8KpFHsO3tgny0YJSdR88W/Ng4zCNlw9sQZlHPCX0JMKebI2uYgWz/1BAU5mAyBfyooQIECKg== X-Google-Smtp-Source: AGHT+IGbezh1Wv2kWwTlOaIbIJmD5qV9b6njlv1UgjGx0dDNRCx16yQvcFNLYuC65grEGlNI/w1Y X-Received: by 2002:a17:90a:d382:b0:2a4:b05b:939a with SMTP id q2-20020a17090ad38200b002a4b05b939amr7849262pju.47.1713221201130; Mon, 15 Apr 2024 15:46:41 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713221201; cv=pass; d=google.com; s=arc-20160816; b=wo25FgPJOJzHG16Iz8IUPF/TSxO1PeJfpaKWVfyECXHOj6dhivRWqDdk7jLCzezaYn 4lX1yNQQ12DlGqcAEJpekJQqBx+PhsXJ+2z5DQtNm8qH52s9Dq83sw/QhkHl3u6At3zy h1XrFn4Ezgiv+Kp5JE8eJUuZ0nuVL/BV5FVZjYBhZHcNkAYEQM2N68N35zkT5ot4Cpae pExRg9Sy/9Ae0eMnJSn+PgwVWhBs6D04nDTTV/juv+CQCgRjU69jURlxuxDYku+OEyh/ fPPty2J21vAJhWfNemMCMb/4zIz3eWnTpdrgghd/cRgpjOBoTEAJBgboQGq40P2AyMsJ TrtQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=tV8HiHN4lZqvgGBvH4pmI2aF/ObbedN9c2nholKhQjY=; fh=SRwl0RhJouLh0Z/9FxvFadlcA8vbVGPRZwkLZxcz5o8=; b=jfqgWj5qmjPKly/hvxD/fq67lTuNG4AmC5f41XnyW8uFwcU6XjBTRb/EoLjBSOEwHU 5f90oz9Wghv/Ggk/zM9V6KgbwvO+0rdkocnytir6W6shTgA32AQBrj9GOhXCdrX9TeGL 2xnnI2nb6AlggW7K6fYQA0pHWWkGFqV1V5jfExsKnmgnpuM9S2yt8mNEqWpV1jADPUW6 kfi6G1youvSr+mi3SxCtSYruPOy/AApI1CyouJjZwrYx55gUE3GBerVLGH2moUohssRq 6ALfTJsH2DIsXqU+Fc+1cl903UNGZq3zR/rbGbcEO4obTyPkhM/hCIoyEWAOYx4Uz0Iy +SJw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=FniSMvHY; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-145977-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-145977-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id nb5-20020a17090b35c500b002a705c1d19asi6264502pjb.60.2024.04.15.15.46.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Apr 2024 15:46:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-145977-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=FniSMvHY; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-145977-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-145977-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id CAC8928326B for ; Mon, 15 Apr 2024 22:46:40 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B80CB158217; Mon, 15 Apr 2024 22:46:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FniSMvHY" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC00B156225; Mon, 15 Apr 2024 22:46:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713221193; cv=none; b=DSGZX4mSw+4oYQev62zB/q8P/Vx8+KzAkaVQAj+KkfvR9Dp6w+WmFDypYXKCzji5nysKJven8Are2JNwOTA2IArAVdIJYDVILG9/0IOwZbTjPnju1qXKu+ptfp0Dc9/ROqrz1LSRRDc7LeNfEy9aR7Wm7JB+mfHGDya004MtOYc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713221193; c=relaxed/simple; bh=CUDoHqW/SodLI3NlqkpBmXp7dZ2wMZ3arGRmHuXeJ98=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=kchFqsU98sgVjYmo39cIgFJANycJ7GEB0mjpil5nLexAy4tgQr0vrLoq/4DTin4rtr1TPA4Yl9N1reKkRvfDqjcvPTE/qKJf5j3HzpKRYOAzKfG+b6Tg1US7vvsGl72qPBZB/Vpd53L07mnzxPVASGedj+V58nxIgiIiNRaKK54= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FniSMvHY; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 51C48C2BD11; Mon, 15 Apr 2024 22:46:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1713221192; bh=CUDoHqW/SodLI3NlqkpBmXp7dZ2wMZ3arGRmHuXeJ98=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=FniSMvHYQvG1sKJwHy9qHb23jRSntAZPZaX1Yv3VSxhVv8C8P66Fb8CmctAZ5dJs1 +bkWbl6zrh9hZ8HrQJe0C400qGZJxkW5uJCcyST3vpCPr9V3YD5e5TSmvJDMLhDzkg qg+c1pw2TWSGVOBUdNxoeWbuD6Z8ZhD+ZeE0urholQNWZ+c7bmI0qaQqO59ksGp/Mt 1uA4Y/9PKc1WISZt/FSpV/HVI/opkuBfhGhyX4ypZ4ZYm202ivKO1DWt3nVDCi7DIN 5aCGfVVg/XUr3DoXD/M+7RW33ycsf3v6np/MlIbpn8svRMA69T0gmE2ruzj+H6wgd0 eo9DFM7l6urgg== Date: Mon, 15 Apr 2024 15:46:29 -0700 From: Eric Biggers To: Stefan Kanthak Cc: linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 4/4] crypto: x86/sha256-ni - simplify do_4rounds Message-ID: <20240415224629.GB5206@sol.localdomain> References: <20240411162359.39073-1-ebiggers@kernel.org> <20240411162359.39073-5-ebiggers@kernel.org> <2ECD48ACEA9540C083E6B797CFD18027@H270> <20240415212121.GA5206@sol.localdomain> <65E53E4DD09F41CDA7EBCBD970E23C23@H270> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <65E53E4DD09F41CDA7EBCBD970E23C23@H270> On Tue, Apr 16, 2024 at 12:04:56AM +0200, Stefan Kanthak wrote: > "Eric Biggers" wrote: > > > On Mon, Apr 15, 2024 at 10:41:07PM +0200, Stefan Kanthak wrote: > [...] > >> At last the final change: write the macro straightforward and SIMPLE, > >> closely matching NIST.FIPS.180-4.pdf and their order of operations. > >> > >> @@ ... > >> +.macro sha256 m0 :req, m1 :req, m2 :req, m3 :req > >> +.if \@ < 4 > >> + movdqu \@*16(DATA_PTR), \m0 > >> + pshufb SHUF_MASK, \m0 # \m0 = {w(\@*16), w(\@*16+1), w(\@*16+2), w(\@*16+3)} > >> +.else > >> + # \m0 = {w(\@*16-16), w(\@*16-15), w(\@*16-14), w(\@*16-13)} > >> + # \m1 = {w(\@*16-12), w(\@*16-11), w(\@*16-10), w(\@*16-9)} > >> + # \m2 = {w(\@*16-8), w(\@*16-7), w(\@*16-6), w(\@*16-5)} > >> + # \m3 = {w(\@*16-4), w(\@*16-3), w(\@*16-2), w(\@*16-1)} > >> + sha256msg1 \m1, \m0 > >> + movdqa \m3, TMP > >> + palignr $4, \m2, TMP > >> + paddd TMP, \m0 > >> + sha256msg2 \m3, \m0 # \m0 = {w(\@*16), w(\@*16+1), w(\@*16+2), w(\@*16+3)} > >> +.endif > >> + movdqa (\@-8)*16(SHA256CONSTANTS), MSG > >> + paddd \m0, MSG > >> + sha256rnds2 STATE0, STATE1 # STATE1 = {f', e', b', a'} > >> + punpckhqdq MSG, MSG > >> + sha256rnds2 STATE1, STATE0 # STATE0 = {f", e", b", a"}, > >> + # STATE1 = {h", g", d", c"} > >> +.endm > >> > >> JFTR: you may simplify this further using .altmacro and generate \m0 to \m3 > >> as MSG%(4-\@&3), MSG%(5-\@&3), MSG%(6-\@&3) and MSG%(7-\@&3) within > >> the macro, thus getting rid of its 4 arguments. > >> > >> @@ ... > >> +.rept 4 # 4*4*4 rounds > >> + sha256 MSG0, MSG1, MSG2, MSG3 > >> + sha256 MSG1, MSG2, MSG3, MSG0 > >> + sha256 MSG2, MSG3, MSG0, MSG1 > >> + sha256 MSG3, MSG0, MSG1, MSG2 > >> +.endr > > > > Could you please send a real patch, following > > Documentation/process/submitting-patches.rst? It's hard to understand what > > you're proposing here. > > 1) I replace your macro (which unfortunately follows Tim Chens twisted code) > COMPLETELY with a clean and simple implementation: message schedule first, > update of state variables last. > You don't need ".if \i >= 12 && \i < 60"/".if \i >= 4 && \i < 52" at all! It's probably intentional that the code does the message schedule computations a bit ahead of time. This might improve performance by reducing the time spent waiting for the message schedule. It would be worth trying a few different variants on different CPUs and seeing how they actually perform in practice, though. > > 2) I replace the .irp which invokes your macro with a .rept: my macro uses \@ > instead of an argument for the round number. > > The \@ feature is a bit obscure and maybe is best avoided. - Eric