Received: by 2002:ab2:7988:0:b0:1f4:b336:87c4 with SMTP id g8csp96902lqj; Thu, 11 Apr 2024 10:50:01 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWC/ATt0SZrkCz33oJgfxDWXfG0741nG1IvpiVRsvDOB/LwwVMSpOrRSlghDrIn9ATMm1EHKUiXRusMKr+Qg+0AoLL64/ujkW6xqQwtRg== X-Google-Smtp-Source: AGHT+IEENBG4Vx1ixUgJYzgoKQEm0c1m/LzS0EVvEc0U17X4mp1gImIdrmpIvYMxCFG5ho2ggEDB X-Received: by 2002:a05:620a:22a3:b0:78d:6a00:c49c with SMTP id p3-20020a05620a22a300b0078d6a00c49cmr361859qkh.25.1712857800604; Thu, 11 Apr 2024 10:50:00 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712857800; cv=pass; d=google.com; s=arc-20160816; b=LumrPBWhj0dupGPF0dpyaQ263ggXT+wXmbR+jHn1ZvCq2ImP+epBRxS5Dp3r9B0uPq oC6qkvNL0r2SCrK5efeYTHHQZbWMU4fnKyRUKhE1UJ8s/bcrjm64HybfnI917a2bg97f OYYmeChtUd+UZGXmrBaQYbOSNwrYPjS4ADxMl/g0KvReDlK10n2+NUiT+JUP6nB/alQ0 +vG+7wGfHPt5NMHiNQKF7ZZO2MJF2ZOH4Hup4uLVpLoOoUCc3moNn9CATVGNKsqLYSk5 XswSfv/vgrd9TSTwMHxhUI0Efem91EfVPH2PQ2Dx5CQTc6L+CzmFqizAAFFLty9I6uUT dikQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=Q7dRSx20t7Rp7vapnR2761C3syzV8AQwndni3wZ7BKk=; fh=GzGjMSjjHsiY8Ddbjf5l4YI4VQaOr35lVK1wCa5AhEs=; b=aVa7+liw0cfNfyCn5IaenL4Ppj4ySVOjHCAum0IGh6eUvt/UyzypeIssUTQo64WiAS EAGsxGrmSd8GL0StgOGJWK4NfBr3suXdrN9ClS6KPyWfntfkIzUisPqwGMDX4mOWDP86 JxZxiFr/oW7fLtqV/NWDs+Hi9sN4v+hbioM07EpJ30dDLh0w+8PZlVCRRAJ3ntk+2pEX 3EIUuJfPrJAjq2mxpgCSMH8TsHoJ58NEFQBsToLCT2LxIeGY6hBvtbMj5bgk3361gKzx SmF2btFqNIMnECAclpRwkRJNSfDOvLotAR+O4OxsnqVHIpjrLlOMExqkMvYPkcwlDP/C TRKw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HFV62PX2; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-141317-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-141317-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id i25-20020a05620a145900b0078d6cc21128si1908297qkl.618.2024.04.11.10.50.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Apr 2024 10:50:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-141317-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HFV62PX2; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-141317-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-141317-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 56BCE1C23897 for ; Thu, 11 Apr 2024 17:50:00 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6F9F178C8D; Thu, 11 Apr 2024 16:25:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="HFV62PX2" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 877EA3B182; Thu, 11 Apr 2024 16:25:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712852745; cv=none; b=MdhJOt9lEpGvkUFyST70NilJ/jVYBZP03mbbHNNTbbRQg1Gwx+39oWPDQt3NqI8a37HeFX45MEUQQ7L52/FxQnsbDRiEctqvriiWe0Im1T3aR8QWwtv+5mIm+tQhbe6gvTTt4RWgiLCKnKt6Qv34c0AIl6tcATDvY/RhBS5vwrk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712852745; c=relaxed/simple; bh=ZzfV+EqcBie72foxnU6KRYOCGyM6B01njeHXtkgYybc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=O1jJdqswbUxAH6xtwSCbvozQpE0meRGyNyxgKxXa12Vw1dOPU/vH7BWefhGaKh5rHfL/mYuu19PCbh7eqll3suze4s71jJL4yJUA0sDgb3YsaCWShPZX+5HGmMV5xXyG3rIsLY5kRs5SjZUEp6YlS5eQH7Jj2WWs7DlBLrHJuk4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=HFV62PX2; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 504D8C2BBFC; Thu, 11 Apr 2024 16:25:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712852745; bh=ZzfV+EqcBie72foxnU6KRYOCGyM6B01njeHXtkgYybc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HFV62PX29RmMjKa3NzIcILdjgk/KHlbOztnGagt1jHkHCycv5Biyfrvn6ulX3wI6E 9vL6ODjD/PbETVJJ/uANKykXhAjZEZiOeIHNC+6nJSzDTOta1bkXu3yh8Tg7PCdP78 7dpysnZcWYiBe3e+coGfIGSNqB0UGOeizvHukmbsFAa0f82Bnrvumrmp76SMs/HNEZ rMUkMWK/p81YRVPcQ5hZzSlYXoHDtv1tRHlgSTjGCSd8eADAhxnejwIaQcHlujhqrG tXVyz3iNhHeiW3IfdFlizrbj5gt9fFCsisAsuS2Wzy8cKYVKRr94FH0B+cjFBnQqx2 2coZWUXyab2IA== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Stefan Kanthak Subject: [PATCH v2 3/4] crypto: x86/sha256-ni - optimize code size Date: Thu, 11 Apr 2024 09:23:58 -0700 Message-ID: <20240411162359.39073-4-ebiggers@kernel.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240411162359.39073-1-ebiggers@kernel.org> References: <20240411162359.39073-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Eric Biggers - Load the SHA-256 round constants relative to a pointer that points into the middle of the constants rather than to the beginning. Since x86 instructions use signed offsets, this decreases the instruction length required to access some of the later round constants. - Use punpcklqdq or punpckhqdq instead of longer instructions such as pshufd, pblendw, and palignr. This doesn't harm performance. The end result is that sha256_ni_transform shrinks from 839 bytes to 791 bytes, with no loss in performance. Suggested-by: Stefan Kanthak Signed-off-by: Eric Biggers --- arch/x86/crypto/sha256_ni_asm.S | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/x86/crypto/sha256_ni_asm.S b/arch/x86/crypto/sha256_ni_asm.S index b7e7001dafdf..ffc9f1c75c15 100644 --- a/arch/x86/crypto/sha256_ni_asm.S +++ b/arch/x86/crypto/sha256_ni_asm.S @@ -82,19 +82,19 @@ pshufb SHUF_MASK, MSG movdqa MSG, \m0 .else movdqa \m0, MSG .endif - paddd \i*4(SHA256CONSTANTS), MSG + paddd (\i-32)*4(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 .if \i >= 12 && \i < 60 movdqa \m0, TMP palignr $4, \m3, TMP paddd TMP, \m1 sha256msg2 \m0, \m1 .endif - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 .if \i >= 4 && \i < 52 sha256msg1 \m0, \m3 .endif .endm @@ -126,21 +126,21 @@ SYM_TYPED_FUNC_START(sha256_ni_transform) /* * load initial hash values * Need to reorder these appropriately * DCBA, HGFE -> ABEF, CDGH */ - movdqu 0*16(DIGEST_PTR), STATE0 - movdqu 1*16(DIGEST_PTR), STATE1 + movdqu 0*16(DIGEST_PTR), STATE0 /* DCBA */ + movdqu 1*16(DIGEST_PTR), STATE1 /* HGFE */ - pshufd $0xB1, STATE0, STATE0 /* CDAB */ - pshufd $0x1B, STATE1, STATE1 /* EFGH */ movdqa STATE0, TMP - palignr $8, STATE1, STATE0 /* ABEF */ - pblendw $0xF0, TMP, STATE1 /* CDGH */ + punpcklqdq STATE1, STATE0 /* FEBA */ + punpckhqdq TMP, STATE1 /* DCHG */ + pshufd $0x1B, STATE0, STATE0 /* ABEF */ + pshufd $0xB1, STATE1, STATE1 /* CDGH */ movdqa PSHUFFLE_BYTE_FLIP_MASK(%rip), SHUF_MASK - lea K256(%rip), SHA256CONSTANTS + lea K256+32*4(%rip), SHA256CONSTANTS .Lloop0: /* Save hash values for addition after rounds */ movdqa STATE0, ABEF_SAVE movdqa STATE1, CDGH_SAVE @@ -160,18 +160,18 @@ SYM_TYPED_FUNC_START(sha256_ni_transform) add $64, DATA_PTR cmp NUM_BLKS, DATA_PTR jne .Lloop0 /* Write hash values back in the correct order */ - pshufd $0x1B, STATE0, STATE0 /* FEBA */ - pshufd $0xB1, STATE1, STATE1 /* DCHG */ movdqa STATE0, TMP - pblendw $0xF0, STATE1, STATE0 /* DCBA */ - palignr $8, TMP, STATE1 /* HGFE */ + punpcklqdq STATE1, STATE0 /* GHEF */ + punpckhqdq TMP, STATE1 /* ABCD */ + pshufd $0xB1, STATE0, STATE0 /* HGFE */ + pshufd $0x1B, STATE1, STATE1 /* DCBA */ - movdqu STATE0, 0*16(DIGEST_PTR) - movdqu STATE1, 1*16(DIGEST_PTR) + movdqu STATE1, 0*16(DIGEST_PTR) + movdqu STATE0, 1*16(DIGEST_PTR) .Ldone_hash: RET SYM_FUNC_END(sha256_ni_transform) -- 2.44.0