Received: by 2002:ab2:3350:0:b0:1f4:6588:b3a7 with SMTP id o16csp2024344lqe; Tue, 9 Apr 2024 07:34:45 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCX6jykYuJldH96q+Glvz636il7a/DAzDxQFKIBEkKTRitSLHx2lASjNX1Pw0rIkIrXBcDZcZlSgY5fd/L/RqvOE/Whg+QsR17JFEsb1Fw== X-Google-Smtp-Source: AGHT+IEJdpurufGSUPHOnfkiyX6fC8s0beUQdpNfv3bv7inDaUTRDj130w/SkzzE5hl2VAWSiNwu X-Received: by 2002:a05:6a21:3d85:b0:1a7:a8ea:794c with SMTP id bj5-20020a056a213d8500b001a7a8ea794cmr10265pzc.18.1712673284878; Tue, 09 Apr 2024 07:34:44 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712673284; cv=pass; d=google.com; s=arc-20160816; b=DSVg5UcKoz5MlnbIpRmaBX+eYzkyo+d5WNB0YRPepPFeOjCJ2F/LfLgOZxRA6BXPLa qDWabu8wPPUr6PDdyki17Wi1uxO7XdG+1ce0xxd0isRaGKUMAD5pHczjq7Wc1lsaLh3M S0duzE3j1ewCKjsfQf78DKYGKnkE/8zWI940GwCGxzty6S1wEz508jC6M4ZD2H3uf7iw keEcD3zncd1san/1aHHcZkmQVciL78//Tho11lB56+XrPbupj+X4ZgoqkkhMaWMACyuO giDTlIdgVZT7kwY+g8eRhdSFKcwXSWyzofb3+1H+su7b4j0TRfJ6ShttTbSJxpyczzd1 AfMg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:organization:date:subject:cc:to :from:dkim-signature; bh=ex5q/rKxiClRhuNx+/psXbPzdEsLjYS4wEnJr37eQlc=; fh=wXTqhI4uuRFRAXBybgx+FOtlQ63Ips+Q8L7FRJ2w3gM=; b=m8vZ8zrWnL4IrwZqGatQhJxHAaKIqd/r/JXbg/yNGTtwX7oOLf2/i3Tfs2D6qtUSZj LKrKV4/epTFveCRWizEcRbz3lax4V69q47Xh5dduSsqkDQczj/VjucTKWpa0Mu1YyI4s KXILcv0K4x+bmj4CFI0TwCHqLBb+c0klmjCVyc7JzhRM0k8MpO7qi6wqSPiAqtvKAQAK gHcaiqr9AW6dBrqXkAk0F1HqN79VmST5QRo9YTNl4ITc6s62834Gh9h/p1SUNyYCmZ5l v9wbV5lDRbOHLTcgB0NNwLs9GzxCo9LReEsYopgPIyTn/O6+VeMV3aCoYmG+RVzSvRuC E84Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@nexgo.de header.s=vfde-mb-mr2-23sep header.b=XUKiTUDK; arc=pass (i=1 spf=pass spfdomain=nexgo.de dkim=pass dkdomain=nexgo.de); spf=pass (google.com: domain of linux-crypto+bounces-3406-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-crypto+bounces-3406-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id c4-20020a633504000000b005d9b49b7ad6si8513447pga.775.2024.04.09.07.34.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Apr 2024 07:34:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto+bounces-3406-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@nexgo.de header.s=vfde-mb-mr2-23sep header.b=XUKiTUDK; arc=pass (i=1 spf=pass spfdomain=nexgo.de dkim=pass dkdomain=nexgo.de); spf=pass (google.com: domain of linux-crypto+bounces-3406-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-crypto+bounces-3406-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 23BD5B254A9 for ; Mon, 8 Apr 2024 14:17:55 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2948A126F3D; Mon, 8 Apr 2024 14:17:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=nexgo.de header.i=@nexgo.de header.b="XUKiTUDK" X-Original-To: linux-crypto@vger.kernel.org Received: from mr3.vodafonemail.de (mr3.vodafonemail.de [145.253.228.163]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDB8185C62; Mon, 8 Apr 2024 14:17:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=145.253.228.163 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712585863; cv=none; b=O0/cenGmZSxnVDXYF1s8pxF0DrZ78DbmUJb2ZR7uHIOBVAlbLLTEF9FR56QSxseo5eEbLObgzzQ9eJjs0WfxaBwZYf/pSKCATP7hKTpOHhf7EWxtBpoumHoABx+B+MgqVst3rgkXo2tDEVckK5oua2uaqOelrGYivhtrl5QS6qM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712585863; c=relaxed/simple; bh=8YqxLBOsMzMuFEOR2Ah2D/5AjtW1EpVMQ54Y10ln54I=; h=From:To:Cc:Subject:Date:MIME-Version:Content-Type; b=C0xFCM1u0v/ksBOXnjlX/4mOEE8xfDeCbhYAHPX7eDlmePmZAcTxoxC5RtUaOR/6ESOcWKqFDXaPzEuSGf2v4Vpj/6n2RmXP0Kd+JY21/I9mejyrtu7tX6CnATpXMmK63geZUWkvEk+xL5fS9t5XgJpfldUCoJwNw6jZYqDqltI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=nexgo.de; spf=pass smtp.mailfrom=nexgo.de; dkim=pass (1024-bit key) header.d=nexgo.de header.i=@nexgo.de header.b=XUKiTUDK; arc=none smtp.client-ip=145.253.228.163 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=nexgo.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=nexgo.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nexgo.de; s=vfde-mb-mr2-23sep; t=1712585858; bh=ex5q/rKxiClRhuNx+/psXbPzdEsLjYS4wEnJr37eQlc=; h=From:To:Subject:Date:Content-Type:From; b=XUKiTUDKt0NPY8lG+Ir3Zj1c81LgMDwP68CyHZShmSlmFinWHM0HDnrfb2jrlYMlA BAnFluqYUuoVntnRKB8n0hwBw3QPM1vp6LaiPSWfKmz8SOQrIqJdMigEMWJSr74/yt +gWjkgCWUg1JxWsZHaSBdnt8xZVLHeQ8cDdmRIDw= Received: from smtp.vodafone.de (unknown [10.0.0.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by mr3.vodafonemail.de (Postfix) with ESMTPS id 4VCrlQ71h1z1yDK; Mon, 8 Apr 2024 14:17:38 +0000 (UTC) Received: from sha256_ni_asm.patch (p5de6d4c4.dip0.t-ipconnect.de [93.230.212.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by smtp.vodafone.de (Postfix) with ESMTPSA id 4VCrl60pyGzHpxh; Mon, 8 Apr 2024 14:17:19 +0000 (UTC) From: "Stefan Kanthak" To: , Cc: , "Eric Biggers" Subject: [PATCH v2 1/2] crypto: s(h)aving 40+ bytes off arch/x86/crypto/sha256_ni_asm.S Date: Mon, 8 Apr 2024 16:08:52 +0200 Organization: Me, myself & IT Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-purgate-type: clean X-purgate: clean X-purgate-size: 7521 X-purgate-ID: 155817::1712585854-F2FFBA77-283C08E7/0/0 Message-Id: <20240408141744.2948A126F3D@smtp.subspace.kernel.org> Use shorter SSE2 instructions instead of some SSE4.1 use short displacements into K256 --- -/arch/x86/crypto/sha256_ni_asm.S +++ +/arch/x86/crypto/sha256_ni_asm.S @@ -108,17 +108,17 @@ * Need to reorder these appropriately * DCBA, HGFE -> ABEF, CDGH */ - movdqu 0*16(DIGEST_PTR), STATE0 - movdqu 1*16(DIGEST_PTR), STATE1 + movdqu 0*16(DIGEST_PTR), STATE0 /* DCBA */ + movdqu 1*16(DIGEST_PTR), STATE1 /* HGFE */ - pshufd $0xB1, STATE0, STATE0 /* CDAB */ - pshufd $0x1B, STATE1, STATE1 /* EFGH */ movdqa STATE0, MSGTMP4 - palignr $8, STATE1, STATE0 /* ABEF */ - pblendw $0xF0, MSGTMP4, STATE1 /* CDGH */ + punpcklqdq STATE1, STATE0 /* FEBA */ + punpckhqdq MSGTMP4, STATE1 /* DCHG */ + pshufd $0x1B, STATE0, STATE0 /* ABEF */ + pshufd $0xB1, STATE1, STATE1 /* CDGH */ movdqa PSHUFFLE_BYTE_FLIP_MASK(%rip), SHUF_MASK - lea K256(%rip), SHA256CONSTANTS + lea K256+8*16(%rip), SHA256CONSTANTS .Lloop0: /* Save hash values for addition after rounds */ @@ -129,18 +129,18 @@ movdqu 0*16(DATA_PTR), MSG pshufb SHUF_MASK, MSG movdqa MSG, MSGTMP0 - paddd 0*16(SHA256CONSTANTS), MSG + paddd -8*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 /* Rounds 4-7 */ movdqu 1*16(DATA_PTR), MSG pshufb SHUF_MASK, MSG movdqa MSG, MSGTMP1 - paddd 1*16(SHA256CONSTANTS), MSG + paddd -7*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 sha256msg1 MSGTMP1, MSGTMP0 @@ -148,9 +148,9 @@ movdqu 2*16(DATA_PTR), MSG pshufb SHUF_MASK, MSG movdqa MSG, MSGTMP2 - paddd 2*16(SHA256CONSTANTS), MSG + paddd -6*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 sha256msg1 MSGTMP2, MSGTMP1 @@ -158,151 +158,151 @@ movdqu 3*16(DATA_PTR), MSG pshufb SHUF_MASK, MSG movdqa MSG, MSGTMP3 - paddd 3*16(SHA256CONSTANTS), MSG + paddd -5*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 movdqa MSGTMP3, MSGTMP4 palignr $4, MSGTMP2, MSGTMP4 paddd MSGTMP4, MSGTMP0 sha256msg2 MSGTMP3, MSGTMP0 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 sha256msg1 MSGTMP3, MSGTMP2 /* Rounds 16-19 */ movdqa MSGTMP0, MSG - paddd 4*16(SHA256CONSTANTS), MSG + paddd -4*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 movdqa MSGTMP0, MSGTMP4 palignr $4, MSGTMP3, MSGTMP4 paddd MSGTMP4, MSGTMP1 sha256msg2 MSGTMP0, MSGTMP1 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 sha256msg1 MSGTMP0, MSGTMP3 /* Rounds 20-23 */ movdqa MSGTMP1, MSG - paddd 5*16(SHA256CONSTANTS), MSG + paddd -3*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 movdqa MSGTMP1, MSGTMP4 palignr $4, MSGTMP0, MSGTMP4 paddd MSGTMP4, MSGTMP2 sha256msg2 MSGTMP1, MSGTMP2 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 sha256msg1 MSGTMP1, MSGTMP0 /* Rounds 24-27 */ movdqa MSGTMP2, MSG - paddd 6*16(SHA256CONSTANTS), MSG + paddd -2*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 movdqa MSGTMP2, MSGTMP4 palignr $4, MSGTMP1, MSGTMP4 paddd MSGTMP4, MSGTMP3 sha256msg2 MSGTMP2, MSGTMP3 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 sha256msg1 MSGTMP2, MSGTMP1 /* Rounds 28-31 */ movdqa MSGTMP3, MSG - paddd 7*16(SHA256CONSTANTS), MSG + paddd -1*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 movdqa MSGTMP3, MSGTMP4 palignr $4, MSGTMP2, MSGTMP4 paddd MSGTMP4, MSGTMP0 sha256msg2 MSGTMP3, MSGTMP0 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 sha256msg1 MSGTMP3, MSGTMP2 /* Rounds 32-35 */ movdqa MSGTMP0, MSG - paddd 8*16(SHA256CONSTANTS), MSG + paddd 0*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 movdqa MSGTMP0, MSGTMP4 palignr $4, MSGTMP3, MSGTMP4 paddd MSGTMP4, MSGTMP1 sha256msg2 MSGTMP0, MSGTMP1 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 sha256msg1 MSGTMP0, MSGTMP3 /* Rounds 36-39 */ movdqa MSGTMP1, MSG - paddd 9*16(SHA256CONSTANTS), MSG + paddd 1*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 movdqa MSGTMP1, MSGTMP4 palignr $4, MSGTMP0, MSGTMP4 paddd MSGTMP4, MSGTMP2 sha256msg2 MSGTMP1, MSGTMP2 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 sha256msg1 MSGTMP1, MSGTMP0 /* Rounds 40-43 */ movdqa MSGTMP2, MSG - paddd 10*16(SHA256CONSTANTS), MSG + paddd 2*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 movdqa MSGTMP2, MSGTMP4 palignr $4, MSGTMP1, MSGTMP4 paddd MSGTMP4, MSGTMP3 sha256msg2 MSGTMP2, MSGTMP3 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 sha256msg1 MSGTMP2, MSGTMP1 /* Rounds 44-47 */ movdqa MSGTMP3, MSG - paddd 11*16(SHA256CONSTANTS), MSG + paddd 3*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 movdqa MSGTMP3, MSGTMP4 palignr $4, MSGTMP2, MSGTMP4 paddd MSGTMP4, MSGTMP0 sha256msg2 MSGTMP3, MSGTMP0 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 sha256msg1 MSGTMP3, MSGTMP2 /* Rounds 48-51 */ movdqa MSGTMP0, MSG - paddd 12*16(SHA256CONSTANTS), MSG + paddd 4*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 movdqa MSGTMP0, MSGTMP4 palignr $4, MSGTMP3, MSGTMP4 paddd MSGTMP4, MSGTMP1 sha256msg2 MSGTMP0, MSGTMP1 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 sha256msg1 MSGTMP0, MSGTMP3 /* Rounds 52-55 */ movdqa MSGTMP1, MSG - paddd 13*16(SHA256CONSTANTS), MSG + paddd 5*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 movdqa MSGTMP1, MSGTMP4 palignr $4, MSGTMP0, MSGTMP4 paddd MSGTMP4, MSGTMP2 sha256msg2 MSGTMP1, MSGTMP2 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 /* Rounds 56-59 */ movdqa MSGTMP2, MSG - paddd 14*16(SHA256CONSTANTS), MSG + paddd 6*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 movdqa MSGTMP2, MSGTMP4 palignr $4, MSGTMP1, MSGTMP4 paddd MSGTMP4, MSGTMP3 sha256msg2 MSGTMP2, MSGTMP3 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 /* Rounds 60-63 */ movdqa MSGTMP3, MSG - paddd 15*16(SHA256CONSTANTS), MSG + paddd 7*16(SHA256CONSTANTS), MSG sha256rnds2 STATE0, STATE1 - pshufd $0x0E, MSG, MSG + punpckhqdq MSG, MSG sha256rnds2 STATE1, STATE0 /* Add current hash values with previously saved */ @@ -315,11 +315,11 @@ jne .Lloop0 /* Write hash values back in the correct order */ - pshufd $0x1B, STATE0, STATE0 /* FEBA */ - pshufd $0xB1, STATE1, STATE1 /* DCHG */ movdqa STATE1, MSGTMP4 - pblendw $0xF0, STATE1, STATE0 /* DCBA */ - palignr $8, MSGTMP4, STATE1 /* HGFE */ + punpcklqdq STATE0, STATE1 /* EFGH */ + punpckhqdq MSGTMP4, STATE0 /* CDAB */ + pshufd $0x1B, STATE0, STATE0 /* HGFE */ + pshufd $0xB1, STATE1, STATE1 /* DCBA */ movdqu STATE0, 0*16(DIGEST_PTR) movdqu STATE1, 1*16(DIGEST_PTR)