2021-12-22 04:50:29

by Tianjia Zhang

[permalink] [raw]
Subject: [PATCH v2 0/6] Introduce x86 assembly accelerated implementation for SM3 algorithm

This series of patches creates an stand-alone library for SM3 hash
algorithm in the lib/crypto directory, and makes the implementations
that originally depended on sm3-generic depend on the stand-alone SM3
library, which also includes sm3-generic itself.

On this basis, the AVX assembly acceleration implementation of SM3
algorithm is introduced, the main algorithm implementation based on
SM3 AES/BMI2 accelerated work by libgcrypt at:
https://gnupg.org/software/libgcrypt/index.html

From the performance benchmark data, the performance improvement of
SM3 algorithm after AVX optimization can reach up to 38%.

---
v2 changes:
- x86/sm3: Change K macros to signed decimal and use LEA and 32-bit offset

Tianjia Zhang (6):
crypto: sm3 - create SM3 stand-alone library
crypto: arm64/sm3-ce - make dependent on sm3 library
crypto: sm2 - make dependent on sm3 library
crypto: sm3 - make dependent on sm3 library
crypto: x86/sm3 - add AVX assembly implementation
crypto: tcrypt - add asynchronous speed test for SM3

arch/arm64/crypto/Kconfig | 2 +-
arch/arm64/crypto/sm3-ce-glue.c | 20 +-
arch/x86/crypto/Makefile | 3 +
arch/x86/crypto/sm3-avx-asm_64.S | 517 +++++++++++++++++++++++++++++++
arch/x86/crypto/sm3_avx_glue.c | 134 ++++++++
crypto/Kconfig | 16 +-
crypto/sm2.c | 38 +--
crypto/sm3_generic.c | 142 +--------
crypto/tcrypt.c | 14 +-
include/crypto/sm3.h | 35 ++-
lib/crypto/Kconfig | 3 +
lib/crypto/Makefile | 3 +
lib/crypto/sm3.c | 246 +++++++++++++++
13 files changed, 1007 insertions(+), 166 deletions(-)
create mode 100644 arch/x86/crypto/sm3-avx-asm_64.S
create mode 100644 arch/x86/crypto/sm3_avx_glue.c
create mode 100644 lib/crypto/sm3.c

--
2.32.0



2021-12-22 04:50:33

by Tianjia Zhang

[permalink] [raw]
Subject: [PATCH v2 1/6] crypto: sm3 - create SM3 stand-alone library

Stand-alone implementation of the SM3 algorithm. It is designed
to have as little dependencies as possible. In other cases you
should generally use the hash APIs from include/crypto/hash.h.
Especially when hashing large amounts of data as those APIs may
be hw-accelerated. In the new SM3 stand-alone library,
sm3_compress() has also been optimized, instead of simply using
the code in sm3_generic.

Signed-off-by: Tianjia Zhang <[email protected]>
---
include/crypto/sm3.h | 31 ++++++
lib/crypto/Kconfig | 3 +
lib/crypto/Makefile | 3 +
lib/crypto/sm3.c | 246 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 283 insertions(+)
create mode 100644 lib/crypto/sm3.c

diff --git a/include/crypto/sm3.h b/include/crypto/sm3.h
index 42ea21289ba9..315adaf38007 100644
--- a/include/crypto/sm3.h
+++ b/include/crypto/sm3.h
@@ -1,5 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Common values for SM3 algorithm
+ *
+ * Copyright (C) 2017 ARM Limited or its affiliates.
+ * Copyright (C) 2021 Tianjia Zhang <[email protected]>
*/

#ifndef _CRYPTO_SM3_H
@@ -39,4 +43,31 @@ extern int crypto_sm3_final(struct shash_desc *desc, u8 *out);

extern int crypto_sm3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *hash);
+
+/*
+ * Stand-alone implementation of the SM3 algorithm. It is designed to
+ * have as little dependencies as possible so it can be used in the
+ * kexec_file purgatory. In other cases you should generally use the
+ * hash APIs from include/crypto/hash.h. Especially when hashing large
+ * amounts of data as those APIs may be hw-accelerated.
+ *
+ * For details see lib/crypto/sm3.c
+ */
+
+static inline void sm3_init(struct sm3_state *sctx)
+{
+ sctx->state[0] = SM3_IVA;
+ sctx->state[1] = SM3_IVB;
+ sctx->state[2] = SM3_IVC;
+ sctx->state[3] = SM3_IVD;
+ sctx->state[4] = SM3_IVE;
+ sctx->state[5] = SM3_IVF;
+ sctx->state[6] = SM3_IVG;
+ sctx->state[7] = SM3_IVH;
+ sctx->count = 0;
+}
+
+void sm3_update(struct sm3_state *sctx, const u8 *data, unsigned int len);
+void sm3_final(struct sm3_state *sctx, u8 *out);
+
#endif
diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index 545ccbddf6a1..36963e8f4eaa 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -129,5 +129,8 @@ config CRYPTO_LIB_CHACHA20POLY1305
config CRYPTO_LIB_SHA256
tristate

+config CRYPTO_LIB_SM3
+ tristate
+
config CRYPTO_LIB_SM4
tristate
diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index 73205ed269ba..8149bc00b627 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -38,6 +38,9 @@ libpoly1305-y += poly1305.o
obj-$(CONFIG_CRYPTO_LIB_SHA256) += libsha256.o
libsha256-y := sha256.o

+obj-$(CONFIG_CRYPTO_LIB_SM3) += libsm3.o
+libsm3-y := sm3.o
+
obj-$(CONFIG_CRYPTO_LIB_SM4) += libsm4.o
libsm4-y := sm4.o

diff --git a/lib/crypto/sm3.c b/lib/crypto/sm3.c
new file mode 100644
index 000000000000..d473e358a873
--- /dev/null
+++ b/lib/crypto/sm3.c
@@ -0,0 +1,246 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * SM3 secure hash, as specified by OSCCA GM/T 0004-2012 SM3 and described
+ * at https://datatracker.ietf.org/doc/html/draft-sca-cfrg-sm3-02
+ *
+ * Copyright (C) 2017 ARM Limited or its affiliates.
+ * Copyright (C) 2017 Gilad Ben-Yossef <[email protected]>
+ * Copyright (C) 2021 Tianjia Zhang <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <asm/unaligned.h>
+#include <crypto/sm3.h>
+
+static const u32 ____cacheline_aligned K[64] = {
+ 0x79cc4519, 0xf3988a32, 0xe7311465, 0xce6228cb,
+ 0x9cc45197, 0x3988a32f, 0x7311465e, 0xe6228cbc,
+ 0xcc451979, 0x988a32f3, 0x311465e7, 0x6228cbce,
+ 0xc451979c, 0x88a32f39, 0x11465e73, 0x228cbce6,
+ 0x9d8a7a87, 0x3b14f50f, 0x7629ea1e, 0xec53d43c,
+ 0xd8a7a879, 0xb14f50f3, 0x629ea1e7, 0xc53d43ce,
+ 0x8a7a879d, 0x14f50f3b, 0x29ea1e76, 0x53d43cec,
+ 0xa7a879d8, 0x4f50f3b1, 0x9ea1e762, 0x3d43cec5,
+ 0x7a879d8a, 0xf50f3b14, 0xea1e7629, 0xd43cec53,
+ 0xa879d8a7, 0x50f3b14f, 0xa1e7629e, 0x43cec53d,
+ 0x879d8a7a, 0x0f3b14f5, 0x1e7629ea, 0x3cec53d4,
+ 0x79d8a7a8, 0xf3b14f50, 0xe7629ea1, 0xcec53d43,
+ 0x9d8a7a87, 0x3b14f50f, 0x7629ea1e, 0xec53d43c,
+ 0xd8a7a879, 0xb14f50f3, 0x629ea1e7, 0xc53d43ce,
+ 0x8a7a879d, 0x14f50f3b, 0x29ea1e76, 0x53d43cec,
+ 0xa7a879d8, 0x4f50f3b1, 0x9ea1e762, 0x3d43cec5
+};
+
+/*
+ * Transform the message X which consists of 16 32-bit-words. See
+ * GM/T 004-2012 for details.
+ */
+#define R(i, a, b, c, d, e, f, g, h, t, w1, w2) \
+ do { \
+ ss1 = rol32((rol32((a), 12) + (e) + (t)), 7); \
+ ss2 = ss1 ^ rol32((a), 12); \
+ d += FF ## i(a, b, c) + ss2 + ((w1) ^ (w2)); \
+ h += GG ## i(e, f, g) + ss1 + (w1); \
+ b = rol32((b), 9); \
+ f = rol32((f), 19); \
+ h = P0((h)); \
+ } while (0)
+
+#define R1(a, b, c, d, e, f, g, h, t, w1, w2) \
+ R(1, a, b, c, d, e, f, g, h, t, w1, w2)
+#define R2(a, b, c, d, e, f, g, h, t, w1, w2) \
+ R(2, a, b, c, d, e, f, g, h, t, w1, w2)
+
+#define FF1(x, y, z) (x ^ y ^ z)
+#define FF2(x, y, z) ((x & y) | (x & z) | (y & z))
+
+#define GG1(x, y, z) FF1(x, y, z)
+#define GG2(x, y, z) ((x & y) | (~x & z))
+
+/* Message expansion */
+#define P0(x) ((x) ^ rol32((x), 9) ^ rol32((x), 17))
+#define P1(x) ((x) ^ rol32((x), 15) ^ rol32((x), 23))
+#define I(i) (W[i] = get_unaligned_be32(data + i * 4))
+#define W1(i) (W[i & 0x0f])
+#define W2(i) (W[i & 0x0f] = \
+ P1(W[i & 0x0f] \
+ ^ W[(i-9) & 0x0f] \
+ ^ rol32(W[(i-3) & 0x0f], 15)) \
+ ^ rol32(W[(i-13) & 0x0f], 7) \
+ ^ W[(i-6) & 0x0f])
+
+static void sm3_transform(struct sm3_state *sctx, u8 const *data, u32 W[16])
+{
+ u32 a, b, c, d, e, f, g, h, ss1, ss2;
+
+ a = sctx->state[0];
+ b = sctx->state[1];
+ c = sctx->state[2];
+ d = sctx->state[3];
+ e = sctx->state[4];
+ f = sctx->state[5];
+ g = sctx->state[6];
+ h = sctx->state[7];
+
+ R1(a, b, c, d, e, f, g, h, K[0], I(0), I(4));
+ R1(d, a, b, c, h, e, f, g, K[1], I(1), I(5));
+ R1(c, d, a, b, g, h, e, f, K[2], I(2), I(6));
+ R1(b, c, d, a, f, g, h, e, K[3], I(3), I(7));
+ R1(a, b, c, d, e, f, g, h, K[4], W1(4), I(8));
+ R1(d, a, b, c, h, e, f, g, K[5], W1(5), I(9));
+ R1(c, d, a, b, g, h, e, f, K[6], W1(6), I(10));
+ R1(b, c, d, a, f, g, h, e, K[7], W1(7), I(11));
+ R1(a, b, c, d, e, f, g, h, K[8], W1(8), I(12));
+ R1(d, a, b, c, h, e, f, g, K[9], W1(9), I(13));
+ R1(c, d, a, b, g, h, e, f, K[10], W1(10), I(14));
+ R1(b, c, d, a, f, g, h, e, K[11], W1(11), I(15));
+ R1(a, b, c, d, e, f, g, h, K[12], W1(12), W2(16));
+ R1(d, a, b, c, h, e, f, g, K[13], W1(13), W2(17));
+ R1(c, d, a, b, g, h, e, f, K[14], W1(14), W2(18));
+ R1(b, c, d, a, f, g, h, e, K[15], W1(15), W2(19));
+
+ R2(a, b, c, d, e, f, g, h, K[16], W1(16), W2(20));
+ R2(d, a, b, c, h, e, f, g, K[17], W1(17), W2(21));
+ R2(c, d, a, b, g, h, e, f, K[18], W1(18), W2(22));
+ R2(b, c, d, a, f, g, h, e, K[19], W1(19), W2(23));
+ R2(a, b, c, d, e, f, g, h, K[20], W1(20), W2(24));
+ R2(d, a, b, c, h, e, f, g, K[21], W1(21), W2(25));
+ R2(c, d, a, b, g, h, e, f, K[22], W1(22), W2(26));
+ R2(b, c, d, a, f, g, h, e, K[23], W1(23), W2(27));
+ R2(a, b, c, d, e, f, g, h, K[24], W1(24), W2(28));
+ R2(d, a, b, c, h, e, f, g, K[25], W1(25), W2(29));
+ R2(c, d, a, b, g, h, e, f, K[26], W1(26), W2(30));
+ R2(b, c, d, a, f, g, h, e, K[27], W1(27), W2(31));
+ R2(a, b, c, d, e, f, g, h, K[28], W1(28), W2(32));
+ R2(d, a, b, c, h, e, f, g, K[29], W1(29), W2(33));
+ R2(c, d, a, b, g, h, e, f, K[30], W1(30), W2(34));
+ R2(b, c, d, a, f, g, h, e, K[31], W1(31), W2(35));
+
+ R2(a, b, c, d, e, f, g, h, K[32], W1(32), W2(36));
+ R2(d, a, b, c, h, e, f, g, K[33], W1(33), W2(37));
+ R2(c, d, a, b, g, h, e, f, K[34], W1(34), W2(38));
+ R2(b, c, d, a, f, g, h, e, K[35], W1(35), W2(39));
+ R2(a, b, c, d, e, f, g, h, K[36], W1(36), W2(40));
+ R2(d, a, b, c, h, e, f, g, K[37], W1(37), W2(41));
+ R2(c, d, a, b, g, h, e, f, K[38], W1(38), W2(42));
+ R2(b, c, d, a, f, g, h, e, K[39], W1(39), W2(43));
+ R2(a, b, c, d, e, f, g, h, K[40], W1(40), W2(44));
+ R2(d, a, b, c, h, e, f, g, K[41], W1(41), W2(45));
+ R2(c, d, a, b, g, h, e, f, K[42], W1(42), W2(46));
+ R2(b, c, d, a, f, g, h, e, K[43], W1(43), W2(47));
+ R2(a, b, c, d, e, f, g, h, K[44], W1(44), W2(48));
+ R2(d, a, b, c, h, e, f, g, K[45], W1(45), W2(49));
+ R2(c, d, a, b, g, h, e, f, K[46], W1(46), W2(50));
+ R2(b, c, d, a, f, g, h, e, K[47], W1(47), W2(51));
+
+ R2(a, b, c, d, e, f, g, h, K[48], W1(48), W2(52));
+ R2(d, a, b, c, h, e, f, g, K[49], W1(49), W2(53));
+ R2(c, d, a, b, g, h, e, f, K[50], W1(50), W2(54));
+ R2(b, c, d, a, f, g, h, e, K[51], W1(51), W2(55));
+ R2(a, b, c, d, e, f, g, h, K[52], W1(52), W2(56));
+ R2(d, a, b, c, h, e, f, g, K[53], W1(53), W2(57));
+ R2(c, d, a, b, g, h, e, f, K[54], W1(54), W2(58));
+ R2(b, c, d, a, f, g, h, e, K[55], W1(55), W2(59));
+ R2(a, b, c, d, e, f, g, h, K[56], W1(56), W2(60));
+ R2(d, a, b, c, h, e, f, g, K[57], W1(57), W2(61));
+ R2(c, d, a, b, g, h, e, f, K[58], W1(58), W2(62));
+ R2(b, c, d, a, f, g, h, e, K[59], W1(59), W2(63));
+ R2(a, b, c, d, e, f, g, h, K[60], W1(60), W2(64));
+ R2(d, a, b, c, h, e, f, g, K[61], W1(61), W2(65));
+ R2(c, d, a, b, g, h, e, f, K[62], W1(62), W2(66));
+ R2(b, c, d, a, f, g, h, e, K[63], W1(63), W2(67));
+
+ sctx->state[0] ^= a;
+ sctx->state[1] ^= b;
+ sctx->state[2] ^= c;
+ sctx->state[3] ^= d;
+ sctx->state[4] ^= e;
+ sctx->state[5] ^= f;
+ sctx->state[6] ^= g;
+ sctx->state[7] ^= h;
+}
+#undef R
+#undef R1
+#undef R2
+#undef I
+#undef W1
+#undef W2
+
+static inline void sm3_block(struct sm3_state *sctx,
+ u8 const *data, int blocks, u32 W[16])
+{
+ while (blocks--) {
+ sm3_transform(sctx, data, W);
+ data += SM3_BLOCK_SIZE;
+ }
+}
+
+void sm3_update(struct sm3_state *sctx, const u8 *data, unsigned int len)
+{
+ unsigned int partial = sctx->count % SM3_BLOCK_SIZE;
+ u32 W[16];
+
+ sctx->count += len;
+
+ if ((partial + len) >= SM3_BLOCK_SIZE) {
+ int blocks;
+
+ if (partial) {
+ int p = SM3_BLOCK_SIZE - partial;
+
+ memcpy(sctx->buffer + partial, data, p);
+ data += p;
+ len -= p;
+
+ sm3_block(sctx, sctx->buffer, 1, W);
+ }
+
+ blocks = len / SM3_BLOCK_SIZE;
+ len %= SM3_BLOCK_SIZE;
+
+ if (blocks) {
+ sm3_block(sctx, data, blocks, W);
+ data += blocks * SM3_BLOCK_SIZE;
+ }
+
+ memzero_explicit(W, sizeof(W));
+
+ partial = 0;
+ }
+ if (len)
+ memcpy(sctx->buffer + partial, data, len);
+}
+EXPORT_SYMBOL_GPL(sm3_update);
+
+void sm3_final(struct sm3_state *sctx, u8 *out)
+{
+ const int bit_offset = SM3_BLOCK_SIZE - sizeof(u64);
+ __be64 *bits = (__be64 *)(sctx->buffer + bit_offset);
+ __be32 *digest = (__be32 *)out;
+ unsigned int partial = sctx->count % SM3_BLOCK_SIZE;
+ u32 W[16];
+ int i;
+
+ sctx->buffer[partial++] = 0x80;
+ if (partial > bit_offset) {
+ memset(sctx->buffer + partial, 0, SM3_BLOCK_SIZE - partial);
+ partial = 0;
+
+ sm3_block(sctx, sctx->buffer, 1, W);
+ }
+
+ memset(sctx->buffer + partial, 0, bit_offset - partial);
+ *bits = cpu_to_be64(sctx->count << 3);
+ sm3_block(sctx, sctx->buffer, 1, W);
+
+ for (i = 0; i < 8; i++)
+ put_unaligned_be32(sctx->state[i], digest++);
+
+ /* Zeroize sensitive information. */
+ memzero_explicit(W, sizeof(W));
+ memzero_explicit(sctx, sizeof(*sctx));
+}
+EXPORT_SYMBOL_GPL(sm3_final);
+
+MODULE_DESCRIPTION("Generic SM3 library");
+MODULE_LICENSE("GPL v2");
--
2.32.0


2021-12-22 04:50:37

by Tianjia Zhang

[permalink] [raw]
Subject: [PATCH v2 3/6] crypto: sm2 - make dependent on sm3 library

SM3 generic library is stand-alone implementation, it is necessary
for the calculation of sm2 z digest to depends on SM3 library
instead of sm3-generic.

Signed-off-by: Tianjia Zhang <[email protected]>
---
crypto/Kconfig | 2 +-
crypto/sm2.c | 38 +++++++++++++++++++-------------------
2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 01b9ca0836a5..60b252975dc4 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -267,7 +267,7 @@ config CRYPTO_ECRDSA

config CRYPTO_SM2
tristate "SM2 algorithm"
- select CRYPTO_SM3
+ select CRYPTO_LIB_SM3
select CRYPTO_AKCIPHER
select CRYPTO_MANAGER
select MPILIB
diff --git a/crypto/sm2.c b/crypto/sm2.c
index db8a4a265669..ae3f77a66070 100644
--- a/crypto/sm2.c
+++ b/crypto/sm2.c
@@ -13,7 +13,7 @@
#include <crypto/internal/akcipher.h>
#include <crypto/akcipher.h>
#include <crypto/hash.h>
-#include <crypto/sm3_base.h>
+#include <crypto/sm3.h>
#include <crypto/rng.h>
#include <crypto/sm2.h>
#include "sm2signature.asn1.h"
@@ -213,7 +213,7 @@ int sm2_get_signature_s(void *context, size_t hdrlen, unsigned char tag,
return 0;
}

-static int sm2_z_digest_update(struct shash_desc *desc,
+static int sm2_z_digest_update(struct sm3_state *sctx,
MPI m, unsigned int pbytes)
{
static const unsigned char zero[32];
@@ -226,20 +226,20 @@ static int sm2_z_digest_update(struct shash_desc *desc,

if (inlen < pbytes) {
/* padding with zero */
- crypto_sm3_update(desc, zero, pbytes - inlen);
- crypto_sm3_update(desc, in, inlen);
+ sm3_update(sctx, zero, pbytes - inlen);
+ sm3_update(sctx, in, inlen);
} else if (inlen > pbytes) {
/* skip the starting zero */
- crypto_sm3_update(desc, in + inlen - pbytes, pbytes);
+ sm3_update(sctx, in + inlen - pbytes, pbytes);
} else {
- crypto_sm3_update(desc, in, inlen);
+ sm3_update(sctx, in, inlen);
}

kfree(in);
return 0;
}

-static int sm2_z_digest_update_point(struct shash_desc *desc,
+static int sm2_z_digest_update_point(struct sm3_state *sctx,
MPI_POINT point, struct mpi_ec_ctx *ec, unsigned int pbytes)
{
MPI x, y;
@@ -249,8 +249,8 @@ static int sm2_z_digest_update_point(struct shash_desc *desc,
y = mpi_new(0);

if (!mpi_ec_get_affine(x, y, point, ec) &&
- !sm2_z_digest_update(desc, x, pbytes) &&
- !sm2_z_digest_update(desc, y, pbytes))
+ !sm2_z_digest_update(sctx, x, pbytes) &&
+ !sm2_z_digest_update(sctx, y, pbytes))
ret = 0;

mpi_free(x);
@@ -265,7 +265,7 @@ int sm2_compute_z_digest(struct crypto_akcipher *tfm,
struct mpi_ec_ctx *ec = akcipher_tfm_ctx(tfm);
uint16_t bits_len;
unsigned char entl[2];
- SHASH_DESC_ON_STACK(desc, NULL);
+ struct sm3_state sctx;
unsigned int pbytes;

if (id_len > (USHRT_MAX / 8) || !ec->Q)
@@ -278,17 +278,17 @@ int sm2_compute_z_digest(struct crypto_akcipher *tfm,
pbytes = MPI_NBYTES(ec->p);

/* ZA = H256(ENTLA | IDA | a | b | xG | yG | xA | yA) */
- sm3_base_init(desc);
- crypto_sm3_update(desc, entl, 2);
- crypto_sm3_update(desc, id, id_len);
-
- if (sm2_z_digest_update(desc, ec->a, pbytes) ||
- sm2_z_digest_update(desc, ec->b, pbytes) ||
- sm2_z_digest_update_point(desc, ec->G, ec, pbytes) ||
- sm2_z_digest_update_point(desc, ec->Q, ec, pbytes))
+ sm3_init(&sctx);
+ sm3_update(&sctx, entl, 2);
+ sm3_update(&sctx, id, id_len);
+
+ if (sm2_z_digest_update(&sctx, ec->a, pbytes) ||
+ sm2_z_digest_update(&sctx, ec->b, pbytes) ||
+ sm2_z_digest_update_point(&sctx, ec->G, ec, pbytes) ||
+ sm2_z_digest_update_point(&sctx, ec->Q, ec, pbytes))
return -EINVAL;

- crypto_sm3_final(desc, dgst);
+ sm3_final(&sctx, dgst);
return 0;
}
EXPORT_SYMBOL(sm2_compute_z_digest);
--
2.32.0


2021-12-22 04:50:37

by Tianjia Zhang

[permalink] [raw]
Subject: [PATCH v2 2/6] crypto: arm64/sm3-ce - make dependent on sm3 library

SM3 generic library is stand-alone implementation, sm3-ce can depend
on the SM3 library instead of sm3-generic.

Signed-off-by: Tianjia Zhang <[email protected]>
---
arch/arm64/crypto/Kconfig | 2 +-
arch/arm64/crypto/sm3-ce-glue.c | 20 ++++++++++++++------
2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index addfa413650b..2a965aa0188d 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -45,7 +45,7 @@ config CRYPTO_SM3_ARM64_CE
tristate "SM3 digest algorithm (ARMv8.2 Crypto Extensions)"
depends on KERNEL_MODE_NEON
select CRYPTO_HASH
- select CRYPTO_SM3
+ select CRYPTO_LIB_SM3

config CRYPTO_SM4_ARM64_CE
tristate "SM4 symmetric cipher (ARMv8.2 Crypto Extensions)"
diff --git a/arch/arm64/crypto/sm3-ce-glue.c b/arch/arm64/crypto/sm3-ce-glue.c
index d71faca322f2..3198f31c9446 100644
--- a/arch/arm64/crypto/sm3-ce-glue.c
+++ b/arch/arm64/crypto/sm3-ce-glue.c
@@ -27,7 +27,7 @@ static int sm3_ce_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
if (!crypto_simd_usable())
- return crypto_sm3_update(desc, data, len);
+ return sm3_update(shash_desc_ctx(desc), data, len);

kernel_neon_begin();
sm3_base_do_update(desc, data, len, sm3_ce_transform);
@@ -39,7 +39,7 @@ static int sm3_ce_update(struct shash_desc *desc, const u8 *data,
static int sm3_ce_final(struct shash_desc *desc, u8 *out)
{
if (!crypto_simd_usable())
- return crypto_sm3_finup(desc, NULL, 0, out);
+ return sm3_final(shash_desc_ctx(desc), out);

kernel_neon_begin();
sm3_base_do_finalize(desc, sm3_ce_transform);
@@ -51,14 +51,22 @@ static int sm3_ce_final(struct shash_desc *desc, u8 *out)
static int sm3_ce_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- if (!crypto_simd_usable())
- return crypto_sm3_finup(desc, data, len, out);
+ if (!crypto_simd_usable()) {
+ struct sm3_state *sctx = shash_desc_ctx(desc);
+
+ if (len)
+ sm3_update(sctx, data, len);
+ sm3_final(sctx, out);
+ return 0;
+ }

kernel_neon_begin();
- sm3_base_do_update(desc, data, len, sm3_ce_transform);
+ if (len)
+ sm3_base_do_update(desc, data, len, sm3_ce_transform);
+ sm3_base_do_finalize(desc, sm3_ce_transform);
kernel_neon_end();

- return sm3_ce_final(desc, out);
+ return sm3_base_finish(desc, out);
}

static struct shash_alg sm3_alg = {
--
2.32.0


2021-12-22 04:50:42

by Tianjia Zhang

[permalink] [raw]
Subject: [PATCH v2 5/6] crypto: x86/sm3 - add AVX assembly implementation

This patch adds AVX assembly accelerated implementation of SM3 secure
hash algorithm. From the benchmark data, compared to pure software
implementation sm3-generic, the performance increase is up to 38%.

The main algorithm implementation based on SM3 AES/BMI2 accelerated
work by libgcrypt at:
https://gnupg.org/software/libgcrypt/index.html

Benchmark on Intel i5-6200U 2.30GHz, performance data of two
implementations, pure software sm3-generic and sm3-avx acceleration.
The data comes from the 326 mode and 422 mode of tcrypt. The abscissas
are different lengths of per update. The data is tabulated and the
unit is Mb/s:

update-size | 16 64 256 1024 2048 4096 8192
--------------------------------------------------------------------
sm3-generic | 105.97 129.60 182.12 189.62 188.06 193.66 194.88
sm3-avx | 119.87 163.05 244.44 260.92 257.60 264.87 265.88

Signed-off-by: Tianjia Zhang <[email protected]>
---
arch/x86/crypto/Makefile | 3 +
arch/x86/crypto/sm3-avx-asm_64.S | 517 +++++++++++++++++++++++++++++++
arch/x86/crypto/sm3_avx_glue.c | 134 ++++++++
crypto/Kconfig | 13 +
4 files changed, 667 insertions(+)
create mode 100644 arch/x86/crypto/sm3-avx-asm_64.S
create mode 100644 arch/x86/crypto/sm3_avx_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index f307c93fc90a..7cbe860f6201 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -88,6 +88,9 @@ nhpoly1305-avx2-y := nh-avx2-x86_64.o nhpoly1305-avx2-glue.o

obj-$(CONFIG_CRYPTO_CURVE25519_X86) += curve25519-x86_64.o

+obj-$(CONFIG_CRYPTO_SM3_AVX_X86_64) += sm3-avx-x86_64.o
+sm3-avx-x86_64-y := sm3-avx-asm_64.o sm3_avx_glue.o
+
obj-$(CONFIG_CRYPTO_SM4_AESNI_AVX_X86_64) += sm4-aesni-avx-x86_64.o
sm4-aesni-avx-x86_64-y := sm4-aesni-avx-asm_64.o sm4_aesni_avx_glue.o

diff --git a/arch/x86/crypto/sm3-avx-asm_64.S b/arch/x86/crypto/sm3-avx-asm_64.S
new file mode 100644
index 000000000000..90169cab3b30
--- /dev/null
+++ b/arch/x86/crypto/sm3-avx-asm_64.S
@@ -0,0 +1,517 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * SM3 AVX accelerated transform.
+ * specified in: https://datatracker.ietf.org/doc/html/draft-sca-cfrg-sm3-02
+ *
+ * Copyright (C) 2021 Jussi Kivilinna <[email protected]>
+ * Copyright (C) 2021 Tianjia Zhang <[email protected]>
+ */
+
+/* Based on SM3 AES/BMI2 accelerated work by libgcrypt at:
+ * https://gnupg.org/software/libgcrypt/index.html
+ */
+
+#include <linux/linkage.h>
+#include <asm/frame.h>
+
+/* Context structure */
+
+#define state_h0 0
+#define state_h1 4
+#define state_h2 8
+#define state_h3 12
+#define state_h4 16
+#define state_h5 20
+#define state_h6 24
+#define state_h7 28
+
+/* Constants */
+
+/* Round constant macros */
+
+#define K0 2043430169 /* 0x79cc4519 */
+#define K1 -208106958 /* 0xf3988a32 */
+#define K2 -416213915 /* 0xe7311465 */
+#define K3 -832427829 /* 0xce6228cb */
+#define K4 -1664855657 /* 0x9cc45197 */
+#define K5 965255983 /* 0x3988a32f */
+#define K6 1930511966 /* 0x7311465e */
+#define K7 -433943364 /* 0xe6228cbc */
+#define K8 -867886727 /* 0xcc451979 */
+#define K9 -1735773453 /* 0x988a32f3 */
+#define K10 823420391 /* 0x311465e7 */
+#define K11 1646840782 /* 0x6228cbce */
+#define K12 -1001285732 /* 0xc451979c */
+#define K13 -2002571463 /* 0x88a32f39 */
+#define K14 289824371 /* 0x11465e73 */
+#define K15 579648742 /* 0x228cbce6 */
+#define K16 -1651869049 /* 0x9d8a7a87 */
+#define K17 991229199 /* 0x3b14f50f */
+#define K18 1982458398 /* 0x7629ea1e */
+#define K19 -330050500 /* 0xec53d43c */
+#define K20 -660100999 /* 0xd8a7a879 */
+#define K21 -1320201997 /* 0xb14f50f3 */
+#define K22 1654563303 /* 0x629ea1e7 */
+#define K23 -985840690 /* 0xc53d43ce */
+#define K24 -1971681379 /* 0x8a7a879d */
+#define K25 351604539 /* 0x14f50f3b */
+#define K26 703209078 /* 0x29ea1e76 */
+#define K27 1406418156 /* 0x53d43cec */
+#define K28 -1482130984 /* 0xa7a879d8 */
+#define K29 1330705329 /* 0x4f50f3b1 */
+#define K30 -1633556638 /* 0x9ea1e762 */
+#define K31 1027854021 /* 0x3d43cec5 */
+#define K32 2055708042 /* 0x7a879d8a */
+#define K33 -183551212 /* 0xf50f3b14 */
+#define K34 -367102423 /* 0xea1e7629 */
+#define K35 -734204845 /* 0xd43cec53 */
+#define K36 -1468409689 /* 0xa879d8a7 */
+#define K37 1358147919 /* 0x50f3b14f */
+#define K38 -1578671458 /* 0xa1e7629e */
+#define K39 1137624381 /* 0x43cec53d */
+#define K40 -2019718534 /* 0x879d8a7a */
+#define K41 255530229 /* 0x0f3b14f5 */
+#define K42 511060458 /* 0x1e7629ea */
+#define K43 1022120916 /* 0x3cec53d4 */
+#define K44 2044241832 /* 0x79d8a7a8 */
+#define K45 -206483632 /* 0xf3b14f50 */
+#define K46 -412967263 /* 0xe7629ea1 */
+#define K47 -825934525 /* 0xcec53d43 */
+#define K48 -1651869049 /* 0x9d8a7a87 */
+#define K49 991229199 /* 0x3b14f50f */
+#define K50 1982458398 /* 0x7629ea1e */
+#define K51 -330050500 /* 0xec53d43c */
+#define K52 -660100999 /* 0xd8a7a879 */
+#define K53 -1320201997 /* 0xb14f50f3 */
+#define K54 1654563303 /* 0x629ea1e7 */
+#define K55 -985840690 /* 0xc53d43ce */
+#define K56 -1971681379 /* 0x8a7a879d */
+#define K57 351604539 /* 0x14f50f3b */
+#define K58 703209078 /* 0x29ea1e76 */
+#define K59 1406418156 /* 0x53d43cec */
+#define K60 -1482130984 /* 0xa7a879d8 */
+#define K61 1330705329 /* 0x4f50f3b1 */
+#define K62 -1633556638 /* 0x9ea1e762 */
+#define K63 1027854021 /* 0x3d43cec5 */
+
+/* Register macros */
+
+#define RSTATE %rdi
+#define RDATA %rsi
+#define RNBLKS %rdx
+
+#define t0 %eax
+#define t1 %ebx
+#define t2 %ecx
+
+#define a %r8d
+#define b %r9d
+#define c %r10d
+#define d %r11d
+#define e %r12d
+#define f %r13d
+#define g %r14d
+#define h %r15d
+
+#define W0 %xmm0
+#define W1 %xmm1
+#define W2 %xmm2
+#define W3 %xmm3
+#define W4 %xmm4
+#define W5 %xmm5
+
+#define XTMP0 %xmm6
+#define XTMP1 %xmm7
+#define XTMP2 %xmm8
+#define XTMP3 %xmm9
+#define XTMP4 %xmm10
+#define XTMP5 %xmm11
+#define XTMP6 %xmm12
+
+#define BSWAP_REG %xmm15
+
+/* Stack structure */
+
+#define STACK_W_SIZE (32 * 2 * 3)
+#define STACK_REG_SAVE_SIZE (64)
+
+#define STACK_W (0)
+#define STACK_REG_SAVE (STACK_W + STACK_W_SIZE)
+#define STACK_SIZE (STACK_REG_SAVE + STACK_REG_SAVE_SIZE)
+
+/* Instruction helpers. */
+
+#define roll2(v, reg) \
+ roll $(v), reg;
+
+#define roll3mov(v, src, dst) \
+ movl src, dst; \
+ roll $(v), dst;
+
+#define roll3(v, src, dst) \
+ rorxl $(32-(v)), src, dst;
+
+#define addl2(a, out) \
+ leal (a, out), out;
+
+/* Round function macros. */
+
+#define GG1(x, y, z, o, t) \
+ movl x, o; \
+ xorl y, o; \
+ xorl z, o;
+
+#define FF1(x, y, z, o, t) GG1(x, y, z, o, t)
+
+#define GG2(x, y, z, o, t) \
+ andnl z, x, o; \
+ movl y, t; \
+ andl x, t; \
+ addl2(t, o);
+
+#define FF2(x, y, z, o, t) \
+ movl y, o; \
+ xorl x, o; \
+ movl y, t; \
+ andl x, t; \
+ andl z, o; \
+ xorl t, o;
+
+#define R(i, a, b, c, d, e, f, g, h, round, widx, wtype) \
+ /* rol(a, 12) => t0 */ \
+ roll3mov(12, a, t0); /* rorxl here would reduce perf by 6% on zen3 */ \
+ /* rol (t0 + e + t), 7) => t1 */ \
+ leal K##round(t0, e, 1), t1; \
+ roll2(7, t1); \
+ /* h + w1 => h */ \
+ addl wtype##_W1_ADDR(round, widx), h; \
+ /* h + t1 => h */ \
+ addl2(t1, h); \
+ /* t1 ^ t0 => t0 */ \
+ xorl t1, t0; \
+ /* w1w2 + d => d */ \
+ addl wtype##_W1W2_ADDR(round, widx), d; \
+ /* FF##i(a,b,c) => t1 */ \
+ FF##i(a, b, c, t1, t2); \
+ /* d + t1 => d */ \
+ addl2(t1, d); \
+ /* GG#i(e,f,g) => t2 */ \
+ GG##i(e, f, g, t2, t1); \
+ /* h + t2 => h */ \
+ addl2(t2, h); \
+ /* rol (f, 19) => f */ \
+ roll2(19, f); \
+ /* d + t0 => d */ \
+ addl2(t0, d); \
+ /* rol (b, 9) => b */ \
+ roll2(9, b); \
+ /* P0(h) => h */ \
+ roll3(9, h, t2); \
+ roll3(17, h, t1); \
+ xorl t2, h; \
+ xorl t1, h;
+
+#define R1(a, b, c, d, e, f, g, h, round, widx, wtype) \
+ R(1, a, b, c, d, e, f, g, h, round, widx, wtype)
+
+#define R2(a, b, c, d, e, f, g, h, round, widx, wtype) \
+ R(2, a, b, c, d, e, f, g, h, round, widx, wtype)
+
+/* Input expansion macros. */
+
+/* Byte-swapped input address. */
+#define IW_W_ADDR(round, widx, offs) \
+ (STACK_W + ((round) / 4) * 64 + (offs) + ((widx) * 4))(%rsp)
+
+/* Expanded input address. */
+#define XW_W_ADDR(round, widx, offs) \
+ (STACK_W + ((((round) / 3) - 4) % 2) * 64 + (offs) + ((widx) * 4))(%rsp)
+
+/* Rounds 1-12, byte-swapped input block addresses. */
+#define IW_W1_ADDR(round, widx) IW_W_ADDR(round, widx, 0)
+#define IW_W1W2_ADDR(round, widx) IW_W_ADDR(round, widx, 32)
+
+/* Rounds 1-12, expanded input block addresses. */
+#define XW_W1_ADDR(round, widx) XW_W_ADDR(round, widx, 0)
+#define XW_W1W2_ADDR(round, widx) XW_W_ADDR(round, widx, 32)
+
+/* Input block loading. */
+#define LOAD_W_XMM_1() \
+ vmovdqu 0*16(RDATA), XTMP0; /* XTMP0: w3, w2, w1, w0 */ \
+ vmovdqu 1*16(RDATA), XTMP1; /* XTMP1: w7, w6, w5, w4 */ \
+ vmovdqu 2*16(RDATA), XTMP2; /* XTMP2: w11, w10, w9, w8 */ \
+ vmovdqu 3*16(RDATA), XTMP3; /* XTMP3: w15, w14, w13, w12 */ \
+ vpshufb BSWAP_REG, XTMP0, XTMP0; \
+ vpshufb BSWAP_REG, XTMP1, XTMP1; \
+ vpshufb BSWAP_REG, XTMP2, XTMP2; \
+ vpshufb BSWAP_REG, XTMP3, XTMP3; \
+ vpxor XTMP0, XTMP1, XTMP4; \
+ vpxor XTMP1, XTMP2, XTMP5; \
+ vpxor XTMP2, XTMP3, XTMP6; \
+ leaq 64(RDATA), RDATA; \
+ vmovdqa XTMP0, IW_W1_ADDR(0, 0); \
+ vmovdqa XTMP4, IW_W1W2_ADDR(0, 0); \
+ vmovdqa XTMP1, IW_W1_ADDR(4, 0); \
+ vmovdqa XTMP5, IW_W1W2_ADDR(4, 0);
+
+#define LOAD_W_XMM_2() \
+ vmovdqa XTMP2, IW_W1_ADDR(8, 0); \
+ vmovdqa XTMP6, IW_W1W2_ADDR(8, 0);
+
+#define LOAD_W_XMM_3() \
+ vpshufd $0b00000000, XTMP0, W0; /* W0: xx, w0, xx, xx */ \
+ vpshufd $0b11111001, XTMP0, W1; /* W1: xx, w3, w2, w1 */ \
+ vmovdqa XTMP1, W2; /* W2: xx, w6, w5, w4 */ \
+ vpalignr $12, XTMP1, XTMP2, W3; /* W3: xx, w9, w8, w7 */ \
+ vpalignr $8, XTMP2, XTMP3, W4; /* W4: xx, w12, w11, w10 */ \
+ vpshufd $0b11111001, XTMP3, W5; /* W5: xx, w15, w14, w13 */
+
+/* Message scheduling. Note: 3 words per XMM register. */
+#define SCHED_W_0(round, w0, w1, w2, w3, w4, w5) \
+ /* Load (w[i - 16]) => XTMP0 */ \
+ vpshufd $0b10111111, w0, XTMP0; \
+ vpalignr $12, XTMP0, w1, XTMP0; /* XTMP0: xx, w2, w1, w0 */ \
+ /* Load (w[i - 13]) => XTMP1 */ \
+ vpshufd $0b10111111, w1, XTMP1; \
+ vpalignr $12, XTMP1, w2, XTMP1; \
+ /* w[i - 9] == w3 */ \
+ /* XMM3 ^ XTMP0 => XTMP0 */ \
+ vpxor w3, XTMP0, XTMP0;
+
+#define SCHED_W_1(round, w0, w1, w2, w3, w4, w5) \
+ /* w[i - 3] == w5 */ \
+ /* rol(XMM5, 15) ^ XTMP0 => XTMP0 */ \
+ vpslld $15, w5, XTMP2; \
+ vpsrld $(32-15), w5, XTMP3; \
+ vpxor XTMP2, XTMP3, XTMP3; \
+ vpxor XTMP3, XTMP0, XTMP0; \
+ /* rol(XTMP1, 7) => XTMP1 */ \
+ vpslld $7, XTMP1, XTMP5; \
+ vpsrld $(32-7), XTMP1, XTMP1; \
+ vpxor XTMP5, XTMP1, XTMP1; \
+ /* XMM4 ^ XTMP1 => XTMP1 */ \
+ vpxor w4, XTMP1, XTMP1; \
+ /* w[i - 6] == XMM4 */ \
+ /* P1(XTMP0) ^ XTMP1 => XMM0 */ \
+ vpslld $15, XTMP0, XTMP5; \
+ vpsrld $(32-15), XTMP0, XTMP6; \
+ vpslld $23, XTMP0, XTMP2; \
+ vpsrld $(32-23), XTMP0, XTMP3; \
+ vpxor XTMP0, XTMP1, XTMP1; \
+ vpxor XTMP6, XTMP5, XTMP5; \
+ vpxor XTMP3, XTMP2, XTMP2; \
+ vpxor XTMP2, XTMP5, XTMP5; \
+ vpxor XTMP5, XTMP1, w0;
+
+#define SCHED_W_2(round, w0, w1, w2, w3, w4, w5) \
+ /* W1 in XMM12 */ \
+ vpshufd $0b10111111, w4, XTMP4; \
+ vpalignr $12, XTMP4, w5, XTMP4; \
+ vmovdqa XTMP4, XW_W1_ADDR((round), 0); \
+ /* W1 ^ W2 => XTMP1 */ \
+ vpxor w0, XTMP4, XTMP1; \
+ vmovdqa XTMP1, XW_W1W2_ADDR((round), 0);
+
+
+.section .rodata.cst16, "aM", @progbits, 16
+.align 16
+
+.Lbe32mask:
+ .long 0x00010203, 0x04050607, 0x08090a0b, 0x0c0d0e0f
+
+.text
+
+/*
+ * Transform nblocks*64 bytes (nblocks*16 32-bit words) at DATA.
+ *
+ * void sm3_transform_avx(struct sm3_state *state,
+ * const u8 *data, int nblocks);
+ */
+.align 16
+SYM_FUNC_START(sm3_transform_avx)
+ /* input:
+ * %rdi: ctx, CTX
+ * %rsi: data (64*nblks bytes)
+ * %rdx: nblocks
+ */
+ vzeroupper;
+
+ pushq %rbp;
+ movq %rsp, %rbp;
+
+ movq %rdx, RNBLKS;
+
+ subq $STACK_SIZE, %rsp;
+ andq $(~63), %rsp;
+
+ movq %rbx, (STACK_REG_SAVE + 0 * 8)(%rsp);
+ movq %r15, (STACK_REG_SAVE + 1 * 8)(%rsp);
+ movq %r14, (STACK_REG_SAVE + 2 * 8)(%rsp);
+ movq %r13, (STACK_REG_SAVE + 3 * 8)(%rsp);
+ movq %r12, (STACK_REG_SAVE + 4 * 8)(%rsp);
+
+ vmovdqa .Lbe32mask (%rip), BSWAP_REG;
+
+ /* Get the values of the chaining variables. */
+ movl state_h0(RSTATE), a;
+ movl state_h1(RSTATE), b;
+ movl state_h2(RSTATE), c;
+ movl state_h3(RSTATE), d;
+ movl state_h4(RSTATE), e;
+ movl state_h5(RSTATE), f;
+ movl state_h6(RSTATE), g;
+ movl state_h7(RSTATE), h;
+
+.align 16
+.Loop:
+ /* Load data part1. */
+ LOAD_W_XMM_1();
+
+ leaq -1(RNBLKS), RNBLKS;
+
+ /* Transform 0-3 + Load data part2. */
+ R1(a, b, c, d, e, f, g, h, 0, 0, IW); LOAD_W_XMM_2();
+ R1(d, a, b, c, h, e, f, g, 1, 1, IW);
+ R1(c, d, a, b, g, h, e, f, 2, 2, IW);
+ R1(b, c, d, a, f, g, h, e, 3, 3, IW); LOAD_W_XMM_3();
+
+ /* Transform 4-7 + Precalc 12-14. */
+ R1(a, b, c, d, e, f, g, h, 4, 0, IW);
+ R1(d, a, b, c, h, e, f, g, 5, 1, IW);
+ R1(c, d, a, b, g, h, e, f, 6, 2, IW); SCHED_W_0(12, W0, W1, W2, W3, W4, W5);
+ R1(b, c, d, a, f, g, h, e, 7, 3, IW); SCHED_W_1(12, W0, W1, W2, W3, W4, W5);
+
+ /* Transform 8-11 + Precalc 12-17. */
+ R1(a, b, c, d, e, f, g, h, 8, 0, IW); SCHED_W_2(12, W0, W1, W2, W3, W4, W5);
+ R1(d, a, b, c, h, e, f, g, 9, 1, IW); SCHED_W_0(15, W1, W2, W3, W4, W5, W0);
+ R1(c, d, a, b, g, h, e, f, 10, 2, IW); SCHED_W_1(15, W1, W2, W3, W4, W5, W0);
+ R1(b, c, d, a, f, g, h, e, 11, 3, IW); SCHED_W_2(15, W1, W2, W3, W4, W5, W0);
+
+ /* Transform 12-14 + Precalc 18-20 */
+ R1(a, b, c, d, e, f, g, h, 12, 0, XW); SCHED_W_0(18, W2, W3, W4, W5, W0, W1);
+ R1(d, a, b, c, h, e, f, g, 13, 1, XW); SCHED_W_1(18, W2, W3, W4, W5, W0, W1);
+ R1(c, d, a, b, g, h, e, f, 14, 2, XW); SCHED_W_2(18, W2, W3, W4, W5, W0, W1);
+
+ /* Transform 15-17 + Precalc 21-23 */
+ R1(b, c, d, a, f, g, h, e, 15, 0, XW); SCHED_W_0(21, W3, W4, W5, W0, W1, W2);
+ R2(a, b, c, d, e, f, g, h, 16, 1, XW); SCHED_W_1(21, W3, W4, W5, W0, W1, W2);
+ R2(d, a, b, c, h, e, f, g, 17, 2, XW); SCHED_W_2(21, W3, W4, W5, W0, W1, W2);
+
+ /* Transform 18-20 + Precalc 24-26 */
+ R2(c, d, a, b, g, h, e, f, 18, 0, XW); SCHED_W_0(24, W4, W5, W0, W1, W2, W3);
+ R2(b, c, d, a, f, g, h, e, 19, 1, XW); SCHED_W_1(24, W4, W5, W0, W1, W2, W3);
+ R2(a, b, c, d, e, f, g, h, 20, 2, XW); SCHED_W_2(24, W4, W5, W0, W1, W2, W3);
+
+ /* Transform 21-23 + Precalc 27-29 */
+ R2(d, a, b, c, h, e, f, g, 21, 0, XW); SCHED_W_0(27, W5, W0, W1, W2, W3, W4);
+ R2(c, d, a, b, g, h, e, f, 22, 1, XW); SCHED_W_1(27, W5, W0, W1, W2, W3, W4);
+ R2(b, c, d, a, f, g, h, e, 23, 2, XW); SCHED_W_2(27, W5, W0, W1, W2, W3, W4);
+
+ /* Transform 24-26 + Precalc 30-32 */
+ R2(a, b, c, d, e, f, g, h, 24, 0, XW); SCHED_W_0(30, W0, W1, W2, W3, W4, W5);
+ R2(d, a, b, c, h, e, f, g, 25, 1, XW); SCHED_W_1(30, W0, W1, W2, W3, W4, W5);
+ R2(c, d, a, b, g, h, e, f, 26, 2, XW); SCHED_W_2(30, W0, W1, W2, W3, W4, W5);
+
+ /* Transform 27-29 + Precalc 33-35 */
+ R2(b, c, d, a, f, g, h, e, 27, 0, XW); SCHED_W_0(33, W1, W2, W3, W4, W5, W0);
+ R2(a, b, c, d, e, f, g, h, 28, 1, XW); SCHED_W_1(33, W1, W2, W3, W4, W5, W0);
+ R2(d, a, b, c, h, e, f, g, 29, 2, XW); SCHED_W_2(33, W1, W2, W3, W4, W5, W0);
+
+ /* Transform 30-32 + Precalc 36-38 */
+ R2(c, d, a, b, g, h, e, f, 30, 0, XW); SCHED_W_0(36, W2, W3, W4, W5, W0, W1);
+ R2(b, c, d, a, f, g, h, e, 31, 1, XW); SCHED_W_1(36, W2, W3, W4, W5, W0, W1);
+ R2(a, b, c, d, e, f, g, h, 32, 2, XW); SCHED_W_2(36, W2, W3, W4, W5, W0, W1);
+
+ /* Transform 33-35 + Precalc 39-41 */
+ R2(d, a, b, c, h, e, f, g, 33, 0, XW); SCHED_W_0(39, W3, W4, W5, W0, W1, W2);
+ R2(c, d, a, b, g, h, e, f, 34, 1, XW); SCHED_W_1(39, W3, W4, W5, W0, W1, W2);
+ R2(b, c, d, a, f, g, h, e, 35, 2, XW); SCHED_W_2(39, W3, W4, W5, W0, W1, W2);
+
+ /* Transform 36-38 + Precalc 42-44 */
+ R2(a, b, c, d, e, f, g, h, 36, 0, XW); SCHED_W_0(42, W4, W5, W0, W1, W2, W3);
+ R2(d, a, b, c, h, e, f, g, 37, 1, XW); SCHED_W_1(42, W4, W5, W0, W1, W2, W3);
+ R2(c, d, a, b, g, h, e, f, 38, 2, XW); SCHED_W_2(42, W4, W5, W0, W1, W2, W3);
+
+ /* Transform 39-41 + Precalc 45-47 */
+ R2(b, c, d, a, f, g, h, e, 39, 0, XW); SCHED_W_0(45, W5, W0, W1, W2, W3, W4);
+ R2(a, b, c, d, e, f, g, h, 40, 1, XW); SCHED_W_1(45, W5, W0, W1, W2, W3, W4);
+ R2(d, a, b, c, h, e, f, g, 41, 2, XW); SCHED_W_2(45, W5, W0, W1, W2, W3, W4);
+
+ /* Transform 42-44 + Precalc 48-50 */
+ R2(c, d, a, b, g, h, e, f, 42, 0, XW); SCHED_W_0(48, W0, W1, W2, W3, W4, W5);
+ R2(b, c, d, a, f, g, h, e, 43, 1, XW); SCHED_W_1(48, W0, W1, W2, W3, W4, W5);
+ R2(a, b, c, d, e, f, g, h, 44, 2, XW); SCHED_W_2(48, W0, W1, W2, W3, W4, W5);
+
+ /* Transform 45-47 + Precalc 51-53 */
+ R2(d, a, b, c, h, e, f, g, 45, 0, XW); SCHED_W_0(51, W1, W2, W3, W4, W5, W0);
+ R2(c, d, a, b, g, h, e, f, 46, 1, XW); SCHED_W_1(51, W1, W2, W3, W4, W5, W0);
+ R2(b, c, d, a, f, g, h, e, 47, 2, XW); SCHED_W_2(51, W1, W2, W3, W4, W5, W0);
+
+ /* Transform 48-50 + Precalc 54-56 */
+ R2(a, b, c, d, e, f, g, h, 48, 0, XW); SCHED_W_0(54, W2, W3, W4, W5, W0, W1);
+ R2(d, a, b, c, h, e, f, g, 49, 1, XW); SCHED_W_1(54, W2, W3, W4, W5, W0, W1);
+ R2(c, d, a, b, g, h, e, f, 50, 2, XW); SCHED_W_2(54, W2, W3, W4, W5, W0, W1);
+
+ /* Transform 51-53 + Precalc 57-59 */
+ R2(b, c, d, a, f, g, h, e, 51, 0, XW); SCHED_W_0(57, W3, W4, W5, W0, W1, W2);
+ R2(a, b, c, d, e, f, g, h, 52, 1, XW); SCHED_W_1(57, W3, W4, W5, W0, W1, W2);
+ R2(d, a, b, c, h, e, f, g, 53, 2, XW); SCHED_W_2(57, W3, W4, W5, W0, W1, W2);
+
+ /* Transform 54-56 + Precalc 60-62 */
+ R2(c, d, a, b, g, h, e, f, 54, 0, XW); SCHED_W_0(60, W4, W5, W0, W1, W2, W3);
+ R2(b, c, d, a, f, g, h, e, 55, 1, XW); SCHED_W_1(60, W4, W5, W0, W1, W2, W3);
+ R2(a, b, c, d, e, f, g, h, 56, 2, XW); SCHED_W_2(60, W4, W5, W0, W1, W2, W3);
+
+ /* Transform 57-59 + Precalc 63 */
+ R2(d, a, b, c, h, e, f, g, 57, 0, XW); SCHED_W_0(63, W5, W0, W1, W2, W3, W4);
+ R2(c, d, a, b, g, h, e, f, 58, 1, XW);
+ R2(b, c, d, a, f, g, h, e, 59, 2, XW); SCHED_W_1(63, W5, W0, W1, W2, W3, W4);
+
+ /* Transform 60-62 + Precalc 63 */
+ R2(a, b, c, d, e, f, g, h, 60, 0, XW);
+ R2(d, a, b, c, h, e, f, g, 61, 1, XW); SCHED_W_2(63, W5, W0, W1, W2, W3, W4);
+ R2(c, d, a, b, g, h, e, f, 62, 2, XW);
+
+ /* Transform 63 */
+ R2(b, c, d, a, f, g, h, e, 63, 0, XW);
+
+ /* Update the chaining variables. */
+ xorl state_h0(RSTATE), a;
+ xorl state_h1(RSTATE), b;
+ xorl state_h2(RSTATE), c;
+ xorl state_h3(RSTATE), d;
+ movl a, state_h0(RSTATE);
+ movl b, state_h1(RSTATE);
+ movl c, state_h2(RSTATE);
+ movl d, state_h3(RSTATE);
+ xorl state_h4(RSTATE), e;
+ xorl state_h5(RSTATE), f;
+ xorl state_h6(RSTATE), g;
+ xorl state_h7(RSTATE), h;
+ movl e, state_h4(RSTATE);
+ movl f, state_h5(RSTATE);
+ movl g, state_h6(RSTATE);
+ movl h, state_h7(RSTATE);
+
+ cmpq $0, RNBLKS;
+ jne .Loop;
+
+ vzeroall;
+
+ movq (STACK_REG_SAVE + 0 * 8)(%rsp), %rbx;
+ movq (STACK_REG_SAVE + 1 * 8)(%rsp), %r15;
+ movq (STACK_REG_SAVE + 2 * 8)(%rsp), %r14;
+ movq (STACK_REG_SAVE + 3 * 8)(%rsp), %r13;
+ movq (STACK_REG_SAVE + 4 * 8)(%rsp), %r12;
+
+ vmovdqa %xmm0, IW_W1_ADDR(0, 0);
+ vmovdqa %xmm0, IW_W1W2_ADDR(0, 0);
+ vmovdqa %xmm0, IW_W1_ADDR(4, 0);
+ vmovdqa %xmm0, IW_W1W2_ADDR(4, 0);
+ vmovdqa %xmm0, IW_W1_ADDR(8, 0);
+ vmovdqa %xmm0, IW_W1W2_ADDR(8, 0);
+
+ movq %rbp, %rsp;
+ popq %rbp;
+ ret;
+SYM_FUNC_END(sm3_transform_avx)
diff --git a/arch/x86/crypto/sm3_avx_glue.c b/arch/x86/crypto/sm3_avx_glue.c
new file mode 100644
index 000000000000..661b6f22ffcd
--- /dev/null
+++ b/arch/x86/crypto/sm3_avx_glue.c
@@ -0,0 +1,134 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * SM3 Secure Hash Algorithm, AVX assembler accelerated.
+ * specified in: https://datatracker.ietf.org/doc/html/draft-sca-cfrg-sm3-02
+ *
+ * Copyright (C) 2021 Tianjia Zhang <[email protected]>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <crypto/sm3.h>
+#include <crypto/sm3_base.h>
+#include <asm/simd.h>
+
+asmlinkage void sm3_transform_avx(struct sm3_state *state,
+ const u8 *data, int nblocks);
+
+static int sm3_avx_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ struct sm3_state *sctx = shash_desc_ctx(desc);
+
+ if (!crypto_simd_usable() ||
+ (sctx->count % SM3_BLOCK_SIZE) + len < SM3_BLOCK_SIZE) {
+ sm3_update(sctx, data, len);
+ return 0;
+ }
+
+ /*
+ * Make sure struct sm3_state begins directly with the SM3
+ * 256-bit internal state, as this is what the asm functions expect.
+ */
+ BUILD_BUG_ON(offsetof(struct sm3_state, state) != 0);
+
+ kernel_fpu_begin();
+ sm3_base_do_update(desc, data, len, sm3_transform_avx);
+ kernel_fpu_end();
+
+ return 0;
+}
+
+static int sm3_avx_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ if (!crypto_simd_usable()) {
+ struct sm3_state *sctx = shash_desc_ctx(desc);
+
+ if (len)
+ sm3_update(sctx, data, len);
+
+ sm3_final(sctx, out);
+ return 0;
+ }
+
+ kernel_fpu_begin();
+ if (len)
+ sm3_base_do_update(desc, data, len, sm3_transform_avx);
+ sm3_base_do_finalize(desc, sm3_transform_avx);
+ kernel_fpu_end();
+
+ return sm3_base_finish(desc, out);
+}
+
+static int sm3_avx_final(struct shash_desc *desc, u8 *out)
+{
+ if (!crypto_simd_usable()) {
+ sm3_final(shash_desc_ctx(desc), out);
+ return 0;
+ }
+
+ kernel_fpu_begin();
+ sm3_base_do_finalize(desc, sm3_transform_avx);
+ kernel_fpu_end();
+
+ return sm3_base_finish(desc, out);
+}
+
+static struct shash_alg sm3_avx_alg = {
+ .digestsize = SM3_DIGEST_SIZE,
+ .init = sm3_base_init,
+ .update = sm3_avx_update,
+ .final = sm3_avx_final,
+ .finup = sm3_avx_finup,
+ .descsize = sizeof(struct sm3_state),
+ .base = {
+ .cra_name = "sm3",
+ .cra_driver_name = "sm3-avx",
+ .cra_priority = 300,
+ .cra_blocksize = SM3_BLOCK_SIZE,
+ .cra_module = THIS_MODULE,
+ }
+};
+
+static int __init sm3_avx_mod_init(void)
+{
+ const char *feature_name;
+
+ if (!boot_cpu_has(X86_FEATURE_AVX)) {
+ pr_info("AVX instruction are not detected.\n");
+ return -ENODEV;
+ }
+
+ if (!boot_cpu_has(X86_FEATURE_BMI2)) {
+ pr_info("BMI2 instruction are not detected.\n");
+ return -ENODEV;
+ }
+
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
+ &feature_name)) {
+ pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ return -ENODEV;
+ }
+
+ return crypto_register_shash(&sm3_avx_alg);
+}
+
+static void __exit sm3_avx_mod_exit(void)
+{
+ crypto_unregister_shash(&sm3_avx_alg);
+}
+
+module_init(sm3_avx_mod_init);
+module_exit(sm3_avx_mod_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Tianjia Zhang <[email protected]>");
+MODULE_DESCRIPTION("SM3 Secure Hash Algorithm, AVX assembler accelerated");
+MODULE_ALIAS_CRYPTO("sm3");
+MODULE_ALIAS_CRYPTO("sm3-avx");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 8e780bf66c0a..0af5216c083a 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1008,6 +1008,19 @@ config CRYPTO_SM3
http://www.oscca.gov.cn/UpFile/20101222141857786.pdf
https://datatracker.ietf.org/doc/html/draft-shen-sm3-hash

+config CRYPTO_SM3_AVX_X86_64
+ tristate "SM3 digest algorithm (x86_64/AVX)"
+ depends on X86 && 64BIT
+ select CRYPTO_HASH
+ select CRYPTO_LIB_SM3
+ help
+ SM3 secure hash function as defined by OSCCA GM/T 0004-2012 SM3).
+ It is part of the Chinese Commercial Cryptography suite. This is
+ SM3 optimized implementation using Advanced Vector Extensions (AVX)
+ when available.
+
+ If unsure, say N.
+
config CRYPTO_STREEBOG
tristate "Streebog Hash Function"
select CRYPTO_HASH
--
2.32.0


2021-12-22 04:50:43

by Tianjia Zhang

[permalink] [raw]
Subject: [PATCH v2 6/6] crypto: tcrypt - add asynchronous speed test for SM3

tcrypt supports testing of SM3 hash algorithms that use AVX
instruction acceleration.

In order to add the sm3 asynchronous test to the appropriate
position, shift the testcase sequence number of the multi buffer
backward and start from 450.

Signed-off-by: Tianjia Zhang <[email protected]>
---
crypto/tcrypt.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 00149657a4bc..82b5eef2246a 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -2571,31 +2571,35 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
if (mode > 400 && mode < 500) break;
fallthrough;
case 422:
+ test_ahash_speed("sm3", sec, generic_hash_speed_template);
+ if (mode > 400 && mode < 500) break;
+ fallthrough;
+ case 450:
test_mb_ahash_speed("sha1", sec, generic_hash_speed_template,
num_mb);
if (mode > 400 && mode < 500) break;
fallthrough;
- case 423:
+ case 451:
test_mb_ahash_speed("sha256", sec, generic_hash_speed_template,
num_mb);
if (mode > 400 && mode < 500) break;
fallthrough;
- case 424:
+ case 452:
test_mb_ahash_speed("sha512", sec, generic_hash_speed_template,
num_mb);
if (mode > 400 && mode < 500) break;
fallthrough;
- case 425:
+ case 453:
test_mb_ahash_speed("sm3", sec, generic_hash_speed_template,
num_mb);
if (mode > 400 && mode < 500) break;
fallthrough;
- case 426:
+ case 454:
test_mb_ahash_speed("streebog256", sec,
generic_hash_speed_template, num_mb);
if (mode > 400 && mode < 500) break;
fallthrough;
- case 427:
+ case 455:
test_mb_ahash_speed("streebog512", sec,
generic_hash_speed_template, num_mb);
if (mode > 400 && mode < 500) break;
--
2.32.0


2021-12-22 04:51:06

by Tianjia Zhang

[permalink] [raw]
Subject: [PATCH v2 4/6] crypto: sm3 - make dependent on sm3 library

SM3 generic library is stand-alone implementation, it is necessary
making the sm3-generic implementation to depends on SM3 library.
The functions crypto_sm3_*() provided by sm3_generic is no longer
exported.

Signed-off-by: Tianjia Zhang <[email protected]>
---
crypto/Kconfig | 1 +
crypto/sm3_generic.c | 142 +++++--------------------------------------
include/crypto/sm3.h | 10 ---
3 files changed, 16 insertions(+), 137 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 60b252975dc4..8e780bf66c0a 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -999,6 +999,7 @@ config CRYPTO_SHA3
config CRYPTO_SM3
tristate "SM3 digest algorithm"
select CRYPTO_HASH
+ select CRYPTO_LIB_SM3
help
SM3 secure hash function as defined by OSCCA GM/T 0004-2012 SM3).
It is part of the Chinese Commercial Cryptography suite.
diff --git a/crypto/sm3_generic.c b/crypto/sm3_generic.c
index 193c4584bd00..a215c1c37e73 100644
--- a/crypto/sm3_generic.c
+++ b/crypto/sm3_generic.c
@@ -5,6 +5,7 @@
*
* Copyright (C) 2017 ARM Limited or its affiliates.
* Written by Gilad Ben-Yossef <[email protected]>
+ * Copyright (C) 2021 Tianjia Zhang <[email protected]>
*/

#include <crypto/internal/hash.h>
@@ -26,143 +27,29 @@ const u8 sm3_zero_message_hash[SM3_DIGEST_SIZE] = {
};
EXPORT_SYMBOL_GPL(sm3_zero_message_hash);

-static inline u32 p0(u32 x)
-{
- return x ^ rol32(x, 9) ^ rol32(x, 17);
-}
-
-static inline u32 p1(u32 x)
-{
- return x ^ rol32(x, 15) ^ rol32(x, 23);
-}
-
-static inline u32 ff(unsigned int n, u32 a, u32 b, u32 c)
-{
- return (n < 16) ? (a ^ b ^ c) : ((a & b) | (a & c) | (b & c));
-}
-
-static inline u32 gg(unsigned int n, u32 e, u32 f, u32 g)
-{
- return (n < 16) ? (e ^ f ^ g) : ((e & f) | ((~e) & g));
-}
-
-static inline u32 t(unsigned int n)
-{
- return (n < 16) ? SM3_T1 : SM3_T2;
-}
-
-static void sm3_expand(u32 *t, u32 *w, u32 *wt)
-{
- int i;
- unsigned int tmp;
-
- /* load the input */
- for (i = 0; i <= 15; i++)
- w[i] = get_unaligned_be32((__u32 *)t + i);
-
- for (i = 16; i <= 67; i++) {
- tmp = w[i - 16] ^ w[i - 9] ^ rol32(w[i - 3], 15);
- w[i] = p1(tmp) ^ (rol32(w[i - 13], 7)) ^ w[i - 6];
- }
-
- for (i = 0; i <= 63; i++)
- wt[i] = w[i] ^ w[i + 4];
-}
-
-static void sm3_compress(u32 *w, u32 *wt, u32 *m)
-{
- u32 ss1;
- u32 ss2;
- u32 tt1;
- u32 tt2;
- u32 a, b, c, d, e, f, g, h;
- int i;
-
- a = m[0];
- b = m[1];
- c = m[2];
- d = m[3];
- e = m[4];
- f = m[5];
- g = m[6];
- h = m[7];
-
- for (i = 0; i <= 63; i++) {
-
- ss1 = rol32((rol32(a, 12) + e + rol32(t(i), i & 31)), 7);
-
- ss2 = ss1 ^ rol32(a, 12);
-
- tt1 = ff(i, a, b, c) + d + ss2 + *wt;
- wt++;
-
- tt2 = gg(i, e, f, g) + h + ss1 + *w;
- w++;
-
- d = c;
- c = rol32(b, 9);
- b = a;
- a = tt1;
- h = g;
- g = rol32(f, 19);
- f = e;
- e = p0(tt2);
- }
-
- m[0] = a ^ m[0];
- m[1] = b ^ m[1];
- m[2] = c ^ m[2];
- m[3] = d ^ m[3];
- m[4] = e ^ m[4];
- m[5] = f ^ m[5];
- m[6] = g ^ m[6];
- m[7] = h ^ m[7];
-
- a = b = c = d = e = f = g = h = ss1 = ss2 = tt1 = tt2 = 0;
-}
-
-static void sm3_transform(struct sm3_state *sst, u8 const *src)
-{
- unsigned int w[68];
- unsigned int wt[64];
-
- sm3_expand((u32 *)src, w, wt);
- sm3_compress(w, wt, sst->state);
-
- memzero_explicit(w, sizeof(w));
- memzero_explicit(wt, sizeof(wt));
-}
-
-static void sm3_generic_block_fn(struct sm3_state *sst, u8 const *src,
- int blocks)
-{
- while (blocks--) {
- sm3_transform(sst, src);
- src += SM3_BLOCK_SIZE;
- }
-}
-
-int crypto_sm3_update(struct shash_desc *desc, const u8 *data,
+static int crypto_sm3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sm3_base_do_update(desc, data, len, sm3_generic_block_fn);
+ sm3_update(shash_desc_ctx(desc), data, len);
+ return 0;
}
-EXPORT_SYMBOL(crypto_sm3_update);

-int crypto_sm3_final(struct shash_desc *desc, u8 *out)
+static int crypto_sm3_final(struct shash_desc *desc, u8 *out)
{
- sm3_base_do_finalize(desc, sm3_generic_block_fn);
- return sm3_base_finish(desc, out);
+ sm3_final(shash_desc_ctx(desc), out);
+ return 0;
}
-EXPORT_SYMBOL(crypto_sm3_final);

-int crypto_sm3_finup(struct shash_desc *desc, const u8 *data,
+static int crypto_sm3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *hash)
{
- sm3_base_do_update(desc, data, len, sm3_generic_block_fn);
- return crypto_sm3_final(desc, hash);
+ struct sm3_state *sctx = shash_desc_ctx(desc);
+
+ if (len)
+ sm3_update(sctx, data, len);
+ sm3_final(sctx, hash);
+ return 0;
}
-EXPORT_SYMBOL(crypto_sm3_finup);

static struct shash_alg sm3_alg = {
.digestsize = SM3_DIGEST_SIZE,
@@ -174,6 +61,7 @@ static struct shash_alg sm3_alg = {
.base = {
.cra_name = "sm3",
.cra_driver_name = "sm3-generic",
+ .cra_priority = 100,
.cra_blocksize = SM3_BLOCK_SIZE,
.cra_module = THIS_MODULE,
}
diff --git a/include/crypto/sm3.h b/include/crypto/sm3.h
index 315adaf38007..56b50330ef0a 100644
--- a/include/crypto/sm3.h
+++ b/include/crypto/sm3.h
@@ -34,16 +34,6 @@ struct sm3_state {
u8 buffer[SM3_BLOCK_SIZE];
};

-struct shash_desc;
-
-extern int crypto_sm3_update(struct shash_desc *desc, const u8 *data,
- unsigned int len);
-
-extern int crypto_sm3_final(struct shash_desc *desc, u8 *out);
-
-extern int crypto_sm3_finup(struct shash_desc *desc, const u8 *data,
- unsigned int len, u8 *hash);
-
/*
* Stand-alone implementation of the SM3 algorithm. It is designed to
* have as little dependencies as possible so it can be used in the
--
2.32.0


2021-12-22 06:59:25

by Gilad Ben-Yossef

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] crypto: sm3 - create SM3 stand-alone library

On Wed, Dec 22, 2021 at 6:50 AM Tianjia Zhang
<[email protected]> wrote:
>
> Stand-alone implementation of the SM3 algorithm. It is designed
> to have as little dependencies as possible. In other cases you
> should generally use the hash APIs from include/crypto/hash.h.
> Especially when hashing large amounts of data as those APIs may
> be hw-accelerated. In the new SM3 stand-alone library,
> sm3_compress() has also been optimized, instead of simply using
> the code in sm3_generic.
>

I have a really minor nitpick: the commit message talks about changes
to sm3_compress() which was there in the original code but there is no
such function in the current code which is in a different patch and
file, so if you do another iteration for other reason, perhaps change
the commit message to refer to sm3_transform() instead? it's not
really important enough to warrant a new iteration on it's own...

Otherwise, I'm not smart enough to evaluate the changes to
sm3_transform() cryptographically but the overall approach of moving
to a standalone library seems sane to me.

So, for what it's worth -

Reviewed-by: Gilad Ben-Yossef <[email protected]>

Gilad


> Signed-off-by: Tianjia Zhang <[email protected]>
> ---
> include/crypto/sm3.h | 31 ++++++
> lib/crypto/Kconfig | 3 +
> lib/crypto/Makefile | 3 +
> lib/crypto/sm3.c | 246 +++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 283 insertions(+)
> create mode 100644 lib/crypto/sm3.c
>
> diff --git a/include/crypto/sm3.h b/include/crypto/sm3.h
> index 42ea21289ba9..315adaf38007 100644
> --- a/include/crypto/sm3.h
> +++ b/include/crypto/sm3.h
> @@ -1,5 +1,9 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> /*
> * Common values for SM3 algorithm
> + *
> + * Copyright (C) 2017 ARM Limited or its affiliates.
> + * Copyright (C) 2021 Tianjia Zhang <[email protected]>
> */
>
> #ifndef _CRYPTO_SM3_H
> @@ -39,4 +43,31 @@ extern int crypto_sm3_final(struct shash_desc *desc, u8 *out);
>
> extern int crypto_sm3_finup(struct shash_desc *desc, const u8 *data,
> unsigned int len, u8 *hash);
> +
> +/*
> + * Stand-alone implementation of the SM3 algorithm. It is designed to
> + * have as little dependencies as possible so it can be used in the
> + * kexec_file purgatory. In other cases you should generally use the
> + * hash APIs from include/crypto/hash.h. Especially when hashing large
> + * amounts of data as those APIs may be hw-accelerated.
> + *
> + * For details see lib/crypto/sm3.c
> + */
> +
> +static inline void sm3_init(struct sm3_state *sctx)
> +{
> + sctx->state[0] = SM3_IVA;
> + sctx->state[1] = SM3_IVB;
> + sctx->state[2] = SM3_IVC;
> + sctx->state[3] = SM3_IVD;
> + sctx->state[4] = SM3_IVE;
> + sctx->state[5] = SM3_IVF;
> + sctx->state[6] = SM3_IVG;
> + sctx->state[7] = SM3_IVH;
> + sctx->count = 0;
> +}
> +
> +void sm3_update(struct sm3_state *sctx, const u8 *data, unsigned int len);
> +void sm3_final(struct sm3_state *sctx, u8 *out);
> +
> #endif
> diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
> index 545ccbddf6a1..36963e8f4eaa 100644
> --- a/lib/crypto/Kconfig
> +++ b/lib/crypto/Kconfig
> @@ -129,5 +129,8 @@ config CRYPTO_LIB_CHACHA20POLY1305
> config CRYPTO_LIB_SHA256
> tristate
>
> +config CRYPTO_LIB_SM3
> + tristate
> +
> config CRYPTO_LIB_SM4
> tristate
> diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
> index 73205ed269ba..8149bc00b627 100644
> --- a/lib/crypto/Makefile
> +++ b/lib/crypto/Makefile
> @@ -38,6 +38,9 @@ libpoly1305-y += poly1305.o
> obj-$(CONFIG_CRYPTO_LIB_SHA256) += libsha256.o
> libsha256-y := sha256.o
>
> +obj-$(CONFIG_CRYPTO_LIB_SM3) += libsm3.o
> +libsm3-y := sm3.o
> +
> obj-$(CONFIG_CRYPTO_LIB_SM4) += libsm4.o
> libsm4-y := sm4.o
>
> diff --git a/lib/crypto/sm3.c b/lib/crypto/sm3.c
> new file mode 100644
> index 000000000000..d473e358a873
> --- /dev/null
> +++ b/lib/crypto/sm3.c
> @@ -0,0 +1,246 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * SM3 secure hash, as specified by OSCCA GM/T 0004-2012 SM3 and described
> + * at https://datatracker.ietf.org/doc/html/draft-sca-cfrg-sm3-02
> + *
> + * Copyright (C) 2017 ARM Limited or its affiliates.
> + * Copyright (C) 2017 Gilad Ben-Yossef <[email protected]>
> + * Copyright (C) 2021 Tianjia Zhang <[email protected]>
> + */
> +
> +#include <linux/module.h>
> +#include <asm/unaligned.h>
> +#include <crypto/sm3.h>
> +
> +static const u32 ____cacheline_aligned K[64] = {
> + 0x79cc4519, 0xf3988a32, 0xe7311465, 0xce6228cb,
> + 0x9cc45197, 0x3988a32f, 0x7311465e, 0xe6228cbc,
> + 0xcc451979, 0x988a32f3, 0x311465e7, 0x6228cbce,
> + 0xc451979c, 0x88a32f39, 0x11465e73, 0x228cbce6,
> + 0x9d8a7a87, 0x3b14f50f, 0x7629ea1e, 0xec53d43c,
> + 0xd8a7a879, 0xb14f50f3, 0x629ea1e7, 0xc53d43ce,
> + 0x8a7a879d, 0x14f50f3b, 0x29ea1e76, 0x53d43cec,
> + 0xa7a879d8, 0x4f50f3b1, 0x9ea1e762, 0x3d43cec5,
> + 0x7a879d8a, 0xf50f3b14, 0xea1e7629, 0xd43cec53,
> + 0xa879d8a7, 0x50f3b14f, 0xa1e7629e, 0x43cec53d,
> + 0x879d8a7a, 0x0f3b14f5, 0x1e7629ea, 0x3cec53d4,
> + 0x79d8a7a8, 0xf3b14f50, 0xe7629ea1, 0xcec53d43,
> + 0x9d8a7a87, 0x3b14f50f, 0x7629ea1e, 0xec53d43c,
> + 0xd8a7a879, 0xb14f50f3, 0x629ea1e7, 0xc53d43ce,
> + 0x8a7a879d, 0x14f50f3b, 0x29ea1e76, 0x53d43cec,
> + 0xa7a879d8, 0x4f50f3b1, 0x9ea1e762, 0x3d43cec5
> +};
> +
> +/*
> + * Transform the message X which consists of 16 32-bit-words. See
> + * GM/T 004-2012 for details.
> + */
> +#define R(i, a, b, c, d, e, f, g, h, t, w1, w2) \
> + do { \
> + ss1 = rol32((rol32((a), 12) + (e) + (t)), 7); \
> + ss2 = ss1 ^ rol32((a), 12); \
> + d += FF ## i(a, b, c) + ss2 + ((w1) ^ (w2)); \
> + h += GG ## i(e, f, g) + ss1 + (w1); \
> + b = rol32((b), 9); \
> + f = rol32((f), 19); \
> + h = P0((h)); \
> + } while (0)
> +
> +#define R1(a, b, c, d, e, f, g, h, t, w1, w2) \
> + R(1, a, b, c, d, e, f, g, h, t, w1, w2)
> +#define R2(a, b, c, d, e, f, g, h, t, w1, w2) \
> + R(2, a, b, c, d, e, f, g, h, t, w1, w2)
> +
> +#define FF1(x, y, z) (x ^ y ^ z)
> +#define FF2(x, y, z) ((x & y) | (x & z) | (y & z))
> +
> +#define GG1(x, y, z) FF1(x, y, z)
> +#define GG2(x, y, z) ((x & y) | (~x & z))
> +
> +/* Message expansion */
> +#define P0(x) ((x) ^ rol32((x), 9) ^ rol32((x), 17))
> +#define P1(x) ((x) ^ rol32((x), 15) ^ rol32((x), 23))
> +#define I(i) (W[i] = get_unaligned_be32(data + i * 4))
> +#define W1(i) (W[i & 0x0f])
> +#define W2(i) (W[i & 0x0f] = \
> + P1(W[i & 0x0f] \
> + ^ W[(i-9) & 0x0f] \
> + ^ rol32(W[(i-3) & 0x0f], 15)) \
> + ^ rol32(W[(i-13) & 0x0f], 7) \
> + ^ W[(i-6) & 0x0f])
> +
> +static void sm3_transform(struct sm3_state *sctx, u8 const *data, u32 W[16])
> +{
> + u32 a, b, c, d, e, f, g, h, ss1, ss2;
> +
> + a = sctx->state[0];
> + b = sctx->state[1];
> + c = sctx->state[2];
> + d = sctx->state[3];
> + e = sctx->state[4];
> + f = sctx->state[5];
> + g = sctx->state[6];
> + h = sctx->state[7];
> +
> + R1(a, b, c, d, e, f, g, h, K[0], I(0), I(4));
> + R1(d, a, b, c, h, e, f, g, K[1], I(1), I(5));
> + R1(c, d, a, b, g, h, e, f, K[2], I(2), I(6));
> + R1(b, c, d, a, f, g, h, e, K[3], I(3), I(7));
> + R1(a, b, c, d, e, f, g, h, K[4], W1(4), I(8));
> + R1(d, a, b, c, h, e, f, g, K[5], W1(5), I(9));
> + R1(c, d, a, b, g, h, e, f, K[6], W1(6), I(10));
> + R1(b, c, d, a, f, g, h, e, K[7], W1(7), I(11));
> + R1(a, b, c, d, e, f, g, h, K[8], W1(8), I(12));
> + R1(d, a, b, c, h, e, f, g, K[9], W1(9), I(13));
> + R1(c, d, a, b, g, h, e, f, K[10], W1(10), I(14));
> + R1(b, c, d, a, f, g, h, e, K[11], W1(11), I(15));
> + R1(a, b, c, d, e, f, g, h, K[12], W1(12), W2(16));
> + R1(d, a, b, c, h, e, f, g, K[13], W1(13), W2(17));
> + R1(c, d, a, b, g, h, e, f, K[14], W1(14), W2(18));
> + R1(b, c, d, a, f, g, h, e, K[15], W1(15), W2(19));
> +
> + R2(a, b, c, d, e, f, g, h, K[16], W1(16), W2(20));
> + R2(d, a, b, c, h, e, f, g, K[17], W1(17), W2(21));
> + R2(c, d, a, b, g, h, e, f, K[18], W1(18), W2(22));
> + R2(b, c, d, a, f, g, h, e, K[19], W1(19), W2(23));
> + R2(a, b, c, d, e, f, g, h, K[20], W1(20), W2(24));
> + R2(d, a, b, c, h, e, f, g, K[21], W1(21), W2(25));
> + R2(c, d, a, b, g, h, e, f, K[22], W1(22), W2(26));
> + R2(b, c, d, a, f, g, h, e, K[23], W1(23), W2(27));
> + R2(a, b, c, d, e, f, g, h, K[24], W1(24), W2(28));
> + R2(d, a, b, c, h, e, f, g, K[25], W1(25), W2(29));
> + R2(c, d, a, b, g, h, e, f, K[26], W1(26), W2(30));
> + R2(b, c, d, a, f, g, h, e, K[27], W1(27), W2(31));
> + R2(a, b, c, d, e, f, g, h, K[28], W1(28), W2(32));
> + R2(d, a, b, c, h, e, f, g, K[29], W1(29), W2(33));
> + R2(c, d, a, b, g, h, e, f, K[30], W1(30), W2(34));
> + R2(b, c, d, a, f, g, h, e, K[31], W1(31), W2(35));
> +
> + R2(a, b, c, d, e, f, g, h, K[32], W1(32), W2(36));
> + R2(d, a, b, c, h, e, f, g, K[33], W1(33), W2(37));
> + R2(c, d, a, b, g, h, e, f, K[34], W1(34), W2(38));
> + R2(b, c, d, a, f, g, h, e, K[35], W1(35), W2(39));
> + R2(a, b, c, d, e, f, g, h, K[36], W1(36), W2(40));
> + R2(d, a, b, c, h, e, f, g, K[37], W1(37), W2(41));
> + R2(c, d, a, b, g, h, e, f, K[38], W1(38), W2(42));
> + R2(b, c, d, a, f, g, h, e, K[39], W1(39), W2(43));
> + R2(a, b, c, d, e, f, g, h, K[40], W1(40), W2(44));
> + R2(d, a, b, c, h, e, f, g, K[41], W1(41), W2(45));
> + R2(c, d, a, b, g, h, e, f, K[42], W1(42), W2(46));
> + R2(b, c, d, a, f, g, h, e, K[43], W1(43), W2(47));
> + R2(a, b, c, d, e, f, g, h, K[44], W1(44), W2(48));
> + R2(d, a, b, c, h, e, f, g, K[45], W1(45), W2(49));
> + R2(c, d, a, b, g, h, e, f, K[46], W1(46), W2(50));
> + R2(b, c, d, a, f, g, h, e, K[47], W1(47), W2(51));
> +
> + R2(a, b, c, d, e, f, g, h, K[48], W1(48), W2(52));
> + R2(d, a, b, c, h, e, f, g, K[49], W1(49), W2(53));
> + R2(c, d, a, b, g, h, e, f, K[50], W1(50), W2(54));
> + R2(b, c, d, a, f, g, h, e, K[51], W1(51), W2(55));
> + R2(a, b, c, d, e, f, g, h, K[52], W1(52), W2(56));
> + R2(d, a, b, c, h, e, f, g, K[53], W1(53), W2(57));
> + R2(c, d, a, b, g, h, e, f, K[54], W1(54), W2(58));
> + R2(b, c, d, a, f, g, h, e, K[55], W1(55), W2(59));
> + R2(a, b, c, d, e, f, g, h, K[56], W1(56), W2(60));
> + R2(d, a, b, c, h, e, f, g, K[57], W1(57), W2(61));
> + R2(c, d, a, b, g, h, e, f, K[58], W1(58), W2(62));
> + R2(b, c, d, a, f, g, h, e, K[59], W1(59), W2(63));
> + R2(a, b, c, d, e, f, g, h, K[60], W1(60), W2(64));
> + R2(d, a, b, c, h, e, f, g, K[61], W1(61), W2(65));
> + R2(c, d, a, b, g, h, e, f, K[62], W1(62), W2(66));
> + R2(b, c, d, a, f, g, h, e, K[63], W1(63), W2(67));
> +
> + sctx->state[0] ^= a;
> + sctx->state[1] ^= b;
> + sctx->state[2] ^= c;
> + sctx->state[3] ^= d;
> + sctx->state[4] ^= e;
> + sctx->state[5] ^= f;
> + sctx->state[6] ^= g;
> + sctx->state[7] ^= h;
> +}
> +#undef R
> +#undef R1
> +#undef R2
> +#undef I
> +#undef W1
> +#undef W2
> +
> +static inline void sm3_block(struct sm3_state *sctx,
> + u8 const *data, int blocks, u32 W[16])
> +{
> + while (blocks--) {
> + sm3_transform(sctx, data, W);
> + data += SM3_BLOCK_SIZE;
> + }
> +}
> +
> +void sm3_update(struct sm3_state *sctx, const u8 *data, unsigned int len)
> +{
> + unsigned int partial = sctx->count % SM3_BLOCK_SIZE;
> + u32 W[16];
> +
> + sctx->count += len;
> +
> + if ((partial + len) >= SM3_BLOCK_SIZE) {
> + int blocks;
> +
> + if (partial) {
> + int p = SM3_BLOCK_SIZE - partial;
> +
> + memcpy(sctx->buffer + partial, data, p);
> + data += p;
> + len -= p;
> +
> + sm3_block(sctx, sctx->buffer, 1, W);
> + }
> +
> + blocks = len / SM3_BLOCK_SIZE;
> + len %= SM3_BLOCK_SIZE;
> +
> + if (blocks) {
> + sm3_block(sctx, data, blocks, W);
> + data += blocks * SM3_BLOCK_SIZE;
> + }
> +
> + memzero_explicit(W, sizeof(W));
> +
> + partial = 0;
> + }
> + if (len)
> + memcpy(sctx->buffer + partial, data, len);
> +}
> +EXPORT_SYMBOL_GPL(sm3_update);
> +
> +void sm3_final(struct sm3_state *sctx, u8 *out)
> +{
> + const int bit_offset = SM3_BLOCK_SIZE - sizeof(u64);
> + __be64 *bits = (__be64 *)(sctx->buffer + bit_offset);
> + __be32 *digest = (__be32 *)out;
> + unsigned int partial = sctx->count % SM3_BLOCK_SIZE;
> + u32 W[16];
> + int i;
> +
> + sctx->buffer[partial++] = 0x80;
> + if (partial > bit_offset) {
> + memset(sctx->buffer + partial, 0, SM3_BLOCK_SIZE - partial);
> + partial = 0;
> +
> + sm3_block(sctx, sctx->buffer, 1, W);
> + }
> +
> + memset(sctx->buffer + partial, 0, bit_offset - partial);
> + *bits = cpu_to_be64(sctx->count << 3);
> + sm3_block(sctx, sctx->buffer, 1, W);
> +
> + for (i = 0; i < 8; i++)
> + put_unaligned_be32(sctx->state[i], digest++);
> +
> + /* Zeroize sensitive information. */
> + memzero_explicit(W, sizeof(W));
> + memzero_explicit(sctx, sizeof(*sctx));
> +}
> +EXPORT_SYMBOL_GPL(sm3_final);
> +
> +MODULE_DESCRIPTION("Generic SM3 library");
> +MODULE_LICENSE("GPL v2");
> --
> 2.32.0
>


--
Gilad Ben-Yossef
Chief Coffee Drinker

values of β will give rise to dom!

2021-12-22 13:11:48

by Tianjia Zhang

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] crypto: sm3 - create SM3 stand-alone library

Hi Gilad,

On 12/22/21 2:59 PM, Gilad Ben-Yossef wrote:
> On Wed, Dec 22, 2021 at 6:50 AM Tianjia Zhang
> <[email protected]> wrote:
>>
>> Stand-alone implementation of the SM3 algorithm. It is designed
>> to have as little dependencies as possible. In other cases you
>> should generally use the hash APIs from include/crypto/hash.h.
>> Especially when hashing large amounts of data as those APIs may
>> be hw-accelerated. In the new SM3 stand-alone library,
>> sm3_compress() has also been optimized, instead of simply using
>> the code in sm3_generic.
>>
>
> I have a really minor nitpick: the commit message talks about changes
> to sm3_compress() which was there in the original code but there is no
> such function in the current code which is in a different patch and
> file, so if you do another iteration for other reason, perhaps change
> the commit message to refer to sm3_transform() instead? it's not
> really important enough to warrant a new iteration on it's own...
>
> Otherwise, I'm not smart enough to evaluate the changes to
> sm3_transform() cryptographically but the overall approach of moving
> to a standalone library seems sane to me.
>
> So, for what it's worth -
>
> Reviewed-by: Gilad Ben-Yossef <[email protected]>
>
> Gilad
>

Thanks for your suggestion. I agree with you. In the implementation of
sm3_generic, sm3_compress() is a sub-function of sm3_transform(). The
optimization is indeed for sm3_transform(). I will fix it in the next patch.

Best regards,
Tianjia

2021-12-31 07:05:37

by liulongfang

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] crypto: arm64/sm3-ce - make dependent on sm3 library

On 2021/12/22 12:50, Tianjia Zhang Wrote:
> SM3 generic library is stand-alone implementation, sm3-ce can depend
> on the SM3 library instead of sm3-generic.
>
> Signed-off-by: Tianjia Zhang <[email protected]>
> ---
> arch/arm64/crypto/Kconfig | 2 +-
> arch/arm64/crypto/sm3-ce-glue.c | 20 ++++++++++++++------
> 2 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
> index addfa413650b..2a965aa0188d 100644
> --- a/arch/arm64/crypto/Kconfig
> +++ b/arch/arm64/crypto/Kconfig
> @@ -45,7 +45,7 @@ config CRYPTO_SM3_ARM64_CE
> tristate "SM3 digest algorithm (ARMv8.2 Crypto Extensions)"
> depends on KERNEL_MODE_NEON
> select CRYPTO_HASH
> - select CRYPTO_SM3
> + select CRYPTO_LIB_SM3
>
> config CRYPTO_SM4_ARM64_CE
> tristate "SM4 symmetric cipher (ARMv8.2 Crypto Extensions)"
> diff --git a/arch/arm64/crypto/sm3-ce-glue.c b/arch/arm64/crypto/sm3-ce-glue.c
> index d71faca322f2..3198f31c9446 100644
> --- a/arch/arm64/crypto/sm3-ce-glue.c
> +++ b/arch/arm64/crypto/sm3-ce-glue.c
> @@ -27,7 +27,7 @@ static int sm3_ce_update(struct shash_desc *desc, const u8 *data,
> unsigned int len)
> {
> if (!crypto_simd_usable())
> - return crypto_sm3_update(desc, data, len);
> + return sm3_update(shash_desc_ctx(desc), data, len);
>
> kernel_neon_begin();
> sm3_base_do_update(desc, data, len, sm3_ce_transform);
> @@ -39,7 +39,7 @@ static int sm3_ce_update(struct shash_desc *desc, const u8 *data,
> static int sm3_ce_final(struct shash_desc *desc, u8 *out)
> {
> if (!crypto_simd_usable())
> - return crypto_sm3_finup(desc, NULL, 0, out);
> + return sm3_final(shash_desc_ctx(desc), out);
>
> kernel_neon_begin();
> sm3_base_do_finalize(desc, sm3_ce_transform);
> @@ -51,14 +51,22 @@ static int sm3_ce_final(struct shash_desc *desc, u8 *out)
> static int sm3_ce_finup(struct shash_desc *desc, const u8 *data,
> unsigned int len, u8 *out)
> {
> - if (!crypto_simd_usable())
> - return crypto_sm3_finup(desc, data, len, out);
> + if (!crypto_simd_usable()) {
> + struct sm3_state *sctx = shash_desc_ctx(desc);
> +
> + if (len)
> + sm3_update(sctx, data, len);
> + sm3_final(sctx, out);
> + return 0;
> + }
>
> kernel_neon_begin();
> - sm3_base_do_update(desc, data, len, sm3_ce_transform);
> + if (len)
> + sm3_base_do_update(desc, data, len, sm3_ce_transform);
> + sm3_base_do_finalize(desc, sm3_ce_transform);
> kernel_neon_end();
>
> - return sm3_ce_final(desc, out);
> + return sm3_base_finish(desc, out);
> }
>
> static struct shash_alg sm3_alg = {
>You have modified the implementation of SM3 algorithm, so what benefits will be gained
after such modification?
What flaws are solved or can performance be improved?
Thanks.
Longfang.

2021-12-31 11:01:04

by Tianjia Zhang

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] crypto: arm64/sm3-ce - make dependent on sm3 library

Hi

On 12/31/21 3:05 PM, liulongfang wrote:
> On 2021/12/22 12:50, Tianjia Zhang Wrote:
>> SM3 generic library is stand-alone implementation, sm3-ce can depend
>> on the SM3 library instead of sm3-generic.
>>
>> Signed-off-by: Tianjia Zhang <[email protected]>
>> ---
>> arch/arm64/crypto/Kconfig | 2 +-
>> arch/arm64/crypto/sm3-ce-glue.c | 20 ++++++++++++++------
>> 2 files changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
>> index addfa413650b..2a965aa0188d 100644
>> --- a/arch/arm64/crypto/Kconfig
>> +++ b/arch/arm64/crypto/Kconfig
>> @@ -45,7 +45,7 @@ config CRYPTO_SM3_ARM64_CE
>> tristate "SM3 digest algorithm (ARMv8.2 Crypto Extensions)"
>> depends on KERNEL_MODE_NEON
>> select CRYPTO_HASH
>> - select CRYPTO_SM3
>> + select CRYPTO_LIB_SM3
>>
>> config CRYPTO_SM4_ARM64_CE
>> tristate "SM4 symmetric cipher (ARMv8.2 Crypto Extensions)"
>> diff --git a/arch/arm64/crypto/sm3-ce-glue.c b/arch/arm64/crypto/sm3-ce-glue.c
>> index d71faca322f2..3198f31c9446 100644
>> --- a/arch/arm64/crypto/sm3-ce-glue.c
>> +++ b/arch/arm64/crypto/sm3-ce-glue.c
>> @@ -27,7 +27,7 @@ static int sm3_ce_update(struct shash_desc *desc, const u8 *data,
>> unsigned int len)
>> {
>> if (!crypto_simd_usable())
>> - return crypto_sm3_update(desc, data, len);
>> + return sm3_update(shash_desc_ctx(desc), data, len);
>>
>> kernel_neon_begin();
>> sm3_base_do_update(desc, data, len, sm3_ce_transform);
>> @@ -39,7 +39,7 @@ static int sm3_ce_update(struct shash_desc *desc, const u8 *data,
>> static int sm3_ce_final(struct shash_desc *desc, u8 *out)
>> {
>> if (!crypto_simd_usable())
>> - return crypto_sm3_finup(desc, NULL, 0, out);
>> + return sm3_final(shash_desc_ctx(desc), out);
>>
>> kernel_neon_begin();
>> sm3_base_do_finalize(desc, sm3_ce_transform);
>> @@ -51,14 +51,22 @@ static int sm3_ce_final(struct shash_desc *desc, u8 *out)
>> static int sm3_ce_finup(struct shash_desc *desc, const u8 *data,
>> unsigned int len, u8 *out)
>> {
>> - if (!crypto_simd_usable())
>> - return crypto_sm3_finup(desc, data, len, out);
>> + if (!crypto_simd_usable()) {
>> + struct sm3_state *sctx = shash_desc_ctx(desc);
>> +
>> + if (len)
>> + sm3_update(sctx, data, len);
>> + sm3_final(sctx, out);
>> + return 0;
>> + }
>>
>> kernel_neon_begin();
>> - sm3_base_do_update(desc, data, len, sm3_ce_transform);
>> + if (len)
>> + sm3_base_do_update(desc, data, len, sm3_ce_transform);
>> + sm3_base_do_finalize(desc, sm3_ce_transform);
>> kernel_neon_end();
>>
>> - return sm3_ce_final(desc, out);
>> + return sm3_base_finish(desc, out);
>> }
>>
>> static struct shash_alg sm3_alg = {
>> You have modified the implementation of SM3 algorithm, so what benefits will be gained
> after such modification?
> What flaws are solved or can performance be improved?
> Thanks.
> Longfang.

This modification does not bring obvious performance improvement, but
makes the code logic more reasonable in terms of architecture and
calling level. The calling relationship before modification is:
sm3-ce -> sm3-generic -> sm3-lib,
after this modification is: sm3-ce -> sm3-lib.

Best regards,
Tianjia