Subject: [RFC PATCH 0/7] crypto: x86 - fix RCU stalls

This series attempts to fix the RCU stalls triggered
by the x86 crypto drivers discussed in
https://lore.kernel.org/all/MW5PR84MB18426EBBA3303770A8BC0BDFAB759@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/

The following x86 drivers are enforcing a 4 KiB limit today,
using either the SZ_4K macro or a direct reference to 4096 bytes:
blake-2s, chacha, nhpoly1305, poly1305, polyval
I've included a patch to make them use the same macro name.

These are not currently limited, so I've included patches for
them:
sha*, crc*, sm3, ghash

I originally encountered some RCU stalls with tcrypt in aesni:
tcrypt: testing encryption speed of sync skcipher cts(cbc(aes)) using cts(cbc(aes-aesni))
tcrypt: testing encryption speed of sync skcipher cfb(aes) using cfb(aes-aesni)
tcrypt: testing decryption speed of sync skcipher cfb(aes) using cfb(aes-aesni)
but I don't see any problems in the source code. So, no patch
is proposed for that driver yet.

With various errors inserted, all the drivers failed self-tests or
hung boot, so the changes seem functionally correct. I haven't done
comprehensive tests of different data sizes and alignments, so
please consider this an RFC.

I added some counters (not posted) to the drivers to observe
their behavior. During boot, the finup function is actually
called much more often than update - 1500 calls for 2 GiB via
finup vs. 23 KiB via update. The patch breaks that into half
a million 4 KiB chunks.

/sys/module/sha512_ssse3/parameters/rob_call_finup:1541
/sys/module/sha512_ssse3/parameters/rob_call_finup_fpu:469325
/sys/module/sha512_ssse3/parameters/rob_call_update:174
/sys/module/sha512_ssse3/parameters/rob_call_update_fpu:32
/sys/module/sha512_ssse3/parameters/rob_len_finup:2123048456
/sys/module/sha512_ssse3/parameters/rob_len_update:24120


Robert Elliott (7):
rcu: correct CONFIG_EXT_RCU_CPU_STALL_TIMEOUT descriptions
crypto: x86/sha - limit FPU preemption
crypto: x86/crc - limit FPU preemption
crypto: x86/sm3 - limit FPU preemption
crypto: x86/ghash - restructure FPU context saving
crypto: x86/ghash - limit FPU preemption
crypto: x86 - use common macro for FPU limit

Documentation/RCU/stallwarn.rst | 9 +++---
arch/x86/crypto/blake2s-glue.c | 7 +++--
arch/x86/crypto/chacha_glue.c | 4 ++-
arch/x86/crypto/crc32-pclmul_glue.c | 18 ++++++++---
arch/x86/crypto/crc32c-intel_glue.c | 32 ++++++++++++++++----
arch/x86/crypto/crct10dif-pclmul_glue.c | 32 ++++++++++++++++----
arch/x86/crypto/ghash-clmulni-intel_glue.c | 32 +++++++++++++++-----
arch/x86/crypto/nhpoly1305-avx2-glue.c | 3 +-
arch/x86/crypto/nhpoly1305-sse2-glue.c | 4 ++-
arch/x86/crypto/poly1305_glue.c | 25 +++++++++-------
arch/x86/crypto/polyval-clmulni_glue.c | 5 ++--
arch/x86/crypto/sha1_ssse3_glue.c | 34 +++++++++++++++++----
arch/x86/crypto/sha256_ssse3_glue.c | 35 ++++++++++++++++++----
arch/x86/crypto/sha512_ssse3_glue.c | 35 ++++++++++++++++++----
arch/x86/crypto/sm3_avx_glue.c | 28 +++++++++++++----
kernel/rcu/Kconfig.debug | 2 +-
16 files changed, 237 insertions(+), 68 deletions(-)

--
2.37.3


Subject: [RFC PATCH 1/7] rcu: correct CONFIG_EXT_RCU_CPU_STALL_TIMEOUT descriptions

Make the descriptions of CONFIG_EXT_RCU_CPU_STALL_TIMEOUT
match the code:

- there is no longer a default of 20 ms for Android since
commit 1045a06724f3 ("remove CONFIG_ANDROID"),

- the code includes a maximum of 21 seconds, evident when
specifying 0 which means to use the CONFIG_RCU_STALL_TIMEOUT
value (whose default is 60 seconds).

Example .config:
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0

leads to:
/sys/module/rcupdate/parameters/rcu_cpu_stall_timeout:60
/sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout:21000

Fixes: 1045a06724f3 ("remove CONFIG_ANDROID")
Signed-off-by: Robert Elliott <[email protected]>
---
Documentation/RCU/stallwarn.rst | 9 +++++----
kernel/rcu/Kconfig.debug | 2 +-
2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index e38c587067fc..d86a8b47504f 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -168,10 +168,11 @@ CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
Same as the CONFIG_RCU_CPU_STALL_TIMEOUT parameter but only for
the expedited grace period. This parameter defines the period
of time that RCU will wait from the beginning of an expedited
- grace period until it issues an RCU CPU stall warning. This time
- period is normally 20 milliseconds on Android devices. A zero
- value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be used,
- after conversion to milliseconds.
+ grace period until it issues an RCU CPU stall warning.
+
+ A zero value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be
+ used, after conversion to milliseconds, limited to a maximum of
+ 21 seconds.

This configuration parameter may be changed at runtime via the
/sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout, however
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 1b0c41d490f0..4477eeb8a54f 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -93,7 +93,7 @@ config RCU_EXP_CPU_STALL_TIMEOUT
If the RCU grace period persists, additional CPU stall warnings
are printed at more widely spaced intervals. A value of zero
says to use the RCU_CPU_STALL_TIMEOUT value converted from
- seconds to milliseconds.
+ seconds to milliseconds, limited to a maximum of 21 seconds.

config RCU_TRACE
bool "Enable tracing for RCU"
--
2.37.3

Subject: [RFC PATCH 3/7] crypto: x86/crc - limit FPU preemption

As done by the ECB and CBC helpers in arch/x86/crypt/ecb_cbc_helpers.h,
limit the number of bytes processed between kernel_fpu_begin() and
kernel_fpu_end() calls.

Those functions call preempt_disable() and preempt_enable(), so
the CPU core is unavailable for scheduling while running, leading to:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {12-... } 22 jiffies s: 277 root: 0x1/.

Fixes: 78c37d191dd6 ("crypto: crc32 - add crc32 pclmulqdq implementation and wrappers for table implementation")
Fixes: 6a8ce1ef3940 ("crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction")
Fixes: 0b95a7f85718 ("crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform")
Suggested-by: Herbert Xu <[email protected]>
Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/crc32-pclmul_glue.c | 18 ++++++++++----
arch/x86/crypto/crc32c-intel_glue.c | 32 ++++++++++++++++++++-----
arch/x86/crypto/crct10dif-pclmul_glue.c | 32 ++++++++++++++++++++-----
3 files changed, 66 insertions(+), 16 deletions(-)

diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c
index 288200fe7b4e..7cf65dc726c4 100644
--- a/arch/x86/crypto/crc32-pclmul_glue.c
+++ b/arch/x86/crypto/crc32-pclmul_glue.c
@@ -49,6 +49,8 @@
#define SCALE_F 16L /* size of xmm register */
#define SCALE_F_MASK (SCALE_F - 1)

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
u32 crc32_pclmul_le_16(unsigned char const *buffer, size_t len, u32 crc32);

static u32 __attribute__((pure))
@@ -57,6 +59,7 @@ static u32 __attribute__((pure))
unsigned int iquotient;
unsigned int iremainder;
unsigned int prealign;
+ unsigned int chunk;

if (len < PCLMUL_MIN_LEN + SCALE_F_MASK || !crypto_simd_usable())
return crc32_le(crc, p, len);
@@ -73,12 +76,19 @@ static u32 __attribute__((pure))
iquotient = len & (~SCALE_F_MASK);
iremainder = len & SCALE_F_MASK;

- kernel_fpu_begin();
- crc = crc32_pclmul_le_16(p, iquotient, crc);
- kernel_fpu_end();
+ do {
+ chunk = min(iquotient, FPU_BYTES);
+ iquotient -= chunk;
+
+ kernel_fpu_begin();
+ crc = crc32_pclmul_le_16(p, chunk, crc);
+ kernel_fpu_end();
+
+ p += chunk;
+ } while (iquotient);

if (iremainder)
- crc = crc32_le(crc, p + iquotient, iremainder);
+ crc = crc32_le(crc, p, iremainder);

return crc;
}
diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index c5c965b694c6..b277c215f0fb 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -44,6 +44,8 @@
*/
#define CRC32C_PCL_BREAKEVEN 512

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
asmlinkage unsigned int crc_pcl(const u8 *buffer, int len,
unsigned int crc_init);
#endif /* CONFIG_X86_64 */
@@ -155,15 +157,23 @@ static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
u32 *crcp = shash_desc_ctx(desc);
+ unsigned int chunk;

/*
* use faster PCL version if datasize is large enough to
* overcome kernel fpu state save/restore overhead
*/
if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) {
- kernel_fpu_begin();
- *crcp = crc_pcl(data, len, *crcp);
- kernel_fpu_end();
+ do {
+ chunk = min(len, FPU_BYTES);
+ len -= chunk;
+
+ kernel_fpu_begin();
+ *crcp = crc_pcl(data, chunk, *crcp);
+ kernel_fpu_end();
+
+ data += chunk;
+ } while (len);
} else
*crcp = crc32c_intel_le_hw(*crcp, data, len);
return 0;
@@ -172,10 +182,20 @@ static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data,
static int __crc32c_pcl_intel_finup(u32 *crcp, const u8 *data, unsigned int len,
u8 *out)
{
+ unsigned int chunk;
+
if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) {
- kernel_fpu_begin();
- *(__le32 *)out = ~cpu_to_le32(crc_pcl(data, len, *crcp));
- kernel_fpu_end();
+ do {
+ chunk = min(len, FPU_BYTES);
+ len -= chunk;
+
+ kernel_fpu_begin();
+ *crcp = crc_pcl(data, chunk, *crcp);
+ kernel_fpu_end();
+
+ data += chunk;
+ } while (len);
+ *(__le32 *)out = ~cpu_to_le32(*crcp);
} else
*(__le32 *)out =
~cpu_to_le32(crc32c_intel_le_hw(*crcp, data, len));
diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c
index 7c5a32282d51..bcd362df6b62 100644
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c
+++ b/arch/x86/crypto/crct10dif-pclmul_glue.c
@@ -36,6 +36,8 @@
#include <asm/cpu_device_id.h>
#include <asm/simd.h>

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
asmlinkage u16 crc_t10dif_pcl(u16 init_crc, const u8 *buf, size_t len);

struct chksum_desc_ctx {
@@ -55,11 +57,19 @@ static int chksum_update(struct shash_desc *desc, const u8 *data,
unsigned int length)
{
struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
+ unsigned int chunk;

if (length >= 16 && crypto_simd_usable()) {
- kernel_fpu_begin();
- ctx->crc = crc_t10dif_pcl(ctx->crc, data, length);
- kernel_fpu_end();
+ do {
+ chunk = min(length, FPU_BYTES);
+ length -= chunk;
+
+ kernel_fpu_begin();
+ ctx->crc = crc_t10dif_pcl(ctx->crc, data, chunk);
+ kernel_fpu_end();
+
+ data += chunk;
+ } while (length);
} else
ctx->crc = crc_t10dif_generic(ctx->crc, data, length);
return 0;
@@ -75,10 +85,20 @@ static int chksum_final(struct shash_desc *desc, u8 *out)

static int __chksum_finup(__u16 crc, const u8 *data, unsigned int len, u8 *out)
{
+ unsigned int chunk;
+
if (len >= 16 && crypto_simd_usable()) {
- kernel_fpu_begin();
- *(__u16 *)out = crc_t10dif_pcl(crc, data, len);
- kernel_fpu_end();
+ do {
+ chunk = min(len, FPU_BYTES);
+ len -= chunk;
+
+ kernel_fpu_begin();
+ crc = crc_t10dif_pcl(crc, data, chunk);
+ kernel_fpu_end();
+
+ data += chunk;
+ } while (len);
+ *(__u16 *)out = crc;
} else
*(__u16 *)out = crc_t10dif_generic(crc, data, len);
return 0;
--
2.37.3

Subject: [PATCH v2 00/19] crypto: x86 - fix RCU stalls

This series fixes the RCU stalls triggered by the x86 crypto
modules discussed in
https://lore.kernel.org/all/MW5PR84MB18426EBBA3303770A8BC0BDFAB759@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/

Two root causes were:
- too much data processed between kernel_fpu_begin and
kernel_fpu_end calls (which are heavily used by the x86
optimized drivers)
- tcrypt not calling cond_resched during speed test loops

These problems have always been lurking, but improving the
loading of the x86/sha512 module led to it happening a lot
during boot when using SHA-512 for module signature checking.

Fixing these problems makes it safer to improve loading
the rest of the x86 modules like the sha512 module.

This series only handles the x86 modules.

Testing
=======
The most effective testing was by enabling
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y

which creates random test vectors and compares the results
of the CPU-optimized function to the generic function,
and running two threads of repeated modprobe commands
to exercise those tests:
watch -n 0 modprobe tcrypt mode=200
watch -n 0 ./tcrypt_sweep

where tcrypt_sweep walks through all the test modes:
#!/usr/bin/perl
use strict;

my @modes;

open SOURCE, "<", "/home/me/linux/crypto/tcrypt.c" or die $!;
while (<SOURCE>) {
if (/^\s+case ([0-9]+):$/) {
push @modes, $1;
}
}
close SOURCE;

foreach (@modes) {
print "$_ ";

# don't run mode 300, which runs 301-399
# don't run mode 400, which runs 401-499
if (($_ eq "0") || ($_ eq "300") || ($_ eq "400")) {
system "echo \"===== Skipping special modprobe tcrypt mode=$_\" > /dev/kmsg";
} else {
system "echo \"Running modprobe tcrypt mode=$_\" > /dev/kmsg";
system "modprobe tcrypt mode=$_";
}
}



Robert Elliott (19):
crypto: tcrypt - test crc32
crypto: tcrypt - test nhpoly1305
crypto: tcrypt - reschedule during cycles speed tests
crypto: x86/sha - limit FPU preemption
crypto: x86/crc - limit FPU preemption
crypto: x86/sm3 - limit FPU preemption
crypto: x86/ghash - restructure FPU context saving
crypto: x86/ghash - limit FPU preemption
crypto: x86 - use common macro for FPU limit
crypto: x86/sha1, sha256 - load based on CPU features
crypto: x86/crc - load based on CPU features
crypto: x86/sm3 - load based on CPU features
crypto: x86/ghash - load based on CPU features
crypto: x86 - load based on CPU features
crypto: x86 - add pr_fmt to all modules
crypto: x86 - print CPU optimized loaded messages
crypto: x86 - standardize suboptimal prints
crypto: x86 - standardize not loaded prints
crypto: x86/sha - register only the best function

arch/x86/crypto/aegis128-aesni-glue.c | 21 ++-
arch/x86/crypto/aesni-intel_glue.c | 31 ++--
arch/x86/crypto/aria_aesni_avx_glue.c | 19 +-
arch/x86/crypto/blake2s-glue.c | 34 +++-
arch/x86/crypto/blowfish_glue.c | 19 +-
arch/x86/crypto/camellia_aesni_avx2_glue.c | 25 ++-
arch/x86/crypto/camellia_aesni_avx_glue.c | 24 ++-
arch/x86/crypto/camellia_glue.c | 20 ++-
arch/x86/crypto/cast5_avx_glue.c | 21 ++-
arch/x86/crypto/cast6_avx_glue.c | 21 ++-
arch/x86/crypto/chacha_glue.c | 35 +++-
arch/x86/crypto/crc32-pclmul_asm.S | 6 +-
arch/x86/crypto/crc32-pclmul_glue.c | 37 ++--
arch/x86/crypto/crc32c-intel_glue.c | 51 ++++--
arch/x86/crypto/crct10dif-pclmul_glue.c | 54 ++++--
arch/x86/crypto/curve25519-x86_64.c | 27 ++-
arch/x86/crypto/des3_ede_glue.c | 16 +-
arch/x86/crypto/ghash-clmulni-intel_glue.c | 40 +++--
arch/x86/crypto/nhpoly1305-avx2-glue.c | 27 ++-
arch/x86/crypto/nhpoly1305-sse2-glue.c | 23 ++-
arch/x86/crypto/poly1305_glue.c | 64 +++++--
arch/x86/crypto/polyval-clmulni_glue.c | 14 +-
arch/x86/crypto/serpent_avx2_glue.c | 25 ++-
arch/x86/crypto/serpent_avx_glue.c | 21 ++-
arch/x86/crypto/serpent_sse2_glue.c | 19 +-
arch/x86/crypto/sha1_ssse3_glue.c | 188 +++++++++++--------
arch/x86/crypto/sha256_ssse3_glue.c | 198 ++++++++++++---------
arch/x86/crypto/sha512_ssse3_glue.c | 154 +++++++++-------
arch/x86/crypto/sm3_avx_glue.c | 52 +++++-
arch/x86/crypto/sm4_aesni_avx2_glue.c | 25 ++-
arch/x86/crypto/sm4_aesni_avx_glue.c | 23 ++-
arch/x86/crypto/twofish_avx_glue.c | 25 ++-
arch/x86/crypto/twofish_glue.c | 19 +-
arch/x86/crypto/twofish_glue_3way.c | 26 ++-
crypto/tcrypt.c | 56 +++---
35 files changed, 1060 insertions(+), 400 deletions(-)

--
2.37.3

Subject: [PATCH v2 03/19] crypto: tcrypt - reschedule during cycles speed tests

commit 2af632996b89 ("crypto: tcrypt - reschedule during speed tests")
added cond_resched() calls to "Avoid RCU stalls in the case of
non-preemptible kernel and lengthy speed tests by rescheduling when
advancing from one block size to another."

It only makes those calls if the sec module parameter is used
(run the speed test for a certain number of seconds), not the
default "cycles" mode.

Expand those to also run in "cycles" mode to reduce the rate
of rcu stall warnings:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks:

Suggested-by: Herbert Xu <[email protected]>
Tested-by: Taehee Yoo <[email protected]>
Signed-off-by: Robert Elliott <[email protected]>
---
crypto/tcrypt.c | 44 ++++++++++++++++++--------------------------
1 file changed, 18 insertions(+), 26 deletions(-)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 7a6a56751043..c025ba26b663 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -408,14 +408,13 @@ static void test_mb_aead_speed(const char *algo, int enc, int secs,

}

- if (secs) {
+ if (secs)
ret = test_mb_aead_jiffies(data, enc, bs,
secs, num_mb);
- cond_resched();
- } else {
+ else
ret = test_mb_aead_cycles(data, enc, bs,
num_mb);
- }
+ cond_resched();

if (ret) {
pr_err("%s() failed return code=%d\n", e, ret);
@@ -661,13 +660,11 @@ static void test_aead_speed(const char *algo, int enc, unsigned int secs,
bs + (enc ? 0 : authsize),
iv);

- if (secs) {
- ret = test_aead_jiffies(req, enc, bs,
- secs);
- cond_resched();
- } else {
+ if (secs)
+ ret = test_aead_jiffies(req, enc, bs, secs);
+ else
ret = test_aead_cycles(req, enc, bs);
- }
+ cond_resched();

if (ret) {
pr_err("%s() failed return code=%d\n", e, ret);
@@ -917,14 +914,13 @@ static void test_ahash_speed_common(const char *algo, unsigned int secs,

ahash_request_set_crypt(req, sg, output, speed[i].plen);

- if (secs) {
+ if (secs)
ret = test_ahash_jiffies(req, speed[i].blen,
speed[i].plen, output, secs);
- cond_resched();
- } else {
+ else
ret = test_ahash_cycles(req, speed[i].blen,
speed[i].plen, output);
- }
+ cond_resched();

if (ret) {
pr_err("hashing failed ret=%d\n", ret);
@@ -1184,15 +1180,14 @@ static void test_mb_skcipher_speed(const char *algo, int enc, int secs,
cur->sg, bs, iv);
}

- if (secs) {
+ if (secs)
ret = test_mb_acipher_jiffies(data, enc,
bs, secs,
num_mb);
- cond_resched();
- } else {
+ else
ret = test_mb_acipher_cycles(data, enc,
bs, num_mb);
- }
+ cond_resched();

if (ret) {
pr_err("%s() failed flags=%x\n", e,
@@ -1401,14 +1396,11 @@ static void test_skcipher_speed(const char *algo, int enc, unsigned int secs,

skcipher_request_set_crypt(req, sg, sg, bs, iv);

- if (secs) {
- ret = test_acipher_jiffies(req, enc,
- bs, secs);
- cond_resched();
- } else {
- ret = test_acipher_cycles(req, enc,
- bs);
- }
+ if (secs)
+ ret = test_acipher_jiffies(req, enc, bs, secs);
+ else
+ ret = test_acipher_cycles(req, enc, bs);
+ cond_resched();

if (ret) {
pr_err("%s() failed flags=%x\n", e,
--
2.37.3

Subject: [PATCH v2 01/19] crypto: tcrypt - test crc32

Add self-test and speed tests for crc32, paralleling those
offered for crc32c and crct10dif.

Signed-off-by: Robert Elliott <[email protected]>
---
crypto/tcrypt.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index a82679b576bb..4426386dfb42 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1711,6 +1711,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
ret += tcrypt_test("gcm(aria)");
break;

+ case 59:
+ ret += tcrypt_test("crc32");
+ break;
+
case 100:
ret += tcrypt_test("hmac(md5)");
break;
@@ -2317,6 +2321,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
generic_hash_speed_template);
if (mode > 300 && mode < 400) break;
fallthrough;
+ case 329:
+ test_hash_speed("crc32", sec, generic_hash_speed_template);
+ if (mode > 300 && mode < 400) break;
+ fallthrough;
case 399:
break;

--
2.37.3

Subject: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit

Use a common macro name (FPU_BYTES) for the limit of the number of bytes
processed within kernel_fpu_begin and kernel_fpu_end rather than
using SZ_4K (which is a signed value), or a magic value of 4096U.

Use unsigned int rather than size_t for some of the arguments to
avoid typecasting for the min() macro.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/blake2s-glue.c | 7 ++++---
arch/x86/crypto/chacha_glue.c | 4 +++-
arch/x86/crypto/nhpoly1305-avx2-glue.c | 4 +++-
arch/x86/crypto/nhpoly1305-sse2-glue.c | 4 +++-
arch/x86/crypto/poly1305_glue.c | 25 ++++++++++++++-----------
arch/x86/crypto/polyval-clmulni_glue.c | 5 +++--
6 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c
index aaba21230528..3054ee7fa219 100644
--- a/arch/x86/crypto/blake2s-glue.c
+++ b/arch/x86/crypto/blake2s-glue.c
@@ -16,6 +16,8 @@
#include <asm/processor.h>
#include <asm/simd.h>

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
asmlinkage void blake2s_compress_ssse3(struct blake2s_state *state,
const u8 *block, const size_t nblocks,
const u32 inc);
@@ -29,8 +31,7 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(blake2s_use_avx512);
void blake2s_compress(struct blake2s_state *state, const u8 *block,
size_t nblocks, const u32 inc)
{
- /* SIMD disables preemption, so relax after processing each page. */
- BUILD_BUG_ON(SZ_4K / BLAKE2S_BLOCK_SIZE < 8);
+ BUILD_BUG_ON(FPU_BYTES / BLAKE2S_BLOCK_SIZE < 8);

if (!static_branch_likely(&blake2s_use_ssse3) || !may_use_simd()) {
blake2s_compress_generic(state, block, nblocks, inc);
@@ -39,7 +40,7 @@ void blake2s_compress(struct blake2s_state *state, const u8 *block,

do {
const size_t blocks = min_t(size_t, nblocks,
- SZ_4K / BLAKE2S_BLOCK_SIZE);
+ FPU_BYTES / BLAKE2S_BLOCK_SIZE);

kernel_fpu_begin();
if (IS_ENABLED(CONFIG_AS_AVX512) &&
diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index 7b3a1cf0984b..0d7e172862db 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -15,6 +15,8 @@
#include <linux/sizes.h>
#include <asm/simd.h>

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
asmlinkage void chacha_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src,
unsigned int len, int nrounds);
asmlinkage void chacha_4block_xor_ssse3(u32 *state, u8 *dst, const u8 *src,
@@ -147,7 +149,7 @@ void chacha_crypt_arch(u32 *state, u8 *dst, const u8 *src, unsigned int bytes,
return chacha_crypt_generic(state, dst, src, bytes, nrounds);

do {
- unsigned int todo = min_t(unsigned int, bytes, SZ_4K);
+ unsigned int todo = min(bytes, FPU_BYTES);

kernel_fpu_begin();
chacha_dosimd(state, dst, src, todo, nrounds);
diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index 8ea5ab0f1ca7..59615ae95e86 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -13,6 +13,8 @@
#include <linux/sizes.h>
#include <asm/simd.h>

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
asmlinkage void nh_avx2(const u32 *key, const u8 *message, size_t message_len,
u8 hash[NH_HASH_BYTES]);

@@ -30,7 +32,7 @@ static int nhpoly1305_avx2_update(struct shash_desc *desc,
return crypto_nhpoly1305_update(desc, src, srclen);

do {
- unsigned int n = min_t(unsigned int, srclen, SZ_4K);
+ unsigned int n = min(srclen, FPU_BYTES);

kernel_fpu_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_avx2);
diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c
index 2b353d42ed13..bf91c375821a 100644
--- a/arch/x86/crypto/nhpoly1305-sse2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c
@@ -13,6 +13,8 @@
#include <linux/sizes.h>
#include <asm/simd.h>

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
asmlinkage void nh_sse2(const u32 *key, const u8 *message, size_t message_len,
u8 hash[NH_HASH_BYTES]);

@@ -30,7 +32,7 @@ static int nhpoly1305_sse2_update(struct shash_desc *desc,
return crypto_nhpoly1305_update(desc, src, srclen);

do {
- unsigned int n = min_t(unsigned int, srclen, SZ_4K);
+ unsigned int n = min(srclen, FPU_BYTES);

kernel_fpu_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_sse2);
diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
index 1dfb8af48a3c..3764301bdf1b 100644
--- a/arch/x86/crypto/poly1305_glue.c
+++ b/arch/x86/crypto/poly1305_glue.c
@@ -15,20 +15,24 @@
#include <asm/intel-family.h>
#include <asm/simd.h>

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
asmlinkage void poly1305_init_x86_64(void *ctx,
const u8 key[POLY1305_BLOCK_SIZE]);
asmlinkage void poly1305_blocks_x86_64(void *ctx, const u8 *inp,
- const size_t len, const u32 padbit);
+ const unsigned int len,
+ const u32 padbit);
asmlinkage void poly1305_emit_x86_64(void *ctx, u8 mac[POLY1305_DIGEST_SIZE],
const u32 nonce[4]);
asmlinkage void poly1305_emit_avx(void *ctx, u8 mac[POLY1305_DIGEST_SIZE],
const u32 nonce[4]);
-asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp, const size_t len,
- const u32 padbit);
-asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp, const size_t len,
- const u32 padbit);
+asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp,
+ const unsigned int len, const u32 padbit);
+asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp,
+ const unsigned int len, const u32 padbit);
asmlinkage void poly1305_blocks_avx512(void *ctx, const u8 *inp,
- const size_t len, const u32 padbit);
+ const unsigned int len,
+ const u32 padbit);

static __ro_after_init DEFINE_STATIC_KEY_FALSE(poly1305_use_avx);
static __ro_after_init DEFINE_STATIC_KEY_FALSE(poly1305_use_avx2);
@@ -86,14 +90,13 @@ static void poly1305_simd_init(void *ctx, const u8 key[POLY1305_BLOCK_SIZE])
poly1305_init_x86_64(ctx, key);
}

-static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len,
+static void poly1305_simd_blocks(void *ctx, const u8 *inp, unsigned int len,
const u32 padbit)
{
struct poly1305_arch_internal *state = ctx;

- /* SIMD disables preemption, so relax after processing each page. */
- BUILD_BUG_ON(SZ_4K < POLY1305_BLOCK_SIZE ||
- SZ_4K % POLY1305_BLOCK_SIZE);
+ BUILD_BUG_ON(FPU_BYTES < POLY1305_BLOCK_SIZE ||
+ FPU_BYTES % POLY1305_BLOCK_SIZE);

if (!static_branch_likely(&poly1305_use_avx) ||
(len < (POLY1305_BLOCK_SIZE * 18) && !state->is_base2_26) ||
@@ -104,7 +107,7 @@ static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len,
}

do {
- const size_t bytes = min_t(size_t, len, SZ_4K);
+ const unsigned int bytes = min(len, FPU_BYTES);

kernel_fpu_begin();
if (IS_ENABLED(CONFIG_AS_AVX512) && static_branch_likely(&poly1305_use_avx512))
diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c
index b7664d018851..2502964afef6 100644
--- a/arch/x86/crypto/polyval-clmulni_glue.c
+++ b/arch/x86/crypto/polyval-clmulni_glue.c
@@ -29,6 +29,8 @@

#define NUM_KEY_POWERS 8

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
struct polyval_tfm_ctx {
/*
* These powers must be in the order h^8, ..., h^1.
@@ -123,8 +125,7 @@ static int polyval_x86_update(struct shash_desc *desc,
}

while (srclen >= POLYVAL_BLOCK_SIZE) {
- /* Allow rescheduling every 4K bytes. */
- nblocks = min(srclen, 4096U) / POLYVAL_BLOCK_SIZE;
+ nblocks = min(srclen, FPU_BYTES) / POLYVAL_BLOCK_SIZE;
internal_polyval_update(tctx, src, nblocks, dctx->buffer);
srclen -= nblocks * POLYVAL_BLOCK_SIZE;
src += nblocks * POLYVAL_BLOCK_SIZE;
--
2.37.3

Subject: [PATCH v2 08/19] crypto: x86/ghash - limit FPU preemption

As done by the ECB and CBC helpers in arch/x86/crypt/ecb_cbc_helpers.h,
limit the number of bytes processed between kernel_fpu_begin() and
kernel_fpu_end() calls.

Those functions call preempt_disable() and preempt_enable(), so
the CPU core is unavailable for scheduling while running, leading to:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ...

Fixes: 0e1227d356e9 ("crypto: ghash - Add PCLMULQDQ accelerated implementation")
Suggested-by: Herbert Xu <[email protected]>
Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/ghash-clmulni-intel_glue.c | 26 ++++++++++++++++------
1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index 53aa286ec27f..a39fc405c7cf 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -23,6 +23,8 @@
#define GHASH_BLOCK_SIZE 16
#define GHASH_DIGEST_SIZE 16

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
void clmul_ghash_mul(char *dst, const u128 *shash);

void clmul_ghash_update(char *dst, const char *src, unsigned int srclen,
@@ -82,7 +84,7 @@ static int ghash_update(struct shash_desc *desc,

if (dctx->bytes) {
int n = min(srclen, dctx->bytes);
- u8 *pos = dst + (GHASH_BLOCK_SIZE - dctx->bytes);
+ u8 *pos = dst + GHASH_BLOCK_SIZE - dctx->bytes;

dctx->bytes -= n;
srclen -= n;
@@ -97,13 +99,23 @@ static int ghash_update(struct shash_desc *desc,
}
}

- kernel_fpu_begin();
- clmul_ghash_update(dst, src, srclen, &ctx->shash);
- kernel_fpu_end();
+ while (srclen >= GHASH_BLOCK_SIZE) {
+ unsigned int fpulen = min(srclen, FPU_BYTES);
+
+ kernel_fpu_begin();
+ while (fpulen >= GHASH_BLOCK_SIZE) {
+ int n = min_t(unsigned int, fpulen, GHASH_BLOCK_SIZE);
+
+ clmul_ghash_update(dst, src, n, &ctx->shash);
+
+ srclen -= n;
+ fpulen -= n;
+ src += n;
+ }
+ kernel_fpu_end();
+ }

- if (srclen & 0xf) {
- src += srclen - (srclen & 0xf);
- srclen &= 0xf;
+ if (srclen) {
dctx->bytes = GHASH_BLOCK_SIZE - srclen;
while (srclen--)
*dst++ ^= *src++;
--
2.37.3

Subject: [PATCH v2 13/19] crypto: x86/ghash - load based on CPU features

Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), these x86-optimized crypto modules already have
module aliases based on CPU feature bits:
ghash

Rename the unique device table data structure to a generic name
so the code has the same pattern in all the modules.

This commit covers modules that created rcu stall issues
due to kernel_fpu_begin/kernel_fpu_end calls.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/ghash-clmulni-intel_glue.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index a39fc405c7cf..69945e41bc41 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -327,17 +327,17 @@ static struct ahash_alg ghash_async_alg = {
},
};

-static const struct x86_cpu_id pcmul_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL), /* Pickle-Mickle-Duck */
{}
};
-MODULE_DEVICE_TABLE(x86cpu, pcmul_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init ghash_pclmulqdqni_mod_init(void)
{
int err;

- if (!x86_match_cpu(pcmul_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

err = crypto_register_shash(&ghash_alg);
--
2.37.3

Subject: [PATCH v2 10/19] crypto: x86/sha1, sha256 - load based on CPU features

Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), add module aliases for x86-optimized crypto modules:
sha1, sha256
based on CPU feature bits so udev gets a chance to load them later in
the boot process when the filesystems are all running.

This commit covers modules that created rcu stall issues
due to kernel_fpu_begin/kernel_fpu_end calls.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/sha1_ssse3_glue.c | 13 +++++++++++++
arch/x86/crypto/sha256_ssse3_glue.c | 13 +++++++++++++
2 files changed, 26 insertions(+)

diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index a9f5779b41ca..edffc33bd12e 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -24,6 +24,7 @@
#include <linux/types.h>
#include <crypto/sha1.h>
#include <crypto/sha1_base.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -310,6 +311,15 @@ static int register_sha1_ni(void)
return 0;
}

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_SHA_NI, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static void unregister_sha1_ni(void)
{
if (boot_cpu_has(X86_FEATURE_SHA_NI))
@@ -326,6 +336,9 @@ static int __init sha1_ssse3_mod_init(void)
if (register_sha1_ssse3())
goto fail;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (register_sha1_avx()) {
unregister_sha1_ssse3();
goto fail;
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 322c8aa907af..42e8cb1a6708 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -38,6 +38,7 @@
#include <crypto/sha2.h>
#include <crypto/sha256_base.h>
#include <linux/string.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -366,6 +367,15 @@ static struct shash_alg sha256_ni_algs[] = { {
}
} };

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_SHA_NI, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int register_sha256_ni(void)
{
if (boot_cpu_has(X86_FEATURE_SHA_NI))
@@ -388,6 +398,9 @@ static inline void unregister_sha256_ni(void) { }

static int __init sha256_ssse3_mod_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (register_sha256_ssse3())
goto fail;

--
2.37.3

Subject: [PATCH v2 12/19] crypto: x86/sm3 - load based on CPU features

Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), add module aliases for x86-optimized crypto modules:
sm3
based on CPU feature bits so udev gets a chance to load them later in
the boot process when the filesystems are all running.

This commit covers a module that created rcu stall issues
due to kernel_fpu_begin/kernel_fpu_end calls.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/sm3_avx_glue.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/crypto/sm3_avx_glue.c b/arch/x86/crypto/sm3_avx_glue.c
index ffb6d2f409ef..475b9637a06d 100644
--- a/arch/x86/crypto/sm3_avx_glue.c
+++ b/arch/x86/crypto/sm3_avx_glue.c
@@ -15,6 +15,7 @@
#include <linux/types.h>
#include <crypto/sm3.h>
#include <crypto/sm3_base.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -115,10 +116,19 @@ static struct shash_alg sm3_avx_alg = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init sm3_avx_mod_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_AVX)) {
pr_info("AVX instruction are not detected.\n");
return -ENODEV;
--
2.37.3

Subject: [PATCH v2 14/19] crypto: x86 - load based on CPU features

x86 optimized crypto modules built as modules rather than built-in
to the kernel end up as .ko files in the filesystem, e.g., in
/usr/lib/modules. If the filesystem itself is a module, these might
not be available when the crypto API is initialized, resulting in
the generic implementation being used (e.g., sha512_transform rather
than sha512_transform_avx2).

In one test case, CPU utilization in the sha512 function dropped
from 15.34% to 7.18% after forcing loading of the optimized module.

Set module aliases for x86 optimized crypto modules based on CPU
feature bits so udev gets a chance to load them later in the boot
process when the filesystems are all running.

For example, with sha256, sha512, aesni_intel, and blake2s configured
as built-in and the rest configured as modules:

[ 13.749145] sha256_ssse3: CPU-optimized crypto module loaded (SSSE3=no, AVX=no, AVX2=yes, SHA-NI=no)
[ 13.758502] sha512_ssse3: CPU-optimized crypto module loaded (SSSE3=no, AVX=no, AVX2=yes)
[ 13.766939] libblake2s_x86_64: CPU-optimized crypto module loaded (SSSE3=yes, AVX512=yes)
[ 16.794502] aesni_intel: CPU-optimized crypto module loaded (GCM SSE=no, AVX=yes, AVX2=yes)(CTR AVX=yes)
...
[ 18.160648] Run /init as init process
...
[ 20.073484] twofish_x86_64: CPU-optimized crypto module loaded
[ 23.974029] serpent_sse2_x86_64: CPU-optimized crypto module loaded
[ 24.080749] serpent_avx_x86_64: CPU-optimized crypto module loaded
[ 24.187148] serpent_avx2: CPU-optimized crypto module loaded
[ 24.358980] des3_ede_x86_64: CPU-optimized crypto module loaded
[ 24.459257] camellia_x86_64: CPU-optimized crypto module loaded
[ 24.548487] camellia_aesni_avx_x86_64: CPU-optimized crypto module loaded
[ 24.630777] camellia_aesni_avx2: CPU-optimized crypto module loaded
[ 24.957134] blowfish_x86_64: CPU-optimized crypto module loaded
[ 25.063537] aegis128_aesni: CPU-optimized crypto module loaded
[ 25.174560] chacha_x86_64: CPU-optimized crypto module loaded (AVX2=yes, AVX512=yes)
[ 25.270084] sha1_ssse3: CPU-optimized crypto module loaded (SSSE3=no, AVX=no, AVX2=yes, SHA-NI=no)
[ 25.531724] ghash_clmulni_intel: CPU-optimized crypto module loaded
[ 25.596316] crc32c_intel: CPU-optimized crypto module loaded (PCLMULQDQ=yes)
[ 25.661693] crc32_pclmul: CPU-optimized crypto module loaded
[ 25.696388] crct10dif_pclmul: CPU-optimized crypto module loaded
[ 25.742040] poly1305_x86_64: CPU-optimized crypto module loaded (AVX=yes, AVX2=yes, AVX512=no)
[ 25.841364] nhpoly1305_avx2: CPU-optimized crypto module loaded
[ 25.856401] curve25519_x86_64: CPU-optimized crypto module loaded (ADX=yes)
[ 25.866615] sm3_avx_x86_64: CPU-optimized crypto module loaded

This commit covers modules that did not create rcu stall issues
due to kernel_fpu_begin/kernel_fpu_end calls.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/aegis128-aesni-glue.c | 9 +++++++++
arch/x86/crypto/aesni-intel_glue.c | 7 +++----
arch/x86/crypto/blake2s-glue.c | 11 ++++++++++-
arch/x86/crypto/blowfish_glue.c | 10 ++++++++++
arch/x86/crypto/camellia_aesni_avx2_glue.c | 12 ++++++++++++
arch/x86/crypto/camellia_aesni_avx_glue.c | 11 +++++++++++
arch/x86/crypto/camellia_glue.c | 9 +++++++++
arch/x86/crypto/cast5_avx_glue.c | 10 ++++++++++
arch/x86/crypto/cast6_avx_glue.c | 10 ++++++++++
arch/x86/crypto/chacha_glue.c | 12 ++++++++++--
arch/x86/crypto/curve25519-x86_64.c | 12 +++++++++++-
arch/x86/crypto/des3_ede_glue.c | 10 ++++++++++
arch/x86/crypto/nhpoly1305-avx2-glue.c | 10 ++++++++++
arch/x86/crypto/nhpoly1305-sse2-glue.c | 10 ++++++++++
arch/x86/crypto/poly1305_glue.c | 12 ++++++++++++
arch/x86/crypto/serpent_avx2_glue.c | 10 ++++++++++
arch/x86/crypto/serpent_avx_glue.c | 10 ++++++++++
arch/x86/crypto/serpent_sse2_glue.c | 10 ++++++++++
arch/x86/crypto/sm4_aesni_avx2_glue.c | 12 ++++++++++++
arch/x86/crypto/sm4_aesni_avx_glue.c | 11 +++++++++++
arch/x86/crypto/twofish_avx_glue.c | 10 ++++++++++
arch/x86/crypto/twofish_glue.c | 10 ++++++++++
arch/x86/crypto/twofish_glue_3way.c | 10 ++++++++++
23 files changed, 230 insertions(+), 8 deletions(-)

diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c
index 4623189000d8..9e4ba031704d 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -263,10 +263,19 @@ static struct aead_alg crypto_aegis128_aesni_alg = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AES, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_aead_alg *simd_alg;

static int __init crypto_aegis128_aesni_module_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_XMM2) ||
!boot_cpu_has(X86_FEATURE_AES) ||
!cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..4a530a558436 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,7 +36,6 @@
#include <linux/spinlock.h>
#include <linux/static_call.h>

-
#define AESNI_ALIGN 16
#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
@@ -1228,17 +1227,17 @@ static struct aead_alg aesni_aeads[0];

static struct simd_aead_alg *aesni_simd_aeads[ARRAY_SIZE(aesni_aeads)];

-static const struct x86_cpu_id aesni_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_AES, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, aesni_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init aesni_init(void)
{
int err;

- if (!x86_match_cpu(aesni_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
#ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_AVX2)) {
diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c
index 3054ee7fa219..5153bb423dbe 100644
--- a/arch/x86/crypto/blake2s-glue.c
+++ b/arch/x86/crypto/blake2s-glue.c
@@ -10,7 +10,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sizes.h>
-
+#include <asm/cpu_device_id.h>
#include <asm/cpufeature.h>
#include <asm/fpu/api.h>
#include <asm/processor.h>
@@ -56,8 +56,17 @@ void blake2s_compress(struct blake2s_state *state, const u8 *block,
}
EXPORT_SYMBOL(blake2s_compress);

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init blake2s_mod_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (boot_cpu_has(X86_FEATURE_SSSE3))
static_branch_enable(&blake2s_use_ssse3);

diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c
index 019c64c1340a..4c0ead71b198 100644
--- a/arch/x86/crypto/blowfish_glue.c
+++ b/arch/x86/crypto/blowfish_glue.c
@@ -15,6 +15,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

/* regular block cipher functions */
asmlinkage void __blowfish_enc_blk(struct bf_ctx *ctx, u8 *dst, const u8 *src,
@@ -303,10 +304,19 @@ static int force;
module_param(force, int, 0);
MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init blowfish_init(void)
{
int err;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!force && is_blacklisted_cpu()) {
printk(KERN_INFO
"blowfish-x86_64: performance on this CPU "
diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index e7e4d64e9577..8e3ac5be7cf6 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -11,6 +11,7 @@
#include <linux/err.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

#include "camellia.h"
#include "ecb_cbc_helpers.h"
@@ -98,12 +99,23 @@ static struct skcipher_alg camellia_algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AES, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];

static int __init camellia_aesni_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AVX2) ||
!boot_cpu_has(X86_FEATURE_AES) ||
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index c7ccf63e741e..54fcd86160ff 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -11,6 +11,7 @@
#include <linux/err.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

#include "camellia.h"
#include "ecb_cbc_helpers.h"
@@ -98,12 +99,22 @@ static struct skcipher_alg camellia_algs[] = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AES, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];

static int __init camellia_aesni_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
diff --git a/arch/x86/crypto/camellia_glue.c b/arch/x86/crypto/camellia_glue.c
index d45e9c0c42ac..e21d2d5b68f9 100644
--- a/arch/x86/crypto/camellia_glue.c
+++ b/arch/x86/crypto/camellia_glue.c
@@ -1377,10 +1377,19 @@ static int force;
module_param(force, int, 0);
MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init camellia_init(void)
{
int err;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!force && is_blacklisted_cpu()) {
printk(KERN_INFO
"camellia-x86_64: performance on this CPU "
diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index 3976a87f92ad..bdc3c763334c 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -13,6 +13,7 @@
#include <linux/err.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

#include "ecb_cbc_helpers.h"

@@ -93,12 +94,21 @@ static struct skcipher_alg cast5_algs[] = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *cast5_simd_algs[ARRAY_SIZE(cast5_algs)];

static int __init cast5_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index 7e2aea372349..addca34b3511 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -15,6 +15,7 @@
#include <crypto/algapi.h>
#include <crypto/cast6.h>
#include <crypto/internal/simd.h>
+#include <asm/cpu_device_id.h>

#include "ecb_cbc_helpers.h"

@@ -93,12 +94,21 @@ static struct skcipher_alg cast6_algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *cast6_simd_algs[ARRAY_SIZE(cast6_algs)];

static int __init cast6_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index 0d7e172862db..7275cae3380d 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -13,6 +13,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -278,10 +279,17 @@ static struct skcipher_alg algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init chacha_simd_mod_init(void)
{
- if (!boot_cpu_has(X86_FEATURE_SSSE3))
- return 0;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+

static_branch_enable(&chacha_use_simd);

diff --git a/arch/x86/crypto/curve25519-x86_64.c b/arch/x86/crypto/curve25519-x86_64.c
index d55fa9e9b9e6..7fe395dfa79d 100644
--- a/arch/x86/crypto/curve25519-x86_64.c
+++ b/arch/x86/crypto/curve25519-x86_64.c
@@ -12,7 +12,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/scatterlist.h>
-
+#include <asm/cpu_device_id.h>
#include <asm/cpufeature.h>
#include <asm/processor.h>

@@ -1697,9 +1697,19 @@ static struct kpp_alg curve25519_alg = {
.max_size = curve25519_max_size,
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ADX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init curve25519_mod_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (boot_cpu_has(X86_FEATURE_BMI2) && boot_cpu_has(X86_FEATURE_ADX))
static_branch_enable(&curve25519_use_bmi2_adx);
else
diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c
index abb8b1fe123b..168cac5c6ca6 100644
--- a/arch/x86/crypto/des3_ede_glue.c
+++ b/arch/x86/crypto/des3_ede_glue.c
@@ -15,6 +15,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

struct des3_ede_x86_ctx {
struct des3_ede_ctx enc;
@@ -354,10 +355,19 @@ static int force;
module_param(force, int, 0);
MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init des3_ede_x86_init(void)
{
int err;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!force && is_blacklisted_cpu()) {
pr_info("des3_ede-x86_64: performance on this CPU would be suboptimal: disabling des3_ede-x86_64.\n");
return -ENODEV;
diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index 59615ae95e86..a8046334ddca 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -11,6 +11,7 @@
#include <crypto/nhpoly1305.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -57,8 +58,17 @@ static struct shash_alg nhpoly1305_alg = {
.descsize = sizeof(struct nhpoly1305_state),
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init nhpoly1305_mod_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_AVX2) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE))
return -ENODEV;
diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c
index bf91c375821a..cdbe5df00927 100644
--- a/arch/x86/crypto/nhpoly1305-sse2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c
@@ -11,6 +11,7 @@
#include <crypto/nhpoly1305.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -57,8 +58,17 @@ static struct shash_alg nhpoly1305_alg = {
.descsize = sizeof(struct nhpoly1305_state),
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_XMM2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init nhpoly1305_mod_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_XMM2))
return -ENODEV;

diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
index 3764301bdf1b..3e6ff505cd26 100644
--- a/arch/x86/crypto/poly1305_glue.c
+++ b/arch/x86/crypto/poly1305_glue.c
@@ -12,6 +12,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/intel-family.h>
#include <asm/simd.h>

@@ -260,8 +261,19 @@ static struct shash_alg alg = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX512F, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init poly1305_simd_mod_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (boot_cpu_has(X86_FEATURE_AVX) &&
cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL))
static_branch_enable(&poly1305_use_avx);
diff --git a/arch/x86/crypto/serpent_avx2_glue.c b/arch/x86/crypto/serpent_avx2_glue.c
index 347e97f4b713..24741d33edaf 100644
--- a/arch/x86/crypto/serpent_avx2_glue.c
+++ b/arch/x86/crypto/serpent_avx2_glue.c
@@ -12,6 +12,7 @@
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/serpent.h>
+#include <asm/cpu_device_id.h>

#include "serpent-avx.h"
#include "ecb_cbc_helpers.h"
@@ -94,12 +95,21 @@ static struct skcipher_alg serpent_algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];

static int __init serpent_avx2_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_AVX2) || !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
pr_info("AVX2 instructions are not detected.\n");
return -ENODEV;
diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index 6c248e1ea4ef..0db18d99da50 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -15,6 +15,7 @@
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/serpent.h>
+#include <asm/cpu_device_id.h>

#include "serpent-avx.h"
#include "ecb_cbc_helpers.h"
@@ -100,12 +101,21 @@ static struct skcipher_alg serpent_algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];

static int __init serpent_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
diff --git a/arch/x86/crypto/serpent_sse2_glue.c b/arch/x86/crypto/serpent_sse2_glue.c
index d78f37e9b2cf..5288441cc223 100644
--- a/arch/x86/crypto/serpent_sse2_glue.c
+++ b/arch/x86/crypto/serpent_sse2_glue.c
@@ -20,6 +20,7 @@
#include <crypto/b128ops.h>
#include <crypto/internal/simd.h>
#include <crypto/serpent.h>
+#include <asm/cpu_device_id.h>

#include "serpent-sse2.h"
#include "ecb_cbc_helpers.h"
@@ -103,10 +104,19 @@ static struct skcipher_alg serpent_algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_XMM2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];

static int __init serpent_sse2_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_XMM2)) {
printk(KERN_INFO "SSE2 instructions are not detected.\n");
return -ENODEV;
diff --git a/arch/x86/crypto/sm4_aesni_avx2_glue.c b/arch/x86/crypto/sm4_aesni_avx2_glue.c
index 84bc718f49a3..2e9fe76056b8 100644
--- a/arch/x86/crypto/sm4_aesni_avx2_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx2_glue.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/crypto.h>
#include <linux/kernel.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>
#include <crypto/internal/simd.h>
#include <crypto/internal/skcipher.h>
@@ -126,6 +127,14 @@ static struct skcipher_alg sm4_aesni_avx2_skciphers[] = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AES, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *
simd_sm4_aesni_avx2_skciphers[ARRAY_SIZE(sm4_aesni_avx2_skciphers)];

@@ -133,6 +142,9 @@ static int __init sm4_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AVX2) ||
!boot_cpu_has(X86_FEATURE_AES) ||
diff --git a/arch/x86/crypto/sm4_aesni_avx_glue.c b/arch/x86/crypto/sm4_aesni_avx_glue.c
index 7800f77d68ad..f730822f203a 100644
--- a/arch/x86/crypto/sm4_aesni_avx_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx_glue.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/crypto.h>
#include <linux/kernel.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>
#include <crypto/internal/simd.h>
#include <crypto/internal/skcipher.h>
@@ -445,6 +446,13 @@ static struct skcipher_alg sm4_aesni_avx_skciphers[] = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AES, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *
simd_sm4_aesni_avx_skciphers[ARRAY_SIZE(sm4_aesni_avx_skciphers)];

@@ -452,6 +460,9 @@ static int __init sm4_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index 3eb3440b477a..4657e6efc35d 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -15,6 +15,7 @@
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/twofish.h>
+#include <asm/cpu_device_id.h>

#include "twofish.h"
#include "ecb_cbc_helpers.h"
@@ -103,12 +104,21 @@ static struct skcipher_alg twofish_algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *twofish_simd_algs[ARRAY_SIZE(twofish_algs)];

static int __init twofish_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, &feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
return -ENODEV;
diff --git a/arch/x86/crypto/twofish_glue.c b/arch/x86/crypto/twofish_glue.c
index f9c4adc27404..ade98aef3402 100644
--- a/arch/x86/crypto/twofish_glue.c
+++ b/arch/x86/crypto/twofish_glue.c
@@ -43,6 +43,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

asmlinkage void twofish_enc_blk(struct twofish_ctx *ctx, u8 *dst,
const u8 *src);
@@ -81,8 +82,17 @@ static struct crypto_alg alg = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init twofish_glue_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
return crypto_register_alg(&alg);
}

diff --git a/arch/x86/crypto/twofish_glue_3way.c b/arch/x86/crypto/twofish_glue_3way.c
index 90454cf18e0d..790e5a59a9a7 100644
--- a/arch/x86/crypto/twofish_glue_3way.c
+++ b/arch/x86/crypto/twofish_glue_3way.c
@@ -11,6 +11,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

#include "twofish.h"
#include "ecb_cbc_helpers.h"
@@ -140,8 +141,17 @@ static int force;
module_param(force, int, 0);
MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init twofish_3way_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!force && is_blacklisted_cpu()) {
printk(KERN_INFO
"twofish-x86_64-3way: performance on this CPU "
--
2.37.3

Subject: [PATCH v2 07/19] crypto: x86/ghash - restructure FPU context saving

Wrap each of the calls to clmul_hash_update and clmul_ghash__mul
in its own set of kernel_fpu_begin and kernel_fpu_end calls, preparing
to limit the amount of data processed by each _update call to avoid
RCU stalls.

This is more like how polyval-clmulni_glue is structured.

Fixes: 0e1227d356e9 ("crypto: ghash - Add PCLMULQDQ accelerated implementation")
Suggested-by: Herbert Xu <[email protected]>
Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/ghash-clmulni-intel_glue.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index 1f1a95f3dd0c..53aa286ec27f 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -80,7 +80,6 @@ static int ghash_update(struct shash_desc *desc,
struct ghash_ctx *ctx = crypto_shash_ctx(desc->tfm);
u8 *dst = dctx->buffer;

- kernel_fpu_begin();
if (dctx->bytes) {
int n = min(srclen, dctx->bytes);
u8 *pos = dst + (GHASH_BLOCK_SIZE - dctx->bytes);
@@ -91,10 +90,14 @@ static int ghash_update(struct shash_desc *desc,
while (n--)
*pos++ ^= *src++;

- if (!dctx->bytes)
+ if (!dctx->bytes) {
+ kernel_fpu_begin();
clmul_ghash_mul(dst, &ctx->shash);
+ kernel_fpu_end();
+ }
}

+ kernel_fpu_begin();
clmul_ghash_update(dst, src, srclen, &ctx->shash);
kernel_fpu_end();

--
2.37.3

Subject: [PATCH v2 16/19] crypto: x86 - print CPU optimized loaded messages

Print a positive message at the info level if the CPU-optimized module
is loaded, for all modules except the sha modules.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/aegis128-aesni-glue.c | 8 +++++--
arch/x86/crypto/aesni-intel_glue.c | 22 +++++++++++++------
arch/x86/crypto/aria_aesni_avx_glue.c | 13 ++++++++---
arch/x86/crypto/blake2s-glue.c | 14 ++++++++++--
arch/x86/crypto/blowfish_glue.c | 2 ++
arch/x86/crypto/camellia_aesni_avx2_glue.c | 6 +++++-
arch/x86/crypto/camellia_aesni_avx_glue.c | 6 +++++-
arch/x86/crypto/camellia_glue.c | 3 +++
arch/x86/crypto/cast5_avx_glue.c | 6 +++++-
arch/x86/crypto/cast6_avx_glue.c | 6 +++++-
arch/x86/crypto/chacha_glue.c | 17 +++++++++++++--
arch/x86/crypto/crc32-pclmul_glue.c | 8 ++++++-
arch/x86/crypto/crc32c-intel_glue.c | 15 +++++++++++--
arch/x86/crypto/crct10dif-pclmul_glue.c | 7 +++++-
arch/x86/crypto/curve25519-x86_64.c | 13 +++++++++--
arch/x86/crypto/des3_ede_glue.c | 2 ++
arch/x86/crypto/ghash-clmulni-intel_glue.c | 1 +
arch/x86/crypto/nhpoly1305-avx2-glue.c | 7 +++++-
arch/x86/crypto/nhpoly1305-sse2-glue.c | 7 +++++-
arch/x86/crypto/poly1305_glue.c | 25 ++++++++++++++++++----
arch/x86/crypto/polyval-clmulni_glue.c | 7 +++++-
arch/x86/crypto/serpent_avx2_glue.c | 7 ++++--
arch/x86/crypto/serpent_avx_glue.c | 6 +++++-
arch/x86/crypto/serpent_sse2_glue.c | 7 +++++-
arch/x86/crypto/sm3_avx_glue.c | 6 +++++-
arch/x86/crypto/sm4_aesni_avx2_glue.c | 6 +++++-
arch/x86/crypto/sm4_aesni_avx_glue.c | 7 ++++--
arch/x86/crypto/twofish_avx_glue.c | 10 ++++++---
arch/x86/crypto/twofish_glue.c | 7 +++++-
arch/x86/crypto/twofish_glue_3way.c | 9 ++++++--
30 files changed, 213 insertions(+), 47 deletions(-)

diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c
index 122bfd04ee47..e8eaf79ef220 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -275,7 +275,8 @@ static struct simd_aead_alg *simd_alg;

static int __init crypto_aegis128_aesni_module_init(void)
{
- if (!x86_match_cpu(module_cpu_ids))
+ int ret;
+
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_XMM2) ||
@@ -283,8 +284,11 @@ static int __init crypto_aegis128_aesni_module_init(void)
!cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
return -ENODEV;

- return simd_register_aeads_compat(&crypto_aegis128_aesni_alg, 1,
+ ret = simd_register_aeads_compat(&crypto_aegis128_aesni_alg, 1,
&simd_alg);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit crypto_aegis128_aesni_module_exit(void)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index df93cb44b4eb..56023ba70049 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -1238,25 +1238,28 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
static int __init aesni_init(void)
{
int err;
+ int enabled_gcm_sse = 0;
+ int enabled_gcm_avx = 0;
+ int enabled_gcm_avx2 = 0;
+ int enabled_ctr_avx = 0;

if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
#ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_AVX2)) {
- pr_info("AVX2 version of gcm_enc/dec engaged.\n");
+ enabled_gcm_avx = 1;
+ enabled_gcm_avx2 = 1;
static_branch_enable(&gcm_use_avx);
static_branch_enable(&gcm_use_avx2);
- } else
- if (boot_cpu_has(X86_FEATURE_AVX)) {
- pr_info("AVX version of gcm_enc/dec engaged.\n");
+ } else if (boot_cpu_has(X86_FEATURE_AVX)) {
+ enabled_gcm_avx = 1;
static_branch_enable(&gcm_use_avx);
} else {
- pr_info("SSE version of gcm_enc/dec engaged.\n");
+ enabled_gcm_sse = 1;
}
if (boot_cpu_has(X86_FEATURE_AVX)) {
- /* optimize performance of ctr mode encryption transform */
+ enabled_ctr_avx = 1;
static_call_update(aesni_ctr_enc_tfm, aesni_ctr_enc_avx_tfm);
- pr_info("AES CTR mode by8 optimization enabled\n");
}
#endif /* CONFIG_X86_64 */

@@ -1283,6 +1286,11 @@ static int __init aesni_init(void)
goto unregister_aeads;
#endif /* CONFIG_X86_64 */

+ pr_info("CPU-optimized crypto module loaded (GCM SSE=%s, AVX=%s, AVX2=%s)(CTR AVX=%s)\n",
+ enabled_gcm_sse ? "yes" : "no",
+ enabled_gcm_avx ? "yes" : "no",
+ enabled_gcm_avx2 ? "yes" : "no",
+ enabled_ctr_avx ? "yes" : "no");
return 0;

#ifdef CONFIG_X86_64
diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c
index 589097728bd1..d58fb995a266 100644
--- a/arch/x86/crypto/aria_aesni_avx_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx_glue.c
@@ -170,6 +170,8 @@ static struct simd_skcipher_alg *aria_simd_algs[ARRAY_SIZE(aria_algs)];
static int __init aria_avx_init(void)
{
const char *feature_name;
+ int ret;
+ int enabled_gfni = 0;

if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AES) ||
@@ -188,15 +190,20 @@ static int __init aria_avx_init(void)
aria_ops.aria_encrypt_16way = aria_aesni_avx_gfni_encrypt_16way;
aria_ops.aria_decrypt_16way = aria_aesni_avx_gfni_decrypt_16way;
aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_gfni_ctr_crypt_16way;
+ enabled_gfni = 1;
} else {
aria_ops.aria_encrypt_16way = aria_aesni_avx_encrypt_16way;
aria_ops.aria_decrypt_16way = aria_aesni_avx_decrypt_16way;
aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_ctr_crypt_16way;
}

- return simd_register_skciphers_compat(aria_algs,
- ARRAY_SIZE(aria_algs),
- aria_simd_algs);
+ ret = simd_register_skciphers_compat(aria_algs,
+ ARRAY_SIZE(aria_algs),
+ aria_simd_algs);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded (GFNI=%s)\n",
+ enabled_gfni ? "yes" : "no");
+ return ret;
}

static void __exit aria_avx_exit(void)
diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c
index ac7fb7a9922b..4f2f385f6674 100644
--- a/arch/x86/crypto/blake2s-glue.c
+++ b/arch/x86/crypto/blake2s-glue.c
@@ -66,11 +66,16 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init blake2s_mod_init(void)
{
+ int enabled_ssse3 = 0;
+ int enabled_avx512 = 0;
+
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (boot_cpu_has(X86_FEATURE_SSSE3))
+ if (boot_cpu_has(X86_FEATURE_SSSE3)) {
+ enabled_ssse3 = 1;
static_branch_enable(&blake2s_use_ssse3);
+ }

if (IS_ENABLED(CONFIG_AS_AVX512) &&
boot_cpu_has(X86_FEATURE_AVX) &&
@@ -78,9 +83,14 @@ static int __init blake2s_mod_init(void)
boot_cpu_has(X86_FEATURE_AVX512F) &&
boot_cpu_has(X86_FEATURE_AVX512VL) &&
cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
- XFEATURE_MASK_AVX512, NULL))
+ XFEATURE_MASK_AVX512, NULL)) {
+ enabled_avx512 = 1;
static_branch_enable(&blake2s_use_avx512);
+ }

+ pr_info("CPU-optimized crypto module loaded (SSSE3=%s, AVX512=%s)\n",
+ enabled_ssse3 ? "yes" : "no",
+ enabled_avx512 ? "yes" : "no");
return 0;
}

diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c
index 5cfcbb91c4ca..27b7aed9a488 100644
--- a/arch/x86/crypto/blowfish_glue.c
+++ b/arch/x86/crypto/blowfish_glue.c
@@ -336,6 +336,8 @@ static int __init blowfish_init(void)
if (err)
crypto_unregister_alg(&bf_cipher_alg);

+ if (!err)
+ pr_info("CPU-optimized crypto module loaded\n");
return err;
}

diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index 851f2a29963c..e6c4ed1e40d2 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -114,6 +114,7 @@ static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];
static int __init camellia_aesni_init(void)
{
const char *feature_name;
+ int ret;

if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
@@ -132,9 +133,12 @@ static int __init camellia_aesni_init(void)
return -ENODEV;
}

- return simd_register_skciphers_compat(camellia_algs,
+ ret = simd_register_skciphers_compat(camellia_algs,
ARRAY_SIZE(camellia_algs),
camellia_simd_algs);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit camellia_aesni_fini(void)
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index 8846493c92fb..6a9eadf0fe90 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -113,6 +113,7 @@ static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];
static int __init camellia_aesni_init(void)
{
const char *feature_name;
+ int ret;

if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
@@ -130,9 +131,12 @@ static int __init camellia_aesni_init(void)
return -ENODEV;
}

- return simd_register_skciphers_compat(camellia_algs,
+ ret = simd_register_skciphers_compat(camellia_algs,
ARRAY_SIZE(camellia_algs),
camellia_simd_algs);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit camellia_aesni_fini(void)
diff --git a/arch/x86/crypto/camellia_glue.c b/arch/x86/crypto/camellia_glue.c
index 3c14a904af00..94dd2973bb47 100644
--- a/arch/x86/crypto/camellia_glue.c
+++ b/arch/x86/crypto/camellia_glue.c
@@ -1410,6 +1410,9 @@ static int __init camellia_init(void)
if (err)
crypto_unregister_alg(&camellia_cipher_alg);

+ if (!err)
+ pr_info("CPU-optimized crypto module loaded\n");
+
return err;
}

diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index fdeec0849ab5..b5ae17c3ac53 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -107,6 +107,7 @@ static struct simd_skcipher_alg *cast5_simd_algs[ARRAY_SIZE(cast5_algs)];
static int __init cast5_init(void)
{
const char *feature_name;
+ int ret;

if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
@@ -117,9 +118,12 @@ static int __init cast5_init(void)
return -ENODEV;
}

- return simd_register_skciphers_compat(cast5_algs,
+ ret = simd_register_skciphers_compat(cast5_algs,
ARRAY_SIZE(cast5_algs),
cast5_simd_algs);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit cast5_exit(void)
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index 9258082408eb..d1c14a5f80d7 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -107,6 +107,7 @@ static struct simd_skcipher_alg *cast6_simd_algs[ARRAY_SIZE(cast6_algs)];
static int __init cast6_init(void)
{
const char *feature_name;
+ int ret;

if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
@@ -117,9 +118,12 @@ static int __init cast6_init(void)
return -ENODEV;
}

- return simd_register_skciphers_compat(cast6_algs,
+ ret = simd_register_skciphers_compat(cast6_algs,
ARRAY_SIZE(cast6_algs),
cast6_simd_algs);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit cast6_exit(void)
diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index 8e5cadc808b4..de424fbe9f0e 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -289,6 +289,9 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init chacha_simd_mod_init(void)
{
+ int ret;
+ int enabled_avx2 = 0;
+ int enabled_avx512 = 0;
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

@@ -298,15 +301,25 @@ static int __init chacha_simd_mod_init(void)
if (boot_cpu_has(X86_FEATURE_AVX) &&
boot_cpu_has(X86_FEATURE_AVX2) &&
cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ enabled_avx2 = 1;
static_branch_enable(&chacha_use_avx2);

if (IS_ENABLED(CONFIG_AS_AVX512) &&
boot_cpu_has(X86_FEATURE_AVX512VL) &&
- boot_cpu_has(X86_FEATURE_AVX512BW)) /* kmovq */
+ boot_cpu_has(X86_FEATURE_AVX512BW)) { /* kmovq */
+ enabled_avx512 = 1;
static_branch_enable(&chacha_use_avx512vl);
+ }
}
- return IS_REACHABLE(CONFIG_CRYPTO_SKCIPHER) ?
+ ret = IS_REACHABLE(CONFIG_CRYPTO_SKCIPHER) ?
crypto_register_skciphers(algs, ARRAY_SIZE(algs)) : 0;
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded (AVX2=%s, AVX512=%s)\n",
+ enabled_avx2 ? "yes" : "no",
+ enabled_avx512 ? "yes" : "no");
+ else
+ pr_info("CPU-optimized crypto module not loaded");
+ return ret;
}

static void __exit chacha_simd_mod_fini(void)
diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c
index bc2b31b04e05..c56d3d3ab0a0 100644
--- a/arch/x86/crypto/crc32-pclmul_glue.c
+++ b/arch/x86/crypto/crc32-pclmul_glue.c
@@ -190,9 +190,15 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init crc32_pclmul_mod_init(void)
{
+ int ret;
+
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- return crypto_register_shash(&alg);
+
+ ret = crypto_register_shash(&alg);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit crc32_pclmul_mod_fini(void)
diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index ebf530934a3e..c633d303f19b 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -242,16 +242,27 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init crc32c_intel_mod_init(void)
{
- if (!x86_match_cpu(module_cpu_ids))
+ int ret;
+ int pcl_enabled = 0;
+
+ if (!x86_match_cpu(module_cpu_ids)) {
+ pr_info("CPU-optimized crypto module not loaded, required CPU feature (SSE4.2) not supported\n");
return -ENODEV;
+ }
+
#ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
+ pcl_enabled = 1;
alg.update = crc32c_pcl_intel_update;
alg.finup = crc32c_pcl_intel_finup;
alg.digest = crc32c_pcl_intel_digest;
}
#endif
- return crypto_register_shash(&alg);
+ ret = crypto_register_shash(&alg);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded (PCLMULQDQ=%s)\n",
+ pcl_enabled ? "yes" : "no");
+ return ret;
}

static void __exit crc32c_intel_mod_fini(void)
diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c
index 03e35a1b7677..4476b9af1e61 100644
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c
+++ b/arch/x86/crypto/crct10dif-pclmul_glue.c
@@ -146,10 +146,15 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init crct10dif_intel_mod_init(void)
{
+ int ret;
+
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- return crypto_register_shash(&alg);
+ ret = crypto_register_shash(&alg);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit crct10dif_intel_mod_fini(void)
diff --git a/arch/x86/crypto/curve25519-x86_64.c b/arch/x86/crypto/curve25519-x86_64.c
index f9a1adb0c183..b9289feef375 100644
--- a/arch/x86/crypto/curve25519-x86_64.c
+++ b/arch/x86/crypto/curve25519-x86_64.c
@@ -1709,15 +1709,24 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init curve25519_mod_init(void)
{
+ int ret;
+ int enabled_adx = 0;
+
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (boot_cpu_has(X86_FEATURE_BMI2) && boot_cpu_has(X86_FEATURE_ADX))
+ if (boot_cpu_has(X86_FEATURE_BMI2) && boot_cpu_has(X86_FEATURE_ADX)) {
+ enabled_adx = 1;
static_branch_enable(&curve25519_use_bmi2_adx);
+ }
else
return 0;
- return IS_REACHABLE(CONFIG_CRYPTO_KPP) ?
+ ret = IS_REACHABLE(CONFIG_CRYPTO_KPP) ?
crypto_register_kpp(&curve25519_alg) : 0;
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded (ADX=%s)\n",
+ enabled_adx ? "yes" : "no");
+ return ret;
}

static void __exit curve25519_mod_exit(void)
diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c
index 83e686a6c2f3..7b4dd02007ed 100644
--- a/arch/x86/crypto/des3_ede_glue.c
+++ b/arch/x86/crypto/des3_ede_glue.c
@@ -384,6 +384,8 @@ static int __init des3_ede_x86_init(void)
if (err)
crypto_unregister_alg(&des3_ede_cipher);

+ if (!err)
+ pr_info("CPU-optimized crypto module loaded\n");
return err;
}

diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index 3ad55144da48..496a410eaff7 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -349,6 +349,7 @@ static int __init ghash_pclmulqdqni_mod_init(void)
if (err)
goto err_shash;

+ pr_info("CPU-optimized crypto module loaded\n");
return 0;

err_shash:
diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index 40f49107e5a9..2dc7b618771f 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -68,6 +68,8 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init nhpoly1305_mod_init(void)
{
+ int ret;
+
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

@@ -75,7 +77,10 @@ static int __init nhpoly1305_mod_init(void)
!boot_cpu_has(X86_FEATURE_OSXSAVE))
return -ENODEV;

- return crypto_register_shash(&nhpoly1305_alg);
+ ret = crypto_register_shash(&nhpoly1305_alg);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit nhpoly1305_mod_exit(void)
diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c
index bb40fed92c92..bf0f8ac7afd6 100644
--- a/arch/x86/crypto/nhpoly1305-sse2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c
@@ -68,13 +68,18 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init nhpoly1305_mod_init(void)
{
+ int ret;
+
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_XMM2))
return -ENODEV;

- return crypto_register_shash(&nhpoly1305_alg);
+ ret = crypto_register_shash(&nhpoly1305_alg);
+ if (ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit nhpoly1305_mod_exit(void)
diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
index a2a7cb39cdec..c9ebb6b90d1f 100644
--- a/arch/x86/crypto/poly1305_glue.c
+++ b/arch/x86/crypto/poly1305_glue.c
@@ -273,22 +273,39 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init poly1305_simd_mod_init(void)
{
+ int ret;
+ int enabled_avx = 0;
+ int enabled_avx2 = 0;
+ int enabled_avx512 = 0;
+
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

if (boot_cpu_has(X86_FEATURE_AVX) &&
- cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL))
+ cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ enabled_avx = 1;
static_branch_enable(&poly1305_use_avx);
+ }
if (boot_cpu_has(X86_FEATURE_AVX) && boot_cpu_has(X86_FEATURE_AVX2) &&
- cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL))
+ cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ enabled_avx2 = 1;
static_branch_enable(&poly1305_use_avx2);
+ }
if (IS_ENABLED(CONFIG_AS_AVX512) && boot_cpu_has(X86_FEATURE_AVX) &&
boot_cpu_has(X86_FEATURE_AVX2) && boot_cpu_has(X86_FEATURE_AVX512F) &&
cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM | XFEATURE_MASK_AVX512, NULL) &&
/* Skylake downclocks unacceptably much when using zmm, but later generations are fast. */
- boot_cpu_data.x86_model != INTEL_FAM6_SKYLAKE_X)
+ boot_cpu_data.x86_model != INTEL_FAM6_SKYLAKE_X) {
+ enabled_avx512 = 1;
static_branch_enable(&poly1305_use_avx512);
- return IS_REACHABLE(CONFIG_CRYPTO_HASH) ? crypto_register_shash(&alg) : 0;
+ }
+ ret = IS_REACHABLE(CONFIG_CRYPTO_HASH) ? crypto_register_shash(&alg) : 0;
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded (AVX=%s, AVX2=%s, AVX512=%s)\n",
+ enabled_avx ? "yes" : "no",
+ enabled_avx2 ? "yes" : "no",
+ enabled_avx512 ? "yes" : "no");
+ return ret;
}

static void __exit poly1305_simd_mod_exit(void)
diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c
index 5a345db20ca9..7a3a80085c90 100644
--- a/arch/x86/crypto/polyval-clmulni_glue.c
+++ b/arch/x86/crypto/polyval-clmulni_glue.c
@@ -183,13 +183,18 @@ MODULE_DEVICE_TABLE(x86cpu, pcmul_cpu_id);

static int __init polyval_clmulni_mod_init(void)
{
+ int ret;
+
if (!x86_match_cpu(pcmul_cpu_id))
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_AVX))
return -ENODEV;

- return crypto_register_shash(&polyval_alg);
+ ret = crypto_register_shash(&polyval_alg);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit polyval_clmulni_mod_exit(void)
diff --git a/arch/x86/crypto/serpent_avx2_glue.c b/arch/x86/crypto/serpent_avx2_glue.c
index 5944bf5ead2e..bf59addaf804 100644
--- a/arch/x86/crypto/serpent_avx2_glue.c
+++ b/arch/x86/crypto/serpent_avx2_glue.c
@@ -108,8 +108,8 @@ static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];
static int __init serpent_avx2_init(void)
{
const char *feature_name;
+ int ret;

- if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_AVX2) || !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
@@ -122,9 +122,12 @@ static int __init serpent_avx2_init(void)
return -ENODEV;
}

- return simd_register_skciphers_compat(serpent_algs,
+ ret = simd_register_skciphers_compat(serpent_algs,
ARRAY_SIZE(serpent_algs),
serpent_simd_algs);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit serpent_avx2_fini(void)
diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index 45713c7a4cb9..7b0c02a61552 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -114,6 +114,7 @@ static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];
static int __init serpent_init(void)
{
const char *feature_name;
+ int ret;

if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
@@ -124,9 +125,12 @@ static int __init serpent_init(void)
return -ENODEV;
}

- return simd_register_skciphers_compat(serpent_algs,
+ ret = simd_register_skciphers_compat(serpent_algs,
ARRAY_SIZE(serpent_algs),
serpent_simd_algs);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit serpent_exit(void)
diff --git a/arch/x86/crypto/serpent_sse2_glue.c b/arch/x86/crypto/serpent_sse2_glue.c
index d8aa0d3fbf15..f82880ef6f10 100644
--- a/arch/x86/crypto/serpent_sse2_glue.c
+++ b/arch/x86/crypto/serpent_sse2_glue.c
@@ -116,6 +116,8 @@ static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];

static int __init serpent_sse2_init(void)
{
+ int ret;
+
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

@@ -124,9 +126,12 @@ static int __init serpent_sse2_init(void)
return -ENODEV;
}

- return simd_register_skciphers_compat(serpent_algs,
+ ret = simd_register_skciphers_compat(serpent_algs,
ARRAY_SIZE(serpent_algs),
serpent_simd_algs);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit serpent_sse2_exit(void)
diff --git a/arch/x86/crypto/sm3_avx_glue.c b/arch/x86/crypto/sm3_avx_glue.c
index 475b9637a06d..532f07b05745 100644
--- a/arch/x86/crypto/sm3_avx_glue.c
+++ b/arch/x86/crypto/sm3_avx_glue.c
@@ -125,6 +125,7 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
static int __init sm3_avx_mod_init(void)
{
const char *feature_name;
+ int ret;

if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
@@ -145,7 +146,10 @@ static int __init sm3_avx_mod_init(void)
return -ENODEV;
}

- return crypto_register_shash(&sm3_avx_alg);
+ ret = crypto_register_shash(&sm3_avx_alg);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit sm3_avx_mod_exit(void)
diff --git a/arch/x86/crypto/sm4_aesni_avx2_glue.c b/arch/x86/crypto/sm4_aesni_avx2_glue.c
index 3fe9e170b880..42819ee5d36d 100644
--- a/arch/x86/crypto/sm4_aesni_avx2_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx2_glue.c
@@ -143,6 +143,7 @@ simd_sm4_aesni_avx2_skciphers[ARRAY_SIZE(sm4_aesni_avx2_skciphers)];
static int __init sm4_init(void)
{
const char *feature_name;
+ int ret;

if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
@@ -161,9 +162,12 @@ static int __init sm4_init(void)
return -ENODEV;
}

- return simd_register_skciphers_compat(sm4_aesni_avx2_skciphers,
+ ret = simd_register_skciphers_compat(sm4_aesni_avx2_skciphers,
ARRAY_SIZE(sm4_aesni_avx2_skciphers),
simd_sm4_aesni_avx2_skciphers);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit sm4_exit(void)
diff --git a/arch/x86/crypto/sm4_aesni_avx_glue.c b/arch/x86/crypto/sm4_aesni_avx_glue.c
index 14ae012948ae..8a25376d341f 100644
--- a/arch/x86/crypto/sm4_aesni_avx_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx_glue.c
@@ -461,8 +461,8 @@ simd_sm4_aesni_avx_skciphers[ARRAY_SIZE(sm4_aesni_avx_skciphers)];
static int __init sm4_init(void)
{
const char *feature_name;
+ int ret;

- if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_AVX) ||
@@ -478,9 +478,12 @@ static int __init sm4_init(void)
return -ENODEV;
}

- return simd_register_skciphers_compat(sm4_aesni_avx_skciphers,
+ ret = simd_register_skciphers_compat(sm4_aesni_avx_skciphers,
ARRAY_SIZE(sm4_aesni_avx_skciphers),
simd_sm4_aesni_avx_skciphers);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit sm4_exit(void)
diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index 044e4f92e2c0..ccf016bf6ef2 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -117,6 +117,7 @@ static struct simd_skcipher_alg *twofish_simd_algs[ARRAY_SIZE(twofish_algs)];
static int __init twofish_init(void)
{
const char *feature_name;
+ int ret;

if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
@@ -126,9 +127,12 @@ static int __init twofish_init(void)
return -ENODEV;
}

- return simd_register_skciphers_compat(twofish_algs,
- ARRAY_SIZE(twofish_algs),
- twofish_simd_algs);
+ ret = simd_register_skciphers_compat(twofish_algs,
+ ARRAY_SIZE(twofish_algs),
+ twofish_simd_algs);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit twofish_exit(void)
diff --git a/arch/x86/crypto/twofish_glue.c b/arch/x86/crypto/twofish_glue.c
index 031ed290c755..5756b9cab982 100644
--- a/arch/x86/crypto/twofish_glue.c
+++ b/arch/x86/crypto/twofish_glue.c
@@ -92,10 +92,15 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init twofish_glue_init(void)
{
+ int ret;
+
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- return crypto_register_alg(&alg);
+ ret = crypto_register_alg(&alg);
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit twofish_glue_fini(void)
diff --git a/arch/x86/crypto/twofish_glue_3way.c b/arch/x86/crypto/twofish_glue_3way.c
index 7e2a18e3abe7..2fde637b40c8 100644
--- a/arch/x86/crypto/twofish_glue_3way.c
+++ b/arch/x86/crypto/twofish_glue_3way.c
@@ -151,6 +151,8 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init twofish_3way_init(void)
{
+ int ret;
+
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

@@ -162,8 +164,11 @@ static int __init twofish_3way_init(void)
return -ENODEV;
}

- return crypto_register_skciphers(tf_skciphers,
- ARRAY_SIZE(tf_skciphers));
+ ret = crypto_register_skciphers(tf_skciphers,
+ ARRAY_SIZE(tf_skciphers));
+ if (!ret)
+ pr_info("CPU-optimized crypto module loaded\n");
+ return ret;
}

static void __exit twofish_3way_fini(void)
--
2.37.3

Subject: [PATCH v2 11/19] crypto: x86/crc - load based on CPU features

Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), these x86-optimized crypto modules already have
module aliases based on CPU feature bits:
crc32, crc32c, and crct10dif

Rename the unique device table data structure to a generic name
so the code has the same pattern in all the modules.

Remove the print on a device table mismatch from crc32 that is not
present in the other modules. Modules are not supposed to print
unless they are active.

This commit covers modules that created rcu stall issues
due to kernel_fpu_begin/kernel_fpu_end calls.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/crc32-pclmul_glue.c | 9 +++------
arch/x86/crypto/crc32c-intel_glue.c | 6 +++---
arch/x86/crypto/crct10dif-pclmul_glue.c | 6 +++---
3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c
index 38539c6edfe5..d49a19dcee37 100644
--- a/arch/x86/crypto/crc32-pclmul_glue.c
+++ b/arch/x86/crypto/crc32-pclmul_glue.c
@@ -178,20 +178,17 @@ static struct shash_alg alg = {
}
};

-static const struct x86_cpu_id crc32pclmul_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, crc32pclmul_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);


static int __init crc32_pclmul_mod_init(void)
{
-
- if (!x86_match_cpu(crc32pclmul_cpu_id)) {
- pr_info("PCLMULQDQ-NI instructions are not detected.\n");
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- }
return crypto_register_shash(&alg);
}

diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index ece620227057..980c62929256 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -231,15 +231,15 @@ static struct shash_alg alg = {
}
};

-static const struct x86_cpu_id crc32c_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init crc32c_intel_mod_init(void)
{
- if (!x86_match_cpu(crc32c_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
#ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c
index 54a537fc88ee..3b8e9394c40d 100644
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c
+++ b/arch/x86/crypto/crct10dif-pclmul_glue.c
@@ -136,15 +136,15 @@ static struct shash_alg alg = {
}
};

-static const struct x86_cpu_id crct10dif_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, crct10dif_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init crct10dif_intel_mod_init(void)
{
- if (!x86_match_cpu(crct10dif_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

return crypto_register_shash(&alg);
--
2.37.3

Subject: [PATCH v2 02/19] crypto: tcrypt - test nhpoly1305

Add self-test mode for nhpoly1305.

Signed-off-by: Robert Elliott <[email protected]>
---
crypto/tcrypt.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 4426386dfb42..7a6a56751043 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1715,6 +1715,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
ret += tcrypt_test("crc32");
break;

+ case 60:
+ ret += tcrypt_test("nhpoly1305");
+ break;
+
case 100:
ret += tcrypt_test("hmac(md5)");
break;
--
2.37.3

Subject: [PATCH v2 19/19] crypto: x86/sha - register only the best function

Don't register and unregister each of the functions from least-
to most-optimized (SSSE3 then AVX then AVX2); determine the
most-optimized function and load only that version.

Suggested-by: Tim Chen <[email protected]>
Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/sha1_ssse3_glue.c | 139 ++++++++++++-------------
arch/x86/crypto/sha256_ssse3_glue.c | 154 ++++++++++++++--------------
arch/x86/crypto/sha512_ssse3_glue.c | 120 ++++++++++++----------
3 files changed, 210 insertions(+), 203 deletions(-)

diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index edffc33bd12e..90a86d737bcf 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -123,17 +123,16 @@ static struct shash_alg sha1_ssse3_alg = {
}
};

-static int register_sha1_ssse3(void)
-{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
- return crypto_register_shash(&sha1_ssse3_alg);
- return 0;
-}
-
+static bool sha1_ssse3_registered;
+static bool sha1_avx_registered;
+static bool sha1_avx2_registered;
+static bool sha1_ni_registered;
static void unregister_sha1_ssse3(void)
{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
+ if (sha1_ssse3_registered) {
crypto_unregister_shash(&sha1_ssse3_alg);
+ sha1_ssse3_registered = 0;
+ }
}

asmlinkage void sha1_transform_avx(struct sha1_state *state,
@@ -172,28 +171,12 @@ static struct shash_alg sha1_avx_alg = {
}
};

-static bool avx_usable(void)
-{
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
- if (boot_cpu_has(X86_FEATURE_AVX))
- pr_info("AVX detected but unusable.\n");
- return false;
- }
-
- return true;
-}
-
-static int register_sha1_avx(void)
-{
- if (avx_usable())
- return crypto_register_shash(&sha1_avx_alg);
- return 0;
-}
-
static void unregister_sha1_avx(void)
{
- if (avx_usable())
+ if (sha1_avx_registered) {
crypto_unregister_shash(&sha1_avx_alg);
+ sha1_avx_registered = 0;
+ }
}

#define SHA1_AVX2_BLOCK_OPTSIZE 4 /* optimal 4*64 bytes of SHA1 blocks */
@@ -201,16 +184,6 @@ static void unregister_sha1_avx(void)
asmlinkage void sha1_transform_avx2(struct sha1_state *state,
const u8 *data, int blocks);

-static bool avx2_usable(void)
-{
- if (avx_usable() && boot_cpu_has(X86_FEATURE_AVX2)
- && boot_cpu_has(X86_FEATURE_BMI1)
- && boot_cpu_has(X86_FEATURE_BMI2))
- return true;
-
- return false;
-}
-
static void sha1_apply_transform_avx2(struct sha1_state *state,
const u8 *data, int blocks)
{
@@ -254,17 +227,13 @@ static struct shash_alg sha1_avx2_alg = {
}
};

-static int register_sha1_avx2(void)
-{
- if (avx2_usable())
- return crypto_register_shash(&sha1_avx2_alg);
- return 0;
-}

static void unregister_sha1_avx2(void)
{
- if (avx2_usable())
+ if (sha1_avx2_registered) {
crypto_unregister_shash(&sha1_avx2_alg);
+ sha1_avx2_registered = 0;
+ }
}

#ifdef CONFIG_AS_SHA1_NI
@@ -304,13 +273,6 @@ static struct shash_alg sha1_ni_alg = {
}
};

-static int register_sha1_ni(void)
-{
- if (boot_cpu_has(X86_FEATURE_SHA_NI))
- return crypto_register_shash(&sha1_ni_alg);
- return 0;
-}
-
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_SHA_NI, NULL),
X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
@@ -322,44 +284,79 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static void unregister_sha1_ni(void)
{
- if (boot_cpu_has(X86_FEATURE_SHA_NI))
+ if (sha1_ni_registered) {
crypto_unregister_shash(&sha1_ni_alg);
+ sha1_ni_registered = 0;
+ }
}

#else
-static inline int register_sha1_ni(void) { return 0; }
static inline void unregister_sha1_ni(void) { }
#endif

static int __init sha1_ssse3_mod_init(void)
{
- if (register_sha1_ssse3())
- goto fail;
+ const char *feature_name;
+ const char *driver_name = NULL;
+ int ret;

if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (register_sha1_avx()) {
- unregister_sha1_ssse3();
- goto fail;
- }
+ /* SHA-NI */
+ if (boot_cpu_has(X86_FEATURE_SHA_NI)) {

- if (register_sha1_avx2()) {
- unregister_sha1_avx();
- unregister_sha1_ssse3();
- goto fail;
- }
+ ret = crypto_register_shash(&sha1_ni_alg);
+ if (!ret)
+ sha1_ni_registered = 1;

- if (register_sha1_ni()) {
- unregister_sha1_avx2();
- unregister_sha1_avx();
- unregister_sha1_ssse3();
- goto fail;
+ /* AVX2 */
+ } else if (boot_cpu_has(X86_FEATURE_AVX2)) {
+
+ if (boot_cpu_has(X86_FEATURE_BMI1) &&
+ boot_cpu_has(X86_FEATURE_BMI2)) {
+
+ ret = crypto_register_shash(&sha1_avx2_alg);
+ if (!ret) {
+ sha1_avx2_registered = 1;
+ driver_name = sha1_avx2_alg.base.cra_driver_name;
+ }
+ } else {
+ pr_info("AVX2-optimized version not engaged, all required features (AVX2, BMI1, BMI2) not supported\n");
+ }
+
+ /* AVX */
+ } else if (boot_cpu_has(X86_FEATURE_AVX)) {
+
+ if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
+ &feature_name)) {
+
+ ret = crypto_register_shash(&sha1_avx_alg);
+ if (!ret) {
+ sha1_avx_registered = 1;
+ driver_name = sha1_avx_alg.base.cra_driver_name;
+ }
+ } else {
+ pr_info("AVX-optimized version not engaged, CPU extended feature '%s' is not supported\n",
+ feature_name);
+ }
+
+ /* SSE3 */
+ } else if (boot_cpu_has(X86_FEATURE_SSSE3)) {
+ ret = crypto_register_shash(&sha1_ssse3_alg);
+ if (!ret) {
+ sha1_ssse3_registered = 1;
+ driver_name = sha1_ssse3_alg.base.cra_driver_name;
+ }
}

+ pr_info("CPU-optimized crypto module loaded (SSSE3=%s, AVX=%s, AVX2=%s, SHA-NI=%s): driver=%s\n",
+ sha1_ssse3_registered ? "yes" : "no",
+ sha1_avx_registered ? "yes" : "no",
+ sha1_avx2_registered ? "yes" : "no",
+ sha1_ni_registered ? "yes" : "no",
+ driver_name);
return 0;
-fail:
- return -ENODEV;
}

static void __exit sha1_ssse3_mod_fini(void)
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 8a0fb308fbba..cd7bf2b48f3d 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -150,19 +150,18 @@ static struct shash_alg sha256_ssse3_algs[] = { {
}
} };

-static int register_sha256_ssse3(void)
-{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
- return crypto_register_shashes(sha256_ssse3_algs,
- ARRAY_SIZE(sha256_ssse3_algs));
- return 0;
-}
+static bool sha256_ssse3_registered;
+static bool sha256_avx_registered;
+static bool sha256_avx2_registered;
+static bool sha256_ni_registered;

static void unregister_sha256_ssse3(void)
{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
+ if (sha256_ssse3_registered) {
crypto_unregister_shashes(sha256_ssse3_algs,
ARRAY_SIZE(sha256_ssse3_algs));
+ sha256_ssse3_registered = 0;
+ }
}

asmlinkage void sha256_transform_avx(struct sha256_state *state,
@@ -215,30 +214,13 @@ static struct shash_alg sha256_avx_algs[] = { {
}
} };

-static bool avx_usable(void)
-{
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
- if (boot_cpu_has(X86_FEATURE_AVX))
- pr_info("AVX detected but unusable.\n");
- return false;
- }
-
- return true;
-}
-
-static int register_sha256_avx(void)
-{
- if (avx_usable())
- return crypto_register_shashes(sha256_avx_algs,
- ARRAY_SIZE(sha256_avx_algs));
- return 0;
-}
-
static void unregister_sha256_avx(void)
{
- if (avx_usable())
+ if (sha256_avx_registered) {
crypto_unregister_shashes(sha256_avx_algs,
ARRAY_SIZE(sha256_avx_algs));
+ sha256_avx_registered = 0;
+ }
}

asmlinkage void sha256_transform_rorx(struct sha256_state *state,
@@ -291,28 +273,13 @@ static struct shash_alg sha256_avx2_algs[] = { {
}
} };

-static bool avx2_usable(void)
-{
- if (avx_usable() && boot_cpu_has(X86_FEATURE_AVX2) &&
- boot_cpu_has(X86_FEATURE_BMI2))
- return true;
-
- return false;
-}
-
-static int register_sha256_avx2(void)
-{
- if (avx2_usable())
- return crypto_register_shashes(sha256_avx2_algs,
- ARRAY_SIZE(sha256_avx2_algs));
- return 0;
-}
-
static void unregister_sha256_avx2(void)
{
- if (avx2_usable())
+ if (sha256_avx2_registered) {
crypto_unregister_shashes(sha256_avx2_algs,
ARRAY_SIZE(sha256_avx2_algs));
+ sha256_avx2_registered = 0;
+ }
}

#ifdef CONFIG_AS_SHA256_NI
@@ -375,55 +342,92 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

-static int register_sha256_ni(void)
-{
- if (boot_cpu_has(X86_FEATURE_SHA_NI))
- return crypto_register_shashes(sha256_ni_algs,
- ARRAY_SIZE(sha256_ni_algs));
- return 0;
-}
-
static void unregister_sha256_ni(void)
{
- if (boot_cpu_has(X86_FEATURE_SHA_NI))
+ if (sha256_ni_registered) {
crypto_unregister_shashes(sha256_ni_algs,
ARRAY_SIZE(sha256_ni_algs));
+ sha256_ni_registered = 0;
+ }
}

#else
-static inline int register_sha256_ni(void) { return 0; }
static inline void unregister_sha256_ni(void) { }
#endif

static int __init sha256_ssse3_mod_init(void)
{
- if (!x86_match_cpu(module_cpu_ids))
+ const char *feature_name;
+ const char *driver_name = NULL;
+ const char *driver_name2 = NULL;
+ int ret;
+
+ if (!x86_match_cpu(module_cpu_ids)) {
+ pr_info("CPU-optimized crypto module not loaded, required CPU features (SSSE3, AVX, AVX2, or SHA-NI) not supported\n");
return -ENODEV;
+ }

- if (register_sha256_ssse3())
- goto fail;
+ /* SHA-NI */
+ if (boot_cpu_has(X86_FEATURE_SHA_NI)) {

- if (register_sha256_avx()) {
- unregister_sha256_ssse3();
- goto fail;
- }
+ ret = crypto_register_shashes(sha256_ni_algs,
+ ARRAY_SIZE(sha256_ni_algs));
+ if (!ret) {
+ sha256_ni_registered = 1;
+ driver_name = sha256_ni_algs[0].base.cra_driver_name;
+ driver_name2 = sha256_ni_algs[1].base.cra_driver_name;
+ }

- if (register_sha256_avx2()) {
- unregister_sha256_avx();
- unregister_sha256_ssse3();
- goto fail;
- }
+ /* AVX2 */
+ } else if (boot_cpu_has(X86_FEATURE_AVX2)) {
+
+ if (boot_cpu_has(X86_FEATURE_BMI2)) {
+ ret = crypto_register_shashes(sha256_avx2_algs,
+ ARRAY_SIZE(sha256_avx2_algs));
+ if (!ret) {
+ sha256_avx2_registered = 1;
+ driver_name = sha256_avx2_algs[0].base.cra_driver_name;
+ driver_name2 = sha256_avx2_algs[1].base.cra_driver_name;
+ }
+ } else {
+ pr_info("AVX2-optimized version not engaged, all required CPU features (AVX2, BMI2) not supported\n");
+ }

- if (register_sha256_ni()) {
- unregister_sha256_avx2();
- unregister_sha256_avx();
- unregister_sha256_ssse3();
- goto fail;
+ /* AVX */
+ } else if (boot_cpu_has(X86_FEATURE_AVX)) {
+
+ if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
+ &feature_name)) {
+ ret = crypto_register_shashes(sha256_avx_algs,
+ ARRAY_SIZE(sha256_avx_algs));
+ if (!ret) {
+ sha256_avx_registered = 1;
+ driver_name = sha256_avx_algs[0].base.cra_driver_name;
+ driver_name2 = sha256_avx_algs[1].base.cra_driver_name;
+ }
+ } else {
+ pr_info("AVX-optimized version not engaged, CPU extended feature '%s' is not supported\n",
+ feature_name);
+ }
+
+ /* SSE3 */
+ } else if (boot_cpu_has(X86_FEATURE_SSSE3)) {
+ ret = crypto_register_shashes(sha256_ssse3_algs,
+ ARRAY_SIZE(sha256_ssse3_algs));
+ if (!ret) {
+ sha256_ssse3_registered = 1;
+ driver_name = sha256_ssse3_algs[0].base.cra_driver_name;
+ driver_name2 = sha256_ssse3_algs[1].base.cra_driver_name;
+ }
}

+ pr_info("CPU-optimized crypto module loaded (SSSE3=%s, AVX=%s, AVX2=%s, SHA-NI=%s): drivers=%s, %s\n",
+ sha256_ssse3_registered ? "yes" : "no",
+ sha256_avx_registered ? "yes" : "no",
+ sha256_avx2_registered ? "yes" : "no",
+ sha256_ni_registered ? "yes" : "no",
+ driver_name, driver_name2);
return 0;
-fail:
- return -ENODEV;
}

static void __exit sha256_ssse3_mod_fini(void)
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index fd5075a32613..df9f8207cc79 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -149,33 +149,21 @@ static struct shash_alg sha512_ssse3_algs[] = { {
}
} };

-static int register_sha512_ssse3(void)
-{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
- return crypto_register_shashes(sha512_ssse3_algs,
- ARRAY_SIZE(sha512_ssse3_algs));
- return 0;
-}
+static bool sha512_ssse3_registered;
+static bool sha512_avx_registered;
+static bool sha512_avx2_registered;

static void unregister_sha512_ssse3(void)
{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
+ if (sha512_ssse3_registered) {
crypto_unregister_shashes(sha512_ssse3_algs,
ARRAY_SIZE(sha512_ssse3_algs));
+ sha512_ssse3_registered = 0;
+ }
}

asmlinkage void sha512_transform_avx(struct sha512_state *state,
const u8 *data, int blocks);
-static bool avx_usable(void)
-{
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
- if (boot_cpu_has(X86_FEATURE_AVX))
- pr_info("AVX detected but unusable.\n");
- return false;
- }
-
- return true;
-}

static int sha512_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
@@ -225,19 +213,13 @@ static struct shash_alg sha512_avx_algs[] = { {
}
} };

-static int register_sha512_avx(void)
-{
- if (avx_usable())
- return crypto_register_shashes(sha512_avx_algs,
- ARRAY_SIZE(sha512_avx_algs));
- return 0;
-}
-
static void unregister_sha512_avx(void)
{
- if (avx_usable())
+ if (sha512_avx_registered) {
crypto_unregister_shashes(sha512_avx_algs,
ARRAY_SIZE(sha512_avx_algs));
+ sha512_avx_registered = 0;
+ }
}

asmlinkage void sha512_transform_rorx(struct sha512_state *state,
@@ -291,22 +273,6 @@ static struct shash_alg sha512_avx2_algs[] = { {
}
} };

-static bool avx2_usable(void)
-{
- if (avx_usable() && boot_cpu_has(X86_FEATURE_AVX2) &&
- boot_cpu_has(X86_FEATURE_BMI2))
- return true;
-
- return false;
-}
-
-static int register_sha512_avx2(void)
-{
- if (avx2_usable())
- return crypto_register_shashes(sha512_avx2_algs,
- ARRAY_SIZE(sha512_avx2_algs));
- return 0;
-}
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
@@ -317,33 +283,73 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static void unregister_sha512_avx2(void)
{
- if (avx2_usable())
+ if (sha512_avx2_registered) {
crypto_unregister_shashes(sha512_avx2_algs,
ARRAY_SIZE(sha512_avx2_algs));
+ sha512_avx2_registered = 0;
+ }
}

static int __init sha512_ssse3_mod_init(void)
{
- if (!x86_match_cpu(module_cpu_ids))
+ const char *feature_name;
+ const char *driver_name = NULL;
+ const char *driver_name2 = NULL;
+ int ret;
+
+ if (!x86_match_cpu(module_cpu_ids)) {
+ pr_info("CPU-optimized crypto module not loaded, required CPU features (SSSE3, AVX, or AVX2) not supported\n");
return -ENODEV;
+ }

- if (register_sha512_ssse3())
- goto fail;
+ /* AVX2 */
+ if (boot_cpu_has(X86_FEATURE_AVX2)) {
+ if (boot_cpu_has(X86_FEATURE_BMI2)) {
+ ret = crypto_register_shashes(sha512_avx2_algs,
+ ARRAY_SIZE(sha512_avx2_algs));
+ if (!ret) {
+ sha512_avx2_registered = 1;
+ driver_name = sha512_avx2_algs[0].base.cra_driver_name;
+ driver_name2 = sha512_avx2_algs[1].base.cra_driver_name;
+ }
+ } else {
+ pr_info("AVX2-optimized version not engaged, all required CPU features (AVX2, BMI2) not supported\n");
+ }

- if (register_sha512_avx()) {
- unregister_sha512_ssse3();
- goto fail;
- }
+ /* AVX */
+ } else if (boot_cpu_has(X86_FEATURE_AVX)) {
+
+ if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
+ &feature_name)) {
+ ret = crypto_register_shashes(sha512_avx_algs,
+ ARRAY_SIZE(sha512_avx_algs));
+ if (!ret) {
+ sha512_avx_registered = 1;
+ driver_name = sha512_avx_algs[0].base.cra_driver_name;
+ driver_name2 = sha512_avx_algs[1].base.cra_driver_name;
+ }
+ } else {
+ pr_info("AVX-optimized version not engaged, CPU extended feature '%s' is not supported\n",
+ feature_name);
+ }

- if (register_sha512_avx2()) {
- unregister_sha512_avx();
- unregister_sha512_ssse3();
- goto fail;
+ /* SSE3 */
+ } else if (boot_cpu_has(X86_FEATURE_SSSE3)) {
+ ret = crypto_register_shashes(sha512_ssse3_algs,
+ ARRAY_SIZE(sha512_ssse3_algs));
+ if (!ret) {
+ sha512_ssse3_registered = 1;
+ driver_name = sha512_ssse3_algs[0].base.cra_driver_name;
+ driver_name2 = sha512_ssse3_algs[1].base.cra_driver_name;
+ }
}

+ pr_info("CPU-optimized crypto module loaded (SSSE3=%s, AVX=%s, AVX2=%s): drivers=%s, %s\n",
+ sha512_ssse3_registered ? "yes" : "no",
+ sha512_avx_registered ? "yes" : "no",
+ sha512_avx2_registered ? "yes" : "no",
+ driver_name, driver_name2);
return 0;
-fail:
- return -ENODEV;
}

static void __exit sha512_ssse3_mod_fini(void)
--
2.37.3

Subject: [PATCH v2 05/19] crypto: x86/crc - limit FPU preemption

As done by the ECB and CBC helpers in arch/x86/crypt/ecb_cbc_helpers.h,
limit the number of bytes processed between kernel_fpu_begin() and
kernel_fpu_end() calls.

Those functions call preempt_disable() and preempt_enable(), so
the CPU core is unavailable for scheduling while running, leading to:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ...

Fixes: 78c37d191dd6 ("crypto: crc32 - add crc32 pclmulqdq implementation and wrappers for table implementation")
Fixes: 6a8ce1ef3940 ("crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction")
Fixes: 0b95a7f85718 ("crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform")
Suggested-by: Herbert Xu <[email protected]>
Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/crc32-pclmul_asm.S | 6 ++--
arch/x86/crypto/crc32-pclmul_glue.c | 19 ++++++++----
arch/x86/crypto/crc32c-intel_glue.c | 29 ++++++++++++++----
arch/x86/crypto/crct10dif-pclmul_glue.c | 39 ++++++++++++++++++++-----
4 files changed, 71 insertions(+), 22 deletions(-)

diff --git a/arch/x86/crypto/crc32-pclmul_asm.S b/arch/x86/crypto/crc32-pclmul_asm.S
index ca53e96996ac..9abd861636c3 100644
--- a/arch/x86/crypto/crc32-pclmul_asm.S
+++ b/arch/x86/crypto/crc32-pclmul_asm.S
@@ -72,15 +72,15 @@
.text
/**
* Calculate crc32
- * BUF - buffer (16 bytes aligned)
- * LEN - sizeof buffer (16 bytes aligned), LEN should be grater than 63
+ * BUF - buffer - must be 16 bytes aligned
+ * LEN - sizeof buffer - must be multiple of 16 bytes and greater than 63
* CRC - initial crc32
* return %eax crc32
* uint crc32_pclmul_le_16(unsigned char const *buffer,
* size_t len, uint crc32)
*/

-SYM_FUNC_START(crc32_pclmul_le_16) /* buffer and buffer size are 16 bytes aligned */
+SYM_FUNC_START(crc32_pclmul_le_16)
movdqa (BUF), %xmm1
movdqa 0x10(BUF), %xmm2
movdqa 0x20(BUF), %xmm3
diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c
index 98cf3b4e4c9f..38539c6edfe5 100644
--- a/arch/x86/crypto/crc32-pclmul_glue.c
+++ b/arch/x86/crypto/crc32-pclmul_glue.c
@@ -46,6 +46,8 @@
#define SCALE_F 16L /* size of xmm register */
#define SCALE_F_MASK (SCALE_F - 1)

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
u32 crc32_pclmul_le_16(unsigned char const *buffer, size_t len, u32 crc32);

static u32 __attribute__((pure))
@@ -70,12 +72,19 @@ static u32 __attribute__((pure))
iquotient = len & (~SCALE_F_MASK);
iremainder = len & SCALE_F_MASK;

- kernel_fpu_begin();
- crc = crc32_pclmul_le_16(p, iquotient, crc);
- kernel_fpu_end();
+ do {
+ unsigned int chunk = min(iquotient, FPU_BYTES);
+
+ kernel_fpu_begin();
+ crc = crc32_pclmul_le_16(p, chunk, crc);
+ kernel_fpu_end();
+
+ iquotient -= chunk;
+ p += chunk;
+ } while (iquotient >= PCLMUL_MIN_LEN);

- if (iremainder)
- crc = crc32_le(crc, p + iquotient, iremainder);
+ if (iquotient || iremainder)
+ crc = crc32_le(crc, p, iquotient + iremainder);

return crc;
}
diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index feccb5254c7e..ece620227057 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -41,6 +41,8 @@
*/
#define CRC32C_PCL_BREAKEVEN 512

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
asmlinkage unsigned int crc_pcl(const u8 *buffer, int len,
unsigned int crc_init);
#endif /* CONFIG_X86_64 */
@@ -158,9 +160,16 @@ static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data,
* overcome kernel fpu state save/restore overhead
*/
if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) {
- kernel_fpu_begin();
- *crcp = crc_pcl(data, len, *crcp);
- kernel_fpu_end();
+ do {
+ unsigned int chunk = min(len, FPU_BYTES);
+
+ kernel_fpu_begin();
+ *crcp = crc_pcl(data, chunk, *crcp);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ } while (len);
} else
*crcp = crc32c_intel_le_hw(*crcp, data, len);
return 0;
@@ -170,9 +179,17 @@ static int __crc32c_pcl_intel_finup(u32 *crcp, const u8 *data, unsigned int len,
u8 *out)
{
if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) {
- kernel_fpu_begin();
- *(__le32 *)out = ~cpu_to_le32(crc_pcl(data, len, *crcp));
- kernel_fpu_end();
+ do {
+ unsigned int chunk = min(len, FPU_BYTES);
+
+ kernel_fpu_begin();
+ *crcp = crc_pcl(data, chunk, *crcp);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ } while (len);
+ *(__le32 *)out = ~cpu_to_le32(*crcp);
} else
*(__le32 *)out =
~cpu_to_le32(crc32c_intel_le_hw(*crcp, data, len));
diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c
index 71291d5af9f4..54a537fc88ee 100644
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c
+++ b/arch/x86/crypto/crct10dif-pclmul_glue.c
@@ -34,6 +34,10 @@
#include <asm/cpu_device_id.h>
#include <asm/simd.h>

+#define PCLMUL_MIN_LEN 16U /* minimum size of buffer for crc_t10dif_pcl */
+
+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
asmlinkage u16 crc_t10dif_pcl(u16 init_crc, const u8 *buf, size_t len);

struct chksum_desc_ctx {
@@ -54,10 +58,19 @@ static int chksum_update(struct shash_desc *desc, const u8 *data,
{
struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);

- if (length >= 16 && crypto_simd_usable()) {
- kernel_fpu_begin();
- ctx->crc = crc_t10dif_pcl(ctx->crc, data, length);
- kernel_fpu_end();
+ if (length >= PCLMUL_MIN_LEN && crypto_simd_usable()) {
+ do {
+ unsigned int chunk = min(length, FPU_BYTES);
+
+ kernel_fpu_begin();
+ ctx->crc = crc_t10dif_pcl(ctx->crc, data, chunk);
+ kernel_fpu_end();
+
+ length -= chunk;
+ data += chunk;
+ } while (length >= PCLMUL_MIN_LEN);
+ if (length)
+ ctx->crc = crc_t10dif_generic(ctx->crc, data, length);
} else
ctx->crc = crc_t10dif_generic(ctx->crc, data, length);
return 0;
@@ -73,10 +86,20 @@ static int chksum_final(struct shash_desc *desc, u8 *out)

static int __chksum_finup(__u16 crc, const u8 *data, unsigned int len, u8 *out)
{
- if (len >= 16 && crypto_simd_usable()) {
- kernel_fpu_begin();
- *(__u16 *)out = crc_t10dif_pcl(crc, data, len);
- kernel_fpu_end();
+ if (len >= PCLMUL_MIN_LEN && crypto_simd_usable()) {
+ do {
+ unsigned int chunk = min(len, FPU_BYTES);
+
+ kernel_fpu_begin();
+ crc = crc_t10dif_pcl(crc, data, chunk);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ } while (len >= PCLMUL_MIN_LEN);
+ if (len)
+ crc = crc_t10dif_generic(crc, data, len);
+ *(__u16 *)out = crc;
} else
*(__u16 *)out = crc_t10dif_generic(crc, data, len);
return 0;
--
2.37.3

Subject: [PATCH v2 04/19] crypto: x86/sha - limit FPU preemption

As done by the ECB and CBC helpers in arch/x86/crypt/ecb_cbc_helpers.h,
limit the number of bytes processed between kernel_fpu_begin() and
kernel_fpu_end() calls.

Those functions call preempt_disable() and preempt_enable(), so
the CPU core is unavailable for scheduling while running.

This leads to "rcu_preempt detected expedited stalls" with stack dumps
pointing to the optimized hash function if the module is loaded and
used a lot:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ...

For example, that can occur during boot with the stack track pointing
to the sha512-x86 function if the system set to use SHA-512 for
module signing. The call trace includes:
module_sig_check
mod_verify_sig
pkcs7_verify
pkcs7_digest
sha512_finup
sha512_base_do_update

Fixes: 66be89515888 ("crypto: sha1 - SSSE3 based SHA1 implementation for x86-64")
Fixes: 8275d1aa6422 ("crypto: sha256 - Create module providing optimized SHA256 routines using SSSE3, AVX or AVX2 instructions.")
Fixes: 87de4579f92d ("crypto: sha512 - Create module providing optimized SHA512 routines using SSSE3, AVX or AVX2 instructions.")
Fixes: aa031b8f702e ("crypto: x86/sha512 - load based on CPU features")
Suggested-by: Herbert Xu <[email protected]>
Reviewed-by: Tim Chen <[email protected]>
Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/sha1_ssse3_glue.c | 32 ++++++++++++++++++++++++-----
arch/x86/crypto/sha256_ssse3_glue.c | 32 ++++++++++++++++++++++++-----
arch/x86/crypto/sha512_ssse3_glue.c | 32 ++++++++++++++++++++++++-----
3 files changed, 81 insertions(+), 15 deletions(-)

diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 44340a1139e0..a9f5779b41ca 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -26,6 +26,8 @@
#include <crypto/sha1_base.h>
#include <asm/simd.h>

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
static int sha1_update(struct shash_desc *desc, const u8 *data,
unsigned int len, sha1_block_fn *sha1_xform)
{
@@ -41,9 +43,18 @@ static int sha1_update(struct shash_desc *desc, const u8 *data,
*/
BUILD_BUG_ON(offsetof(struct sha1_state, state) != 0);

- kernel_fpu_begin();
- sha1_base_do_update(desc, data, len, sha1_xform);
- kernel_fpu_end();
+ do {
+ unsigned int chunk = min(len, FPU_BYTES);
+
+ if (chunk) {
+ kernel_fpu_begin();
+ sha1_base_do_update(desc, data, chunk, sha1_xform);
+ kernel_fpu_end();
+ }
+
+ len -= chunk;
+ data += chunk;
+ } while (len);

return 0;
}
@@ -54,9 +65,20 @@ static int sha1_finup(struct shash_desc *desc, const u8 *data,
if (!crypto_simd_usable())
return crypto_sha1_finup(desc, data, len, out);

+ do {
+ unsigned int chunk = min(len, FPU_BYTES);
+
+ if (chunk) {
+ kernel_fpu_begin();
+ sha1_base_do_update(desc, data, chunk, sha1_xform);
+ kernel_fpu_end();
+ }
+
+ len -= chunk;
+ data += chunk;
+ } while (len);
+
kernel_fpu_begin();
- if (len)
- sha1_base_do_update(desc, data, len, sha1_xform);
sha1_base_do_finalize(desc, sha1_xform);
kernel_fpu_end();

diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 3a5f6be7dbba..322c8aa907af 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -40,6 +40,8 @@
#include <linux/string.h>
#include <asm/simd.h>

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
asmlinkage void sha256_transform_ssse3(struct sha256_state *state,
const u8 *data, int blocks);

@@ -58,9 +60,18 @@ static int _sha256_update(struct shash_desc *desc, const u8 *data,
*/
BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);

- kernel_fpu_begin();
- sha256_base_do_update(desc, data, len, sha256_xform);
- kernel_fpu_end();
+ do {
+ unsigned int chunk = min(len, FPU_BYTES);
+
+ if (chunk) {
+ kernel_fpu_begin();
+ sha256_base_do_update(desc, data, chunk, sha256_xform);
+ kernel_fpu_end();
+ }
+
+ len -= chunk;
+ data += chunk;
+ } while (len);

return 0;
}
@@ -71,9 +82,20 @@ static int sha256_finup(struct shash_desc *desc, const u8 *data,
if (!crypto_simd_usable())
return crypto_sha256_finup(desc, data, len, out);

+ do {
+ unsigned int chunk = min(len, FPU_BYTES);
+
+ if (chunk) {
+ kernel_fpu_begin();
+ sha256_base_do_update(desc, data, chunk, sha256_xform);
+ kernel_fpu_end();
+ }
+
+ len -= chunk;
+ data += chunk;
+ } while (len);
+
kernel_fpu_begin();
- if (len)
- sha256_base_do_update(desc, data, len, sha256_xform);
sha256_base_do_finalize(desc, sha256_xform);
kernel_fpu_end();

diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 6d3b85e53d0e..fd5075a32613 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -39,6 +39,8 @@
#include <asm/cpu_device_id.h>
#include <asm/simd.h>

+#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+
asmlinkage void sha512_transform_ssse3(struct sha512_state *state,
const u8 *data, int blocks);

@@ -57,9 +59,18 @@ static int sha512_update(struct shash_desc *desc, const u8 *data,
*/
BUILD_BUG_ON(offsetof(struct sha512_state, state) != 0);

- kernel_fpu_begin();
- sha512_base_do_update(desc, data, len, sha512_xform);
- kernel_fpu_end();
+ do {
+ unsigned int chunk = min(len, FPU_BYTES);
+
+ if (chunk) {
+ kernel_fpu_begin();
+ sha512_base_do_update(desc, data, chunk, sha512_xform);
+ kernel_fpu_end();
+ }
+
+ len -= chunk;
+ data += chunk;
+ } while (len);

return 0;
}
@@ -70,9 +81,20 @@ static int sha512_finup(struct shash_desc *desc, const u8 *data,
if (!crypto_simd_usable())
return crypto_sha512_finup(desc, data, len, out);

+ do {
+ unsigned int chunk = min(len, FPU_BYTES);
+
+ if (chunk) {
+ kernel_fpu_begin();
+ sha512_base_do_update(desc, data, chunk, sha512_xform);
+ kernel_fpu_end();
+ }
+
+ len -= chunk;
+ data += chunk;
+ } while (len);
+
kernel_fpu_begin();
- if (len)
- sha512_base_do_update(desc, data, len, sha512_xform);
sha512_base_do_finalize(desc, sha512_xform);
kernel_fpu_end();

--
2.37.3

Subject: [PATCH v2 15/19] crypto: x86 - add pr_fmt to all modules

Add pr_fmt to all the modules so prints are prefixed by the Kconfig
system module name (which is usually similar to, but not an
exact match, for the runtime module name).

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/aegis128-aesni-glue.c | 2 ++
arch/x86/crypto/aesni-intel_glue.c | 2 ++
arch/x86/crypto/aria_aesni_avx_glue.c | 2 ++
arch/x86/crypto/blake2s-glue.c | 2 ++
arch/x86/crypto/blowfish_glue.c | 2 ++
arch/x86/crypto/camellia_aesni_avx2_glue.c | 2 ++
arch/x86/crypto/camellia_aesni_avx_glue.c | 2 ++
arch/x86/crypto/camellia_glue.c | 3 +++
arch/x86/crypto/cast5_avx_glue.c | 2 ++
arch/x86/crypto/cast6_avx_glue.c | 2 ++
arch/x86/crypto/chacha_glue.c | 2 ++
arch/x86/crypto/crc32-pclmul_glue.c | 3 +++
arch/x86/crypto/crc32c-intel_glue.c | 3 +++
arch/x86/crypto/crct10dif-pclmul_glue.c | 2 ++
arch/x86/crypto/curve25519-x86_64.c | 2 ++
arch/x86/crypto/des3_ede_glue.c | 2 ++
arch/x86/crypto/ghash-clmulni-intel_glue.c | 2 ++
arch/x86/crypto/nhpoly1305-avx2-glue.c | 2 ++
arch/x86/crypto/nhpoly1305-sse2-glue.c | 2 ++
arch/x86/crypto/poly1305_glue.c | 2 ++
arch/x86/crypto/polyval-clmulni_glue.c | 2 ++
arch/x86/crypto/serpent_avx2_glue.c | 2 ++
arch/x86/crypto/serpent_avx_glue.c | 2 ++
arch/x86/crypto/serpent_sse2_glue.c | 2 ++
arch/x86/crypto/sha256_ssse3_glue.c | 1 -
arch/x86/crypto/sm4_aesni_avx2_glue.c | 2 ++
arch/x86/crypto/sm4_aesni_avx_glue.c | 2 ++
arch/x86/crypto/twofish_avx_glue.c | 2 ++
arch/x86/crypto/twofish_glue.c | 2 ++
arch/x86/crypto/twofish_glue_3way.c | 2 ++
30 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c
index 9e4ba031704d..122bfd04ee47 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -7,6 +7,8 @@
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/internal/aead.h>
#include <crypto/internal/simd.h>
#include <crypto/internal/skcipher.h>
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 4a530a558436..df93cb44b4eb 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -15,6 +15,8 @@
* Copyright (c) 2010, Intel Corporation.
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/hardirq.h>
#include <linux/types.h>
#include <linux/module.h>
diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c
index c561ea4fefa5..589097728bd1 100644
--- a/arch/x86/crypto/aria_aesni_avx_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx_glue.c
@@ -5,6 +5,8 @@
* Copyright (c) 2022 Taehee Yoo <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/aria.h>
diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c
index 5153bb423dbe..ac7fb7a9922b 100644
--- a/arch/x86/crypto/blake2s-glue.c
+++ b/arch/x86/crypto/blake2s-glue.c
@@ -3,6 +3,8 @@
* Copyright (C) 2015-2019 Jason A. Donenfeld <[email protected]>. All Rights Reserved.
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/internal/blake2s.h>

#include <linux/types.h>
diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c
index 4c0ead71b198..5cfcbb91c4ca 100644
--- a/arch/x86/crypto/blowfish_glue.c
+++ b/arch/x86/crypto/blowfish_glue.c
@@ -8,6 +8,8 @@
* Copyright (c) 2006 Herbert Xu <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/algapi.h>
#include <crypto/blowfish.h>
#include <crypto/internal/skcipher.h>
diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index 8e3ac5be7cf6..851f2a29963c 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -5,6 +5,8 @@
* Copyright © 2013 Jussi Kivilinna <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <linux/crypto.h>
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index 54fcd86160ff..8846493c92fb 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -5,6 +5,8 @@
* Copyright © 2012-2013 Jussi Kivilinna <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <linux/crypto.h>
diff --git a/arch/x86/crypto/camellia_glue.c b/arch/x86/crypto/camellia_glue.c
index e21d2d5b68f9..3c14a904af00 100644
--- a/arch/x86/crypto/camellia_glue.c
+++ b/arch/x86/crypto/camellia_glue.c
@@ -8,6 +8,9 @@
* Copyright (C) 2006 NTT (Nippon Telegraph and Telephone Corporation)
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <asm/cpu_device_id.h>
#include <asm/unaligned.h>
#include <linux/crypto.h>
#include <linux/init.h>
diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index bdc3c763334c..fdeec0849ab5 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -6,6 +6,8 @@
* <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/algapi.h>
#include <crypto/cast5.h>
#include <crypto/internal/simd.h>
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index addca34b3511..9258082408eb 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -8,6 +8,8 @@
* Copyright © 2013 Jussi Kivilinna <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/module.h>
#include <linux/types.h>
#include <linux/crypto.h>
diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index 7275cae3380d..8e5cadc808b4 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -6,6 +6,8 @@
* Copyright (C) 2015 Martin Willi
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/algapi.h>
#include <crypto/internal/chacha.h>
#include <crypto/internal/simd.h>
diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c
index d49a19dcee37..bc2b31b04e05 100644
--- a/arch/x86/crypto/crc32-pclmul_glue.c
+++ b/arch/x86/crypto/crc32-pclmul_glue.c
@@ -26,6 +26,9 @@
*
* Wrappers for kernel crypto shash api to pclmulqdq crc32 implementation.
*/
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/init.h>
#include <linux/module.h>
#include <linux/string.h>
diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index 980c62929256..ebf530934a3e 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -11,6 +11,9 @@
* Authors: Austin Zhang <[email protected]>
* Kent Liu <[email protected]>
*/
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/init.h>
#include <linux/module.h>
#include <linux/string.h>
diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c
index 3b8e9394c40d..03e35a1b7677 100644
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c
+++ b/arch/x86/crypto/crct10dif-pclmul_glue.c
@@ -22,6 +22,8 @@
*
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/types.h>
#include <linux/module.h>
#include <linux/crc-t10dif.h>
diff --git a/arch/x86/crypto/curve25519-x86_64.c b/arch/x86/crypto/curve25519-x86_64.c
index 7fe395dfa79d..f9a1adb0c183 100644
--- a/arch/x86/crypto/curve25519-x86_64.c
+++ b/arch/x86/crypto/curve25519-x86_64.c
@@ -4,6 +4,8 @@
* Copyright (c) 2016-2020 INRIA, CMU and Microsoft Corporation
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/curve25519.h>
#include <crypto/internal/kpp.h>

diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c
index 168cac5c6ca6..83e686a6c2f3 100644
--- a/arch/x86/crypto/des3_ede_glue.c
+++ b/arch/x86/crypto/des3_ede_glue.c
@@ -8,6 +8,8 @@
* Copyright (c) 2006 Herbert Xu <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/algapi.h>
#include <crypto/des.h>
#include <crypto/internal/skcipher.h>
diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index 69945e41bc41..3ad55144da48 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -7,6 +7,8 @@
* Author: Huang Ying <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/err.h>
#include <linux/module.h>
#include <linux/init.h>
diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index a8046334ddca..40f49107e5a9 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -6,6 +6,8 @@
* Copyright 2018 Google LLC
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/internal/hash.h>
#include <crypto/internal/simd.h>
#include <crypto/nhpoly1305.h>
diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c
index cdbe5df00927..bb40fed92c92 100644
--- a/arch/x86/crypto/nhpoly1305-sse2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c
@@ -6,6 +6,8 @@
* Copyright 2018 Google LLC
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/internal/hash.h>
#include <crypto/internal/simd.h>
#include <crypto/nhpoly1305.h>
diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
index 3e6ff505cd26..a2a7cb39cdec 100644
--- a/arch/x86/crypto/poly1305_glue.c
+++ b/arch/x86/crypto/poly1305_glue.c
@@ -3,6 +3,8 @@
* Copyright (C) 2015-2019 Jason A. Donenfeld <[email protected]>. All Rights Reserved.
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/algapi.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/poly1305.h>
diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c
index 2502964afef6..5a345db20ca9 100644
--- a/arch/x86/crypto/polyval-clmulni_glue.c
+++ b/arch/x86/crypto/polyval-clmulni_glue.c
@@ -16,6 +16,8 @@
* operations.
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/algapi.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/simd.h>
diff --git a/arch/x86/crypto/serpent_avx2_glue.c b/arch/x86/crypto/serpent_avx2_glue.c
index 24741d33edaf..5944bf5ead2e 100644
--- a/arch/x86/crypto/serpent_avx2_glue.c
+++ b/arch/x86/crypto/serpent_avx2_glue.c
@@ -5,6 +5,8 @@
* Copyright © 2012-2013 Jussi Kivilinna <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/module.h>
#include <linux/types.h>
#include <linux/crypto.h>
diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index 0db18d99da50..45713c7a4cb9 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -8,6 +8,8 @@
* Copyright © 2011-2013 Jussi Kivilinna <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/module.h>
#include <linux/types.h>
#include <linux/crypto.h>
diff --git a/arch/x86/crypto/serpent_sse2_glue.c b/arch/x86/crypto/serpent_sse2_glue.c
index 5288441cc223..d8aa0d3fbf15 100644
--- a/arch/x86/crypto/serpent_sse2_glue.c
+++ b/arch/x86/crypto/serpent_sse2_glue.c
@@ -12,6 +12,8 @@
* Copyright (c) 2006 Herbert Xu <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/module.h>
#include <linux/types.h>
#include <linux/crypto.h>
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 42e8cb1a6708..8a0fb308fbba 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -26,7 +26,6 @@
* SOFTWARE.
*/

-
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <crypto/internal/hash.h>
diff --git a/arch/x86/crypto/sm4_aesni_avx2_glue.c b/arch/x86/crypto/sm4_aesni_avx2_glue.c
index 2e9fe76056b8..3fe9e170b880 100644
--- a/arch/x86/crypto/sm4_aesni_avx2_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx2_glue.c
@@ -8,6 +8,8 @@
* Copyright (c) 2021 Tianjia Zhang <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/module.h>
#include <linux/crypto.h>
#include <linux/kernel.h>
diff --git a/arch/x86/crypto/sm4_aesni_avx_glue.c b/arch/x86/crypto/sm4_aesni_avx_glue.c
index f730822f203a..14ae012948ae 100644
--- a/arch/x86/crypto/sm4_aesni_avx_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx_glue.c
@@ -8,6 +8,8 @@
* Copyright (c) 2021 Tianjia Zhang <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/module.h>
#include <linux/crypto.h>
#include <linux/kernel.h>
diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index 4657e6efc35d..044e4f92e2c0 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -8,6 +8,8 @@
* Copyright © 2013 Jussi Kivilinna <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/module.h>
#include <linux/types.h>
#include <linux/crypto.h>
diff --git a/arch/x86/crypto/twofish_glue.c b/arch/x86/crypto/twofish_glue.c
index ade98aef3402..031ed290c755 100644
--- a/arch/x86/crypto/twofish_glue.c
+++ b/arch/x86/crypto/twofish_glue.c
@@ -38,6 +38,8 @@
* Third Edition.
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/twofish.h>
#include <linux/crypto.h>
#include <linux/init.h>
diff --git a/arch/x86/crypto/twofish_glue_3way.c b/arch/x86/crypto/twofish_glue_3way.c
index 790e5a59a9a7..7e2a18e3abe7 100644
--- a/arch/x86/crypto/twofish_glue_3way.c
+++ b/arch/x86/crypto/twofish_glue_3way.c
@@ -5,6 +5,8 @@
* Copyright (c) 2011 Jussi Kivilinna <[email protected]>
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <crypto/algapi.h>
#include <crypto/twofish.h>
#include <linux/crypto.h>
--
2.37.3

Subject: [PATCH v2 18/19] crypto: x86 - standardize not loaded prints

Standardize the prints that additional required CPU features are not
present along with the main CPU features (e.g., OSXSAVE is not
present along with AVX).

Although modules are not supposed to print unless loaded and
active, these are existing exceptions.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/aegis128-aesni-glue.c | 4 +++-
arch/x86/crypto/aria_aesni_avx_glue.c | 4 ++--
arch/x86/crypto/camellia_aesni_avx2_glue.c | 5 +++--
arch/x86/crypto/camellia_aesni_avx_glue.c | 5 +++--
arch/x86/crypto/cast5_avx_glue.c | 3 ++-
arch/x86/crypto/cast6_avx_glue.c | 3 ++-
arch/x86/crypto/crc32-pclmul_glue.c | 4 +++-
arch/x86/crypto/nhpoly1305-avx2-glue.c | 4 +++-
arch/x86/crypto/serpent_avx2_glue.c | 8 +++++---
arch/x86/crypto/serpent_avx_glue.c | 3 ++-
arch/x86/crypto/sm3_avx_glue.c | 7 ++++---
arch/x86/crypto/sm4_aesni_avx2_glue.c | 5 +++--
arch/x86/crypto/sm4_aesni_avx_glue.c | 5 +++--
arch/x86/crypto/twofish_avx_glue.c | 3 ++-
14 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c
index e8eaf79ef220..aa94b9f8703c 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -281,8 +281,10 @@ static int __init crypto_aegis128_aesni_module_init(void)

if (!boot_cpu_has(X86_FEATURE_XMM2) ||
!boot_cpu_has(X86_FEATURE_AES) ||
- !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
+ !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL)) {
+ pr_info("CPU-optimized crypto module not loaded, all required CPU features (SSE2, AESNI) not supported\n");
return -ENODEV;
+ }

ret = simd_register_aeads_compat(&crypto_aegis128_aesni_alg, 1,
&simd_alg);
diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c
index d58fb995a266..24982450a125 100644
--- a/arch/x86/crypto/aria_aesni_avx_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx_glue.c
@@ -176,13 +176,13 @@ static int __init aria_avx_init(void)
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX or AES-NI instructions are not detected.\n");
+ pr_info("CPU-optimized crypto module not loaded, all required CPU features (AVX, AES-NI, OSXSAVE) not supported\n");
return -ENODEV;
}

if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ pr_info("CPU extended feature '%s' is not supported\n", feature_name);
return -ENODEV;
}

diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index e6c4ed1e40d2..bc6862077984 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -123,13 +123,14 @@ static int __init camellia_aesni_init(void)
!boot_cpu_has(X86_FEATURE_AVX2) ||
!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX2 or AES-NI instructions are not detected.\n");
+ pr_info("CPU-optimized crypto module not loaded, all required CPU features (AVX, AVX2, AESNI, OSXSAVE) not supported\n");
return -ENODEV;
}

if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ pr_info("CPU-optimized crypto module not loaded, CPU extended feature '%s' is not supported\n",
+ feature_name);
return -ENODEV;
}

diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index 6a9eadf0fe90..96e7e1accb6c 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -121,13 +121,14 @@ static int __init camellia_aesni_init(void)
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX or AES-NI instructions are not detected.\n");
+ pr_info("CPU-optimized crypto module not loaded, all required CPU features (AVX, AESNI, OSXSAVE) not supported\n");
return -ENODEV;
}

if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ pr_info("CPU-optimized crypto module not loaded, CPU extended feature '%s' is not supported\n",
+ feature_name);
return -ENODEV;
}

diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index b5ae17c3ac53..89650fffb550 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -114,7 +114,8 @@ static int __init cast5_init(void)

if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ pr_info("CPU-optimized crypto module not loaded, CPU extended feature '%s' is not supported\n",
+ feature_name);
return -ENODEV;
}

diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index d1c14a5f80d7..d69f62ac9553 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -114,7 +114,8 @@ static int __init cast6_init(void)

if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ pr_info("CPU-optimized crypto module not loaded, CPU extended feature '%s' is not supported\n",
+ feature_name);
return -ENODEV;
}

diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c
index c56d3d3ab0a0..4cf86f8f9428 100644
--- a/arch/x86/crypto/crc32-pclmul_glue.c
+++ b/arch/x86/crypto/crc32-pclmul_glue.c
@@ -192,8 +192,10 @@ static int __init crc32_pclmul_mod_init(void)
{
int ret;

- if (!x86_match_cpu(module_cpu_ids))
+ if (!x86_match_cpu(module_cpu_ids)) {
+ pr_info("CPU-optimized crypto module not loaded, required CPU feature (PCLMULQDQ) not supported\n");
return -ENODEV;
+ }

ret = crypto_register_shash(&alg);
if (!ret)
diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index 2dc7b618771f..834bf64bb160 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -74,8 +74,10 @@ static int __init nhpoly1305_mod_init(void)
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_AVX2) ||
- !boot_cpu_has(X86_FEATURE_OSXSAVE))
+ !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ pr_info("CPU-optimized crypto module not loaded, all required CPU features (AVX2, OSXSAVE) not supported\n");
return -ENODEV;
+ }

ret = crypto_register_shash(&nhpoly1305_alg);
if (!ret)
diff --git a/arch/x86/crypto/serpent_avx2_glue.c b/arch/x86/crypto/serpent_avx2_glue.c
index bf59addaf804..4bd59ccea69a 100644
--- a/arch/x86/crypto/serpent_avx2_glue.c
+++ b/arch/x86/crypto/serpent_avx2_glue.c
@@ -112,13 +112,15 @@ static int __init serpent_avx2_init(void)

return -ENODEV;

- if (!boot_cpu_has(X86_FEATURE_AVX2) || !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX2 instructions are not detected.\n");
+ if (!boot_cpu_has(X86_FEATURE_AVX2) ||
+ !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ pr_info("CPU-optimized crypto module not loaded, all required CPU features (AVX2, OSXSAVE) not supported\n");
return -ENODEV;
}
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ pr_info("CPU-optimized crypto module not loaded, CPU extended feature '%s' is not supported\n",
+ feature_name);
return -ENODEV;
}

diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index 7b0c02a61552..853b48677d2b 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -121,7 +121,8 @@ static int __init serpent_init(void)

if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ pr_info("CPU-optimized crypto module not loaded, CPU extended feature '%s' is not supported\n",
+ feature_name);
return -ENODEV;
}

diff --git a/arch/x86/crypto/sm3_avx_glue.c b/arch/x86/crypto/sm3_avx_glue.c
index 532f07b05745..5250fee79147 100644
--- a/arch/x86/crypto/sm3_avx_glue.c
+++ b/arch/x86/crypto/sm3_avx_glue.c
@@ -131,18 +131,19 @@ static int __init sm3_avx_mod_init(void)
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_AVX)) {
- pr_info("AVX instruction are not detected.\n");
+ pr_info("CPU-optimized crypto module not loaded, required CPU feature (AVX) not supported\n");
return -ENODEV;
}

if (!boot_cpu_has(X86_FEATURE_BMI2)) {
- pr_info("BMI2 instruction are not detected.\n");
+ pr_info("CPU-optimized crypto module not loaded, required CPU feature (BMI2) not supported\n");
return -ENODEV;
}

if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ pr_info("CPU-optimized crypto module not loaded, CPU extended feature '%s' is not supported\n",
+ feature_name);
return -ENODEV;
}

diff --git a/arch/x86/crypto/sm4_aesni_avx2_glue.c b/arch/x86/crypto/sm4_aesni_avx2_glue.c
index 42819ee5d36d..cdd7ca92ca61 100644
--- a/arch/x86/crypto/sm4_aesni_avx2_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx2_glue.c
@@ -152,13 +152,14 @@ static int __init sm4_init(void)
!boot_cpu_has(X86_FEATURE_AVX2) ||
!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX2 or AES-NI instructions are not detected.\n");
+ pr_info("CPU-optimized crypto module not loaded, all required CPU features (AVX, AVX2, AESNI, OSXSAVE) not supported\n");
return -ENODEV;
}

if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ pr_info("CPU-optimized crypto module not loaded, CPU extended feature '%s' is not supported\n",
+ feature_name);
return -ENODEV;
}

diff --git a/arch/x86/crypto/sm4_aesni_avx_glue.c b/arch/x86/crypto/sm4_aesni_avx_glue.c
index 8a25376d341f..a2ae3d1e0a4a 100644
--- a/arch/x86/crypto/sm4_aesni_avx_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx_glue.c
@@ -468,13 +468,14 @@ static int __init sm4_init(void)
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX or AES-NI instructions are not detected.\n");
+ pr_info("CPU-optimized crypto module not loaded, all required CPU features (AVX, AESNI, OSXSAVE) not supported\n");
return -ENODEV;
}

if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ pr_info("CPU-optimized crypto module not loaded, CPU extended feature '%s' is not supported\n",
+ feature_name);
return -ENODEV;
}

diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index ccf016bf6ef2..70167dd01816 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -123,7 +123,8 @@ static int __init twofish_init(void)
return -ENODEV;

if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ pr_info("CPU-optimized crypto module not loaded, CPU extended feature '%s' is not supported\n",
+ feature_name);
return -ENODEV;
}

--
2.37.3

2022-10-13 01:13:28

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2 04/19] crypto: x86/sha - limit FPU preemption

On Wed, Oct 12, 2022 at 04:59:16PM -0500, Robert Elliott wrote:
> As done by the ECB and CBC helpers in arch/x86/crypt/ecb_cbc_helpers.h,
> limit the number of bytes processed between kernel_fpu_begin() and
> kernel_fpu_end() calls.
>
> Those functions call preempt_disable() and preempt_enable(), so
> the CPU core is unavailable for scheduling while running.
>
> This leads to "rcu_preempt detected expedited stalls" with stack dumps
> pointing to the optimized hash function if the module is loaded and
> used a lot:
> rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ...
>
> For example, that can occur during boot with the stack track pointing
> to the sha512-x86 function if the system set to use SHA-512 for
> module signing. The call trace includes:
> module_sig_check
> mod_verify_sig
> pkcs7_verify
> pkcs7_digest
> sha512_finup
> sha512_base_do_update
>
> Fixes: 66be89515888 ("crypto: sha1 - SSSE3 based SHA1 implementation for x86-64")
> Fixes: 8275d1aa6422 ("crypto: sha256 - Create module providing optimized SHA256 routines using SSSE3, AVX or AVX2 instructions.")
> Fixes: 87de4579f92d ("crypto: sha512 - Create module providing optimized SHA512 routines using SSSE3, AVX or AVX2 instructions.")
> Fixes: aa031b8f702e ("crypto: x86/sha512 - load based on CPU features")
> Suggested-by: Herbert Xu <[email protected]>
> Reviewed-by: Tim Chen <[email protected]>
> Signed-off-by: Robert Elliott <[email protected]>
> ---
> arch/x86/crypto/sha1_ssse3_glue.c | 32 ++++++++++++++++++++++++-----
> arch/x86/crypto/sha256_ssse3_glue.c | 32 ++++++++++++++++++++++++-----
> arch/x86/crypto/sha512_ssse3_glue.c | 32 ++++++++++++++++++++++++-----
> 3 files changed, 81 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
> index 44340a1139e0..a9f5779b41ca 100644
> --- a/arch/x86/crypto/sha1_ssse3_glue.c
> +++ b/arch/x86/crypto/sha1_ssse3_glue.c
> @@ -26,6 +26,8 @@
> #include <crypto/sha1_base.h>
> #include <asm/simd.h>
>
> +#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */

Declare this inside the function it's used as an untyped enum, and give
it a better name, like BYTES_PER_FPU.

2022-10-13 01:14:59

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2 16/19] crypto: x86 - print CPU optimized loaded messages

On Wed, Oct 12, 2022 at 04:59:28PM -0500, Robert Elliott wrote:
> Print a positive message at the info level if the CPU-optimized module
> is loaded, for all modules except the sha modules.

Why!? This is just meaningless clutter. If the admin wants to see what
modules are loaded, he uses `lsmod`.

Also, what's special about sha?

Anyway, please don't do this.

Jason

2022-10-13 01:18:14

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit

On Wed, Oct 12, 2022 at 04:59:21PM -0500, Robert Elliott wrote:
> Use a common macro name (FPU_BYTES) for the limit of the number of bytes
> processed within kernel_fpu_begin and kernel_fpu_end rather than
> using SZ_4K (which is a signed value), or a magic value of 4096U.

Not sure I like this very much. The whole idea is that this is variable
per algorithm, since not all algorithms have the same performance
characteristics. So in that sense, it's better to put this close to
where it's actually used, rather than somewhere at the top of the file.
When you do that, it makes it seem like "FPU_BYTES" is some universal
constant, which of course it isn't.

Instead, declare this as an untyped enum value within the function. For
example:

diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c
index aaba21230528..602883eee5f3 100644
--- a/arch/x86/crypto/blake2s-glue.c
+++ b/arch/x86/crypto/blake2s-glue.c
@@ -30,7 +30,8 @@ void blake2s_compress(struct blake2s_state *state, const u8 *block,
size_t nblocks, const u32 inc)
{
/* SIMD disables preemption, so relax after processing each page. */
- BUILD_BUG_ON(SZ_4K / BLAKE2S_BLOCK_SIZE < 8);
+ enum { BLOCKS_PER_FPU = SZ_4K / BLAKE2S_BLOCK_SIZE };
+ BUILD_BUG_ON(BLOCKS_PER_FPU < 8);

if (!static_branch_likely(&blake2s_use_ssse3) || !may_use_simd()) {
blake2s_compress_generic(state, block, nblocks, inc);
@@ -38,8 +39,7 @@ void blake2s_compress(struct blake2s_state *state, const u8 *block,
}

do {
- const size_t blocks = min_t(size_t, nblocks,
- SZ_4K / BLAKE2S_BLOCK_SIZE);
+ const size_t blocks = min_t(size_t, nblocks, BLOCKS_PER_FPU);

kernel_fpu_begin();
if (IS_ENABLED(CONFIG_AS_AVX512) &&
diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index 7b3a1cf0984b..f8fd2b7025c1 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -142,12 +142,14 @@ EXPORT_SYMBOL(chacha_init_arch);
void chacha_crypt_arch(u32 *state, u8 *dst, const u8 *src, unsigned int bytes,
int nrounds)
{
+ enum { BYTES_PER_FPU = SZ_4K };
+
if (!static_branch_likely(&chacha_use_simd) || !crypto_simd_usable() ||
bytes <= CHACHA_BLOCK_SIZE)
return chacha_crypt_generic(state, dst, src, bytes, nrounds);

do {
- unsigned int todo = min_t(unsigned int, bytes, SZ_4K);
+ unsigned int todo = min_t(unsigned int, bytes, BYTES_PER_FPU);

kernel_fpu_begin();
chacha_dosimd(state, dst, src, todo, nrounds);


2022-10-13 02:09:33

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2 05/19] crypto: x86/crc - limit FPU preemption

On Wed, Oct 12, 2022 at 04:59:17PM -0500, Robert Elliott wrote:
>
> @@ -170,9 +179,17 @@ static int __crc32c_pcl_intel_finup(u32 *crcp, const u8 *data, unsigned int len,
> u8 *out)
> {
> if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) {
> - kernel_fpu_begin();
> - *(__le32 *)out = ~cpu_to_le32(crc_pcl(data, len, *crcp));
> - kernel_fpu_end();
> + do {
> + unsigned int chunk = min(len, FPU_BYTES);
> +
> + kernel_fpu_begin();
> + *crcp = crc_pcl(data, chunk, *crcp);

How about storing the intermediate result in a local variable
instead of overwriting *crcp?

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2022-10-13 06:02:12

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 04/19] crypto: x86/sha - limit FPU preemption

On Wed, Oct 12, 2022 at 04:59:16PM -0500, Robert Elliott wrote:
> diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
> index 44340a1139e0..a9f5779b41ca 100644
> --- a/arch/x86/crypto/sha1_ssse3_glue.c
> +++ b/arch/x86/crypto/sha1_ssse3_glue.c
> @@ -26,6 +26,8 @@
> #include <crypto/sha1_base.h>
> #include <asm/simd.h>
>
> +#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
> +
> static int sha1_update(struct shash_desc *desc, const u8 *data,
> unsigned int len, sha1_block_fn *sha1_xform)
> {
> @@ -41,9 +43,18 @@ static int sha1_update(struct shash_desc *desc, const u8 *data,
> */
> BUILD_BUG_ON(offsetof(struct sha1_state, state) != 0);
>
> - kernel_fpu_begin();
> - sha1_base_do_update(desc, data, len, sha1_xform);
> - kernel_fpu_end();
> + do {
> + unsigned int chunk = min(len, FPU_BYTES);
> +
> + if (chunk) {
> + kernel_fpu_begin();
> + sha1_base_do_update(desc, data, chunk, sha1_xform);
> + kernel_fpu_end();
> + }
> +
> + len -= chunk;
> + data += chunk;
> + } while (len);

'len' can't be 0 at the beginning of this loop, so the 'if (chunk)' check isn't
needed. And it wouldn't make sense even if 'len' could be 0, since a while loop
could just be used in that case.

- Eric

2022-10-13 06:05:58

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 08/19] crypto: x86/ghash - limit FPU preemption

On Wed, Oct 12, 2022 at 04:59:20PM -0500, Robert Elliott wrote:
> - kernel_fpu_begin();
> - clmul_ghash_update(dst, src, srclen, &ctx->shash);
> - kernel_fpu_end();
> + while (srclen >= GHASH_BLOCK_SIZE) {
> + unsigned int fpulen = min(srclen, FPU_BYTES);
> +
> + kernel_fpu_begin();
> + while (fpulen >= GHASH_BLOCK_SIZE) {
> + int n = min_t(unsigned int, fpulen, GHASH_BLOCK_SIZE);
> +
> + clmul_ghash_update(dst, src, n, &ctx->shash);
> +
> + srclen -= n;
> + fpulen -= n;
> + src += n;
> + }
> + kernel_fpu_end();
> + }

Another loop that doesn't make sense. Why is this only passing 16 bytes at a
time into the assembly code? There shouldn't be an inner loop here at all.

- Eric

2022-10-13 06:06:25

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2 04/19] crypto: x86/sha - limit FPU preemption

On Wed, Oct 12, 2022 at 10:57:04PM -0700, Eric Biggers wrote:
>
> 'len' can't be 0 at the beginning of this loop, so the 'if (chunk)' check isn't
> needed. And it wouldn't make sense even if 'len' could be 0, since a while loop
> could just be used in that case.

I don't see anything preventing len from being zero if this gets
called directly by a user of the Crypto API through crypto_shash_update.
But yes a while loop would be a lot cleaner.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2022-10-13 06:08:44

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 19/19] crypto: x86/sha - register only the best function

On Wed, Oct 12, 2022 at 04:59:31PM -0500, Robert Elliott wrote:
> Don't register and unregister each of the functions from least-
> to most-optimized (SSSE3 then AVX then AVX2); determine the
> most-optimized function and load only that version.
>
> Suggested-by: Tim Chen <[email protected]>
> Signed-off-by: Robert Elliott <[email protected]>

I thought that it's done the way it is so that it's easy to run the self-tests
for all the different variants.

- Eric

2022-10-13 06:22:15

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 04/19] crypto: x86/sha - limit FPU preemption

On Thu, Oct 13, 2022 at 02:04:43PM +0800, Herbert Xu wrote:
> On Wed, Oct 12, 2022 at 10:57:04PM -0700, Eric Biggers wrote:
> >
> > 'len' can't be 0 at the beginning of this loop, so the 'if (chunk)' check isn't
> > needed. And it wouldn't make sense even if 'len' could be 0, since a while loop
> > could just be used in that case.
>
> I don't see anything preventing len from being zero if this gets
> called directly by a user of the Crypto API through crypto_shash_update.
> But yes a while loop would be a lot cleaner.
>

When len == 0, the following path is taken instead:

if (!crypto_simd_usable() ||
(sctx->count % SHA1_BLOCK_SIZE) + len < SHA1_BLOCK_SIZE)
return crypto_sha1_update(desc, data, len);

- Eric

2022-10-13 08:04:06

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2 19/19] crypto: x86/sha - register only the best function

On Wed, Oct 12, 2022 at 11:07:43PM -0700, Eric Biggers wrote:
>
> I thought that it's done the way it is so that it's easy to run the self-tests
> for all the different variants.

Yes, we should keep it that way so that it's easy to test the
different code paths for correctness and/or speed.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2022-10-13 08:06:47

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2 04/19] crypto: x86/sha - limit FPU preemption

On Wed, Oct 12, 2022 at 11:08:53PM -0700, Eric Biggers wrote:
.
> When len == 0, the following path is taken instead:
>
> if (!crypto_simd_usable() ||
> (sctx->count % SHA1_BLOCK_SIZE) + len < SHA1_BLOCK_SIZE)
> return crypto_sha1_update(desc, data, len);

Good point, I missed that.

Thanks!
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2022-10-13 13:54:25

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 16/19] crypto: x86 - print CPU optimized loaded messages

Hi Robert,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on herbert-cryptodev-2.6/master]
[also build test WARNING on herbert-crypto-2.6/master linus/master next-20221012]
[cannot apply to v6.0]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Robert-Elliott/crypto-tcrypt-test-crc32/20221013-065919
base: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
config: x86_64-allyesconfig
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/15a63fd12ab4d509e54c5db6daf2e8e81fdf0cf5
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Robert-Elliott/crypto-tcrypt-test-crc32/20221013-065919
git checkout 15a63fd12ab4d509e54c5db6daf2e8e81fdf0cf5
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash arch/x86/crypto/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> arch/x86/crypto/serpent_avx2_glue.c:100:32: warning: 'module_cpu_ids' defined but not used [-Wunused-const-variable=]
100 | static const struct x86_cpu_id module_cpu_ids[] = {
| ^~~~~~~~~~~~~~
--
>> arch/x86/crypto/aegis128-aesni-glue.c:268:32: warning: 'module_cpu_ids' defined but not used [-Wunused-const-variable=]
268 | static const struct x86_cpu_id module_cpu_ids[] = {
| ^~~~~~~~~~~~~~
--
>> arch/x86/crypto/sm4_aesni_avx_glue.c:451:32: warning: 'module_cpu_ids' defined but not used [-Wunused-const-variable=]
451 | static const struct x86_cpu_id module_cpu_ids[] = {
| ^~~~~~~~~~~~~~


vim +/module_cpu_ids +100 arch/x86/crypto/serpent_avx2_glue.c

e16bf974b3d965 Eric Biggers 2018-02-19 99
385e7cb709ad4a Robert Elliott 2022-10-12 @100 static const struct x86_cpu_id module_cpu_ids[] = {
385e7cb709ad4a Robert Elliott 2022-10-12 101 X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
385e7cb709ad4a Robert Elliott 2022-10-12 102 {}
385e7cb709ad4a Robert Elliott 2022-10-12 103 };
385e7cb709ad4a Robert Elliott 2022-10-12 104 MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
385e7cb709ad4a Robert Elliott 2022-10-12 105

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (2.71 kB)
config (295.86 kB)
Download all attachments

2022-10-13 13:57:07

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 16/19] crypto: x86 - print CPU optimized loaded messages

Hi Robert,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on herbert-cryptodev-2.6/master]
[also build test WARNING on herbert-crypto-2.6/master linus/master next-20221012]
[cannot apply to v6.0]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Robert-Elliott/crypto-tcrypt-test-crc32/20221013-065919
base: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
config: x86_64-randconfig-m001
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

smatch warnings:
arch/x86/crypto/serpent_avx2_glue.c:113 serpent_avx2_init() warn: inconsistent indenting
arch/x86/crypto/serpent_avx2_glue.c:115 serpent_avx2_init() warn: ignoring unreachable code.
arch/x86/crypto/aegis128-aesni-glue.c:280 crypto_aegis128_aesni_module_init() warn: inconsistent indenting
arch/x86/crypto/aegis128-aesni-glue.c:282 crypto_aegis128_aesni_module_init() warn: ignoring unreachable code.
arch/x86/crypto/sm4_aesni_avx_glue.c:466 sm4_init() warn: inconsistent indenting
arch/x86/crypto/sm4_aesni_avx_glue.c:468 sm4_init() warn: ignoring unreachable code.

vim +113 arch/x86/crypto/serpent_avx2_glue.c

56d76c96a9f3e3 Jussi Kivilinna 2013-04-13 107
f16a005cde3b1f Randy Dunlap 2022-03-16 108 static int __init serpent_avx2_init(void)
56d76c96a9f3e3 Jussi Kivilinna 2013-04-13 109 {
534ff06e39292b Ingo Molnar 2015-04-28 110 const char *feature_name;
15a63fd12ab4d5 Robert Elliott 2022-10-12 111 int ret;
56d76c96a9f3e3 Jussi Kivilinna 2013-04-13 112
385e7cb709ad4a Robert Elliott 2022-10-12 @113 return -ENODEV;
385e7cb709ad4a Robert Elliott 2022-10-12 114
abcfdfe07de75f Borislav Petkov 2016-04-04 @115 if (!boot_cpu_has(X86_FEATURE_AVX2) || !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
b54b4bbbf5e931 Ingo Molnar 2015-05-22 116 pr_info("AVX2 instructions are not detected.\n");
b54b4bbbf5e931 Ingo Molnar 2015-05-22 117 return -ENODEV;
b54b4bbbf5e931 Ingo Molnar 2015-05-22 118 }
d91cab78133d33 Dave Hansen 2015-09-02 119 if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
d91cab78133d33 Dave Hansen 2015-09-02 120 &feature_name)) {
534ff06e39292b Ingo Molnar 2015-04-28 121 pr_info("CPU feature '%s' is not supported.\n", feature_name);
56d76c96a9f3e3 Jussi Kivilinna 2013-04-13 122 return -ENODEV;
56d76c96a9f3e3 Jussi Kivilinna 2013-04-13 123 }
56d76c96a9f3e3 Jussi Kivilinna 2013-04-13 124
15a63fd12ab4d5 Robert Elliott 2022-10-12 125 ret = simd_register_skciphers_compat(serpent_algs,
e16bf974b3d965 Eric Biggers 2018-02-19 126 ARRAY_SIZE(serpent_algs),
e16bf974b3d965 Eric Biggers 2018-02-19 127 serpent_simd_algs);
15a63fd12ab4d5 Robert Elliott 2022-10-12 128 if (!ret)
15a63fd12ab4d5 Robert Elliott 2022-10-12 129 pr_info("CPU-optimized crypto module loaded\n");
15a63fd12ab4d5 Robert Elliott 2022-10-12 130 return ret;
56d76c96a9f3e3 Jussi Kivilinna 2013-04-13 131 }
56d76c96a9f3e3 Jussi Kivilinna 2013-04-13 132

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (3.36 kB)
config (124.66 kB)
Download all attachments
Subject: RE: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit



> -----Original Message-----
> From: Jason A. Donenfeld <[email protected]>
> Sent: Wednesday, October 12, 2022 7:36 PM
> To: Elliott, Robert (Servers) <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit
>
> On Wed, Oct 12, 2022 at 04:59:21PM -0500, Robert Elliott wrote:
> > Use a common macro name (FPU_BYTES) for the limit of the number of bytes
> > processed within kernel_fpu_begin and kernel_fpu_end rather than
> > using SZ_4K (which is a signed value), or a magic value of 4096U.
>
> Not sure I like this very much. The whole idea is that this is variable
> per algorithm, since not all algorithms have the same performance
> characteristics.

Good point.

I noticed the powerpc aes, sha1, and sha256 modules include explanations
of how their macros serving a similar purpose were calculated.

arch/powerpc/crypto/aes-spe-glue.c:
* MAX_BYTES defines the number of bytes that are allowed to be processed
* between preempt_disable() and preempt_enable(). e500 cores can issue two
* instructions per clock cycle using one 32/64 bit unit (SU1) and one 32
* bit unit (SU2). One of these can be a memory access that is executed via
* a single load and store unit (LSU). XTS-AES-256 takes ~780 operations per
* 16 byte block or 25 cycles per byte. Thus 768 bytes of input data
* will need an estimated maximum of 20,000 cycles. Headroom for cache misses
* included. Even with the low end model clocked at 667 MHz this equals to a
* critical time window of less than 30us. The value has been chosen to
* process a 512 byte disk block in one or a large 1400 bytes IPsec network
* packet in two runs.
#define MAX_BYTES 768

and arch/powerpc/crypto/sha1-spe-glue.c:
* MAX_BYTES defines the number of bytes that are allowed to be processed
* between preempt_disable() and preempt_enable(). SHA1 takes ~1000
* operations per 64 bytes. e500 cores can issue two arithmetic instructions
* per clock cycle using one 32/64 bit unit (SU1) and one 32 bit unit (SU2).
* Thus 2KB of input data will need an estimated maximum of 18,000 cycles.
* Headroom for cache misses included. Even with the low end model clocked
* at 667 MHz this equals to a critical time window of less than 27us.
#define MAX_BYTES 2048

arch/powerpc/crypto/sha256-spe-glue.c:
* MAX_BYTES defines the number of bytes that are allowed to be processed
* between preempt_disable() and preempt_enable(). SHA256 takes ~2,000
* operations per 64 bytes. e500 cores can issue two arithmetic instructions
* per clock cycle using one 32/64 bit unit (SU1) and one 32 bit unit (SU2).
* Thus 1KB of input data will need an estimated maximum of 18,000 cycles.
* Headroom for cache misses included. Even with the low end model clocked
* at 667 MHz this equals to a critical time window of less than 27us.
#define MAX_BYTES 1024

Perhaps we should declare a time goal like "30 us," measure the actual
speed of each algorithm with a tcrypt speed test, and calculate the
nominal value assuming some slow x86 CPU core speed?

That could be further adjusted at run-time based on the supposed
minimum CPU frequency (e.g., as reported in
/sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq).

If values less than 4 KiB are necessary (e.g., like the powerpc
values), that will require changes in all the modules using
skcipher_walk too.

> So in that sense, it's better to put this close to
> where it's actually used, rather than somewhere at the top of the file.
> When you do that, it makes it seem like "FPU_BYTES" is some universal
> constant, which of course it isn't.
>
> Instead, declare this as an untyped enum value within the function.

Many of these modules use the same value for both an _update and
a _finit function (usually pretty close to each other). Is it
important to avoid replication?


Subject: RE: [PATCH v2 18/19] crypto: x86 - standardize not loaded prints



> -----Original Message-----
> From: Jason A. Donenfeld <[email protected]>
> Sent: Wednesday, October 12, 2022 7:43 PM
> To: Elliott, Robert (Servers) <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [PATCH v2 18/19] crypto: x86 - standardize not loaded prints
>
> On Wed, Oct 12, 2022 at 04:59:30PM -0500, Robert Elliott wrote:
> > Standardize the prints that additional required CPU features are not
> > present along with the main CPU features (e.g., OSXSAVE is not
> > present along with AVX).
> >
> > Although modules are not supposed to print unless loaded and
> > active, these are existing exceptions.
>
> Another comma splice. But also, don't do this. No need to clutter dmesg.
> `lsmod` is the job for this.

If module loading fails, modprobe gets back one errno value
and converts that to a string, but has no other clue what
is wrong.

The modprobe man page refers to dmesg:
... modprobe does not do anything to the module itself: the work of
resolving symbols and understanding parameters is done inside the
kernel. So module failure is sometimes accompanied by a kernel
message: see dmesg(8).

If the error happens to be -ENOENT, modprobe specifically recommends
looking at dmesg:
modprobe: ERROR: could not insert 'tcrypt': Unknown symbol in module, or unknown parameter (see dmesg)

A device table mismatch can be determined by comparing the modinfo
aliases for the module to /sys/devices/system/cpu/modalias:

cpu:type:x86,ven0000fam0006mod0055:feature:,0000,0001,0002,0003,0004,0005,0006,0007,0008,0009,000B,000C,000D,000E,000F,0010,0011,0013,0015,0016,0017,0018,0019,001A,001B,001C,001D,001F,002B,0034,003A,003B,003D,0068,006A,006B,006C,006D,006F,0070,0072,0074,0075,0076,0078,0079,007C,0080,0081,0082,0083,0084,0085,0086,0087,0088,0089,008B,008C,008D,008E,008F,0091,0092,0093,0094,0095,0096,0097,0098,0099,009A,009B,009C,009D,009E,00C0,00C5,00C8,00E1,00E3,00E4,00E6,00E7,00EA,00F0,00F1,00F2,00F3,00F5,00F9,00FA,00FB,00FE,00FF,0100,0101,0102,0103,0104,0111,0120,0121,0123,0125,0126,0127,0128,0129,012A,012C,012D,012E,012F,0130,0131,0132,0133,0134,0137,0138,0139,013C,013E,013F,0140,0141,0142,0143,0160,0161,0162,0163,0164,0165,0171,01C0,01C1,01C2,01C4,01C5,01C6,01C7,01C9,01CB,0203,0204,020B,024A,025A,025B,025C,025D,025F

modinfo aesni-intel:
alias: cpu:type:x86,ven*fam*mod*:feature:*0099*

so I'm comfortable not printing that one.

The checks for other combinations of features (e.g., sha512
also requiring BMI2) and for CPU extended features are not
so obvious. Nothing in modinfo explains what the module is
looking for, and nothing records what it didn't like. There
are currently 32 prints in the directory, either explaining
errors or explaining which optional features have been
enabled.

The modprobe manpage doesn't promise what log level will
explain the problem, so we could print them with pr_debug
so they're only available if you figure out how to enable
dynamic debug for the module.

The positive messages about which optional features are
engaged could be reported as read-only module parameters.


Subject: RE: [PATCH v2 05/19] crypto: x86/crc - limit FPU preemption



> -----Original Message-----
> From: Herbert Xu <[email protected]>
> Sent: Wednesday, October 12, 2022 9:00 PM
> To: Elliott, Robert (Servers) <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v2 05/19] crypto: x86/crc - limit FPU preemption
>
> On Wed, Oct 12, 2022 at 04:59:17PM -0500, Robert Elliott wrote:
> >
> > @@ -170,9 +179,17 @@ static int __crc32c_pcl_intel_finup(u32 *crcp, const u8
> *data, unsigned int len,
> > u8 *out)
> > {
> > if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) {
> > - kernel_fpu_begin();
> > - *(__le32 *)out = ~cpu_to_le32(crc_pcl(data, len, *crcp));
> > - kernel_fpu_end();
> > + do {
> > + unsigned int chunk = min(len, FPU_BYTES);
> > +
> > + kernel_fpu_begin();
> > + *crcp = crc_pcl(data, chunk, *crcp);
>
> How about storing the intermediate result in a local variable
> instead of overwriting *crcp?
>
> Thanks,

The _update function does so, and it's not marked const here,
so seemed prudent to keep up to date.

Do the callers understand it's no longer valid after finup, or
is there any case they might treat finup like an update and try
again?

Subject: RE: :Re: [PATCH v2 04/19] crypto: x86/sha - limit FPU preemption



> > diff --git a/arch/x86/crypto/sha1_ssse3_glue.c
...
> > + do {
> > + unsigned int chunk = min(len, FPU_BYTES);
> > +
> > + if (chunk) {
> > + kernel_fpu_begin();
> > + sha1_base_do_update(desc, data, chunk, sha1_xform);
> > + kernel_fpu_end();
> > + }
> > +
> > + len -= chunk;
> > + data += chunk;
> > + } while (len);
>
> 'len' can't be 0 at the beginning of this loop, so the 'if (chunk)' check
> isn't needed. And it wouldn't make sense even if 'len' could be 0, since
> a while loop could just be used in that case.

Thanks, I'll remove that if from all the sha functions, since they do
have that protective check upfront. I'll review the 0 byte handling in
all of them.

Subject: RE: [PATCH v2 08/19] crypto: x86/ghash - limit FPU preemption



> > + while (srclen >= GHASH_BLOCK_SIZE) {
> > + unsigned int fpulen = min(srclen, FPU_BYTES);
> > +
> > + kernel_fpu_begin();
> > + while (fpulen >= GHASH_BLOCK_SIZE) {
> > + int n = min_t(unsigned int, fpulen, GHASH_BLOCK_SIZE);
> > +
> > + clmul_ghash_update(dst, src, n, &ctx->shash);
> > +
> > + srclen -= n;
> > + fpulen -= n;
> > + src += n;
> > + }
> > + kernel_fpu_end();
> > + }
>
> Another loop that doesn't make sense. Why is this only passing 16 bytes at a
> time into the assembly code? There shouldn't be an inner loop here at all.

Thanks, copied the pattern from another function whose assembly function
had a size limit. clmul_ghash_update looks ready for all sizes, so I'll
simplify that.

Subject: RE: [PATCH v2 19/19] crypto: x86/sha - register only the best function



> -----Original Message-----
> From: Herbert Xu <[email protected]>
> Sent: Thursday, October 13, 2022 2:52 AM
> To: Eric Biggers <[email protected]>
> Cc: Elliott, Robert (Servers) <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [PATCH v2 19/19] crypto: x86/sha - register only the best
> function
>
> On Wed, Oct 12, 2022 at 11:07:43PM -0700, Eric Biggers wrote:
> >
> > I thought that it's done the way it is so that it's easy to run the self-
> tests
> > for all the different variants.
>
> Yes, we should keep it that way so that it's easy to test the
> different code paths for correctness and/or speed.

I have done some testing with extra patches that do that for
that very reason. Is there much overhead from having a module
loaded and registered in the crypto system, but not being
chosen for use?

The current sha modules register SSSE3, then register AVX and
unregister SSSE3, then register AVX2 and unregister AVX...
good testing for the unregister function, but not real helpful
for users.


2022-10-14 01:41:30

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit

On Thu, Oct 13, 2022 at 3:48 PM Elliott, Robert (Servers)
<[email protected]> wrote:
> Perhaps we should declare a time goal like "30 us," measure the actual
> speed of each algorithm with a tcrypt speed test, and calculate the
> nominal value assuming some slow x86 CPU core speed?

Sure, pick something reasonable with good margin for a reasonable CPU.
It doesn't have to be perfect, but just vaguely right for supported
hardware.

> That could be further adjusted at run-time based on the supposed
> minimum CPU frequency (e.g., as reported in
> /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq).

Oh no, please no. Not another runtime knob. That also will make the
loop less efficient.

2022-10-14 04:07:03

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 05/19] crypto: x86/crc - limit FPU preemption

From: Robert Elliott
> Sent: 12 October 2022 22:59
>
> As done by the ECB and CBC helpers in arch/x86/crypt/ecb_cbc_helpers.h,
> limit the number of bytes processed between kernel_fpu_begin() and
> kernel_fpu_end() calls.
>
> Those functions call preempt_disable() and preempt_enable(), so
> the CPU core is unavailable for scheduling while running, leading to:
> rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ...

How long were the buffers being processed when the rcu stall was reported?
It looks like you are adding kernel_fpu_end(); kernel_fpu_begin()
pairs every 4096 bytes.
I'd guess the crc instruction runs at 4 bytes/clock
(or at least gets somewhere near that).
So you are talking of few thousand clocks at most.
A pci read from a device can easily take much longer than that.
So I'm surprised you need to do such small buffers to avoid
rcu stalls.

The kernel_fpu_end(); kernel_fpu_begin() pair pair will also cost.
(Maybe not as much as the first kernel_fpu_begin() ?)

Some performance figures might be enlightening.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-10-14 08:26:41

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2 19/19] crypto: x86/sha - register only the best function

On Thu, Oct 13, 2022 at 10:59:08PM +0000, Elliott, Robert (Servers) wrote:
>
> I have done some testing with extra patches that do that for
> that very reason. Is there much overhead from having a module
> loaded and registered in the crypto system, but not being
> chosen for use?

I don't think it's a big deal. The system is designed to cope
with multiple implementations and picking the best option.

IOW if the overhead is an issue then that's something we'd need to
address in the core API code rather than trying to paper over it
by reducing the number of registered algorithms.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2022-10-14 11:03:55

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 04/19] crypto: x86/sha - limit FPU preemption

From: Jason A. Donenfeld
> Sent: 13 October 2022 01:42
...
> > diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
> > index 44340a1139e0..a9f5779b41ca 100644
> > --- a/arch/x86/crypto/sha1_ssse3_glue.c
> > +++ b/arch/x86/crypto/sha1_ssse3_glue.c
> > @@ -26,6 +26,8 @@
> > #include <crypto/sha1_base.h>
> > #include <asm/simd.h>
> >
> > +#define FPU_BYTES 4096U /* avoid kernel_fpu_begin/end scheduler/rcu stalls */
>
> Declare this inside the function it's used as an untyped enum, and give
> it a better name, like BYTES_PER_FPU.

Isn't 'bytes' the wrong unit anyway?
At least it ought to be 'clocks' so it can be divided by the
(approximate) 'clocks per byte' of the algorithm.

Something like a crc is likely to be far faster than AES.

Clearly the actual required units are microseconds.
But depending on the actual cpu frequency is a bit hard.
And people running faster cpu may want lower latency anyway.
So a typical slow cpu frequency is probably ok.

The actual architecture dependant constant really ought
to be defined with kernel_fpu_begin().

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Subject: RE: [PATCH v2 14/19] crypto: x86 - load based on CPU features


> Subject: [PATCH v2 14/19] crypto: x86 - load based on CPU features
> 23 files changed, 230 insertions(+), 8 deletions(-)

Here are some things I've noticed on this patch that will be
addressed in v3.

- Add aria device table (new algorithm added at end of 6.0)

- Change camellia_avx2 device table to not match on AVX (just AVX2
and AES-NI). There's a separate module for AVX.

- Remove ADX from the curve25519 device table. That is optional,
not mandatory.

- Remove AVX from the sm4-avx2 device table. There's a separate
module for AVX.

Here is a script to review the device table aliases:

modinfo /lib/modules/6.0.0+/kernel/arch/x86/crypto/* | grep -E "filename|alias.*cpu" |
sed 's/.013D./\tSHA-NI/' |
sed 's/.0133./\tADX/' |
sed 's/.0130./\t\tAVX512-F/' |
sed 's/.0125./\t\tAVX2/' |
sed 's/.009C./\t\tAVX/' |
sed 's/.0099./\tAES-NI/' |
sed 's/.0094./\tXMM4.2/' |
sed 's/.0089./\t\tSSSE3/' |
sed 's/.0081./\tPCLMULQDQ/' |
sed 's/.001A./\tXMM2/' | # aka sse2
cat

Subject: RE: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit



> -----Original Message-----
> From: Jason A. Donenfeld <[email protected]>
> Sent: Thursday, October 13, 2022 8:27 PM
> Subject: Re: [PATCH v2 09/19] crypto: x86 - use common macro for FPU
> limit
>
> On Thu, Oct 13, 2022 at 3:48 PM Elliott, Robert (Servers)
> <[email protected]> wrote:
> > Perhaps we should declare a time goal like "30 us," measure the actual
> > speed of each algorithm with a tcrypt speed test, and calculate the
> > nominal value assuming some slow x86 CPU core speed?
>
> Sure, pick something reasonable with good margin for a reasonable CPU.
> It doesn't have to be perfect, but just vaguely right for supported
> hardware.
>
> > That could be further adjusted at run-time based on the supposed
> > minimum CPU frequency (e.g., as reported in
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq).
>
> Oh no, please no. Not another runtime knob. That also will make the
> loop less efficient.

Here's some stats measuring the time in CPU cycles between
kernel_fpu_begin() and kernel_fpu_end() for every x86 crypto
module using those function calls. This is before any
patches to enforce any new limits.

Driver boot tcrypt-sweep average
====== ==== ============ =======
aegis128_aesni 6240 | 8214 433
aesni_intel 22218 | 150558 68
aria_aesni_avx_x86_64 0 > 95560 1282
camellia_aesni_avx2 52300 52300 4300
camellia_aesni_avx_x86_64 20920 20920 5915
camellia_x86_64 0 0 0
cast5_avx_x86_64 41854 | 108996 6602
cast6_avx_x86_64 39270 | 119476 10596
chacha_x86_64 3516 | 58112 349
crc32c_intel 1458 | 2702 235
crc32_pclmul 1610 | 3130 210
crct10dif_pclmul 1928 | 2096 82
ghash_clmulni_intel 9154 | 56632 336
libblake2s_x86_64 7514 7514 897
nhpoly1305_avx2 1360 | 5408 301
poly1305_x86_64 20656 | 21688 409
polyval_clmulni 13972 13972 34
serpent_avx2 45686 | 74824 4185
serpent_avx_x86_64 47436 47436 7120
serpent_sse2_x86_64 38492 38492 7400
sha1_ssse3 20950 | 38310 512
sha256_ssse3 46554 | 57162 1201
sha512_ssse3 157051800 157051800 167728
sm3_avx_x86_64 82372 82372 2017
sm4_aesni_avx_x86_64 66350 66350 2019
twofish_avx_x86_64 104598 | 163894 6633
twofish_x86_64_3way 0 0 0

Comparing a few of the hash functions with tcrypt test 16
(4 KiB of data with 1 update) shows a 35x difference from the
fastest to slowest:
crc32c 695 cycles/operation
crct10dif 2197
sha1-avx2 8825
sha224-avx2 24816
sha256-avx2 21179
sha384-avx2 14939
sha512-avx2 14584


Test notes
==========
Measurement points:
- after booting, with
- CONFIG_MODULE_SIG_SHA512=y (use SHA-512 for module signing)
- CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y (compares results
with generic module during init)
- # CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
(run self-tests during module load)
- after sweeping through tcrypt test modes 1 to 999
- except 0, 300, and 400 which run combinations of the others
- measured on a system with Intel Cascade Lake CPUs at 2.2 GHz

This run did not report any RCU stalls.

The hash function is the main problem, subjected to huge
sizes during module signature checking. sha1 or sha256 would
face the same problem if they had been selected.

The self-tests are limited to 2 * PAGE_SIZE so don't stress
the drivers anywhere near as much as booting. This run did
include the tcrypt patch to call cond_resched during speed
tests, so the speed test induced problem is out of the way.

aria_aesni_avx_x86_64 0 > 95560 1282

This run didn't have the patch to load aria based on the
device table, so it wasn't loaded until tcrypt asked for it.

camellia_x86_64 0 0 0
twofish_x86_64_3way 0 0 0

Those use the ecb_cbc_helper macros, but pass along -1 to
not use kernel_fpu_begin/end, so the debug instrumentation
is there but unused.

Next steps
==========
I'll try to add a test with long data, and work on scaling the
loops based on relative performance (e.g., if sha512 needs
4 KiB, then crc32c should be fine with 80 KiB).

2022-10-24 02:09:33

by Yujie Liu

[permalink] [raw]
Subject: Re: [PATCH v2 05/19] crypto: x86/crc - limit FPU preemption

Greeting,

FYI, we noticed ltp.fsopen01.fail due to commit (built with gcc-11):

commit: 0c664cbc906012f02c5bf128cf2dff854cca65c7 ("[PATCH v2 05/19] crypto: x86/crc - limit FPU preemption")
url: https://github.com/intel-lab-lkp/linux/commits/Robert-Elliott/crypto-tcrypt-test-crc32/20221013-065919
base: https://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git master
patch link: https://lore.kernel.org/linux-crypto/[email protected]
patch subject: [PATCH v2 05/19] crypto: x86/crc - limit FPU preemption

in testcase: ltp
version: ltp-x86_64-14c1f76-1_20221009
with following parameters:

disk: 1HDD
fs: ext4
test: syscalls-07

test-description: The LTP testsuite contains a collection of tools for testing the Linux kernel and related features.
test-url: http://linux-test-project.github.io/

on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


<<<test_start>>>
tag=fsopen01 stime=1666383665
cmdline="fsopen01"
contacts=""
analysis=exit
<<<test_output>>>
...
tst_test.c:1599: TINFO: === Testing on btrfs ===
tst_test.c:1064: TINFO: Formatting /dev/loop0 with btrfs opts='' extra opts=''
fsopen01.c:42: TFAIL: fsconfig(FSCONFIG_CMD_CREATE) failed: EINVAL (22)
fsopen01.c:42: TFAIL: fsconfig(FSCONFIG_CMD_CREATE) failed: EINVAL (22)
...

Summary:
passed 12
failed 2
broken 0
skipped 0
warnings 0
<<<execution_status>>>
initiation_status="ok"
duration=3 termination_type=exited termination_id=1 corefile=no
cutime=3 cstime=47
<<<test_end>>>


[ 152.413919][ T4912] BTRFS: device fsid 05e51863-81c3-4c32-9e24-3d49d849f724 devid 1 transid 6 /dev/loop0 scanned by mkfs.btrfs (4912)
[ 152.429076][ T4851] BTRFS info (device loop0): using crc32c (crc32c-intel) checksum algorithm
[ 152.438743][ T4851] BTRFS info (device loop0): using free space tree
[ 152.449103][ T8] BTRFS warning (device loop0): checksum verify failed on logical 22036480 mirror 1 wanted 0xc4a1f4f3 found 0x76f09a51 level 0
[ 152.463363][ T35] BTRFS warning (device loop0): checksum verify failed on logical 22036480 mirror 2 wanted 0xc4a1f4f3 found 0x76f09a51 level 0
[ 152.477446][ T4851] BTRFS error (device loop0): failed to read chunk root
[ 152.486164][ T4851] BTRFS error (device loop0): open_ctree failed


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/r/[email protected]


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.


--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (3.05 kB)
config-6.0.0-rc1-00131-g0c664cbc9060 (171.47 kB)
job-script (5.89 kB)
dmesg.xz (91.50 kB)
ltp (197.87 kB)
job.yaml (4.85 kB)
reproduce (302.00 B)
Download all attachments
Subject: RE: [PATCH v2 00/19] crypto: x86 - fix RCU stalls



> -----Original Message-----
> From: Elliott, Robert (Servers) <[email protected]>
> Sent: Wednesday, October 12, 2022 4:59 PM
> To: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Cc: Elliott, Robert (Servers) <[email protected]>
> Subject: [PATCH v2 00/19] crypto: x86 - fix RCU stalls
>
> This series fixes the RCU stalls triggered by the x86 crypto
> modules discussed in
> https://lore.kernel.org/all/MW5PR84MB18426EBBA3303770A8BC0BDFAB759@MW5PR84
> MB1842.NAMPRD84.PROD.OUTLOOK.COM/

I've instrumented all the x86 crypto modules, including ways to
experiment with different loop sizes. Here are some results with
the hash functions.

Key:
calls = number of kernel_fpu_begin()/end() calls made by the module
cost = number of CPU cycles consumed by those calls (overhead)
maxcycles = number of CPU cycles between those calls in FPU context
bpf = bytes_per_fpu loop size
KiB = bpf expressed in KiB
maxlen = maximum number of bytes per loop via update()
maxlen2 = maximum number of bytes per loop via finup()

This is on a 2.2 GHz Cascade Lake CPU, where each cycle is nominally
0.45 ns. The CPU does not support SHA-NI instructions, so those
results are missing.

Here are the results from a boot with the avx2 bytes_per_fpu values set
to 0 (unlimited - original behavior).

Booting includes:
- processing 2.3 GB of SHA-512 kernel module hashes
- crypto self-tests
- crypto extra self-tests (CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y)

calls cost maxcycles bpf KiB maxlen maxlen2 algorithm module
======== =========== ============ ======== ==== ======== ======== ======================== ============================
3641 177182 10230 0 0 4096 0 __ghash-pclmulqdqni ghash_clmulni_intel
2242 150516 1684 0 0 8112 0 crc32-pclmul crc32_pclmul
1008 43800 22404 0 0 8068 8105 crc32c-intel crc32c_intel
2565 179734 4286 0 0 7791 8027 crct10dif-pclmul crct10dif_pclmul
1603 77112 2414 0 0 8132 0 nhpoly1305-avx2 nhpoly1305_avx2
1671 81108 9390 203776 199 8109 0 nhpoly1305-sse2 nhpoly1305_sse2
1977 103598 5314 0 0 8112 0 poly1305-simd poly1305_x86_64
26744 1251756 2046 0 0 8096 0 polyval-clmulni polyval_clmulni
14669 682428 65462 30720 30 251 8096 sha1-avx sha1_ssse3
14669 682428 65462 0 0 7170 0 sha1-avx2 sha1_ssse3
14669 682428 65462 34816 34 0 0 sha1-shani sha1_ssse3
14669 682428 65462 26624 26 8089 8164 sha1-ssse3 sha1_ssse3
26768 1230100 144902 11264 11 8130 8159 sha224-avx sha256_ssse3
26768 1230100 144902 13312 13 8078 8146 sha224-avx2 sha256_ssse3
26768 1230100 144902 13312 13 0 0 sha224-shani sha256_ssse3
26768 1230100 144902 11264 11 8068 8168 sha224-ssse3 sha256_ssse3
26768 1230100 144902 11264 11 8130 8159 sha256-avx sha256_ssse3
26768 1230100 144902 13312 13 8078 8146 sha256-avx2 sha256_ssse3
26768 1230100 144902 13312 13 0 0 sha256-shani sha256_ssse3
26768 1230100 144902 11264 11 8068 8168 sha256-ssse3 sha256_ssse3
29157 2044882 164510724 17408 17 0 8127 sha384-avx sha512_ssse3
29157 2044882 164510724 0 0 0 48175432 sha384-avx2 sha512_ssse3
29157 2044882 164510724 17408 17 0 8055 sha384-ssse3 sha512_ssse3
29157 2044882 164510724 17408 17 0 8127 sha512-avx sha512_ssse3
29157 2044882 164510724 0 0 0 48175432 sha512-avx2 sha512_ssse3
29157 2044882 164510724 17408 17 0 8055 sha512-ssse3 sha512_ssse3
4314 193456 124918 0 0 7672 8101 sm3-avx sm3_avx_x86_64

The self-tests only test small data sets (even the extra tests
limit themselves to PAGE_SIZE * 2) so only the sha512_ssse3
module was stressed with large requests.

The cost of the kernel_fpu_begin()/end() calls (2044882 cycles) was
929 us, and the longest time in FPU context (164510724) was 75 ms.
I think the biggest file it encounters is:
-rw-r--r--. 1 root root 48186713 Nov 1 13:14 /lib/modules/6.0.0+/kernel/fs/xfs/xfs.ko


I added tcrypt tests to exercise each driver ten times with 1 MiB data,
and that exposes all the drivers to larger requests.

bigbuf tests with no limits:
calls cost maxcycles bpf KiB maxlen maxlen2 algorithm module
======== =========== ============ ======== ==== ======== ======== ======================== ============================
1000 156354 1484434 0 0 1048576 0 __ghash-pclmulqdqni ghash_clmulni_intel
1000 150386 221710 0 0 1048576 0 crc32-pclmul crc32_pclmul
1000 104890 114000 0 0 1048576 0 crc32c-intel crc32c_intel
1000 169596 182904 0 0 1048576 0 crct10dif-pclmul crct10dif_pclmul
1000 122842 267568 0 0 1048576 0 nhpoly1305-avx2 nhpoly1305_avx2
1000 190530 453118 0 0 1048576 0 nhpoly1305-sse2 nhpoly1305_sse2
1000 134682 431264 0 0 1048576 0 poly1305-simd poly1305_x86_64
8000 387206 215922 0 0 1048576 0 polyval-clmulni polyval_clmulni
6000 562932 2831190 0 0 1048576 0 sha1-avx sha1_ssse3
6000 562932 2831190 0 0 1048576 0 sha1-avx2 sha1_ssse3
6000 562932 2831190 34816 34 0 0 sha1-shani sha1_ssse3
6000 562932 2831190 0 0 1048576 0 sha1-ssse3 sha1_ssse3
12000 1212742 6558712 0 0 1048576 0 sha224-avx sha256_ssse3
12000 1212742 6558712 0 0 1048576 0 sha224-avx2 sha256_ssse3
12000 1212742 6558712 13312 13 0 0 sha224-shani sha256_ssse3
12000 1212742 6558712 0 0 1048576 0 sha224-ssse3 sha256_ssse3
12000 1212742 6558712 0 0 1048576 0 sha256-avx sha256_ssse3
12000 1212742 6558712 0 0 1048576 0 sha256-avx2 sha256_ssse3
12000 1212742 6558712 13312 13 0 0 sha256-shani sha256_ssse3
12000 1212742 6558712 0 0 1048576 0 sha256-ssse3 sha256_ssse3
12006 1250296 4621038 0 0 1048576 0 sha384-avx sha512_ssse3
12006 1250296 4621038 0 0 1048576 1037416 sha384-avx2 sha512_ssse3
12006 1250296 4621038 0 0 1048576 0 sha384-ssse3 sha512_ssse3
12006 1250296 4621038 0 0 1048576 0 sha512-avx sha512_ssse3
12006 1250296 4621038 0 0 1048576 1037416 sha512-avx2 sha512_ssse3
12006 1250296 4621038 0 0 1048576 0 sha512-ssse3 sha512_ssse3
2000 221468 6236756 0 0 1048576 0 sm3-avx sm3_avx_x86_64

Setting bpf limits based on those results narrows the maxcycles in
FPU context. I've seen results vary from 81912 (37 us) to
(102 us) - not real tight, but much better than ranging up
to 75 ms.

bigbuf tests with bytes_per_fpu limits as shown:
calls cost maxcycles bpf KiB maxlen maxlen2 algorithm module
======== =========== ============ ======== ==== ======== ======== ======================== ============================
21000 1002372 138558 51200 50 51200 0 __ghash-pclmulqdqni ghash_clmulni_intel
2000 220666 226806 646912 631 646912 0 crc32-pclmul crc32_pclmul
2000 255110 105968 895232 874 895232 0 crc32c-intel crc32c_intel
2000 218942 107930 626944 612 626944 0 crct10dif-pclmul crct10dif_pclmul
4000 208170 141356 345088 337 345088 0 nhpoly1305-avx2 nhpoly1305_avx2
6000 285286 105072 203520 198 203520 0 nhpoly1305-sse2 nhpoly1305_sse2
5000 368866 162262 222976 217 222976 0 poly1305-simd poly1305_x86_64
10000 457010 142362 402688 393 402688 0 polyval-clmulni polyval_clmulni
108000 6048076 160670 30720 30 30720 0 sha1-avx sha1_ssse3
108000 6048076 160670 34816 34 34816 0 sha1-avx2 sha1_ssse3
108000 6048076 160670 27392 26 27392 0 sha1-ssse3 sha1_ssse3
520000 23646576 196462 11520 11 11520 0 sha224-avx sha256_ssse3
520000 23646576 196462 14080 13 14080 0 sha224-avx2 sha256_ssse3
520000 23646576 196462 11776 11 11776 0 sha224-ssse3 sha256_ssse3
520000 23646576 196462 11520 11 11520 0 sha256-avx sha256_ssse3
520000 23646576 196462 14080 13 14080 0 sha256-avx2 sha256_ssse3
520000 23646576 196462 11776 11 11776 0 sha256-ssse3 sha256_ssse3
356156 18242860 226538 17152 16 17152 0 sha384-avx sha512_ssse3
356156 18242860 226538 20480 20 20480 20480 sha384-avx2 sha512_ssse3
356156 18242860 226538 17408 17 17408 0 sha384-ssse3 sha512_ssse3
356156 18242860 226538 17152 16 17152 0 sha512-avx sha512_ssse3
356156 18242860 226538 20480 20 20480 20480 sha512-avx2 sha512_ssse3
356156 18242860 226538 17408 17 17408 0 sha512-ssse3 sha512_ssse3
93000 4537164 138924 11520 11 11520 0 sm3-avx sm3_avx_x86_64

If I reboot with sha512-avx2 set to 20 KiB, the sha512-avx2
maxlength can still take a long time (e.g., 2 ms). That's much
better than the original 75 ms, but still not in the 50 us range.

I set /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor to
"performance" in .bash_profile, but that's not effective during
boot, so maybe that is the source of variability.

Example boot with 20 KiB limit:
calls cost maxcycles bpf KiB maxlen maxlen2 algorithm module
======== =========== ============ ======== ==== ======== ======== ======================== ============================
161011 16232280 4049644 20480 20 0 20480 sha512-avx2 sha512_ssse3

Limiting it to 1 KiB does reduce maxcycles to the us range, but
the cost of all the extra calls soars.

So, for v3 of the series, I plan to propose values ranging from:
- 11 to 20 KiB for sha* amd sm3
- 200 to 400 Kib for *poly*
- 600 to 800 KiB for crc*

v3 will only cover the hash functions - skcipher and aead
have some unique challenges that we can tackle later.


Subject: [PATCH v3 00/17] crypt: x86 - fix RCU stalls

This series fixes the RCU stalls triggered by the x86 crypto
modules discussed in
https://lore.kernel.org/all/MW5PR84MB18426EBBA3303770A8BC0BDFAB759@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/

Two root causes were:
- too much data processed between kernel_fpu_begin and
kernel_fpu_end calls (which are heavily used by the x86
optimized drivers)
- tcrypt not calling cond_resched during speed test loops

These problems have always been lurking, but improving the
loading of the x86/sha512 module led to it happening a lot
during boot when using SHA-512 for module signature checking.

Fixing these problems makes it safer to improve loading
the rest of the x86 modules like the sha512 module.

This series only handles the x86 modules.

Except for the tcrypt change, v3 only tackles the hash functions
as discussed in
https://lore.kernel.org/lkml/MW5PR84MB184284FBED63E2D043C93A6FAB369@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/

The limits are implemented as static const unsigned ints at the
module level, which makes them easy to expose as module parameters
for testing like this:
-static const unsigned int bytes_per_fpu = 655 * 1024;
+static unsigned int bytes_per_fpu = 655 * 1024;
+module_param(bytes_per_fpu, uint, 0644);
+MODULE_PARM_DESC(bytes_per_fpu, "Bytes per FPU context");


Robert Elliott (17):
crypto: tcrypt - test crc32
crypto: tcrypt - test nhpoly1305
crypto: tcrypt - reschedule during cycles speed tests
crypto: x86/sha - limit FPU preemption
crypto: x86/crc - limit FPU preemption
crypto: x86/sm3 - limit FPU preemption
crypto: x86/ghash - use u8 rather than char
crypto: x86/ghash - restructure FPU context saving
crypto: x86/ghash - limit FPU preemption
crypto: x86/*poly* - limit FPU preemption
crypto: x86/sha - register all variations
crypto: x86/sha - minimize time in FPU context
crypto: x86/sha1, sha256 - load based on CPU features
crypto: x86/crc - load based on CPU features
crypto: x86/sm3 - load based on CPU features
crypto: x86/ghash,polyval - load based on CPU features
crypto: x86/nhpoly1305, poly1305 - load based on CPU features

arch/x86/crypto/crc32-pclmul_asm.S | 6 +-
arch/x86/crypto/crc32-pclmul_glue.c | 36 ++-
arch/x86/crypto/crc32c-intel_glue.c | 58 +++--
arch/x86/crypto/crct10dif-pclmul_glue.c | 54 ++--
arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 +-
arch/x86/crypto/ghash-clmulni-intel_glue.c | 43 ++--
arch/x86/crypto/nhpoly1305-avx2-glue.c | 21 +-
arch/x86/crypto/nhpoly1305-sse2-glue.c | 21 +-
arch/x86/crypto/poly1305_glue.c | 49 +++-
arch/x86/crypto/polyval-clmulni_glue.c | 14 +-
arch/x86/crypto/sha1_ssse3_glue.c | 276 +++++++++++++--------
arch/x86/crypto/sha256_ssse3_glue.c | 268 +++++++++++++-------
arch/x86/crypto/sha512_ssse3_glue.c | 191 ++++++++------
arch/x86/crypto/sm3_avx_glue.c | 45 +++-
crypto/tcrypt.c | 56 +++--
15 files changed, 764 insertions(+), 378 deletions(-)

--
2.37.3


Subject: [PATCH v3 05/17] crypto: x86/crc - limit FPU preemption

Limit the number of bytes processed between kernel_fpu_begin() and
kernel_fpu_end() calls.

Those functions call preempt_disable() and preempt_enable(), so
the CPU core is unavailable for scheduling while running, leading to:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ...

Fixes: 78c37d191dd6 ("crypto: crc32 - add crc32 pclmulqdq implementation and wrappers for table implementation")
Fixes: 6a8ce1ef3940 ("crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction")
Fixes: 0b95a7f85718 ("crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform")
Suggested-by: Herbert Xu <[email protected]>
Signed-off-by: Robert Elliott <[email protected]>

---
v3 use while loops and static int, simplify one of the loop structures,
add algorithm-specific limits, use local stack variable in crc32 finup
rather than the context pointer like update uses
---
arch/x86/crypto/crc32-pclmul_asm.S | 6 +--
arch/x86/crypto/crc32-pclmul_glue.c | 27 +++++++++----
arch/x86/crypto/crc32c-intel_glue.c | 52 ++++++++++++++++++-------
arch/x86/crypto/crct10dif-pclmul_glue.c | 48 +++++++++++++++++------
4 files changed, 99 insertions(+), 34 deletions(-)

diff --git a/arch/x86/crypto/crc32-pclmul_asm.S b/arch/x86/crypto/crc32-pclmul_asm.S
index ca53e96996ac..9abd861636c3 100644
--- a/arch/x86/crypto/crc32-pclmul_asm.S
+++ b/arch/x86/crypto/crc32-pclmul_asm.S
@@ -72,15 +72,15 @@
.text
/**
* Calculate crc32
- * BUF - buffer (16 bytes aligned)
- * LEN - sizeof buffer (16 bytes aligned), LEN should be grater than 63
+ * BUF - buffer - must be 16 bytes aligned
+ * LEN - sizeof buffer - must be multiple of 16 bytes and greater than 63
* CRC - initial crc32
* return %eax crc32
* uint crc32_pclmul_le_16(unsigned char const *buffer,
* size_t len, uint crc32)
*/

-SYM_FUNC_START(crc32_pclmul_le_16) /* buffer and buffer size are 16 bytes aligned */
+SYM_FUNC_START(crc32_pclmul_le_16)
movdqa (BUF), %xmm1
movdqa 0x10(BUF), %xmm2
movdqa 0x20(BUF), %xmm3
diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c
index 98cf3b4e4c9f..df3dbc754818 100644
--- a/arch/x86/crypto/crc32-pclmul_glue.c
+++ b/arch/x86/crypto/crc32-pclmul_glue.c
@@ -46,6 +46,9 @@
#define SCALE_F 16L /* size of xmm register */
#define SCALE_F_MASK (SCALE_F - 1)

+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 655 * 1024;
+
u32 crc32_pclmul_le_16(unsigned char const *buffer, size_t len, u32 crc32);

static u32 __attribute__((pure))
@@ -55,6 +58,9 @@ static u32 __attribute__((pure))
unsigned int iremainder;
unsigned int prealign;

+ BUILD_BUG_ON(bytes_per_fpu < PCLMUL_MIN_LEN);
+ BUILD_BUG_ON(bytes_per_fpu & SCALE_F_MASK);
+
if (len < PCLMUL_MIN_LEN + SCALE_F_MASK || !crypto_simd_usable())
return crc32_le(crc, p, len);

@@ -70,12 +76,19 @@ static u32 __attribute__((pure))
iquotient = len & (~SCALE_F_MASK);
iremainder = len & SCALE_F_MASK;

- kernel_fpu_begin();
- crc = crc32_pclmul_le_16(p, iquotient, crc);
- kernel_fpu_end();
+ while (iquotient >= PCLMUL_MIN_LEN) {
+ unsigned int chunk = min(iquotient, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ crc = crc32_pclmul_le_16(p, chunk, crc);
+ kernel_fpu_end();
+
+ iquotient -= chunk;
+ p += chunk;
+ }

- if (iremainder)
- crc = crc32_le(crc, p + iquotient, iremainder);
+ if (iquotient || iremainder)
+ crc = crc32_le(crc, p, iquotient + iremainder);

return crc;
}
@@ -120,8 +133,8 @@ static int crc32_pclmul_update(struct shash_desc *desc, const u8 *data,
}

/* No final XOR 0xFFFFFFFF, like crc32_le */
-static int __crc32_pclmul_finup(u32 *crcp, const u8 *data, unsigned int len,
- u8 *out)
+static int __crc32_pclmul_finup(const u32 *crcp, const u8 *data,
+ unsigned int len, u8 *out)
{
*(__le32 *)out = cpu_to_le32(crc32_pclmul_le(*crcp, data, len));
return 0;
diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index feccb5254c7e..f08ed68ec93d 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -45,7 +45,10 @@ asmlinkage unsigned int crc_pcl(const u8 *buffer, int len,
unsigned int crc_init);
#endif /* CONFIG_X86_64 */

-static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t length)
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 868 * 1024;
+
+static u32 crc32c_intel_le_hw_byte(u32 crc, const unsigned char *data, size_t length)
{
while (length--) {
asm("crc32b %1, %0"
@@ -56,7 +59,7 @@ static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t le
return crc;
}

-static u32 __pure crc32c_intel_le_hw(u32 crc, unsigned char const *p, size_t len)
+static u32 __pure crc32c_intel_le_hw(u32 crc, const unsigned char *p, size_t len)
{
unsigned int iquotient = len / SCALE_F;
unsigned int iremainder = len % SCALE_F;
@@ -110,8 +113,8 @@ static int crc32c_intel_update(struct shash_desc *desc, const u8 *data,
return 0;
}

-static int __crc32c_intel_finup(u32 *crcp, const u8 *data, unsigned int len,
- u8 *out)
+static int __crc32c_intel_finup(const u32 *crcp, const u8 *data,
+ unsigned int len, u8 *out)
{
*(__le32 *)out = ~cpu_to_le32(crc32c_intel_le_hw(*crcp, data, len));
return 0;
@@ -153,29 +156,52 @@ static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data,
{
u32 *crcp = shash_desc_ctx(desc);

+ BUILD_BUG_ON(bytes_per_fpu < CRC32C_PCL_BREAKEVEN);
+ BUILD_BUG_ON(bytes_per_fpu % SCALE_F);
+
/*
* use faster PCL version if datasize is large enough to
* overcome kernel fpu state save/restore overhead
*/
if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) {
- kernel_fpu_begin();
- *crcp = crc_pcl(data, len, *crcp);
- kernel_fpu_end();
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ *crcp = crc_pcl(data, chunk, *crcp);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
} else
*crcp = crc32c_intel_le_hw(*crcp, data, len);
return 0;
}

-static int __crc32c_pcl_intel_finup(u32 *crcp, const u8 *data, unsigned int len,
- u8 *out)
+static int __crc32c_pcl_intel_finup(const u32 *crcp, const u8 *data,
+ unsigned int len, u8 *out)
{
+ u32 crc = *crcp;
+
+ BUILD_BUG_ON(bytes_per_fpu < CRC32C_PCL_BREAKEVEN);
+ BUILD_BUG_ON(bytes_per_fpu % SCALE_F);
+
if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) {
- kernel_fpu_begin();
- *(__le32 *)out = ~cpu_to_le32(crc_pcl(data, len, *crcp));
- kernel_fpu_end();
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ crc = crc_pcl(data, chunk, crc);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
+ *(__le32 *)out = ~cpu_to_le32(crc);
} else
*(__le32 *)out =
- ~cpu_to_le32(crc32c_intel_le_hw(*crcp, data, len));
+ ~cpu_to_le32(crc32c_intel_le_hw(crc, data, len));
return 0;
}

diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c
index 71291d5af9f4..4f6b8c727d88 100644
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c
+++ b/arch/x86/crypto/crct10dif-pclmul_glue.c
@@ -34,6 +34,11 @@
#include <asm/cpu_device_id.h>
#include <asm/simd.h>

+#define PCLMUL_MIN_LEN 16U /* minimum size of buffer for crc_t10dif_pcl */
+
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 614 * 1024;
+
asmlinkage u16 crc_t10dif_pcl(u16 init_crc, const u8 *buf, size_t len);

struct chksum_desc_ctx {
@@ -54,11 +59,21 @@ static int chksum_update(struct shash_desc *desc, const u8 *data,
{
struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);

- if (length >= 16 && crypto_simd_usable()) {
- kernel_fpu_begin();
- ctx->crc = crc_t10dif_pcl(ctx->crc, data, length);
- kernel_fpu_end();
- } else
+ BUILD_BUG_ON(bytes_per_fpu < PCLMUL_MIN_LEN);
+
+ if (length >= PCLMUL_MIN_LEN && crypto_simd_usable()) {
+ while (length >= PCLMUL_MIN_LEN) {
+ unsigned int chunk = min(length, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ ctx->crc = crc_t10dif_pcl(ctx->crc, data, chunk);
+ kernel_fpu_end();
+
+ length -= chunk;
+ data += chunk;
+ }
+ }
+ if (length)
ctx->crc = crc_t10dif_generic(ctx->crc, data, length);
return 0;
}
@@ -73,12 +88,23 @@ static int chksum_final(struct shash_desc *desc, u8 *out)

static int __chksum_finup(__u16 crc, const u8 *data, unsigned int len, u8 *out)
{
- if (len >= 16 && crypto_simd_usable()) {
- kernel_fpu_begin();
- *(__u16 *)out = crc_t10dif_pcl(crc, data, len);
- kernel_fpu_end();
- } else
- *(__u16 *)out = crc_t10dif_generic(crc, data, len);
+ BUILD_BUG_ON(bytes_per_fpu < PCLMUL_MIN_LEN);
+
+ if (len >= PCLMUL_MIN_LEN && crypto_simd_usable()) {
+ while (len >= PCLMUL_MIN_LEN) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ crc = crc_t10dif_pcl(crc, data, chunk);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
+ }
+ if (len)
+ crc = crc_t10dif_generic(crc, data, len);
+ *(__u16 *)out = crc;
return 0;
}

--
2.37.3


Subject: [PATCH v3 02/17] crypto: tcrypt - test nhpoly1305

Add self-test mode for nhpoly1305.

Signed-off-by: Robert Elliott <[email protected]>
---
crypto/tcrypt.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 4426386dfb42..7a6a56751043 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1715,6 +1715,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
ret += tcrypt_test("crc32");
break;

+ case 60:
+ ret += tcrypt_test("nhpoly1305");
+ break;
+
case 100:
ret += tcrypt_test("hmac(md5)");
break;
--
2.37.3


Subject: [PATCH v3 01/17] crypto: tcrypt - test crc32

Add self-test and speed tests for crc32, paralleling those
offered for crc32c and crct10dif.

Signed-off-by: Robert Elliott <[email protected]>
---
crypto/tcrypt.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index a82679b576bb..4426386dfb42 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1711,6 +1711,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
ret += tcrypt_test("gcm(aria)");
break;

+ case 59:
+ ret += tcrypt_test("crc32");
+ break;
+
case 100:
ret += tcrypt_test("hmac(md5)");
break;
@@ -2317,6 +2321,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
generic_hash_speed_template);
if (mode > 300 && mode < 400) break;
fallthrough;
+ case 329:
+ test_hash_speed("crc32", sec, generic_hash_speed_template);
+ if (mode > 300 && mode < 400) break;
+ fallthrough;
case 399:
break;

--
2.37.3


Subject: [PATCH v3 08/17] crypto: x86/ghash - restructure FPU context saving

Wrap each of the calls to clmul_hash_update and clmul_ghash__mul
in its own set of kernel_fpu_begin and kernel_fpu_end calls, preparing
to limit the amount of data processed by each _update call to avoid
RCU stalls.

This is more like how polyval-clmulni_glue is structured.

Fixes: 0e1227d356e9 ("crypto: ghash - Add PCLMULQDQ accelerated implementation")
Suggested-by: Herbert Xu <[email protected]>
Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/ghash-clmulni-intel_glue.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index e996627c6583..22367e363d72 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -80,7 +80,6 @@ static int ghash_update(struct shash_desc *desc,
struct ghash_ctx *ctx = crypto_shash_ctx(desc->tfm);
u8 *dst = dctx->buffer;

- kernel_fpu_begin();
if (dctx->bytes) {
int n = min(srclen, dctx->bytes);
u8 *pos = dst + (GHASH_BLOCK_SIZE - dctx->bytes);
@@ -91,10 +90,14 @@ static int ghash_update(struct shash_desc *desc,
while (n--)
*pos++ ^= *src++;

- if (!dctx->bytes)
+ if (!dctx->bytes) {
+ kernel_fpu_begin();
clmul_ghash_mul(dst, &ctx->shash);
+ kernel_fpu_end();
+ }
}

+ kernel_fpu_begin();
clmul_ghash_update(dst, src, srclen, &ctx->shash);
kernel_fpu_end();

--
2.37.3


Subject: [PATCH v3 10/17] crypto: x86/*poly* - limit FPU preemption

Use a static const unsigned int for the limit of the number of bytes
processed between kernel_fpu_begin() and kernel_fpu_end() rather than
using the SZ_4K macro (which is a signed value), or a magic value
of 4096U embedded in the C code.

Use unsigned int rather than size_t for some of the arguments to
avoid typecasting for the min() macro.

Signed-off-by: Robert Elliott <[email protected]>

---
v3 use static int rather than macro, change to while loops
rather than do/while loops
---
arch/x86/crypto/nhpoly1305-avx2-glue.c | 11 +++++---
arch/x86/crypto/nhpoly1305-sse2-glue.c | 11 +++++---
arch/x86/crypto/poly1305_glue.c | 37 +++++++++++++++++---------
arch/x86/crypto/polyval-clmulni_glue.c | 8 ++++--
4 files changed, 46 insertions(+), 21 deletions(-)

diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index 8ea5ab0f1ca7..f7dc9c563bb5 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -13,6 +13,9 @@
#include <linux/sizes.h>
#include <asm/simd.h>

+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 337 * 1024;
+
asmlinkage void nh_avx2(const u32 *key, const u8 *message, size_t message_len,
u8 hash[NH_HASH_BYTES]);

@@ -26,18 +29,20 @@ static void _nh_avx2(const u32 *key, const u8 *message, size_t message_len,
static int nhpoly1305_avx2_update(struct shash_desc *desc,
const u8 *src, unsigned int srclen)
{
+ BUILD_BUG_ON(bytes_per_fpu == 0);
+
if (srclen < 64 || !crypto_simd_usable())
return crypto_nhpoly1305_update(desc, src, srclen);

- do {
- unsigned int n = min_t(unsigned int, srclen, SZ_4K);
+ while (srclen) {
+ unsigned int n = min(srclen, bytes_per_fpu);

kernel_fpu_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_avx2);
kernel_fpu_end();
src += n;
srclen -= n;
- } while (srclen);
+ }
return 0;
}

diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c
index 2b353d42ed13..daffcc7019ad 100644
--- a/arch/x86/crypto/nhpoly1305-sse2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c
@@ -13,6 +13,9 @@
#include <linux/sizes.h>
#include <asm/simd.h>

+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 199 * 1024;
+
asmlinkage void nh_sse2(const u32 *key, const u8 *message, size_t message_len,
u8 hash[NH_HASH_BYTES]);

@@ -26,18 +29,20 @@ static void _nh_sse2(const u32 *key, const u8 *message, size_t message_len,
static int nhpoly1305_sse2_update(struct shash_desc *desc,
const u8 *src, unsigned int srclen)
{
+ BUILD_BUG_ON(bytes_per_fpu == 0);
+
if (srclen < 64 || !crypto_simd_usable())
return crypto_nhpoly1305_update(desc, src, srclen);

- do {
- unsigned int n = min_t(unsigned int, srclen, SZ_4K);
+ while (srclen) {
+ unsigned int n = min(srclen, bytes_per_fpu);

kernel_fpu_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_sse2);
kernel_fpu_end();
src += n;
srclen -= n;
- } while (srclen);
+ }
return 0;
}

diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
index 1dfb8af48a3c..16831c036d71 100644
--- a/arch/x86/crypto/poly1305_glue.c
+++ b/arch/x86/crypto/poly1305_glue.c
@@ -15,20 +15,27 @@
#include <asm/intel-family.h>
#include <asm/simd.h>

+#define POLY1305_BLOCK_SIZE_MASK (~(POLY1305_BLOCK_SIZE - 1))
+
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 217 * 1024;
+
asmlinkage void poly1305_init_x86_64(void *ctx,
const u8 key[POLY1305_BLOCK_SIZE]);
asmlinkage void poly1305_blocks_x86_64(void *ctx, const u8 *inp,
- const size_t len, const u32 padbit);
+ const unsigned int len,
+ const u32 padbit);
asmlinkage void poly1305_emit_x86_64(void *ctx, u8 mac[POLY1305_DIGEST_SIZE],
const u32 nonce[4]);
asmlinkage void poly1305_emit_avx(void *ctx, u8 mac[POLY1305_DIGEST_SIZE],
const u32 nonce[4]);
-asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp, const size_t len,
- const u32 padbit);
-asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp, const size_t len,
- const u32 padbit);
+asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp,
+ const unsigned int len, const u32 padbit);
+asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp,
+ const unsigned int len, const u32 padbit);
asmlinkage void poly1305_blocks_avx512(void *ctx, const u8 *inp,
- const size_t len, const u32 padbit);
+ const unsigned int len,
+ const u32 padbit);

static __ro_after_init DEFINE_STATIC_KEY_FALSE(poly1305_use_avx);
static __ro_after_init DEFINE_STATIC_KEY_FALSE(poly1305_use_avx2);
@@ -86,14 +93,12 @@ static void poly1305_simd_init(void *ctx, const u8 key[POLY1305_BLOCK_SIZE])
poly1305_init_x86_64(ctx, key);
}

-static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len,
+static void poly1305_simd_blocks(void *ctx, const u8 *inp, unsigned int len,
const u32 padbit)
{
struct poly1305_arch_internal *state = ctx;

- /* SIMD disables preemption, so relax after processing each page. */
- BUILD_BUG_ON(SZ_4K < POLY1305_BLOCK_SIZE ||
- SZ_4K % POLY1305_BLOCK_SIZE);
+ BUILD_BUG_ON(bytes_per_fpu < POLY1305_BLOCK_SIZE);

if (!static_branch_likely(&poly1305_use_avx) ||
(len < (POLY1305_BLOCK_SIZE * 18) && !state->is_base2_26) ||
@@ -103,8 +108,14 @@ static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len,
return;
}

- do {
- const size_t bytes = min_t(size_t, len, SZ_4K);
+ while (len) {
+ unsigned int bytes;
+
+ if (len < POLY1305_BLOCK_SIZE)
+ bytes = len;
+ else
+ bytes = min(len,
+ bytes_per_fpu & POLY1305_BLOCK_SIZE_MASK);

kernel_fpu_begin();
if (IS_ENABLED(CONFIG_AS_AVX512) && static_branch_likely(&poly1305_use_avx512))
@@ -117,7 +128,7 @@ static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len,

len -= bytes;
inp += bytes;
- } while (len);
+ }
}

static void poly1305_simd_emit(void *ctx, u8 mac[POLY1305_DIGEST_SIZE],
diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c
index b7664d018851..de1c908f7412 100644
--- a/arch/x86/crypto/polyval-clmulni_glue.c
+++ b/arch/x86/crypto/polyval-clmulni_glue.c
@@ -29,6 +29,9 @@

#define NUM_KEY_POWERS 8

+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 393 * 1024;
+
struct polyval_tfm_ctx {
/*
* These powers must be in the order h^8, ..., h^1.
@@ -107,6 +110,8 @@ static int polyval_x86_update(struct shash_desc *desc,
unsigned int nblocks;
unsigned int n;

+ BUILD_BUG_ON(bytes_per_fpu < POLYVAL_BLOCK_SIZE);
+
if (dctx->bytes) {
n = min(srclen, dctx->bytes);
pos = dctx->buffer + POLYVAL_BLOCK_SIZE - dctx->bytes;
@@ -123,8 +128,7 @@ static int polyval_x86_update(struct shash_desc *desc,
}

while (srclen >= POLYVAL_BLOCK_SIZE) {
- /* Allow rescheduling every 4K bytes. */
- nblocks = min(srclen, 4096U) / POLYVAL_BLOCK_SIZE;
+ nblocks = min(srclen, bytes_per_fpu) / POLYVAL_BLOCK_SIZE;
internal_polyval_update(tctx, src, nblocks, dctx->buffer);
srclen -= nblocks * POLYVAL_BLOCK_SIZE;
src += nblocks * POLYVAL_BLOCK_SIZE;
--
2.37.3


Subject: [PATCH v3 04/17] crypto: x86/sha - limit FPU preemption

Limit the number of bytes processed between kernel_fpu_begin() and
kernel_fpu_end() calls.

Those functions call preempt_disable() and preempt_enable(), so
the CPU core is unavailable for scheduling while running.

This leads to "rcu_preempt detected expedited stalls" with stack dumps
pointing to the optimized hash function if the module is loaded and
used a lot:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ...

For example, that can occur during boot with the stack track pointing
to the sha512-x86 function if the system set to use SHA-512 for
module signing. The call trace includes:
module_sig_check
mod_verify_sig
pkcs7_verify
pkcs7_digest
sha512_finup
sha512_base_do_update

Fixes: 66be89515888 ("crypto: sha1 - SSSE3 based SHA1 implementation for x86-64")
Fixes: 8275d1aa6422 ("crypto: sha256 - Create module providing optimized SHA256 routines using SSSE3, AVX or AVX2 instructions.")
Fixes: 87de4579f92d ("crypto: sha512 - Create module providing optimized SHA512 routines using SSSE3, AVX or AVX2 instructions.")
Fixes: aa031b8f702e ("crypto: x86/sha512 - load based on CPU features")
Suggested-by: Herbert Xu <[email protected]>
Reviewed-by: Tim Chen <[email protected]>
Signed-off-by: Robert Elliott <[email protected]>

---
v3 simplify to while loops rather than do..while loops, avoid
redundant checks for zero length, rename the limit macro and
change into a const, vary the limit for each algo
---
arch/x86/crypto/sha1_ssse3_glue.c | 64 ++++++++++++++++++++++-------
arch/x86/crypto/sha256_ssse3_glue.c | 64 ++++++++++++++++++++++-------
arch/x86/crypto/sha512_ssse3_glue.c | 55 +++++++++++++++++++------
3 files changed, 140 insertions(+), 43 deletions(-)

diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 44340a1139e0..4bc77c84b0fb 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -26,8 +26,17 @@
#include <crypto/sha1_base.h>
#include <asm/simd.h>

+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+#ifdef CONFIG_AS_SHA1_NI
+static const unsigned int bytes_per_fpu_shani = 34 * 1024;
+#endif
+static const unsigned int bytes_per_fpu_avx2 = 34 * 1024;
+static const unsigned int bytes_per_fpu_avx = 30 * 1024;
+static const unsigned int bytes_per_fpu_ssse3 = 26 * 1024;
+
static int sha1_update(struct shash_desc *desc, const u8 *data,
- unsigned int len, sha1_block_fn *sha1_xform)
+ unsigned int len, unsigned int bytes_per_fpu,
+ sha1_block_fn *sha1_xform)
{
struct sha1_state *sctx = shash_desc_ctx(desc);

@@ -41,22 +50,39 @@ static int sha1_update(struct shash_desc *desc, const u8 *data,
*/
BUILD_BUG_ON(offsetof(struct sha1_state, state) != 0);

- kernel_fpu_begin();
- sha1_base_do_update(desc, data, len, sha1_xform);
- kernel_fpu_end();
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sha1_base_do_update(desc, data, chunk, sha1_xform);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }

return 0;
}

static int sha1_finup(struct shash_desc *desc, const u8 *data,
- unsigned int len, u8 *out, sha1_block_fn *sha1_xform)
+ unsigned int len, unsigned int bytes_per_fpu,
+ u8 *out, sha1_block_fn *sha1_xform)
{
if (!crypto_simd_usable())
return crypto_sha1_finup(desc, data, len, out);

+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sha1_base_do_update(desc, data, chunk, sha1_xform);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
+
kernel_fpu_begin();
- if (len)
- sha1_base_do_update(desc, data, len, sha1_xform);
sha1_base_do_finalize(desc, sha1_xform);
kernel_fpu_end();

@@ -69,13 +95,15 @@ asmlinkage void sha1_transform_ssse3(struct sha1_state *state,
static int sha1_ssse3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha1_update(desc, data, len, sha1_transform_ssse3);
+ return sha1_update(desc, data, len, bytes_per_fpu_ssse3,
+ sha1_transform_ssse3);
}

static int sha1_ssse3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha1_finup(desc, data, len, out, sha1_transform_ssse3);
+ return sha1_finup(desc, data, len, bytes_per_fpu_ssse3, out,
+ sha1_transform_ssse3);
}

/* Add padding and return the message digest. */
@@ -119,13 +147,15 @@ asmlinkage void sha1_transform_avx(struct sha1_state *state,
static int sha1_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha1_update(desc, data, len, sha1_transform_avx);
+ return sha1_update(desc, data, len, bytes_per_fpu_avx,
+ sha1_transform_avx);
}

static int sha1_avx_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha1_finup(desc, data, len, out, sha1_transform_avx);
+ return sha1_finup(desc, data, len, bytes_per_fpu_avx, out,
+ sha1_transform_avx);
}

static int sha1_avx_final(struct shash_desc *desc, u8 *out)
@@ -201,13 +231,15 @@ static void sha1_apply_transform_avx2(struct sha1_state *state,
static int sha1_avx2_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha1_update(desc, data, len, sha1_apply_transform_avx2);
+ return sha1_update(desc, data, len, bytes_per_fpu_avx2,
+ sha1_apply_transform_avx2);
}

static int sha1_avx2_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha1_finup(desc, data, len, out, sha1_apply_transform_avx2);
+ return sha1_finup(desc, data, len, bytes_per_fpu_avx2, out,
+ sha1_apply_transform_avx2);
}

static int sha1_avx2_final(struct shash_desc *desc, u8 *out)
@@ -251,13 +283,15 @@ asmlinkage void sha1_ni_transform(struct sha1_state *digest, const u8 *data,
static int sha1_ni_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha1_update(desc, data, len, sha1_ni_transform);
+ return sha1_update(desc, data, len, bytes_per_fpu_shani,
+ sha1_ni_transform);
}

static int sha1_ni_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha1_finup(desc, data, len, out, sha1_ni_transform);
+ return sha1_finup(desc, data, len, bytes_per_fpu_shani, out,
+ sha1_ni_transform);
}

static int sha1_ni_final(struct shash_desc *desc, u8 *out)
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 3a5f6be7dbba..cdcdf5a80ffe 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -40,11 +40,20 @@
#include <linux/string.h>
#include <asm/simd.h>

+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+#ifdef CONFIG_AS_SHA256_NI
+static const unsigned int bytes_per_fpu_shani = 13 * 1024;
+#endif
+static const unsigned int bytes_per_fpu_avx2 = 13 * 1024;
+static const unsigned int bytes_per_fpu_avx = 11 * 1024;
+static const unsigned int bytes_per_fpu_ssse3 = 11 * 1024;
+
asmlinkage void sha256_transform_ssse3(struct sha256_state *state,
const u8 *data, int blocks);

static int _sha256_update(struct shash_desc *desc, const u8 *data,
- unsigned int len, sha256_block_fn *sha256_xform)
+ unsigned int len, unsigned int bytes_per_fpu,
+ sha256_block_fn *sha256_xform)
{
struct sha256_state *sctx = shash_desc_ctx(desc);

@@ -58,22 +67,39 @@ static int _sha256_update(struct shash_desc *desc, const u8 *data,
*/
BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);

- kernel_fpu_begin();
- sha256_base_do_update(desc, data, len, sha256_xform);
- kernel_fpu_end();
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sha256_base_do_update(desc, data, chunk, sha256_xform);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }

return 0;
}

static int sha256_finup(struct shash_desc *desc, const u8 *data,
- unsigned int len, u8 *out, sha256_block_fn *sha256_xform)
+ unsigned int len, unsigned int bytes_per_fpu,
+ u8 *out, sha256_block_fn *sha256_xform)
{
if (!crypto_simd_usable())
return crypto_sha256_finup(desc, data, len, out);

+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sha256_base_do_update(desc, data, chunk, sha256_xform);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
+
kernel_fpu_begin();
- if (len)
- sha256_base_do_update(desc, data, len, sha256_xform);
sha256_base_do_finalize(desc, sha256_xform);
kernel_fpu_end();

@@ -83,13 +109,15 @@ static int sha256_finup(struct shash_desc *desc, const u8 *data,
static int sha256_ssse3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return _sha256_update(desc, data, len, sha256_transform_ssse3);
+ return _sha256_update(desc, data, len, bytes_per_fpu_ssse3,
+ sha256_transform_ssse3);
}

static int sha256_ssse3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha256_finup(desc, data, len, out, sha256_transform_ssse3);
+ return sha256_finup(desc, data, len, bytes_per_fpu_ssse3,
+ out, sha256_transform_ssse3);
}

/* Add padding and return the message digest. */
@@ -149,13 +177,15 @@ asmlinkage void sha256_transform_avx(struct sha256_state *state,
static int sha256_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return _sha256_update(desc, data, len, sha256_transform_avx);
+ return _sha256_update(desc, data, len, bytes_per_fpu_avx,
+ sha256_transform_avx);
}

static int sha256_avx_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha256_finup(desc, data, len, out, sha256_transform_avx);
+ return sha256_finup(desc, data, len, bytes_per_fpu_avx,
+ out, sha256_transform_avx);
}

static int sha256_avx_final(struct shash_desc *desc, u8 *out)
@@ -225,13 +255,15 @@ asmlinkage void sha256_transform_rorx(struct sha256_state *state,
static int sha256_avx2_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return _sha256_update(desc, data, len, sha256_transform_rorx);
+ return _sha256_update(desc, data, len, bytes_per_fpu_avx2,
+ sha256_transform_rorx);
}

static int sha256_avx2_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha256_finup(desc, data, len, out, sha256_transform_rorx);
+ return sha256_finup(desc, data, len, bytes_per_fpu_avx2,
+ out, sha256_transform_rorx);
}

static int sha256_avx2_final(struct shash_desc *desc, u8 *out)
@@ -300,13 +332,15 @@ asmlinkage void sha256_ni_transform(struct sha256_state *digest,
static int sha256_ni_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return _sha256_update(desc, data, len, sha256_ni_transform);
+ return _sha256_update(desc, data, len, bytes_per_fpu_shani,
+ sha256_ni_transform);
}

static int sha256_ni_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha256_finup(desc, data, len, out, sha256_ni_transform);
+ return sha256_finup(desc, data, len, bytes_per_fpu_shani,
+ out, sha256_ni_transform);
}

static int sha256_ni_final(struct shash_desc *desc, u8 *out)
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 6d3b85e53d0e..c7036cfe2a7e 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -39,11 +39,17 @@
#include <asm/cpu_device_id.h>
#include <asm/simd.h>

+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu_avx2 = 20 * 1024;
+static const unsigned int bytes_per_fpu_avx = 17 * 1024;
+static const unsigned int bytes_per_fpu_ssse3 = 17 * 1024;
+
asmlinkage void sha512_transform_ssse3(struct sha512_state *state,
const u8 *data, int blocks);

static int sha512_update(struct shash_desc *desc, const u8 *data,
- unsigned int len, sha512_block_fn *sha512_xform)
+ unsigned int len, unsigned int bytes_per_fpu,
+ sha512_block_fn *sha512_xform)
{
struct sha512_state *sctx = shash_desc_ctx(desc);

@@ -57,22 +63,39 @@ static int sha512_update(struct shash_desc *desc, const u8 *data,
*/
BUILD_BUG_ON(offsetof(struct sha512_state, state) != 0);

- kernel_fpu_begin();
- sha512_base_do_update(desc, data, len, sha512_xform);
- kernel_fpu_end();
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sha512_base_do_update(desc, data, chunk, sha512_xform);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }

return 0;
}

static int sha512_finup(struct shash_desc *desc, const u8 *data,
- unsigned int len, u8 *out, sha512_block_fn *sha512_xform)
+ unsigned int len, unsigned int bytes_per_fpu,
+ u8 *out, sha512_block_fn *sha512_xform)
{
if (!crypto_simd_usable())
return crypto_sha512_finup(desc, data, len, out);

+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ sha512_base_do_update(desc, data, chunk, sha512_xform);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
+
kernel_fpu_begin();
- if (len)
- sha512_base_do_update(desc, data, len, sha512_xform);
sha512_base_do_finalize(desc, sha512_xform);
kernel_fpu_end();

@@ -82,13 +105,15 @@ static int sha512_finup(struct shash_desc *desc, const u8 *data,
static int sha512_ssse3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha512_update(desc, data, len, sha512_transform_ssse3);
+ return sha512_update(desc, data, len, bytes_per_fpu_ssse3,
+ sha512_transform_ssse3);
}

static int sha512_ssse3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha512_finup(desc, data, len, out, sha512_transform_ssse3);
+ return sha512_finup(desc, data, len, bytes_per_fpu_ssse3,
+ out, sha512_transform_ssse3);
}

/* Add padding and return the message digest. */
@@ -158,13 +183,15 @@ static bool avx_usable(void)
static int sha512_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha512_update(desc, data, len, sha512_transform_avx);
+ return sha512_update(desc, data, len, bytes_per_fpu_avx,
+ sha512_transform_avx);
}

static int sha512_avx_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha512_finup(desc, data, len, out, sha512_transform_avx);
+ return sha512_finup(desc, data, len, bytes_per_fpu_avx,
+ out, sha512_transform_avx);
}

/* Add padding and return the message digest. */
@@ -224,13 +251,15 @@ asmlinkage void sha512_transform_rorx(struct sha512_state *state,
static int sha512_avx2_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
- return sha512_update(desc, data, len, sha512_transform_rorx);
+ return sha512_update(desc, data, len, bytes_per_fpu_avx2,
+ sha512_transform_rorx);
}

static int sha512_avx2_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
- return sha512_finup(desc, data, len, out, sha512_transform_rorx);
+ return sha512_finup(desc, data, len, bytes_per_fpu_avx2,
+ out, sha512_transform_rorx);
}

/* Add padding and return the message digest. */
--
2.37.3


Subject: [PATCH v3 11/17] crypto: x86/sha - register all variations

Don't register and unregister each of the functions from least-
to most-optimized (e.g., SSSE3 then AVX then AVX2); register all
variations.

This enables selecting those other algorithms if needed,
such as for testing with:
modprobe tcrypt mode=300 alg=sha512-avx
modprobe tcrypt mode=400 alg=sha512-avx

Suggested-by: Tim Chen <[email protected]>
Suggested-by: Herbert Xu <[email protected]>
Signed-off-by: Robert Elliott <[email protected]>

---
v3 register all the variations, not just the best one, per
Herbert's feedback. return -ENODEV if none are successful, 0
if any are successful
---
arch/x86/crypto/sha1_ssse3_glue.c | 135 +++++++++++++------------
arch/x86/crypto/sha256_ssse3_glue.c | 146 ++++++++++++++--------------
arch/x86/crypto/sha512_ssse3_glue.c | 108 ++++++++++----------
3 files changed, 193 insertions(+), 196 deletions(-)

diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 4bc77c84b0fb..89aa5b787f2f 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -128,17 +128,17 @@ static struct shash_alg sha1_ssse3_alg = {
}
};

-static int register_sha1_ssse3(void)
-{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
- return crypto_register_shash(&sha1_ssse3_alg);
- return 0;
-}
+static bool sha1_ssse3_registered;
+static bool sha1_avx_registered;
+static bool sha1_avx2_registered;
+static bool sha1_ni_registered;

static void unregister_sha1_ssse3(void)
{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
+ if (sha1_ssse3_registered) {
crypto_unregister_shash(&sha1_ssse3_alg);
+ sha1_ssse3_registered = 0;
+ }
}

asmlinkage void sha1_transform_avx(struct sha1_state *state,
@@ -179,28 +179,12 @@ static struct shash_alg sha1_avx_alg = {
}
};

-static bool avx_usable(void)
-{
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
- if (boot_cpu_has(X86_FEATURE_AVX))
- pr_info("AVX detected but unusable.\n");
- return false;
- }
-
- return true;
-}
-
-static int register_sha1_avx(void)
-{
- if (avx_usable())
- return crypto_register_shash(&sha1_avx_alg);
- return 0;
-}
-
static void unregister_sha1_avx(void)
{
- if (avx_usable())
+ if (sha1_avx_registered) {
crypto_unregister_shash(&sha1_avx_alg);
+ sha1_avx_registered = 0;
+ }
}

#define SHA1_AVX2_BLOCK_OPTSIZE 4 /* optimal 4*64 bytes of SHA1 blocks */
@@ -208,16 +192,6 @@ static void unregister_sha1_avx(void)
asmlinkage void sha1_transform_avx2(struct sha1_state *state,
const u8 *data, int blocks);

-static bool avx2_usable(void)
-{
- if (avx_usable() && boot_cpu_has(X86_FEATURE_AVX2)
- && boot_cpu_has(X86_FEATURE_BMI1)
- && boot_cpu_has(X86_FEATURE_BMI2))
- return true;
-
- return false;
-}
-
static void sha1_apply_transform_avx2(struct sha1_state *state,
const u8 *data, int blocks)
{
@@ -263,17 +237,12 @@ static struct shash_alg sha1_avx2_alg = {
}
};

-static int register_sha1_avx2(void)
-{
- if (avx2_usable())
- return crypto_register_shash(&sha1_avx2_alg);
- return 0;
-}
-
static void unregister_sha1_avx2(void)
{
- if (avx2_usable())
+ if (sha1_avx2_registered) {
crypto_unregister_shash(&sha1_avx2_alg);
+ sha1_avx2_registered = 0;
+ }
}

#ifdef CONFIG_AS_SHA1_NI
@@ -315,49 +284,77 @@ static struct shash_alg sha1_ni_alg = {
}
};

-static int register_sha1_ni(void)
-{
- if (boot_cpu_has(X86_FEATURE_SHA_NI))
- return crypto_register_shash(&sha1_ni_alg);
- return 0;
-}
-
static void unregister_sha1_ni(void)
{
- if (boot_cpu_has(X86_FEATURE_SHA_NI))
+ if (sha1_ni_registered) {
crypto_unregister_shash(&sha1_ni_alg);
+ sha1_ni_registered = 0;
+ }
}

#else
-static inline int register_sha1_ni(void) { return 0; }
static inline void unregister_sha1_ni(void) { }
#endif

static int __init sha1_ssse3_mod_init(void)
{
- if (register_sha1_ssse3())
- goto fail;
+ const char *feature_name;
+ const char *driver_name = NULL;
+ int ret;
+
+#ifdef CONFIG_AS_SHA1_NI
+ /* SHA-NI */
+ if (boot_cpu_has(X86_FEATURE_SHA_NI)) {

- if (register_sha1_avx()) {
- unregister_sha1_ssse3();
- goto fail;
+ ret = crypto_register_shash(&sha1_ni_alg);
+ if (!ret)
+ sha1_ni_registered = 1;
+ }
+#endif
+
+ /* AVX2 */
+ if (boot_cpu_has(X86_FEATURE_AVX2)) {
+
+ if (boot_cpu_has(X86_FEATURE_BMI1) &&
+ boot_cpu_has(X86_FEATURE_BMI2)) {
+
+ ret = crypto_register_shash(&sha1_avx2_alg);
+ if (!ret) {
+ sha1_avx2_registered = 1;
+ driver_name = sha1_avx2_alg.base.cra_driver_name;
+ }
+ }
}

- if (register_sha1_avx2()) {
- unregister_sha1_avx();
- unregister_sha1_ssse3();
- goto fail;
+ /* AVX */
+ if (boot_cpu_has(X86_FEATURE_AVX)) {
+
+ if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
+ &feature_name)) {
+
+ ret = crypto_register_shash(&sha1_avx_alg);
+ if (!ret) {
+ sha1_avx_registered = 1;
+ driver_name = sha1_avx_alg.base.cra_driver_name;
+ }
+ }
}

- if (register_sha1_ni()) {
- unregister_sha1_avx2();
- unregister_sha1_avx();
- unregister_sha1_ssse3();
- goto fail;
+ /* SSE3 */
+ if (boot_cpu_has(X86_FEATURE_SSSE3)) {
+ ret = crypto_register_shash(&sha1_ssse3_alg);
+ if (!ret) {
+ sha1_ssse3_registered = 1;
+ driver_name = sha1_ssse3_alg.base.cra_driver_name;
+ }
}

- return 0;
-fail:
+#ifdef CONFIG_AS_SHA1_NI
+ if (sha1_ni_registered)
+ return 0;
+#endif
+ if (sha1_avx2_registered || sha1_avx_registered || sha1_ssse3_registered)
+ return 0;
return -ENODEV;
}

diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index cdcdf5a80ffe..de320973e473 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -156,19 +156,18 @@ static struct shash_alg sha256_ssse3_algs[] = { {
}
} };

-static int register_sha256_ssse3(void)
-{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
- return crypto_register_shashes(sha256_ssse3_algs,
- ARRAY_SIZE(sha256_ssse3_algs));
- return 0;
-}
+static bool sha256_ssse3_registered;
+static bool sha256_avx_registered;
+static bool sha256_avx2_registered;
+static bool sha256_ni_registered;

static void unregister_sha256_ssse3(void)
{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
+ if (sha256_ssse3_registered) {
crypto_unregister_shashes(sha256_ssse3_algs,
ARRAY_SIZE(sha256_ssse3_algs));
+ sha256_ssse3_registered = 0;
+ }
}

asmlinkage void sha256_transform_avx(struct sha256_state *state,
@@ -223,30 +222,13 @@ static struct shash_alg sha256_avx_algs[] = { {
}
} };

-static bool avx_usable(void)
-{
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
- if (boot_cpu_has(X86_FEATURE_AVX))
- pr_info("AVX detected but unusable.\n");
- return false;
- }
-
- return true;
-}
-
-static int register_sha256_avx(void)
-{
- if (avx_usable())
- return crypto_register_shashes(sha256_avx_algs,
- ARRAY_SIZE(sha256_avx_algs));
- return 0;
-}
-
static void unregister_sha256_avx(void)
{
- if (avx_usable())
+ if (sha256_avx_registered) {
crypto_unregister_shashes(sha256_avx_algs,
ARRAY_SIZE(sha256_avx_algs));
+ sha256_avx_registered = 0;
+ }
}

asmlinkage void sha256_transform_rorx(struct sha256_state *state,
@@ -301,28 +283,13 @@ static struct shash_alg sha256_avx2_algs[] = { {
}
} };

-static bool avx2_usable(void)
-{
- if (avx_usable() && boot_cpu_has(X86_FEATURE_AVX2) &&
- boot_cpu_has(X86_FEATURE_BMI2))
- return true;
-
- return false;
-}
-
-static int register_sha256_avx2(void)
-{
- if (avx2_usable())
- return crypto_register_shashes(sha256_avx2_algs,
- ARRAY_SIZE(sha256_avx2_algs));
- return 0;
-}
-
static void unregister_sha256_avx2(void)
{
- if (avx2_usable())
+ if (sha256_avx2_registered) {
crypto_unregister_shashes(sha256_avx2_algs,
ARRAY_SIZE(sha256_avx2_algs));
+ sha256_avx2_registered = 0;
+ }
}

#ifdef CONFIG_AS_SHA256_NI
@@ -378,51 +345,86 @@ static struct shash_alg sha256_ni_algs[] = { {
}
} };

-static int register_sha256_ni(void)
-{
- if (boot_cpu_has(X86_FEATURE_SHA_NI))
- return crypto_register_shashes(sha256_ni_algs,
- ARRAY_SIZE(sha256_ni_algs));
- return 0;
-}
-
static void unregister_sha256_ni(void)
{
- if (boot_cpu_has(X86_FEATURE_SHA_NI))
+ if (sha256_ni_registered) {
crypto_unregister_shashes(sha256_ni_algs,
ARRAY_SIZE(sha256_ni_algs));
+ sha256_ni_registered = 0;
+ }
}

#else
-static inline int register_sha256_ni(void) { return 0; }
static inline void unregister_sha256_ni(void) { }
#endif

static int __init sha256_ssse3_mod_init(void)
{
- if (register_sha256_ssse3())
- goto fail;
+ const char *feature_name;
+ const char *driver_name = NULL;
+ const char *driver_name2 = NULL;
+ int ret;

- if (register_sha256_avx()) {
- unregister_sha256_ssse3();
- goto fail;
+#ifdef CONFIG_AS_SHA256_NI
+ /* SHA-NI */
+ if (boot_cpu_has(X86_FEATURE_SHA_NI)) {
+
+ ret = crypto_register_shashes(sha256_ni_algs,
+ ARRAY_SIZE(sha256_ni_algs));
+ if (!ret) {
+ sha256_ni_registered = 1;
+ driver_name = sha256_ni_algs[0].base.cra_driver_name;
+ driver_name2 = sha256_ni_algs[1].base.cra_driver_name;
+ }
}
+#endif

- if (register_sha256_avx2()) {
- unregister_sha256_avx();
- unregister_sha256_ssse3();
- goto fail;
+ /* AVX2 */
+ if (boot_cpu_has(X86_FEATURE_AVX2)) {
+
+ if (boot_cpu_has(X86_FEATURE_BMI2)) {
+ ret = crypto_register_shashes(sha256_avx2_algs,
+ ARRAY_SIZE(sha256_avx2_algs));
+ if (!ret) {
+ sha256_avx2_registered = 1;
+ driver_name = sha256_avx2_algs[0].base.cra_driver_name;
+ driver_name2 = sha256_avx2_algs[1].base.cra_driver_name;
+ }
+ }
}

- if (register_sha256_ni()) {
- unregister_sha256_avx2();
- unregister_sha256_avx();
- unregister_sha256_ssse3();
- goto fail;
+ /* AVX */
+ if (boot_cpu_has(X86_FEATURE_AVX)) {
+
+ if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
+ &feature_name)) {
+ ret = crypto_register_shashes(sha256_avx_algs,
+ ARRAY_SIZE(sha256_avx_algs));
+ if (!ret) {
+ sha256_avx_registered = 1;
+ driver_name = sha256_avx_algs[0].base.cra_driver_name;
+ driver_name2 = sha256_avx_algs[1].base.cra_driver_name;
+ }
+ }
}

- return 0;
-fail:
+ /* SSE3 */
+ if (boot_cpu_has(X86_FEATURE_SSSE3)) {
+ ret = crypto_register_shashes(sha256_ssse3_algs,
+ ARRAY_SIZE(sha256_ssse3_algs));
+ if (!ret) {
+ sha256_ssse3_registered = 1;
+ driver_name = sha256_ssse3_algs[0].base.cra_driver_name;
+ driver_name2 = sha256_ssse3_algs[1].base.cra_driver_name;
+ }
+ }
+
+#ifdef CONFIG_AS_SHA256_NI
+ if (sha256_ni_registered)
+ return 0;
+#endif
+ if (sha256_avx2_registered || sha256_avx_registered || sha256_ssse3_registered)
+ return 0;
return -ENODEV;
}

diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index c7036cfe2a7e..3e96fe51f1a0 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -152,33 +152,21 @@ static struct shash_alg sha512_ssse3_algs[] = { {
}
} };

-static int register_sha512_ssse3(void)
-{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
- return crypto_register_shashes(sha512_ssse3_algs,
- ARRAY_SIZE(sha512_ssse3_algs));
- return 0;
-}
+static bool sha512_ssse3_registered;
+static bool sha512_avx_registered;
+static bool sha512_avx2_registered;

static void unregister_sha512_ssse3(void)
{
- if (boot_cpu_has(X86_FEATURE_SSSE3))
+ if (sha512_ssse3_registered) {
crypto_unregister_shashes(sha512_ssse3_algs,
ARRAY_SIZE(sha512_ssse3_algs));
+ sha512_ssse3_registered = 0;
+ }
}

asmlinkage void sha512_transform_avx(struct sha512_state *state,
const u8 *data, int blocks);
-static bool avx_usable(void)
-{
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
- if (boot_cpu_has(X86_FEATURE_AVX))
- pr_info("AVX detected but unusable.\n");
- return false;
- }
-
- return true;
-}

static int sha512_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
@@ -230,19 +218,13 @@ static struct shash_alg sha512_avx_algs[] = { {
}
} };

-static int register_sha512_avx(void)
-{
- if (avx_usable())
- return crypto_register_shashes(sha512_avx_algs,
- ARRAY_SIZE(sha512_avx_algs));
- return 0;
-}
-
static void unregister_sha512_avx(void)
{
- if (avx_usable())
+ if (sha512_avx_registered) {
crypto_unregister_shashes(sha512_avx_algs,
ARRAY_SIZE(sha512_avx_algs));
+ sha512_avx_registered = 0;
+ }
}

asmlinkage void sha512_transform_rorx(struct sha512_state *state,
@@ -298,22 +280,6 @@ static struct shash_alg sha512_avx2_algs[] = { {
}
} };

-static bool avx2_usable(void)
-{
- if (avx_usable() && boot_cpu_has(X86_FEATURE_AVX2) &&
- boot_cpu_has(X86_FEATURE_BMI2))
- return true;
-
- return false;
-}
-
-static int register_sha512_avx2(void)
-{
- if (avx2_usable())
- return crypto_register_shashes(sha512_avx2_algs,
- ARRAY_SIZE(sha512_avx2_algs));
- return 0;
-}
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
@@ -324,32 +290,64 @@ MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static void unregister_sha512_avx2(void)
{
- if (avx2_usable())
+ if (sha512_avx2_registered) {
crypto_unregister_shashes(sha512_avx2_algs,
ARRAY_SIZE(sha512_avx2_algs));
+ sha512_avx2_registered = 0;
+ }
}

static int __init sha512_ssse3_mod_init(void)
{
+ const char *feature_name;
+ const char *driver_name = NULL;
+ const char *driver_name2 = NULL;
+ int ret;
+
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (register_sha512_ssse3())
- goto fail;
+ /* AVX2 */
+ if (boot_cpu_has(X86_FEATURE_AVX2)) {
+ if (boot_cpu_has(X86_FEATURE_BMI2)) {
+ ret = crypto_register_shashes(sha512_avx2_algs,
+ ARRAY_SIZE(sha512_avx2_algs));
+ if (!ret) {
+ sha512_avx2_registered = 1;
+ driver_name = sha512_avx2_algs[0].base.cra_driver_name;
+ driver_name2 = sha512_avx2_algs[1].base.cra_driver_name;
+ }
+ }
+ }

- if (register_sha512_avx()) {
- unregister_sha512_ssse3();
- goto fail;
+ /* AVX */
+ if (boot_cpu_has(X86_FEATURE_AVX)) {
+
+ if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
+ &feature_name)) {
+ ret = crypto_register_shashes(sha512_avx_algs,
+ ARRAY_SIZE(sha512_avx_algs));
+ if (!ret) {
+ sha512_avx_registered = 1;
+ driver_name = sha512_avx_algs[0].base.cra_driver_name;
+ driver_name2 = sha512_avx_algs[1].base.cra_driver_name;
+ }
+ }
}

- if (register_sha512_avx2()) {
- unregister_sha512_avx();
- unregister_sha512_ssse3();
- goto fail;
+ /* SSE3 */
+ if (boot_cpu_has(X86_FEATURE_SSSE3)) {
+ ret = crypto_register_shashes(sha512_ssse3_algs,
+ ARRAY_SIZE(sha512_ssse3_algs));
+ if (!ret) {
+ sha512_ssse3_registered = 1;
+ driver_name = sha512_ssse3_algs[0].base.cra_driver_name;
+ driver_name2 = sha512_ssse3_algs[1].base.cra_driver_name;
+ }
}

- return 0;
-fail:
+ if (sha512_avx2_registered || sha512_avx_registered || sha512_ssse3_registered)
+ return 0;
return -ENODEV;
}

--
2.37.3


Subject: [PATCH v3 15/17] crypto: x86/sm3 - load based on CPU features

Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), add module aliases for x86-optimized crypto modules:
sm3
based on CPU feature bits so udev gets a chance to load them later in
the boot process when the filesystems are all running.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/sm3_avx_glue.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/crypto/sm3_avx_glue.c b/arch/x86/crypto/sm3_avx_glue.c
index 483aaed996ba..26256cc0cbb6 100644
--- a/arch/x86/crypto/sm3_avx_glue.c
+++ b/arch/x86/crypto/sm3_avx_glue.c
@@ -15,6 +15,7 @@
#include <linux/types.h>
#include <crypto/sm3.h>
#include <crypto/sm3_base.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -119,10 +120,19 @@ static struct shash_alg sm3_avx_alg = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init sm3_avx_mod_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_AVX)) {
pr_info("AVX instruction are not detected.\n");
return -ENODEV;
--
2.37.3


Subject: [PATCH v3 14/17] crypto: x86/crc - load based on CPU features

Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), these x86-optimized crypto modules already have
module aliases based on CPU feature bits:
crc32, crc32c, and crct10dif

Rename the unique device table data structure to a generic name
so the code has the same pattern in all the modules.

Remove the print on a device table mismatch from crc32 that is not
present in the other modules. Modules are not supposed to print
unless they are active.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/crc32-pclmul_glue.c | 9 +++------
arch/x86/crypto/crc32c-intel_glue.c | 6 +++---
arch/x86/crypto/crct10dif-pclmul_glue.c | 6 +++---
3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c
index df3dbc754818..f6d8a933641f 100644
--- a/arch/x86/crypto/crc32-pclmul_glue.c
+++ b/arch/x86/crypto/crc32-pclmul_glue.c
@@ -182,20 +182,17 @@ static struct shash_alg alg = {
}
};

-static const struct x86_cpu_id crc32pclmul_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, crc32pclmul_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);


static int __init crc32_pclmul_mod_init(void)
{
-
- if (!x86_match_cpu(crc32pclmul_cpu_id)) {
- pr_info("PCLMULQDQ-NI instructions are not detected.\n");
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- }
return crypto_register_shash(&alg);
}

diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index f08ed68ec93d..aff132e925ea 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -240,15 +240,15 @@ static struct shash_alg alg = {
}
};

-static const struct x86_cpu_id crc32c_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init crc32c_intel_mod_init(void)
{
- if (!x86_match_cpu(crc32c_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
#ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c
index 4f6b8c727d88..a26dbd27da96 100644
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c
+++ b/arch/x86/crypto/crct10dif-pclmul_glue.c
@@ -139,15 +139,15 @@ static struct shash_alg alg = {
}
};

-static const struct x86_cpu_id crct10dif_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, crct10dif_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init crct10dif_intel_mod_init(void)
{
- if (!x86_match_cpu(crct10dif_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

return crypto_register_shash(&alg);
--
2.37.3


Subject: [PATCH v3 17/17] crypto: x86/nhpoly1305, poly1305 - load based on CPU features

Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), add module aliases for x86-optimized crypto modules:
nhpoly1305, poly1305
based on CPU feature bits so udev gets a chance to load them later
in the boot process when the filesystems are all running.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/nhpoly1305-avx2-glue.c | 10 ++++++++++
arch/x86/crypto/nhpoly1305-sse2-glue.c | 10 ++++++++++
arch/x86/crypto/poly1305_glue.c | 12 ++++++++++++
3 files changed, 32 insertions(+)

diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index f7dc9c563bb5..15f98b53bfda 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -11,6 +11,7 @@
#include <crypto/nhpoly1305.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -60,8 +61,17 @@ static struct shash_alg nhpoly1305_alg = {
.descsize = sizeof(struct nhpoly1305_state),
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init nhpoly1305_mod_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_AVX2) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE))
return -ENODEV;
diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c
index daffcc7019ad..533db3e0e06f 100644
--- a/arch/x86/crypto/nhpoly1305-sse2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c
@@ -11,6 +11,7 @@
#include <crypto/nhpoly1305.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -60,8 +61,17 @@ static struct shash_alg nhpoly1305_alg = {
.descsize = sizeof(struct nhpoly1305_state),
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_XMM2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init nhpoly1305_mod_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_XMM2))
return -ENODEV;

diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
index 16831c036d71..2ff4358e4b3f 100644
--- a/arch/x86/crypto/poly1305_glue.c
+++ b/arch/x86/crypto/poly1305_glue.c
@@ -12,6 +12,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/intel-family.h>
#include <asm/simd.h>

@@ -268,8 +269,19 @@ static struct shash_alg alg = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX512F, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init poly1305_simd_mod_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (boot_cpu_has(X86_FEATURE_AVX) &&
cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL))
static_branch_enable(&poly1305_use_avx);
--
2.37.3


Subject: [PATCH v3 16/17] crypto: x86/ghash,polyval - load based on CPU features

Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), these x86-optimized crypto modules already have
module aliases based on CPU feature bits:
ghash, polyval

Rename the unique device table data structure to a generic name
so the code has the same pattern in all the modules.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/ghash-clmulni-intel_glue.c | 6 +++---
arch/x86/crypto/polyval-clmulni_glue.c | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index 0f24c3b23fd2..d19a8e9b34a6 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -325,17 +325,17 @@ static struct ahash_alg ghash_async_alg = {
},
};

-static const struct x86_cpu_id pcmul_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL), /* Pickle-Mickle-Duck */
{}
};
-MODULE_DEVICE_TABLE(x86cpu, pcmul_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init ghash_pclmulqdqni_mod_init(void)
{
int err;

- if (!x86_match_cpu(pcmul_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

err = crypto_register_shash(&ghash_alg);
diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c
index de1c908f7412..375c845695a4 100644
--- a/arch/x86/crypto/polyval-clmulni_glue.c
+++ b/arch/x86/crypto/polyval-clmulni_glue.c
@@ -176,15 +176,15 @@ static struct shash_alg polyval_alg = {
},
};

-__maybe_unused static const struct x86_cpu_id pcmul_cpu_id[] = {
+__maybe_unused static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, pcmul_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init polyval_clmulni_mod_init(void)
{
- if (!x86_match_cpu(pcmul_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_AVX))
--
2.37.3


Subject: [PATCH v3 07/17] crypto: x86/ghash - use u8 rather than char

Use more consistent unambivalent types for the source and destination
buffer pointer arguments.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 ++--
arch/x86/crypto/ghash-clmulni-intel_glue.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S
index 2bf871899920..c7b8542facee 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_asm.S
+++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S
@@ -88,7 +88,7 @@ SYM_FUNC_START_LOCAL(__clmul_gf128mul_ble)
RET
SYM_FUNC_END(__clmul_gf128mul_ble)

-/* void clmul_ghash_mul(char *dst, const u128 *shash) */
+/* void clmul_ghash_mul(u8 *dst, const u128 *shash) */
SYM_FUNC_START(clmul_ghash_mul)
FRAME_BEGIN
movups (%rdi), DATA
@@ -103,7 +103,7 @@ SYM_FUNC_START(clmul_ghash_mul)
SYM_FUNC_END(clmul_ghash_mul)

/*
- * void clmul_ghash_update(char *dst, const char *src, unsigned int srclen,
+ * void clmul_ghash_update(u8 *dst, const u8 *src, unsigned int srclen,
* const u128 *shash);
*/
SYM_FUNC_START(clmul_ghash_update)
diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index 1f1a95f3dd0c..e996627c6583 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -23,9 +23,9 @@
#define GHASH_BLOCK_SIZE 16
#define GHASH_DIGEST_SIZE 16

-void clmul_ghash_mul(char *dst, const u128 *shash);
+void clmul_ghash_mul(u8 *dst, const u128 *shash);

-void clmul_ghash_update(char *dst, const char *src, unsigned int srclen,
+void clmul_ghash_update(u8 *dst, const u8 *src, unsigned int srclen,
const u128 *shash);

struct ghash_async_ctx {
--
2.37.3


Subject: [PATCH v3 13/17] crypto: x86/sha1, sha256 - load based on CPU features

Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), add module aliases for x86-optimized crypto modules:
sha1, sha256
based on CPU feature bits so udev gets a chance to load them later in
the boot process when the filesystems are all running.

Signed-off-by: Robert Elliott <[email protected]>

---
v3 put device table SHA_NI entries inside CONFIG_SHAn_NI ifdefs,
ensure builds properly with arch/x86/Kconfig.assembler changed
to not set CONFIG_AS_SHA*_NI
---
arch/x86/crypto/sha1_ssse3_glue.c | 15 +++++++++++++++
arch/x86/crypto/sha256_ssse3_glue.c | 15 +++++++++++++++
2 files changed, 30 insertions(+)

diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index cd390083451f..7269beaa9291 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -24,6 +24,7 @@
#include <linux/types.h>
#include <crypto/sha1.h>
#include <crypto/sha1_base.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -326,12 +327,26 @@ static void unregister_sha1_ni(void)
static inline void unregister_sha1_ni(void) { }
#endif

+static const struct x86_cpu_id module_cpu_ids[] = {
+#ifdef CONFIG_AS_SHA1_NI
+ X86_MATCH_FEATURE(X86_FEATURE_SHA_NI, NULL),
+#endif
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init sha1_ssse3_mod_init(void)
{
const char *feature_name;
const char *driver_name = NULL;
int ret;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
#ifdef CONFIG_AS_SHA1_NI
/* SHA-NI */
if (boot_cpu_has(X86_FEATURE_SHA_NI)) {
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 692d6f010a4d..5ce42f1d228b 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -38,6 +38,7 @@
#include <crypto/sha2.h>
#include <crypto/sha256_base.h>
#include <linux/string.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -397,6 +398,17 @@ static void unregister_sha256_ni(void)
static inline void unregister_sha256_ni(void) { }
#endif

+static const struct x86_cpu_id module_cpu_ids[] = {
+#ifdef CONFIG_AS_SHA256_NI
+ X86_MATCH_FEATURE(X86_FEATURE_SHA_NI, NULL),
+#endif
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init sha256_ssse3_mod_init(void)
{
const char *feature_name;
@@ -404,6 +416,9 @@ static int __init sha256_ssse3_mod_init(void)
const char *driver_name2 = NULL;
int ret;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
#ifdef CONFIG_AS_SHA256_NI
/* SHA-NI */
if (boot_cpu_has(X86_FEATURE_SHA_NI)) {
--
2.37.3


Subject: [PATCH v3 12/17] crypto: x86/sha - minimize time in FPU context

Narrow the kernel_fpu_begin()/kernel_fpu_end() to just wrap the
assembly functions, not any extra C code around them (which includes
several memcpy() calls).

This reduces unnecessary time in FPU context, in which the scheduler
is prevented from preempting and the RCU subsystem is kept from
doing its work.

Example results measuring a boot, in which SHA-512 is used to check all
module signatures using finup() calls:

Before:
calls maxcycles bpf update finup algorithm module
======== ============ ======== ======== ======== =========== ==============
168390 1233188 19456 0 19456 sha512-avx2 sha512_ssse3

After:
182694 1007224 19456 0 19456 sha512-avx2 sha512_ssse3

That means it stayed in FPU context for 226k fewer clocks cycles (which
is 102 microseconds on this system, 18% less).

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/sha1_ssse3_glue.c | 82 ++++++++++++++++++++---------
arch/x86/crypto/sha256_ssse3_glue.c | 67 ++++++++++++++++++-----
arch/x86/crypto/sha512_ssse3_glue.c | 48 ++++++++++++-----
3 files changed, 145 insertions(+), 52 deletions(-)

diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 89aa5b787f2f..cd390083451f 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -34,6 +34,54 @@ static const unsigned int bytes_per_fpu_avx2 = 34 * 1024;
static const unsigned int bytes_per_fpu_avx = 30 * 1024;
static const unsigned int bytes_per_fpu_ssse3 = 26 * 1024;

+asmlinkage void sha1_transform_ssse3(struct sha1_state *state,
+ const u8 *data, int blocks);
+
+asmlinkage void sha1_transform_avx(struct sha1_state *state,
+ const u8 *data, int blocks);
+
+asmlinkage void sha1_transform_avx2(struct sha1_state *state,
+ const u8 *data, int blocks);
+
+#ifdef CONFIG_AS_SHA1_NI
+asmlinkage void sha1_ni_transform(struct sha1_state *digest, const u8 *data,
+ int rounds);
+#endif
+
+static void fpu_sha1_transform_ssse3(struct sha1_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha1_transform_ssse3(state, data, blocks);
+ kernel_fpu_end();
+}
+
+static void fpu_sha1_transform_avx(struct sha1_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha1_transform_avx(state, data, blocks);
+ kernel_fpu_end();
+}
+
+static void fpu_sha1_transform_avx2(struct sha1_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha1_transform_avx2(state, data, blocks);
+ kernel_fpu_end();
+}
+
+#ifdef CONFIG_AS_SHA1_NI
+static void fpu_sha1_transform_shani(struct sha1_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha1_ni_transform(state, data, blocks);
+ kernel_fpu_end();
+}
+#endif
+
static int sha1_update(struct shash_desc *desc, const u8 *data,
unsigned int len, unsigned int bytes_per_fpu,
sha1_block_fn *sha1_xform)
@@ -53,9 +101,7 @@ static int sha1_update(struct shash_desc *desc, const u8 *data,
while (len) {
unsigned int chunk = min(len, bytes_per_fpu);

- kernel_fpu_begin();
sha1_base_do_update(desc, data, chunk, sha1_xform);
- kernel_fpu_end();

len -= chunk;
data += chunk;
@@ -74,36 +120,29 @@ static int sha1_finup(struct shash_desc *desc, const u8 *data,
while (len) {
unsigned int chunk = min(len, bytes_per_fpu);

- kernel_fpu_begin();
sha1_base_do_update(desc, data, chunk, sha1_xform);
- kernel_fpu_end();

len -= chunk;
data += chunk;
}

- kernel_fpu_begin();
sha1_base_do_finalize(desc, sha1_xform);
- kernel_fpu_end();

return sha1_base_finish(desc, out);
}

-asmlinkage void sha1_transform_ssse3(struct sha1_state *state,
- const u8 *data, int blocks);
-
static int sha1_ssse3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha1_update(desc, data, len, bytes_per_fpu_ssse3,
- sha1_transform_ssse3);
+ fpu_sha1_transform_ssse3);
}

static int sha1_ssse3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha1_finup(desc, data, len, bytes_per_fpu_ssse3, out,
- sha1_transform_ssse3);
+ fpu_sha1_transform_ssse3);
}

/* Add padding and return the message digest. */
@@ -141,21 +180,18 @@ static void unregister_sha1_ssse3(void)
}
}

-asmlinkage void sha1_transform_avx(struct sha1_state *state,
- const u8 *data, int blocks);
-
static int sha1_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha1_update(desc, data, len, bytes_per_fpu_avx,
- sha1_transform_avx);
+ fpu_sha1_transform_avx);
}

static int sha1_avx_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha1_finup(desc, data, len, bytes_per_fpu_avx, out,
- sha1_transform_avx);
+ fpu_sha1_transform_avx);
}

static int sha1_avx_final(struct shash_desc *desc, u8 *out)
@@ -189,17 +225,14 @@ static void unregister_sha1_avx(void)

#define SHA1_AVX2_BLOCK_OPTSIZE 4 /* optimal 4*64 bytes of SHA1 blocks */

-asmlinkage void sha1_transform_avx2(struct sha1_state *state,
- const u8 *data, int blocks);
-
static void sha1_apply_transform_avx2(struct sha1_state *state,
const u8 *data, int blocks)
{
/* Select the optimal transform based on data block size */
if (blocks >= SHA1_AVX2_BLOCK_OPTSIZE)
- sha1_transform_avx2(state, data, blocks);
+ fpu_sha1_transform_avx2(state, data, blocks);
else
- sha1_transform_avx(state, data, blocks);
+ fpu_sha1_transform_avx(state, data, blocks);
}

static int sha1_avx2_update(struct shash_desc *desc, const u8 *data,
@@ -246,21 +279,18 @@ static void unregister_sha1_avx2(void)
}

#ifdef CONFIG_AS_SHA1_NI
-asmlinkage void sha1_ni_transform(struct sha1_state *digest, const u8 *data,
- int rounds);
-
static int sha1_ni_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha1_update(desc, data, len, bytes_per_fpu_shani,
- sha1_ni_transform);
+ fpu_sha1_transform_shani);
}

static int sha1_ni_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha1_finup(desc, data, len, bytes_per_fpu_shani, out,
- sha1_ni_transform);
+ fpu_sha1_transform_shani);
}

static int sha1_ni_final(struct shash_desc *desc, u8 *out)
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index de320973e473..692d6f010a4d 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -51,6 +51,51 @@ static const unsigned int bytes_per_fpu_ssse3 = 11 * 1024;
asmlinkage void sha256_transform_ssse3(struct sha256_state *state,
const u8 *data, int blocks);

+asmlinkage void sha256_transform_avx(struct sha256_state *state,
+ const u8 *data, int blocks);
+
+asmlinkage void sha256_transform_rorx(struct sha256_state *state,
+ const u8 *data, int blocks);
+
+#ifdef CONFIG_AS_SHA256_NI
+asmlinkage void sha256_ni_transform(struct sha256_state *digest,
+ const u8 *data, int rounds);
+#endif
+
+static void fpu_sha256_transform_ssse3(struct sha256_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha256_transform_ssse3(state, data, blocks);
+ kernel_fpu_end();
+}
+
+static void fpu_sha256_transform_avx(struct sha256_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha256_transform_avx(state, data, blocks);
+ kernel_fpu_end();
+}
+
+static void fpu_sha256_transform_avx2(struct sha256_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha256_transform_rorx(state, data, blocks);
+ kernel_fpu_end();
+}
+
+#ifdef CONFIG_AS_SHA1_NI
+static void fpu_sha256_transform_shani(struct sha256_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha256_ni_transform(state, data, blocks);
+ kernel_fpu_end();
+}
+#endif
+
static int _sha256_update(struct shash_desc *desc, const u8 *data,
unsigned int len, unsigned int bytes_per_fpu,
sha256_block_fn *sha256_xform)
@@ -70,9 +115,7 @@ static int _sha256_update(struct shash_desc *desc, const u8 *data,
while (len) {
unsigned int chunk = min(len, bytes_per_fpu);

- kernel_fpu_begin();
sha256_base_do_update(desc, data, chunk, sha256_xform);
- kernel_fpu_end();

len -= chunk;
data += chunk;
@@ -91,17 +134,13 @@ static int sha256_finup(struct shash_desc *desc, const u8 *data,
while (len) {
unsigned int chunk = min(len, bytes_per_fpu);

- kernel_fpu_begin();
sha256_base_do_update(desc, data, chunk, sha256_xform);
- kernel_fpu_end();

len -= chunk;
data += chunk;
}

- kernel_fpu_begin();
sha256_base_do_finalize(desc, sha256_xform);
- kernel_fpu_end();

return sha256_base_finish(desc, out);
}
@@ -110,14 +149,14 @@ static int sha256_ssse3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return _sha256_update(desc, data, len, bytes_per_fpu_ssse3,
- sha256_transform_ssse3);
+ fpu_sha256_transform_ssse3);
}

static int sha256_ssse3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha256_finup(desc, data, len, bytes_per_fpu_ssse3,
- out, sha256_transform_ssse3);
+ out, fpu_sha256_transform_ssse3);
}

/* Add padding and return the message digest. */
@@ -177,14 +216,14 @@ static int sha256_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return _sha256_update(desc, data, len, bytes_per_fpu_avx,
- sha256_transform_avx);
+ fpu_sha256_transform_avx);
}

static int sha256_avx_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha256_finup(desc, data, len, bytes_per_fpu_avx,
- out, sha256_transform_avx);
+ out, fpu_sha256_transform_avx);
}

static int sha256_avx_final(struct shash_desc *desc, u8 *out)
@@ -238,14 +277,14 @@ static int sha256_avx2_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return _sha256_update(desc, data, len, bytes_per_fpu_avx2,
- sha256_transform_rorx);
+ fpu_sha256_transform_avx2);
}

static int sha256_avx2_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha256_finup(desc, data, len, bytes_per_fpu_avx2,
- out, sha256_transform_rorx);
+ out, fpu_sha256_transform_avx2);
}

static int sha256_avx2_final(struct shash_desc *desc, u8 *out)
@@ -300,14 +339,14 @@ static int sha256_ni_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return _sha256_update(desc, data, len, bytes_per_fpu_shani,
- sha256_ni_transform);
+ fpu_sha256_transform_shani);
}

static int sha256_ni_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha256_finup(desc, data, len, bytes_per_fpu_shani,
- out, sha256_ni_transform);
+ out, fpu_sha256_transform_shani);
}

static int sha256_ni_final(struct shash_desc *desc, u8 *out)
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 3e96fe51f1a0..e2698545bf47 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -47,6 +47,36 @@ static const unsigned int bytes_per_fpu_ssse3 = 17 * 1024;
asmlinkage void sha512_transform_ssse3(struct sha512_state *state,
const u8 *data, int blocks);

+asmlinkage void sha512_transform_avx(struct sha512_state *state,
+ const u8 *data, int blocks);
+
+asmlinkage void sha512_transform_rorx(struct sha512_state *state,
+ const u8 *data, int blocks);
+
+static void fpu_sha512_transform_ssse3(struct sha512_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha512_transform_ssse3(state, data, blocks);
+ kernel_fpu_end();
+}
+
+static void fpu_sha512_transform_avx(struct sha512_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha512_transform_avx(state, data, blocks);
+ kernel_fpu_end();
+}
+
+static void fpu_sha512_transform_avx2(struct sha512_state *state,
+ const u8 *data, int blocks)
+{
+ kernel_fpu_begin();
+ sha512_transform_rorx(state, data, blocks);
+ kernel_fpu_end();
+}
+
static int sha512_update(struct shash_desc *desc, const u8 *data,
unsigned int len, unsigned int bytes_per_fpu,
sha512_block_fn *sha512_xform)
@@ -66,9 +96,7 @@ static int sha512_update(struct shash_desc *desc, const u8 *data,
while (len) {
unsigned int chunk = min(len, bytes_per_fpu);

- kernel_fpu_begin();
sha512_base_do_update(desc, data, chunk, sha512_xform);
- kernel_fpu_end();

len -= chunk;
data += chunk;
@@ -87,17 +115,13 @@ static int sha512_finup(struct shash_desc *desc, const u8 *data,
while (len) {
unsigned int chunk = min(len, bytes_per_fpu);

- kernel_fpu_begin();
sha512_base_do_update(desc, data, chunk, sha512_xform);
- kernel_fpu_end();

len -= chunk;
data += chunk;
}

- kernel_fpu_begin();
sha512_base_do_finalize(desc, sha512_xform);
- kernel_fpu_end();

return sha512_base_finish(desc, out);
}
@@ -106,14 +130,14 @@ static int sha512_ssse3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha512_update(desc, data, len, bytes_per_fpu_ssse3,
- sha512_transform_ssse3);
+ fpu_sha512_transform_ssse3);
}

static int sha512_ssse3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha512_finup(desc, data, len, bytes_per_fpu_ssse3,
- out, sha512_transform_ssse3);
+ out, fpu_sha512_transform_ssse3);
}

/* Add padding and return the message digest. */
@@ -172,14 +196,14 @@ static int sha512_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha512_update(desc, data, len, bytes_per_fpu_avx,
- sha512_transform_avx);
+ fpu_sha512_transform_avx);
}

static int sha512_avx_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha512_finup(desc, data, len, bytes_per_fpu_avx,
- out, sha512_transform_avx);
+ out, fpu_sha512_transform_avx);
}

/* Add padding and return the message digest. */
@@ -234,14 +258,14 @@ static int sha512_avx2_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha512_update(desc, data, len, bytes_per_fpu_avx2,
- sha512_transform_rorx);
+ fpu_sha512_transform_avx2);
}

static int sha512_avx2_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha512_finup(desc, data, len, bytes_per_fpu_avx2,
- out, sha512_transform_rorx);
+ out, fpu_sha512_transform_avx2);
}

/* Add padding and return the message digest. */
--
2.37.3


Subject: [PATCH v3 09/17] crypto: x86/ghash - limit FPU preemption

Limit the number of bytes processed between kernel_fpu_begin() and
kernel_fpu_end() calls.

Those functions call preempt_disable() and preempt_enable(), so
the CPU core is unavailable for scheduling while running, leading to:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ...

Fixes: 0e1227d356e9 ("crypto: ghash - Add PCLMULQDQ accelerated implementation")
Suggested-by: Herbert Xu <[email protected]>
Signed-off-by: Robert Elliott <[email protected]>

---
v3 change to static int, simplify while loop
---
arch/x86/crypto/ghash-clmulni-intel_glue.c | 28 +++++++++++++++-------
1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index 22367e363d72..0f24c3b23fd2 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -20,8 +20,11 @@
#include <asm/cpu_device_id.h>
#include <asm/simd.h>

-#define GHASH_BLOCK_SIZE 16
-#define GHASH_DIGEST_SIZE 16
+#define GHASH_BLOCK_SIZE 16U
+#define GHASH_DIGEST_SIZE 16U
+
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 50 * 1024;

void clmul_ghash_mul(u8 *dst, const u128 *shash);

@@ -80,9 +83,11 @@ static int ghash_update(struct shash_desc *desc,
struct ghash_ctx *ctx = crypto_shash_ctx(desc->tfm);
u8 *dst = dctx->buffer;

+ BUILD_BUG_ON(bytes_per_fpu < GHASH_BLOCK_SIZE);
+
if (dctx->bytes) {
int n = min(srclen, dctx->bytes);
- u8 *pos = dst + (GHASH_BLOCK_SIZE - dctx->bytes);
+ u8 *pos = dst + GHASH_BLOCK_SIZE - dctx->bytes;

dctx->bytes -= n;
srclen -= n;
@@ -97,13 +102,18 @@ static int ghash_update(struct shash_desc *desc,
}
}

- kernel_fpu_begin();
- clmul_ghash_update(dst, src, srclen, &ctx->shash);
- kernel_fpu_end();
+ while (srclen >= GHASH_BLOCK_SIZE) {
+ unsigned int chunk = min(srclen, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ clmul_ghash_update(dst, src, chunk, &ctx->shash);
+ kernel_fpu_end();
+
+ src += chunk & ~(GHASH_BLOCK_SIZE - 1);
+ srclen -= chunk & ~(GHASH_BLOCK_SIZE - 1);
+ }

- if (srclen & 0xf) {
- src += srclen - (srclen & 0xf);
- srclen &= 0xf;
+ if (srclen) {
dctx->bytes = GHASH_BLOCK_SIZE - srclen;
while (srclen--)
*dst++ ^= *src++;
--
2.37.3


2022-11-03 09:29:58

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3 11/17] crypto: x86/sha - register all variations

Hi Robert,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v6.1-rc2]
[also build test WARNING on next-20221103]
[cannot apply to herbert-cryptodev-2.6/master herbert-crypto-2.6/master crng-random/master linus/master v6.1-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Robert-Elliott/crypto-tcrypt-test-crc32/20221103-132823
patch link: https://lore.kernel.org/r/20221103042740.6556-12-elliott%40hpe.com
patch subject: [PATCH v3 11/17] crypto: x86/sha - register all variations
config: x86_64-allyesconfig
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/70c392540055e5e494696b9f4fb6db52f2b6574c
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Robert-Elliott/crypto-tcrypt-test-crc32/20221103-132823
git checkout 70c392540055e5e494696b9f4fb6db52f2b6574c
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash arch/x86/crypto/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

arch/x86/crypto/sha1_ssse3_glue.c: In function 'sha1_ssse3_mod_init':
>> arch/x86/crypto/sha1_ssse3_glue.c:302:21: warning: variable 'driver_name' set but not used [-Wunused-but-set-variable]
302 | const char *driver_name = NULL;
| ^~~~~~~~~~~
--
arch/x86/crypto/sha256_ssse3_glue.c: In function 'sha256_ssse3_mod_init':
>> arch/x86/crypto/sha256_ssse3_glue.c:365:21: warning: variable 'driver_name2' set but not used [-Wunused-but-set-variable]
365 | const char *driver_name2 = NULL;
| ^~~~~~~~~~~~
>> arch/x86/crypto/sha256_ssse3_glue.c:364:21: warning: variable 'driver_name' set but not used [-Wunused-but-set-variable]
364 | const char *driver_name = NULL;
| ^~~~~~~~~~~
--
arch/x86/crypto/sha512_ssse3_glue.c: In function 'sha512_ssse3_mod_init':
>> arch/x86/crypto/sha512_ssse3_glue.c:304:21: warning: variable 'driver_name2' set but not used [-Wunused-but-set-variable]
304 | const char *driver_name2 = NULL;
| ^~~~~~~~~~~~
>> arch/x86/crypto/sha512_ssse3_glue.c:303:21: warning: variable 'driver_name' set but not used [-Wunused-but-set-variable]
303 | const char *driver_name = NULL;
| ^~~~~~~~~~~


vim +/driver_name +302 arch/x86/crypto/sha1_ssse3_glue.c

298
299 static int __init sha1_ssse3_mod_init(void)
300 {
301 const char *feature_name;
> 302 const char *driver_name = NULL;
303 int ret;
304

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (3.09 kB)
config (297.60 kB)
Download all attachments
Subject: RE: [PATCH v2 18/19] crypto: x86 - standardize not loaded prints


> The positive messages about which optional features are
> engaged could be reported as read-only module parameters.

I've experimented with this approach.

There are two constructions for the modules:
1. modules that enable different behavior in the drivers
(e.g., aesni_intel enabling avx or avx2 within each driver)

I named these parameters "engaged_<feature>"

2. modules that register separate drivers for different behavior
(e.g., sha1 registering separate drivers for avx2, avx, and ssse3)

I named these parameters "registered_<feature>"

It looks like this:

$ grep -Hs . /sys/module/*/parameters/engaged*
/sys/module/aesni_intel/parameters/engaged_avx:1
/sys/module/aesni_intel/parameters/engaged_avx2:1
/sys/module/aria_aesni_avx_x86_64/parameters/engaged_gfni:0
/sys/module/chacha_x86_64/parameters/engaged_avx2:1
/sys/module/chacha_x86_64/parameters/engaged_avx512:1
/sys/module/crc32c_intel/parameters/engaged_pclmulqdq:1
/sys/module/curve25519_x86_64/parameters/engaged_adx:1
/sys/module/libblake2s_x86_64/parameters/engaged_avx512:1
/sys/module/libblake2s_x86_64/parameters/engaged_ssse3:1
/sys/module/poly1305_x86_64/parameters/engaged_avx:1
/sys/module/poly1305_x86_64/parameters/engaged_avx2:1
/sys/module/poly1305_x86_64/parameters/engaged_avx512:0

with modinfo descriptions like:
parm: engaged_avx2:Using x86 instruction set extensions: AVX2 (for GCM mode) (int)
parm: engaged_avx:Using x86 instruction set extensions: AVX (for CTR and GCM modes) (int)

$ grep -Hs . /sys/module/*/parameters/registered*
/sys/module/sha1_ssse3/parameters/registered_avx:1
/sys/module/sha1_ssse3/parameters/registered_avx2:1
/sys/module/sha1_ssse3/parameters/registered_shani:0
/sys/module/sha1_ssse3/parameters/registered_ssse3:1
/sys/module/sha256_ssse3/parameters/registered_avx:1
/sys/module/sha256_ssse3/parameters/registered_avx2:1
/sys/module/sha256_ssse3/parameters/registered_shani:0
/sys/module/sha256_ssse3/parameters/registered_ssse3:1
/sys/module/sha512_ssse3/parameters/registered_avx:1
/sys/module/sha512_ssse3/parameters/registered_avx2:1
/sys/module/sha512_ssse3/parameters/registered_ssse3:1

with modinfo descriptions like:
parm: registered_shani:Registered crypto driver using x86 instruction set extensions: SHA-NI (int)
parm: registered_ssse3:Registered crypto driver using x86 instruction set extensions: SSSE3 (int)
parm: registered_avx:Registered crypto driver using x86 instruction set extensions: AVX (int)
parm: registered_avx2:Registered crypto driver using x86 instruction set extensions: AVX2 (int)

That would eliminate these prints in aesni_intel, so all the
modules load silently (but you can figure out what they're
doing if needed).

pr_info("AVX2 version of gcm_enc/dec engaged.\n");
pr_info("AVX version of gcm_enc/dec engaged.\n");
pr_info("SSE version of gcm_enc/dec engaged.\n");
pr_info("AES CTR mode by8 optimization enabled\n");


Subject: [PATCH v4 00/24] crypto: fix RCU stalls

This series fixes the RCU stalls triggered by the x86 crypto
modules discussed in
https://lore.kernel.org/all/MW5PR84MB18426EBBA3303770A8BC0BDFAB759@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/

Two root causes were:
- too much data processed between kernel_fpu_begin and
kernel_fpu_end calls (which are heavily used by the x86
optimized drivers)
- tcrypt not calling cond_resched during speed test loops

These problems have always been lurking, but improving the
loading of the x86/sha512 module led to it happening a lot
during boot when using SHA-512 for module signature checking.

Fixing these problems makes it safer to improve loading
the rest of the x86 modules like the sha512 module.

This series only handles the x86 modules.

Version 4 tackles lingering comments from version 2.

1. Unlike the hash functions, skcipher and aead functions
accept pointers to scatter-gather lists, and the helper
functions that walk through those lists limit processing
to a page size at a time.

The aegis module did everything inside one pair of
kernel_fpu_begin() and kernel_fpu_end() calls including
walking through the sglist, so it could preempt the CPU
without constraint.

The aesni aead functions for gcm process the additional
data (data that is included in the authentication tag
calculation but not encrypted) in one FPU context, so
that can be a problem. This will require some asm changes
to fix. However, I don't think that is a typical use case,
so this series defers fixing that.

The series adds device table matching for all the x86
crypto modules.

2. I replaced all the positive and negative prints with
module parameters, including enough clues in modinfo
descriptions that a user can determine what is
working and not working.


Robert Elliott (24):
crypto: tcrypt - test crc32
crypto: tcrypt - test nhpoly1305
crypto: tcrypt - reschedule during cycles speed tests
crypto: x86/sha - limit FPU preemption
crypto: x86/crc - limit FPU preemption
crypto: x86/sm3 - limit FPU preemption
crypto: x86/ghash - use u8 rather than char
crypto: x86/ghash - restructure FPU context saving
crypto: x86/ghash - limit FPU preemption
crypto: x86/poly - limit FPU preemption
crypto: x86/aegis - limit FPU preemption
crypto: x86/sha - register all variations
crypto: x86/sha - minimize time in FPU context
crypto: x86/sha - load based on CPU features
crypto: x86/crc - load based on CPU features
crypto: x86/sm3 - load based on CPU features
crypto: x86/poly - load based on CPU features
crypto: x86/ghash - load based on CPU features
crypto: x86/aesni - avoid type conversions
crypto: x86/ciphers - load based on CPU features
crypto: x86 - report used CPU features via module parameters
crypto: x86 - report missing CPU features via module parameters
crypto: x86 - report suboptimal CPUs via module parameters
crypto: x86 - standarize module descriptions

arch/x86/crypto/aegis128-aesni-glue.c | 66 +++--
arch/x86/crypto/aesni-intel_glue.c | 45 ++--
arch/x86/crypto/aria_aesni_avx_glue.c | 43 ++-
arch/x86/crypto/blake2s-glue.c | 18 +-
arch/x86/crypto/blowfish_glue.c | 39 ++-
arch/x86/crypto/camellia_aesni_avx2_glue.c | 40 ++-
arch/x86/crypto/camellia_aesni_avx_glue.c | 38 ++-
arch/x86/crypto/camellia_glue.c | 37 ++-
arch/x86/crypto/cast5_avx_glue.c | 30 ++-
arch/x86/crypto/cast6_avx_glue.c | 30 ++-
arch/x86/crypto/chacha_glue.c | 18 +-
arch/x86/crypto/crc32-pclmul_asm.S | 6 +-
arch/x86/crypto/crc32-pclmul_glue.c | 39 ++-
arch/x86/crypto/crc32c-intel_glue.c | 66 +++--
arch/x86/crypto/crct10dif-pclmul_glue.c | 56 ++--
arch/x86/crypto/curve25519-x86_64.c | 29 +-
arch/x86/crypto/des3_ede_glue.c | 36 ++-
arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 +-
arch/x86/crypto/ghash-clmulni-intel_glue.c | 45 ++--
arch/x86/crypto/nhpoly1305-avx2-glue.c | 36 ++-
arch/x86/crypto/nhpoly1305-sse2-glue.c | 22 +-
arch/x86/crypto/poly1305_glue.c | 56 +++-
arch/x86/crypto/polyval-clmulni_glue.c | 31 ++-
arch/x86/crypto/serpent_avx2_glue.c | 36 ++-
arch/x86/crypto/serpent_avx_glue.c | 31 ++-
arch/x86/crypto/serpent_sse2_glue.c | 13 +-
arch/x86/crypto/sha1_ssse3_glue.c | 298 ++++++++++++++-------
arch/x86/crypto/sha256_ssse3_glue.c | 294 +++++++++++++-------
arch/x86/crypto/sha512_ssse3_glue.c | 205 +++++++++-----
arch/x86/crypto/sm3_avx_glue.c | 70 +++--
arch/x86/crypto/sm4_aesni_avx2_glue.c | 37 ++-
arch/x86/crypto/sm4_aesni_avx_glue.c | 39 ++-
arch/x86/crypto/twofish_avx_glue.c | 29 +-
arch/x86/crypto/twofish_glue.c | 12 +-
arch/x86/crypto/twofish_glue_3way.c | 36 ++-
crypto/aes_ti.c | 2 +-
crypto/blake2b_generic.c | 2 +-
crypto/blowfish_common.c | 2 +-
crypto/crct10dif_generic.c | 2 +-
crypto/curve25519-generic.c | 1 +
crypto/sha256_generic.c | 2 +-
crypto/sha512_generic.c | 2 +-
crypto/sm3.c | 2 +-
crypto/sm4.c | 2 +-
crypto/tcrypt.c | 56 ++--
crypto/twofish_common.c | 2 +-
crypto/twofish_generic.c | 2 +-
47 files changed, 1377 insertions(+), 630 deletions(-)

--
2.38.1


Subject: [PATCH v4 17/24] crypto: x86/poly - load based on CPU features

Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), these x86-optimized crypto modules already have
module aliases based on CPU feature bits:
nhpoly1305
poly1305
polyval

Rename the unique device table data structure to a generic name
so the code has the same pattern in all the modules.

Remove the __maybe_unused attribute from polyval since it is
always used.

Signed-off-by: Robert Elliott <[email protected]>

---
v4 Removed CPU feature checks that are unreachable because
the x86_match_cpu call already handles them.

Made poly1305 match on all features since it does provide
an x86_64 asm function if avx, avx2, and avx512f are not
available.

Move polyval into this patch rather than pair with ghash.

Remove __maybe_unused from polyval.
---
arch/x86/crypto/nhpoly1305-avx2-glue.c | 13 +++++++++++--
arch/x86/crypto/nhpoly1305-sse2-glue.c | 9 ++++++++-
arch/x86/crypto/poly1305_glue.c | 10 ++++++++++
arch/x86/crypto/polyval-clmulni_glue.c | 6 +++---
4 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index f7dc9c563bb5..fa415fec5793 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -11,6 +11,7 @@
#include <crypto/nhpoly1305.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -60,10 +61,18 @@ static struct shash_alg nhpoly1305_alg = {
.descsize = sizeof(struct nhpoly1305_state),
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init nhpoly1305_mod_init(void)
{
- if (!boot_cpu_has(X86_FEATURE_AVX2) ||
- !boot_cpu_has(X86_FEATURE_OSXSAVE))
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_OSXSAVE))
return -ENODEV;

return crypto_register_shash(&nhpoly1305_alg);
diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c
index daffcc7019ad..c47765e46236 100644
--- a/arch/x86/crypto/nhpoly1305-sse2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c
@@ -11,6 +11,7 @@
#include <crypto/nhpoly1305.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
@@ -60,9 +61,15 @@ static struct shash_alg nhpoly1305_alg = {
.descsize = sizeof(struct nhpoly1305_state),
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_XMM2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init nhpoly1305_mod_init(void)
{
- if (!boot_cpu_has(X86_FEATURE_XMM2))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

return crypto_register_shash(&nhpoly1305_alg);
diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
index 16831c036d71..f1e39e23b2a3 100644
--- a/arch/x86/crypto/poly1305_glue.c
+++ b/arch/x86/crypto/poly1305_glue.c
@@ -12,6 +12,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/intel-family.h>
#include <asm/simd.h>

@@ -268,8 +269,17 @@ static struct shash_alg alg = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init poly1305_simd_mod_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (boot_cpu_has(X86_FEATURE_AVX) &&
cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL))
static_branch_enable(&poly1305_use_avx);
diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c
index de1c908f7412..b98e32f8e2a4 100644
--- a/arch/x86/crypto/polyval-clmulni_glue.c
+++ b/arch/x86/crypto/polyval-clmulni_glue.c
@@ -176,15 +176,15 @@ static struct shash_alg polyval_alg = {
},
};

-__maybe_unused static const struct x86_cpu_id pcmul_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, pcmul_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init polyval_clmulni_mod_init(void)
{
- if (!x86_match_cpu(pcmul_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_AVX))
--
2.38.1


Subject: [PATCH v4 07/24] crypto: x86/ghash - use u8 rather than char

Use more consistent unambivalent types for the source and destination
buffer pointer arguments.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 ++--
arch/x86/crypto/ghash-clmulni-intel_glue.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S
index 2bf871899920..c7b8542facee 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_asm.S
+++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S
@@ -88,7 +88,7 @@ SYM_FUNC_START_LOCAL(__clmul_gf128mul_ble)
RET
SYM_FUNC_END(__clmul_gf128mul_ble)

-/* void clmul_ghash_mul(char *dst, const u128 *shash) */
+/* void clmul_ghash_mul(u8 *dst, const u128 *shash) */
SYM_FUNC_START(clmul_ghash_mul)
FRAME_BEGIN
movups (%rdi), DATA
@@ -103,7 +103,7 @@ SYM_FUNC_START(clmul_ghash_mul)
SYM_FUNC_END(clmul_ghash_mul)

/*
- * void clmul_ghash_update(char *dst, const char *src, unsigned int srclen,
+ * void clmul_ghash_update(u8 *dst, const u8 *src, unsigned int srclen,
* const u128 *shash);
*/
SYM_FUNC_START(clmul_ghash_update)
diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index 1f1a95f3dd0c..e996627c6583 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -23,9 +23,9 @@
#define GHASH_BLOCK_SIZE 16
#define GHASH_DIGEST_SIZE 16

-void clmul_ghash_mul(char *dst, const u128 *shash);
+void clmul_ghash_mul(u8 *dst, const u128 *shash);

-void clmul_ghash_update(char *dst, const char *src, unsigned int srclen,
+void clmul_ghash_update(u8 *dst, const u8 *src, unsigned int srclen,
const u128 *shash);

struct ghash_async_ctx {
--
2.38.1


Subject: [PATCH v4 05/24] crypto: x86/crc - limit FPU preemption

Limit the number of bytes processed between kernel_fpu_begin() and
kernel_fpu_end() calls.

Those functions call preempt_disable() and preempt_enable(), so
the CPU core is unavailable for scheduling while running, leading to:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ...

Fixes: 78c37d191dd6 ("crypto: crc32 - add crc32 pclmulqdq implementation and wrappers for table implementation")
Fixes: 6a8ce1ef3940 ("crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction")
Fixes: 0b95a7f85718 ("crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform")
Suggested-by: Herbert Xu <[email protected]>
Signed-off-by: Robert Elliott <[email protected]>

---
v3 use while loops and static int, simplify one of the loop structures,
add algorithm-specific limits, use local stack variable in crc32 finup
rather than the context pointer like update uses
---
arch/x86/crypto/crc32-pclmul_asm.S | 6 +--
arch/x86/crypto/crc32-pclmul_glue.c | 27 +++++++++----
arch/x86/crypto/crc32c-intel_glue.c | 52 ++++++++++++++++++-------
arch/x86/crypto/crct10dif-pclmul_glue.c | 48 +++++++++++++++++------
4 files changed, 99 insertions(+), 34 deletions(-)

diff --git a/arch/x86/crypto/crc32-pclmul_asm.S b/arch/x86/crypto/crc32-pclmul_asm.S
index ca53e96996ac..9abd861636c3 100644
--- a/arch/x86/crypto/crc32-pclmul_asm.S
+++ b/arch/x86/crypto/crc32-pclmul_asm.S
@@ -72,15 +72,15 @@
.text
/**
* Calculate crc32
- * BUF - buffer (16 bytes aligned)
- * LEN - sizeof buffer (16 bytes aligned), LEN should be grater than 63
+ * BUF - buffer - must be 16 bytes aligned
+ * LEN - sizeof buffer - must be multiple of 16 bytes and greater than 63
* CRC - initial crc32
* return %eax crc32
* uint crc32_pclmul_le_16(unsigned char const *buffer,
* size_t len, uint crc32)
*/

-SYM_FUNC_START(crc32_pclmul_le_16) /* buffer and buffer size are 16 bytes aligned */
+SYM_FUNC_START(crc32_pclmul_le_16)
movdqa (BUF), %xmm1
movdqa 0x10(BUF), %xmm2
movdqa 0x20(BUF), %xmm3
diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c
index 98cf3b4e4c9f..df3dbc754818 100644
--- a/arch/x86/crypto/crc32-pclmul_glue.c
+++ b/arch/x86/crypto/crc32-pclmul_glue.c
@@ -46,6 +46,9 @@
#define SCALE_F 16L /* size of xmm register */
#define SCALE_F_MASK (SCALE_F - 1)

+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 655 * 1024;
+
u32 crc32_pclmul_le_16(unsigned char const *buffer, size_t len, u32 crc32);

static u32 __attribute__((pure))
@@ -55,6 +58,9 @@ static u32 __attribute__((pure))
unsigned int iremainder;
unsigned int prealign;

+ BUILD_BUG_ON(bytes_per_fpu < PCLMUL_MIN_LEN);
+ BUILD_BUG_ON(bytes_per_fpu & SCALE_F_MASK);
+
if (len < PCLMUL_MIN_LEN + SCALE_F_MASK || !crypto_simd_usable())
return crc32_le(crc, p, len);

@@ -70,12 +76,19 @@ static u32 __attribute__((pure))
iquotient = len & (~SCALE_F_MASK);
iremainder = len & SCALE_F_MASK;

- kernel_fpu_begin();
- crc = crc32_pclmul_le_16(p, iquotient, crc);
- kernel_fpu_end();
+ while (iquotient >= PCLMUL_MIN_LEN) {
+ unsigned int chunk = min(iquotient, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ crc = crc32_pclmul_le_16(p, chunk, crc);
+ kernel_fpu_end();
+
+ iquotient -= chunk;
+ p += chunk;
+ }

- if (iremainder)
- crc = crc32_le(crc, p + iquotient, iremainder);
+ if (iquotient || iremainder)
+ crc = crc32_le(crc, p, iquotient + iremainder);

return crc;
}
@@ -120,8 +133,8 @@ static int crc32_pclmul_update(struct shash_desc *desc, const u8 *data,
}

/* No final XOR 0xFFFFFFFF, like crc32_le */
-static int __crc32_pclmul_finup(u32 *crcp, const u8 *data, unsigned int len,
- u8 *out)
+static int __crc32_pclmul_finup(const u32 *crcp, const u8 *data,
+ unsigned int len, u8 *out)
{
*(__le32 *)out = cpu_to_le32(crc32_pclmul_le(*crcp, data, len));
return 0;
diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index feccb5254c7e..f08ed68ec93d 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -45,7 +45,10 @@ asmlinkage unsigned int crc_pcl(const u8 *buffer, int len,
unsigned int crc_init);
#endif /* CONFIG_X86_64 */

-static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t length)
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 868 * 1024;
+
+static u32 crc32c_intel_le_hw_byte(u32 crc, const unsigned char *data, size_t length)
{
while (length--) {
asm("crc32b %1, %0"
@@ -56,7 +59,7 @@ static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t le
return crc;
}

-static u32 __pure crc32c_intel_le_hw(u32 crc, unsigned char const *p, size_t len)
+static u32 __pure crc32c_intel_le_hw(u32 crc, const unsigned char *p, size_t len)
{
unsigned int iquotient = len / SCALE_F;
unsigned int iremainder = len % SCALE_F;
@@ -110,8 +113,8 @@ static int crc32c_intel_update(struct shash_desc *desc, const u8 *data,
return 0;
}

-static int __crc32c_intel_finup(u32 *crcp, const u8 *data, unsigned int len,
- u8 *out)
+static int __crc32c_intel_finup(const u32 *crcp, const u8 *data,
+ unsigned int len, u8 *out)
{
*(__le32 *)out = ~cpu_to_le32(crc32c_intel_le_hw(*crcp, data, len));
return 0;
@@ -153,29 +156,52 @@ static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data,
{
u32 *crcp = shash_desc_ctx(desc);

+ BUILD_BUG_ON(bytes_per_fpu < CRC32C_PCL_BREAKEVEN);
+ BUILD_BUG_ON(bytes_per_fpu % SCALE_F);
+
/*
* use faster PCL version if datasize is large enough to
* overcome kernel fpu state save/restore overhead
*/
if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) {
- kernel_fpu_begin();
- *crcp = crc_pcl(data, len, *crcp);
- kernel_fpu_end();
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ *crcp = crc_pcl(data, chunk, *crcp);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
} else
*crcp = crc32c_intel_le_hw(*crcp, data, len);
return 0;
}

-static int __crc32c_pcl_intel_finup(u32 *crcp, const u8 *data, unsigned int len,
- u8 *out)
+static int __crc32c_pcl_intel_finup(const u32 *crcp, const u8 *data,
+ unsigned int len, u8 *out)
{
+ u32 crc = *crcp;
+
+ BUILD_BUG_ON(bytes_per_fpu < CRC32C_PCL_BREAKEVEN);
+ BUILD_BUG_ON(bytes_per_fpu % SCALE_F);
+
if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) {
- kernel_fpu_begin();
- *(__le32 *)out = ~cpu_to_le32(crc_pcl(data, len, *crcp));
- kernel_fpu_end();
+ while (len) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ crc = crc_pcl(data, chunk, crc);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
+ *(__le32 *)out = ~cpu_to_le32(crc);
} else
*(__le32 *)out =
- ~cpu_to_le32(crc32c_intel_le_hw(*crcp, data, len));
+ ~cpu_to_le32(crc32c_intel_le_hw(crc, data, len));
return 0;
}

diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c
index 71291d5af9f4..4f6b8c727d88 100644
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c
+++ b/arch/x86/crypto/crct10dif-pclmul_glue.c
@@ -34,6 +34,11 @@
#include <asm/cpu_device_id.h>
#include <asm/simd.h>

+#define PCLMUL_MIN_LEN 16U /* minimum size of buffer for crc_t10dif_pcl */
+
+/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
+static const unsigned int bytes_per_fpu = 614 * 1024;
+
asmlinkage u16 crc_t10dif_pcl(u16 init_crc, const u8 *buf, size_t len);

struct chksum_desc_ctx {
@@ -54,11 +59,21 @@ static int chksum_update(struct shash_desc *desc, const u8 *data,
{
struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);

- if (length >= 16 && crypto_simd_usable()) {
- kernel_fpu_begin();
- ctx->crc = crc_t10dif_pcl(ctx->crc, data, length);
- kernel_fpu_end();
- } else
+ BUILD_BUG_ON(bytes_per_fpu < PCLMUL_MIN_LEN);
+
+ if (length >= PCLMUL_MIN_LEN && crypto_simd_usable()) {
+ while (length >= PCLMUL_MIN_LEN) {
+ unsigned int chunk = min(length, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ ctx->crc = crc_t10dif_pcl(ctx->crc, data, chunk);
+ kernel_fpu_end();
+
+ length -= chunk;
+ data += chunk;
+ }
+ }
+ if (length)
ctx->crc = crc_t10dif_generic(ctx->crc, data, length);
return 0;
}
@@ -73,12 +88,23 @@ static int chksum_final(struct shash_desc *desc, u8 *out)

static int __chksum_finup(__u16 crc, const u8 *data, unsigned int len, u8 *out)
{
- if (len >= 16 && crypto_simd_usable()) {
- kernel_fpu_begin();
- *(__u16 *)out = crc_t10dif_pcl(crc, data, len);
- kernel_fpu_end();
- } else
- *(__u16 *)out = crc_t10dif_generic(crc, data, len);
+ BUILD_BUG_ON(bytes_per_fpu < PCLMUL_MIN_LEN);
+
+ if (len >= PCLMUL_MIN_LEN && crypto_simd_usable()) {
+ while (len >= PCLMUL_MIN_LEN) {
+ unsigned int chunk = min(len, bytes_per_fpu);
+
+ kernel_fpu_begin();
+ crc = crc_t10dif_pcl(crc, data, chunk);
+ kernel_fpu_end();
+
+ len -= chunk;
+ data += chunk;
+ }
+ }
+ if (len)
+ crc = crc_t10dif_generic(crc, data, len);
+ *(__u16 *)out = crc;
return 0;
}

--
2.38.1


Subject: [PATCH v4 21/24] crypto: x86 - report used CPU features via module parameters

For modules that have multiple choices, add read-only module parameters
reporting which CPU features a module is using.

The parameters show up as follows for modules that modify the behavior
of their registered drivers or register additional drivers for
each choice:
/sys/module/aesni_intel/parameters/using_x86_avx:1
/sys/module/aesni_intel/parameters/using_x86_avx2:1
/sys/module/aria_aesni_avx_x86_64/parameters/using_x86_gfni:0
/sys/module/chacha_x86_64/parameters/using_x86_avx2:1
/sys/module/chacha_x86_64/parameters/using_x86_avx512:1
/sys/module/crc32c_intel/parameters/using_x86_pclmulqdq:1
/sys/module/curve25519_x86_64/parameters/using_x86_adx:1
/sys/module/libblake2s_x86_64/parameters/using_x86_avx512:1
/sys/module/libblake2s_x86_64/parameters/using_x86_ssse3:1
/sys/module/poly1305_x86_64/parameters/using_x86_avx:1
/sys/module/poly1305_x86_64/parameters/using_x86_avx2:1
/sys/module/poly1305_x86_64/parameters/using_x86_avx512:0
/sys/module/sha1_ssse3/parameters/using_x86_avx:1
/sys/module/sha1_ssse3/parameters/using_x86_avx2:1
/sys/module/sha1_ssse3/parameters/using_x86_shani:0
/sys/module/sha1_ssse3/parameters/using_x86_ssse3:1
/sys/module/sha256_ssse3/parameters/using_x86_avx:1
/sys/module/sha256_ssse3/parameters/using_x86_avx2:1
/sys/module/sha256_ssse3/parameters/using_x86_shani:0
/sys/module/sha256_ssse3/parameters/using_x86_ssse3:1
/sys/module/sha512_ssse3/parameters/using_x86_avx:1
/sys/module/sha512_ssse3/parameters/using_x86_avx2:1
/sys/module/sha512_ssse3/parameters/using_x86_ssse3:1

Delete the aesni_intel prints reporting those selections:
pr_info("AVX2 version of gcm_enc/dec engaged.\n");

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/aesni-intel_glue.c | 19 ++++++++-----------
arch/x86/crypto/aria_aesni_avx_glue.c | 6 ++++++
arch/x86/crypto/blake2s-glue.c | 5 +++++
arch/x86/crypto/chacha_glue.c | 5 +++++
arch/x86/crypto/crc32c-intel_glue.c | 6 ++++++
arch/x86/crypto/curve25519-x86_64.c | 3 +++
arch/x86/crypto/poly1305_glue.c | 7 +++++++
arch/x86/crypto/sha1_ssse3_glue.c | 11 +++++++++++
arch/x86/crypto/sha256_ssse3_glue.c | 20 +++++++++++---------
arch/x86/crypto/sha512_ssse3_glue.c | 7 +++++++
10 files changed, 69 insertions(+), 20 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 0505d4f9d2a2..80dbf98c53fd 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -1228,6 +1228,11 @@ static struct aead_alg aesni_aeads[0];

static struct simd_aead_alg *aesni_simd_aeads[ARRAY_SIZE(aesni_aeads)];

+module_param_named(using_x86_avx2, gcm_use_avx2.key.enabled.counter, int, 0444);
+module_param_named(using_x86_avx, gcm_use_avx.key.enabled.counter, int, 0444);
+MODULE_PARM_DESC(using_x86_avx2, "Using x86 instruction set extensions: AVX2 (for GCM mode)");
+MODULE_PARM_DESC(using_x86_avx, "Using x86 instruction set extensions: AVX (for CTR and GCM modes)");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_AES, NULL),
{}
@@ -1241,22 +1246,14 @@ static int __init aesni_init(void)
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
#ifdef CONFIG_X86_64
- if (boot_cpu_has(X86_FEATURE_AVX2)) {
- pr_info("AVX2 version of gcm_enc/dec engaged.\n");
- static_branch_enable(&gcm_use_avx);
+ if (boot_cpu_has(X86_FEATURE_AVX2))
static_branch_enable(&gcm_use_avx2);
- } else
+
if (boot_cpu_has(X86_FEATURE_AVX)) {
- pr_info("AVX version of gcm_enc/dec engaged.\n");
static_branch_enable(&gcm_use_avx);
- } else {
- pr_info("SSE version of gcm_enc/dec engaged.\n");
- }
- if (boot_cpu_has(X86_FEATURE_AVX)) {
- /* optimize performance of ctr mode encryption transform */
static_call_update(aesni_ctr_enc_tfm, aesni_ctr_enc_avx_tfm);
- pr_info("AES CTR mode by8 optimization enabled\n");
}
+
#endif /* CONFIG_X86_64 */

err = crypto_register_alg(&aesni_cipher_alg);
diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c
index 6a135203a767..9fd3d1fe1105 100644
--- a/arch/x86/crypto/aria_aesni_avx_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx_glue.c
@@ -166,6 +166,10 @@ static struct skcipher_alg aria_algs[] = {

static struct simd_skcipher_alg *aria_simd_algs[ARRAY_SIZE(aria_algs)];

+static int using_x86_gfni;
+module_param(using_x86_gfni, int, 0444);
+MODULE_PARM_DESC(using_x86_gfni, "Using x86 instruction set extensions: GF-NI");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
{}
@@ -192,6 +196,7 @@ static int __init aria_avx_init(void)
}

if (boot_cpu_has(X86_FEATURE_GFNI)) {
+ using_x86_gfni = 1;
aria_ops.aria_encrypt_16way = aria_aesni_avx_gfni_encrypt_16way;
aria_ops.aria_decrypt_16way = aria_aesni_avx_gfni_decrypt_16way;
aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_gfni_ctr_crypt_16way;
@@ -210,6 +215,7 @@ static void __exit aria_avx_exit(void)
{
simd_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs),
aria_simd_algs);
+ using_x86_gfni = 0;
}

module_init(aria_avx_init);
diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c
index df757d18a35a..781cf9471cb6 100644
--- a/arch/x86/crypto/blake2s-glue.c
+++ b/arch/x86/crypto/blake2s-glue.c
@@ -55,6 +55,11 @@ void blake2s_compress(struct blake2s_state *state, const u8 *block,
}
EXPORT_SYMBOL(blake2s_compress);

+module_param_named(using_x86_ssse3, blake2s_use_ssse3.key.enabled.counter, int, 0444);
+module_param_named(using_x86_avx512vl, blake2s_use_avx512.key.enabled.counter, int, 0444);
+MODULE_PARM_DESC(using_x86_ssse3, "Using x86 instruction set extensions: SSSE3");
+MODULE_PARM_DESC(using_x86_avx512vl, "Using x86 instruction set extensions: AVX-512VL");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
X86_MATCH_FEATURE(X86_FEATURE_AVX512VL, NULL),
diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index 546ab0abf30c..ec7461412c5e 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -277,6 +277,11 @@ static struct skcipher_alg algs[] = {
},
};

+module_param_named(using_x86_avx512vl, chacha_use_avx512vl.key.enabled.counter, int, 0444);
+module_param_named(using_x86_avx2, chacha_use_avx2.key.enabled.counter, int, 0444);
+MODULE_PARM_DESC(using_x86_avx512vl, "Using x86 instruction set extensions: AVX-512VL");
+MODULE_PARM_DESC(using_x86_avx2, "Using x86 instruction set extensions: AVX2");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
{}
diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index aff132e925ea..3c2bf7032667 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -240,6 +240,10 @@ static struct shash_alg alg = {
}
};

+static int using_x86_pclmulqdq;
+module_param(using_x86_pclmulqdq, int, 0444);
+MODULE_PARM_DESC(using_x86_pclmulqdq, "Using x86 instruction set extensions: PCLMULQDQ");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
{}
@@ -252,6 +256,7 @@ static int __init crc32c_intel_mod_init(void)
return -ENODEV;
#ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
+ using_x86_pclmulqdq = 1;
alg.update = crc32c_pcl_intel_update;
alg.finup = crc32c_pcl_intel_finup;
alg.digest = crc32c_pcl_intel_digest;
@@ -263,6 +268,7 @@ static int __init crc32c_intel_mod_init(void)
static void __exit crc32c_intel_mod_fini(void)
{
crypto_unregister_shash(&alg);
+ using_x86_pclmulqdq = 0;
}

module_init(crc32c_intel_mod_init);
diff --git a/arch/x86/crypto/curve25519-x86_64.c b/arch/x86/crypto/curve25519-x86_64.c
index ae7536b17bf9..6d222849e409 100644
--- a/arch/x86/crypto/curve25519-x86_64.c
+++ b/arch/x86/crypto/curve25519-x86_64.c
@@ -1697,6 +1697,9 @@ static struct kpp_alg curve25519_alg = {
.max_size = curve25519_max_size,
};

+module_param_named(using_x86_adx, curve25519_use_bmi2_adx.key.enabled.counter, int, 0444);
+MODULE_PARM_DESC(using_x86_adx, "Using x86 instruction set extensions: ADX");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_ADX, NULL),
{}
diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
index f1e39e23b2a3..d3c0d5b335ea 100644
--- a/arch/x86/crypto/poly1305_glue.c
+++ b/arch/x86/crypto/poly1305_glue.c
@@ -269,6 +269,13 @@ static struct shash_alg alg = {
},
};

+module_param_named(using_x86_avx, poly1305_use_avx.key.enabled.counter, int, 0444);
+module_param_named(using_x86_avx2, poly1305_use_avx2.key.enabled.counter, int, 0444);
+module_param_named(using_x86_avx512f, poly1305_use_avx512.key.enabled.counter, int, 0444);
+MODULE_PARM_DESC(using_x86_avx, "Using x86 instruction set extensions: AVX");
+MODULE_PARM_DESC(using_x86_avx2, "Using x86 instruction set extensions: AVX2");
+MODULE_PARM_DESC(using_x86_avx512f, "Using x86 instruction set extensions: AVX-512F");
+
static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
{}
diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 806463f57b6d..2445648cf234 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -90,6 +90,17 @@ static int using_x86_avx2;
static int using_x86_shani;
#endif

+#ifdef CONFIG_AS_SHA1_NI
+module_param(using_x86_shani, int, 0444);
+MODULE_PARM_DESC(using_x86_shani, "Using x86 instruction set extensions: SHA-NI");
+#endif
+module_param(using_x86_ssse3, int, 0444);
+module_param(using_x86_avx, int, 0444);
+module_param(using_x86_avx2, int, 0444);
+MODULE_PARM_DESC(using_x86_ssse3, "Using x86 instruction set extensions: SSSE3");
+MODULE_PARM_DESC(using_x86_avx, "Using x86 instruction set extensions: AVX");
+MODULE_PARM_DESC(using_x86_avx2, "Using x86 instruction set extensions: AVX2");
+
static int sha1_update(struct shash_desc *desc, const u8 *data,
unsigned int len, unsigned int bytes_per_fpu,
sha1_block_fn *sha1_xform)
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 30c8c50c1123..1464e6ccf912 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -104,6 +104,17 @@ static int using_x86_avx2;
static int using_x86_shani;
#endif

+#ifdef CONFIG_AS_SHA256_NI
+module_param(using_x86_shani, int, 0444);
+MODULE_PARM_DESC(using_x86_shani, "Using x86 instruction set extensions: SHA-NI");
+#endif
+module_param(using_x86_ssse3, int, 0444);
+module_param(using_x86_avx, int, 0444);
+module_param(using_x86_avx2, int, 0444);
+MODULE_PARM_DESC(using_x86_ssse3, "Using x86 instruction set extensions: SSSE3");
+MODULE_PARM_DESC(using_x86_avx, "Using x86 instruction set extensions: AVX");
+MODULE_PARM_DESC(using_x86_avx2, "Using x86 instruction set extensions: AVX2");
+
static int _sha256_update(struct shash_desc *desc, const u8 *data,
unsigned int len, unsigned int bytes_per_fpu,
sha256_block_fn *sha256_xform)
@@ -212,9 +223,6 @@ static void unregister_sha256_ssse3(void)
}
}

-asmlinkage void sha256_transform_avx(struct sha256_state *state,
- const u8 *data, int blocks);
-
static int sha256_avx_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
@@ -273,9 +281,6 @@ static void unregister_sha256_avx(void)
}
}

-asmlinkage void sha256_transform_rorx(struct sha256_state *state,
- const u8 *data, int blocks);
-
static int sha256_avx2_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
@@ -335,9 +340,6 @@ static void unregister_sha256_avx2(void)
}

#ifdef CONFIG_AS_SHA256_NI
-asmlinkage void sha256_ni_transform(struct sha256_state *digest,
- const u8 *data, int rounds);
-
static int sha256_ni_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 48586ab40d55..04e2af951a3e 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -81,6 +81,13 @@ static int using_x86_ssse3;
static int using_x86_avx;
static int using_x86_avx2;

+module_param(using_x86_ssse3, int, 0444);
+module_param(using_x86_avx, int, 0444);
+module_param(using_x86_avx2, int, 0444);
+MODULE_PARM_DESC(using_x86_ssse3, "Using x86 instruction set extensions: SSSE3");
+MODULE_PARM_DESC(using_x86_avx, "Using x86 instruction set extensions: AVX");
+MODULE_PARM_DESC(using_x86_avx2, "Using x86 instruction set extensions: AVX2");
+
static int sha512_update(struct shash_desc *desc, const u8 *data,
unsigned int len, unsigned int bytes_per_fpu,
sha512_block_fn *sha512_xform)
--
2.38.1


Subject: [PATCH v4 20/24] crypto: x86/ciphers - load based on CPU features

Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), add module aliases based on CPU feature bits for
modules not implementing hash algorithms:
aegis, aesni, aria
blake2s, blowfish
camellia, cast5, cast6, chacha, curve25519
des3_ede
serpent, sm4
twofish

Signed-off-by: Robert Elliott <[email protected]>

---
v4 Remove CPU feature checks that are unreachable because
x86_match_cpu already handles them. Make curve25519 match
on ADX and check BMI2.
---
arch/x86/crypto/aegis128-aesni-glue.c | 10 +++++++++-
arch/x86/crypto/aesni-intel_glue.c | 6 +++---
arch/x86/crypto/aria_aesni_avx_glue.c | 15 ++++++++++++---
arch/x86/crypto/blake2s-glue.c | 12 +++++++++++-
arch/x86/crypto/blowfish_glue.c | 10 ++++++++++
arch/x86/crypto/camellia_aesni_avx2_glue.c | 17 +++++++++++++----
arch/x86/crypto/camellia_aesni_avx_glue.c | 15 ++++++++++++---
arch/x86/crypto/camellia_glue.c | 10 ++++++++++
arch/x86/crypto/cast5_avx_glue.c | 10 ++++++++++
arch/x86/crypto/cast6_avx_glue.c | 10 ++++++++++
arch/x86/crypto/chacha_glue.c | 11 +++++++++--
arch/x86/crypto/curve25519-x86_64.c | 19 ++++++++++++++-----
arch/x86/crypto/des3_ede_glue.c | 10 ++++++++++
arch/x86/crypto/serpent_avx2_glue.c | 14 ++++++++++++--
arch/x86/crypto/serpent_avx_glue.c | 10 ++++++++++
arch/x86/crypto/serpent_sse2_glue.c | 11 ++++++++---
arch/x86/crypto/sm4_aesni_avx2_glue.c | 13 +++++++++++--
arch/x86/crypto/sm4_aesni_avx_glue.c | 15 ++++++++++++---
arch/x86/crypto/twofish_avx_glue.c | 10 ++++++++++
arch/x86/crypto/twofish_glue.c | 10 ++++++++++
arch/x86/crypto/twofish_glue_3way.c | 10 ++++++++++
21 files changed, 216 insertions(+), 32 deletions(-)

diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c
index 6e96bdda2811..a3ebd018953c 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -282,12 +282,20 @@ static struct aead_alg crypto_aegis128_aesni_alg = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AES, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_aead_alg *simd_alg;

static int __init crypto_aegis128_aesni_module_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_XMM2) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
!cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
return -ENODEV;

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 921680373855..0505d4f9d2a2 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -1228,17 +1228,17 @@ static struct aead_alg aesni_aeads[0];

static struct simd_aead_alg *aesni_simd_aeads[ARRAY_SIZE(aesni_aeads)];

-static const struct x86_cpu_id aesni_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_AES, NULL),
{}
};
-MODULE_DEVICE_TABLE(x86cpu, aesni_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init aesni_init(void)
{
int err;

- if (!x86_match_cpu(aesni_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
#ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_AVX2)) {
diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c
index c561ea4fefa5..6a135203a767 100644
--- a/arch/x86/crypto/aria_aesni_avx_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx_glue.c
@@ -5,6 +5,7 @@
* Copyright (c) 2022 Taehee Yoo <[email protected]>
*/

+#include <asm/cpu_device_id.h>
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/aria.h>
@@ -165,14 +166,22 @@ static struct skcipher_alg aria_algs[] = {

static struct simd_skcipher_alg *aria_simd_algs[ARRAY_SIZE(aria_algs)];

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init aria_avx_init(void)
{
const char *feature_name;

- if (!boot_cpu_has(X86_FEATURE_AVX) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX or AES-NI instructions are not detected.\n");
+ pr_info("AES or OSXSAVE instructions are not detected.\n");
return -ENODEV;
}

diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c
index aaba21230528..df757d18a35a 100644
--- a/arch/x86/crypto/blake2s-glue.c
+++ b/arch/x86/crypto/blake2s-glue.c
@@ -10,7 +10,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sizes.h>
-
+#include <asm/cpu_device_id.h>
#include <asm/cpufeature.h>
#include <asm/fpu/api.h>
#include <asm/processor.h>
@@ -55,8 +55,18 @@ void blake2s_compress(struct blake2s_state *state, const u8 *block,
}
EXPORT_SYMBOL(blake2s_compress);

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
+ X86_MATCH_FEATURE(X86_FEATURE_AVX512VL, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init blake2s_mod_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (boot_cpu_has(X86_FEATURE_SSSE3))
static_branch_enable(&blake2s_use_ssse3);

diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c
index 019c64c1340a..4c0ead71b198 100644
--- a/arch/x86/crypto/blowfish_glue.c
+++ b/arch/x86/crypto/blowfish_glue.c
@@ -15,6 +15,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

/* regular block cipher functions */
asmlinkage void __blowfish_enc_blk(struct bf_ctx *ctx, u8 *dst, const u8 *src,
@@ -303,10 +304,19 @@ static int force;
module_param(force, int, 0);
MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init blowfish_init(void)
{
int err;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!force && is_blacklisted_cpu()) {
printk(KERN_INFO
"blowfish-x86_64: performance on this CPU "
diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index e7e4d64e9577..6c48fc9f3fde 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -11,6 +11,7 @@
#include <linux/err.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

#include "camellia.h"
#include "ecb_cbc_helpers.h"
@@ -98,17 +99,25 @@ static struct skcipher_alg camellia_algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];

static int __init camellia_aesni_init(void)
{
const char *feature_name;

- if (!boot_cpu_has(X86_FEATURE_AVX) ||
- !boot_cpu_has(X86_FEATURE_AVX2) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_AES) ||
+ !boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX2 or AES-NI instructions are not detected.\n");
+ pr_info("AES-NI, AVX, or OSXSAVE instructions are not detected.\n");
return -ENODEV;
}

diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index c7ccf63e741e..6d7fc96d242e 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -11,6 +11,7 @@
#include <linux/err.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

#include "camellia.h"
#include "ecb_cbc_helpers.h"
@@ -98,16 +99,24 @@ static struct skcipher_alg camellia_algs[] = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];

static int __init camellia_aesni_init(void)
{
const char *feature_name;

- if (!boot_cpu_has(X86_FEATURE_AVX) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX or AES-NI instructions are not detected.\n");
+ pr_info("AES-NI or OSXSAVE instructions are not detected.\n");
return -ENODEV;
}

diff --git a/arch/x86/crypto/camellia_glue.c b/arch/x86/crypto/camellia_glue.c
index d45e9c0c42ac..a3df1043ed73 100644
--- a/arch/x86/crypto/camellia_glue.c
+++ b/arch/x86/crypto/camellia_glue.c
@@ -8,6 +8,7 @@
* Copyright (C) 2006 NTT (Nippon Telegraph and Telephone Corporation)
*/

+#include <asm/cpu_device_id.h>
#include <asm/unaligned.h>
#include <linux/crypto.h>
#include <linux/init.h>
@@ -1377,10 +1378,19 @@ static int force;
module_param(force, int, 0);
MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init camellia_init(void)
{
int err;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!force && is_blacklisted_cpu()) {
printk(KERN_INFO
"camellia-x86_64: performance on this CPU "
diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index 3976a87f92ad..bdc3c763334c 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -13,6 +13,7 @@
#include <linux/err.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

#include "ecb_cbc_helpers.h"

@@ -93,12 +94,21 @@ static struct skcipher_alg cast5_algs[] = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *cast5_simd_algs[ARRAY_SIZE(cast5_algs)];

static int __init cast5_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index 7e2aea372349..addca34b3511 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -15,6 +15,7 @@
#include <crypto/algapi.h>
#include <crypto/cast6.h>
#include <crypto/internal/simd.h>
+#include <asm/cpu_device_id.h>

#include "ecb_cbc_helpers.h"

@@ -93,12 +94,21 @@ static struct skcipher_alg cast6_algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *cast6_simd_algs[ARRAY_SIZE(cast6_algs)];

static int __init cast6_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index 7b3a1cf0984b..546ab0abf30c 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -13,6 +13,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sizes.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>

asmlinkage void chacha_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src,
@@ -276,10 +277,16 @@ static struct skcipher_alg algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init chacha_simd_mod_init(void)
{
- if (!boot_cpu_has(X86_FEATURE_SSSE3))
- return 0;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;

static_branch_enable(&chacha_use_simd);

diff --git a/arch/x86/crypto/curve25519-x86_64.c b/arch/x86/crypto/curve25519-x86_64.c
index d55fa9e9b9e6..ae7536b17bf9 100644
--- a/arch/x86/crypto/curve25519-x86_64.c
+++ b/arch/x86/crypto/curve25519-x86_64.c
@@ -12,7 +12,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/scatterlist.h>
-
+#include <asm/cpu_device_id.h>
#include <asm/cpufeature.h>
#include <asm/processor.h>

@@ -1697,13 +1697,22 @@ static struct kpp_alg curve25519_alg = {
.max_size = curve25519_max_size,
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ADX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init curve25519_mod_init(void)
{
- if (boot_cpu_has(X86_FEATURE_BMI2) && boot_cpu_has(X86_FEATURE_ADX))
- static_branch_enable(&curve25519_use_bmi2_adx);
- else
- return 0;
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_BMI2))
+ return -ENODEV;
+
+ static_branch_enable(&curve25519_use_bmi2_adx);
+
return IS_REACHABLE(CONFIG_CRYPTO_KPP) ?
crypto_register_kpp(&curve25519_alg) : 0;
}
diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c
index abb8b1fe123b..168cac5c6ca6 100644
--- a/arch/x86/crypto/des3_ede_glue.c
+++ b/arch/x86/crypto/des3_ede_glue.c
@@ -15,6 +15,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

struct des3_ede_x86_ctx {
struct des3_ede_ctx enc;
@@ -354,10 +355,19 @@ static int force;
module_param(force, int, 0);
MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init des3_ede_x86_init(void)
{
int err;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!force && is_blacklisted_cpu()) {
pr_info("des3_ede-x86_64: performance on this CPU would be suboptimal: disabling des3_ede-x86_64.\n");
return -ENODEV;
diff --git a/arch/x86/crypto/serpent_avx2_glue.c b/arch/x86/crypto/serpent_avx2_glue.c
index 347e97f4b713..bc18149fb928 100644
--- a/arch/x86/crypto/serpent_avx2_glue.c
+++ b/arch/x86/crypto/serpent_avx2_glue.c
@@ -12,6 +12,7 @@
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/serpent.h>
+#include <asm/cpu_device_id.h>

#include "serpent-avx.h"
#include "ecb_cbc_helpers.h"
@@ -94,14 +95,23 @@ static struct skcipher_alg serpent_algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];

static int __init serpent_avx2_init(void)
{
const char *feature_name;

- if (!boot_cpu_has(X86_FEATURE_AVX2) || !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX2 instructions are not detected.\n");
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ pr_info("OSXSAVE instructions are not detected.\n");
return -ENODEV;
}
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index 6c248e1ea4ef..0db18d99da50 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -15,6 +15,7 @@
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/serpent.h>
+#include <asm/cpu_device_id.h>

#include "serpent-avx.h"
#include "ecb_cbc_helpers.h"
@@ -100,12 +101,21 @@ static struct skcipher_alg serpent_algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];

static int __init serpent_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
diff --git a/arch/x86/crypto/serpent_sse2_glue.c b/arch/x86/crypto/serpent_sse2_glue.c
index d78f37e9b2cf..74f0c89f55ef 100644
--- a/arch/x86/crypto/serpent_sse2_glue.c
+++ b/arch/x86/crypto/serpent_sse2_glue.c
@@ -20,6 +20,7 @@
#include <crypto/b128ops.h>
#include <crypto/internal/simd.h>
#include <crypto/serpent.h>
+#include <asm/cpu_device_id.h>

#include "serpent-sse2.h"
#include "ecb_cbc_helpers.h"
@@ -103,14 +104,18 @@ static struct skcipher_alg serpent_algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_XMM2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];

static int __init serpent_sse2_init(void)
{
- if (!boot_cpu_has(X86_FEATURE_XMM2)) {
- printk(KERN_INFO "SSE2 instructions are not detected.\n");
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;
- }

return simd_register_skciphers_compat(serpent_algs,
ARRAY_SIZE(serpent_algs),
diff --git a/arch/x86/crypto/sm4_aesni_avx2_glue.c b/arch/x86/crypto/sm4_aesni_avx2_glue.c
index 84bc718f49a3..125b00db89b1 100644
--- a/arch/x86/crypto/sm4_aesni_avx2_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx2_glue.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/crypto.h>
#include <linux/kernel.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>
#include <crypto/internal/simd.h>
#include <crypto/internal/skcipher.h>
@@ -126,6 +127,12 @@ static struct skcipher_alg sm4_aesni_avx2_skciphers[] = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *
simd_sm4_aesni_avx2_skciphers[ARRAY_SIZE(sm4_aesni_avx2_skciphers)];

@@ -133,11 +140,13 @@ static int __init sm4_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!boot_cpu_has(X86_FEATURE_AVX) ||
- !boot_cpu_has(X86_FEATURE_AVX2) ||
!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX2 or AES-NI instructions are not detected.\n");
+ pr_info("AVX, AES-NI, and/or OSXSAVE instructions are not detected.\n");
return -ENODEV;
}

diff --git a/arch/x86/crypto/sm4_aesni_avx_glue.c b/arch/x86/crypto/sm4_aesni_avx_glue.c
index 7800f77d68ad..ac8182b197cf 100644
--- a/arch/x86/crypto/sm4_aesni_avx_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx_glue.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/crypto.h>
#include <linux/kernel.h>
+#include <asm/cpu_device_id.h>
#include <asm/simd.h>
#include <crypto/internal/simd.h>
#include <crypto/internal/skcipher.h>
@@ -445,6 +446,12 @@ static struct skcipher_alg sm4_aesni_avx_skciphers[] = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *
simd_sm4_aesni_avx_skciphers[ARRAY_SIZE(sm4_aesni_avx_skciphers)];

@@ -452,10 +459,12 @@ static int __init sm4_init(void)
{
const char *feature_name;

- if (!boot_cpu_has(X86_FEATURE_AVX) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
+ if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX or AES-NI instructions are not detected.\n");
+ pr_info("AES-NI or OSXSAVE instructions are not detected.\n");
return -ENODEV;
}

diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index 3eb3440b477a..4657e6efc35d 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -15,6 +15,7 @@
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/twofish.h>
+#include <asm/cpu_device_id.h>

#include "twofish.h"
#include "ecb_cbc_helpers.h"
@@ -103,12 +104,21 @@ static struct skcipher_alg twofish_algs[] = {
},
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static struct simd_skcipher_alg *twofish_simd_algs[ARRAY_SIZE(twofish_algs)];

static int __init twofish_init(void)
{
const char *feature_name;

+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, &feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
return -ENODEV;
diff --git a/arch/x86/crypto/twofish_glue.c b/arch/x86/crypto/twofish_glue.c
index f9c4adc27404..ade98aef3402 100644
--- a/arch/x86/crypto/twofish_glue.c
+++ b/arch/x86/crypto/twofish_glue.c
@@ -43,6 +43,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

asmlinkage void twofish_enc_blk(struct twofish_ctx *ctx, u8 *dst,
const u8 *src);
@@ -81,8 +82,17 @@ static struct crypto_alg alg = {
}
};

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init twofish_glue_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
return crypto_register_alg(&alg);
}

diff --git a/arch/x86/crypto/twofish_glue_3way.c b/arch/x86/crypto/twofish_glue_3way.c
index 90454cf18e0d..790e5a59a9a7 100644
--- a/arch/x86/crypto/twofish_glue_3way.c
+++ b/arch/x86/crypto/twofish_glue_3way.c
@@ -11,6 +11,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/types.h>
+#include <asm/cpu_device_id.h>

#include "twofish.h"
#include "ecb_cbc_helpers.h"
@@ -140,8 +141,17 @@ static int force;
module_param(force, int, 0);
MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");

+static const struct x86_cpu_id module_cpu_ids[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
+
static int __init twofish_3way_init(void)
{
+ if (!x86_match_cpu(module_cpu_ids))
+ return -ENODEV;
+
if (!force && is_blacklisted_cpu()) {
printk(KERN_INFO
"twofish-x86_64-3way: performance on this CPU "
--
2.38.1


Subject: [PATCH v4 19/24] crypto: x86/aesni - avoid type conversions

Change the type of the GCM auth_tag_len argument and derivative
variables from unsigned long to unsigned int, so they preserve the
type returned by crypto_aead_authsize().

Continue to pass it to the asm functions as an unsigned long,
but let those function calls be the place where the conversion
to the possibly larger type occurs.

This avoids possible truncation for calculations like:
scatterwalk_map_and_copy(auth_tag_msg, req->src,
req->assoclen + req->cryptlen - auth_tag_len,
auth_tag_len, 0);

whose third argument is an unsigned int. If unsigned long were
bigger than unsigned int, that equation could wrap.

Use unsigned int rather than int for intermediate variables
containing byte counts and block counts, since all the functions
using them accept unsigned int arguments.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/aesni-intel_glue.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..921680373855 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -381,7 +381,7 @@ static int cts_cbc_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
- int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+ unsigned int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
struct scatterlist *src = req->src, *dst = req->dst;
struct scatterlist sg_src[2], sg_dst[2];
struct skcipher_request subreq;
@@ -437,7 +437,7 @@ static int cts_cbc_decrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
- int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+ unsigned int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
struct scatterlist *src = req->src, *dst = req->dst;
struct scatterlist sg_src[2], sg_dst[2];
struct skcipher_request subreq;
@@ -671,11 +671,11 @@ static int generic_gcmaes_set_authsize(struct crypto_aead *tfm,
static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
unsigned int assoclen, u8 *hash_subkey,
u8 *iv, void *aes_ctx, u8 *auth_tag,
- unsigned long auth_tag_len)
+ unsigned int auth_tag_len)
{
u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
- unsigned long left = req->cryptlen;
+ unsigned int left = req->cryptlen;
struct scatter_walk assoc_sg_walk;
struct skcipher_walk walk;
bool do_avx, do_avx2;
@@ -782,7 +782,7 @@ static int gcmaes_encrypt(struct aead_request *req, unsigned int assoclen,
u8 *hash_subkey, u8 *iv, void *aes_ctx)
{
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
- unsigned long auth_tag_len = crypto_aead_authsize(tfm);
+ unsigned int auth_tag_len = crypto_aead_authsize(tfm);
u8 auth_tag[16];
int err;

@@ -801,7 +801,7 @@ static int gcmaes_decrypt(struct aead_request *req, unsigned int assoclen,
u8 *hash_subkey, u8 *iv, void *aes_ctx)
{
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
- unsigned long auth_tag_len = crypto_aead_authsize(tfm);
+ unsigned int auth_tag_len = crypto_aead_authsize(tfm);
u8 auth_tag_msg[16];
u8 auth_tag[16];
int err;
@@ -907,7 +907,7 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
- int tail = req->cryptlen % AES_BLOCK_SIZE;
+ unsigned int tail = req->cryptlen % AES_BLOCK_SIZE;
struct skcipher_request subreq;
struct skcipher_walk walk;
int err;
@@ -920,7 +920,7 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
return err;

if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
- int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+ unsigned int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;

skcipher_walk_abort(&walk);

@@ -945,7 +945,7 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);

while (walk.nbytes > 0) {
- int nbytes = walk.nbytes;
+ unsigned int nbytes = walk.nbytes;

if (nbytes < walk.total)
nbytes &= ~(AES_BLOCK_SIZE - 1);
--
2.38.1


Subject: [PATCH v4 18/24] crypto: x86/ghash - load based on CPU features

Like commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU
features"), these x86-optimized crypto modules already have
module aliases based on CPU feature bits:
ghash

Rename the unique device table data structure to a generic name
so the code has the same pattern in all the modules.

Signed-off-by: Robert Elliott <[email protected]>

---
v4 move polyval into a separate patch
---
arch/x86/crypto/ghash-clmulni-intel_glue.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index 0f24c3b23fd2..d19a8e9b34a6 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -325,17 +325,17 @@ static struct ahash_alg ghash_async_alg = {
},
};

-static const struct x86_cpu_id pcmul_cpu_id[] = {
+static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL), /* Pickle-Mickle-Duck */
{}
};
-MODULE_DEVICE_TABLE(x86cpu, pcmul_cpu_id);
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

static int __init ghash_pclmulqdqni_mod_init(void)
{
int err;

- if (!x86_match_cpu(pcmul_cpu_id))
+ if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

err = crypto_register_shash(&ghash_alg);
--
2.38.1


Subject: [PATCH v4 23/24] crypto: x86 - report suboptimal CPUs via module parameters

Don't refuse to load modules on certain CPUs and print a message
to the console. Instead, load the module but don't register the
crypto functions, and report this condition via a new module
suboptimal_x86 module parameter with this description:
Crypto driver not registered because performance on this CPU would be suboptimal

Reword the descriptions of the existing force module parameter
to match this modified behavior:
force: Force crypto driver registration on suboptimal CPUs

Make the new module parameters readable via sysfs:
/sys/module/blowfish_x86_64/parameters/suboptimal_x86:0
/sys/module/camellia_x86_64/parameters/suboptimal_x86:0
/sys/module/des3_ede_x86_64/parameters/suboptimal_x86:1
/sys/module/twofish_x86_64_3way/parameters/suboptimal_x86:1

If the module has been loaded and is reporting suboptimal_x86=1,
remove it to try loading again:
modprobe -r blowfish_x86_64
modprobe blowfish_x86_64 force=1

or specify it on the kernel command line:
blowfish_x86_64.force=1

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/blowfish_glue.c | 29 +++++++++++++++++------------
arch/x86/crypto/camellia_glue.c | 27 ++++++++++++++++-----------
arch/x86/crypto/des3_ede_glue.c | 26 +++++++++++++++++---------
arch/x86/crypto/twofish_glue_3way.c | 26 +++++++++++++++-----------
4 files changed, 65 insertions(+), 43 deletions(-)

diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c
index 4c0ead71b198..8e4de7859e34 100644
--- a/arch/x86/crypto/blowfish_glue.c
+++ b/arch/x86/crypto/blowfish_glue.c
@@ -283,7 +283,7 @@ static struct skcipher_alg bf_skcipher_algs[] = {
},
};

-static bool is_blacklisted_cpu(void)
+static bool is_suboptimal_cpu(void)
{
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return false;
@@ -292,7 +292,7 @@ static bool is_blacklisted_cpu(void)
/*
* On Pentium 4, blowfish-x86_64 is slower than generic C
* implementation because use of 64bit rotates (which are really
- * slow on P4). Therefore blacklist P4s.
+ * slow on P4).
*/
return true;
}
@@ -302,7 +302,12 @@ static bool is_blacklisted_cpu(void)

static int force;
module_param(force, int, 0);
-MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+MODULE_PARM_DESC(force, "Force crypto driver registration on suboptimal CPUs");
+
+static int suboptimal_x86;
+module_param(suboptimal_x86, int, 0444);
+MODULE_PARM_DESC(suboptimal_x86,
+ "Crypto driver not registered because performance on this CPU would be suboptimal");

static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
@@ -317,12 +322,9 @@ static int __init blowfish_init(void)
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (!force && is_blacklisted_cpu()) {
- printk(KERN_INFO
- "blowfish-x86_64: performance on this CPU "
- "would be suboptimal: disabling "
- "blowfish-x86_64.\n");
- return -ENODEV;
+ if (!force && is_suboptimal_cpu()) {
+ suboptimal_x86 = 1;
+ return 0;
}

err = crypto_register_alg(&bf_cipher_alg);
@@ -339,9 +341,12 @@ static int __init blowfish_init(void)

static void __exit blowfish_fini(void)
{
- crypto_unregister_alg(&bf_cipher_alg);
- crypto_unregister_skciphers(bf_skcipher_algs,
- ARRAY_SIZE(bf_skcipher_algs));
+ if (!suboptimal_x86) {
+ crypto_unregister_alg(&bf_cipher_alg);
+ crypto_unregister_skciphers(bf_skcipher_algs,
+ ARRAY_SIZE(bf_skcipher_algs));
+ }
+ suboptimal_x86 = 0;
}

module_init(blowfish_init);
diff --git a/arch/x86/crypto/camellia_glue.c b/arch/x86/crypto/camellia_glue.c
index a3df1043ed73..2cb9b24d9437 100644
--- a/arch/x86/crypto/camellia_glue.c
+++ b/arch/x86/crypto/camellia_glue.c
@@ -1356,7 +1356,7 @@ static struct skcipher_alg camellia_skcipher_algs[] = {
}
};

-static bool is_blacklisted_cpu(void)
+static bool is_suboptimal_cpu(void)
{
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return false;
@@ -1376,7 +1376,12 @@ static bool is_blacklisted_cpu(void)

static int force;
module_param(force, int, 0);
-MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+MODULE_PARM_DESC(force, "Force crypto driver registration on suboptimal CPUs");
+
+static int suboptimal_x86;
+module_param(suboptimal_x86, int, 0444);
+MODULE_PARM_DESC(suboptimal_x86,
+ "Crypto driver not registered because performance on this CPU would be suboptimal");

static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
@@ -1391,12 +1396,9 @@ static int __init camellia_init(void)
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (!force && is_blacklisted_cpu()) {
- printk(KERN_INFO
- "camellia-x86_64: performance on this CPU "
- "would be suboptimal: disabling "
- "camellia-x86_64.\n");
- return -ENODEV;
+ if (!force && is_suboptimal_cpu()) {
+ suboptimal_x86 = 1;
+ return 0;
}

err = crypto_register_alg(&camellia_cipher_alg);
@@ -1413,9 +1415,12 @@ static int __init camellia_init(void)

static void __exit camellia_fini(void)
{
- crypto_unregister_alg(&camellia_cipher_alg);
- crypto_unregister_skciphers(camellia_skcipher_algs,
- ARRAY_SIZE(camellia_skcipher_algs));
+ if (!suboptimal_x86) {
+ crypto_unregister_alg(&camellia_cipher_alg);
+ crypto_unregister_skciphers(camellia_skcipher_algs,
+ ARRAY_SIZE(camellia_skcipher_algs));
+ }
+ suboptimal_x86 = 0;
}

module_init(camellia_init);
diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c
index 168cac5c6ca6..a4cac5129148 100644
--- a/arch/x86/crypto/des3_ede_glue.c
+++ b/arch/x86/crypto/des3_ede_glue.c
@@ -334,7 +334,7 @@ static struct skcipher_alg des3_ede_skciphers[] = {
}
};

-static bool is_blacklisted_cpu(void)
+static bool is_suboptimal_cpu(void)
{
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return false;
@@ -343,7 +343,7 @@ static bool is_blacklisted_cpu(void)
/*
* On Pentium 4, des3_ede-x86_64 is slower than generic C
* implementation because use of 64bit rotates (which are really
- * slow on P4). Therefore blacklist P4s.
+ * slow on P4).
*/
return true;
}
@@ -353,7 +353,12 @@ static bool is_blacklisted_cpu(void)

static int force;
module_param(force, int, 0);
-MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+MODULE_PARM_DESC(force, "Force crypto driver registration on suboptimal CPUs");
+
+static int suboptimal_x86;
+module_param(suboptimal_x86, int, 0444);
+MODULE_PARM_DESC(suboptimal_x86,
+ "Crypto driver not registered because performance on this CPU would be suboptimal");

static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
@@ -368,9 +373,9 @@ static int __init des3_ede_x86_init(void)
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (!force && is_blacklisted_cpu()) {
- pr_info("des3_ede-x86_64: performance on this CPU would be suboptimal: disabling des3_ede-x86_64.\n");
- return -ENODEV;
+ if (!force && is_suboptimal_cpu()) {
+ suboptimal_x86 = 1;
+ return 0;
}

err = crypto_register_alg(&des3_ede_cipher);
@@ -387,9 +392,12 @@ static int __init des3_ede_x86_init(void)

static void __exit des3_ede_x86_fini(void)
{
- crypto_unregister_alg(&des3_ede_cipher);
- crypto_unregister_skciphers(des3_ede_skciphers,
- ARRAY_SIZE(des3_ede_skciphers));
+ if (!suboptimal_x86) {
+ crypto_unregister_alg(&des3_ede_cipher);
+ crypto_unregister_skciphers(des3_ede_skciphers,
+ ARRAY_SIZE(des3_ede_skciphers));
+ }
+ suboptimal_x86 = 0;
}

module_init(des3_ede_x86_init);
diff --git a/arch/x86/crypto/twofish_glue_3way.c b/arch/x86/crypto/twofish_glue_3way.c
index 790e5a59a9a7..8db2f23b3056 100644
--- a/arch/x86/crypto/twofish_glue_3way.c
+++ b/arch/x86/crypto/twofish_glue_3way.c
@@ -103,7 +103,7 @@ static struct skcipher_alg tf_skciphers[] = {
},
};

-static bool is_blacklisted_cpu(void)
+static bool is_suboptimal_cpu(void)
{
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return false;
@@ -118,8 +118,7 @@ static bool is_blacklisted_cpu(void)
* storing blocks in 64bit registers to allow three blocks to
* be processed parallel. Parallel operation then allows gaining
* more performance than was trade off, on out-of-order CPUs.
- * However Atom does not benefit from this parallelism and
- * should be blacklisted.
+ * However Atom does not benefit from this parallelism.
*/
return true;
}
@@ -139,7 +138,12 @@ static bool is_blacklisted_cpu(void)

static int force;
module_param(force, int, 0);
-MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+MODULE_PARM_DESC(force, "Force crypto driver registration on suboptimal CPUs");
+
+static int suboptimal_x86;
+module_param(suboptimal_x86, int, 0444);
+MODULE_PARM_DESC(suboptimal_x86,
+ "Crypto driver not registered because performance on this CPU would be suboptimal");

static const struct x86_cpu_id module_cpu_ids[] = {
X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
@@ -152,12 +156,9 @@ static int __init twofish_3way_init(void)
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (!force && is_blacklisted_cpu()) {
- printk(KERN_INFO
- "twofish-x86_64-3way: performance on this CPU "
- "would be suboptimal: disabling "
- "twofish-x86_64-3way.\n");
- return -ENODEV;
+ if (!force && is_suboptimal_cpu()) {
+ suboptimal_x86 = 1;
+ return 0;
}

return crypto_register_skciphers(tf_skciphers,
@@ -166,7 +167,10 @@ static int __init twofish_3way_init(void)

static void __exit twofish_3way_fini(void)
{
- crypto_unregister_skciphers(tf_skciphers, ARRAY_SIZE(tf_skciphers));
+ if (!suboptimal_x86)
+ crypto_unregister_skciphers(tf_skciphers, ARRAY_SIZE(tf_skciphers));
+
+ suboptimal_x86 = 0;
}

module_init(twofish_3way_init);
--
2.38.1


Subject: [PATCH v4 22/24] crypto: x86 - report missing CPU features via module parameters

Don't refuse to load modules based on missing additional x86 features
(e.g., OSXSAVE) or x86 XSAVE features (e.g., YMM). Instead, load the module,
but don't register any crypto drivers. Report the fact that one or more
features are missing in a new missing_x86_features module parameter
(0 = no problems, 1 = something is missing; each module parameter
description lists all the features that it wants).

For the SHA functions that register up to four drivers based on CPU
features, report separate module parameters for each set:
missing_x86_features_avx2
missing_x86_features_avx

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/aegis128-aesni-glue.c | 15 ++++++++++---
arch/x86/crypto/aria_aesni_avx_glue.c | 24 +++++++++++---------
arch/x86/crypto/camellia_aesni_avx2_glue.c | 25 ++++++++++++---------
arch/x86/crypto/camellia_aesni_avx_glue.c | 25 ++++++++++++---------
arch/x86/crypto/cast5_avx_glue.c | 20 ++++++++++-------
arch/x86/crypto/cast6_avx_glue.c | 20 ++++++++++-------
arch/x86/crypto/curve25519-x86_64.c | 12 ++++++++--
arch/x86/crypto/nhpoly1305-avx2-glue.c | 14 +++++++++---
arch/x86/crypto/polyval-clmulni_glue.c | 15 ++++++++++---
arch/x86/crypto/serpent_avx2_glue.c | 24 +++++++++++---------
arch/x86/crypto/serpent_avx_glue.c | 21 ++++++++++-------
arch/x86/crypto/sha1_ssse3_glue.c | 20 +++++++++++++----
arch/x86/crypto/sha256_ssse3_glue.c | 18 +++++++++++++--
arch/x86/crypto/sha512_ssse3_glue.c | 18 +++++++++++++--
arch/x86/crypto/sm3_avx_glue.c | 22 ++++++++++--------
arch/x86/crypto/sm4_aesni_avx2_glue.c | 26 +++++++++++++---------
arch/x86/crypto/sm4_aesni_avx_glue.c | 26 +++++++++++++---------
arch/x86/crypto/twofish_avx_glue.c | 19 ++++++++++------
18 files changed, 243 insertions(+), 121 deletions(-)

diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c
index a3ebd018953c..e0312ecf34a8 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -288,6 +288,11 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (SSE2) and/or XSAVE features (SSE)");
+
static struct simd_aead_alg *simd_alg;

static int __init crypto_aegis128_aesni_module_init(void)
@@ -296,8 +301,10 @@ static int __init crypto_aegis128_aesni_module_init(void)
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_XMM2) ||
- !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
- return -ENODEV;
+ !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL)) {
+ missing_x86_features = 1;
+ return 0;
+ }

return simd_register_aeads_compat(&crypto_aegis128_aesni_alg, 1,
&simd_alg);
@@ -305,7 +312,9 @@ static int __init crypto_aegis128_aesni_module_init(void)

static void __exit crypto_aegis128_aesni_module_exit(void)
{
- simd_unregister_aeads(&crypto_aegis128_aesni_alg, 1, &simd_alg);
+ if (!missing_x86_features)
+ simd_unregister_aeads(&crypto_aegis128_aesni_alg, 1, &simd_alg);
+ missing_x86_features = 0;
}

module_init(crypto_aegis128_aesni_module_init);
diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c
index 9fd3d1fe1105..ebb9760967b5 100644
--- a/arch/x86/crypto/aria_aesni_avx_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx_glue.c
@@ -176,23 +176,25 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (AES-NI, OSXSAVE) and/or XSAVE features (SSE, YMM)");
+
static int __init aria_avx_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AES or OSXSAVE instructions are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}

- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}

if (boot_cpu_has(X86_FEATURE_GFNI)) {
@@ -213,8 +215,10 @@ static int __init aria_avx_init(void)

static void __exit aria_avx_exit(void)
{
- simd_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs),
- aria_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs),
+ aria_simd_algs);
+ missing_x86_features = 0;
using_x86_gfni = 0;
}

diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index 6c48fc9f3fde..e8ae1e1a801d 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -105,26 +105,28 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (AES-NI, AVX, OSXSAVE) and/or XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];

static int __init camellia_aesni_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AES-NI, AVX, or OSXSAVE instructions are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}

- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}

return simd_register_skciphers_compat(camellia_algs,
@@ -134,8 +136,11 @@ static int __init camellia_aesni_init(void)

static void __exit camellia_aesni_fini(void)
{
- simd_unregister_skciphers(camellia_algs, ARRAY_SIZE(camellia_algs),
- camellia_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(camellia_algs,
+ ARRAY_SIZE(camellia_algs),
+ camellia_simd_algs);
+ missing_x86_features = 0;
}

module_init(camellia_aesni_init);
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index 6d7fc96d242e..6784d631575c 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -105,25 +105,27 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (AES-NI, OSXSAVE) and/or XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];

static int __init camellia_aesni_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AES-NI or OSXSAVE instructions are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}

- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}

return simd_register_skciphers_compat(camellia_algs,
@@ -133,8 +135,11 @@ static int __init camellia_aesni_init(void)

static void __exit camellia_aesni_fini(void)
{
- simd_unregister_skciphers(camellia_algs, ARRAY_SIZE(camellia_algs),
- camellia_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(camellia_algs,
+ ARRAY_SIZE(camellia_algs),
+ camellia_simd_algs);
+ missing_x86_features = 0;
}

module_init(camellia_aesni_init);
diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index bdc3c763334c..34ef032bb8d0 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -100,19 +100,21 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *cast5_simd_algs[ARRAY_SIZE(cast5_algs)];

static int __init cast5_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}

return simd_register_skciphers_compat(cast5_algs,
@@ -122,8 +124,10 @@ static int __init cast5_init(void)

static void __exit cast5_exit(void)
{
- simd_unregister_skciphers(cast5_algs, ARRAY_SIZE(cast5_algs),
- cast5_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(cast5_algs, ARRAY_SIZE(cast5_algs),
+ cast5_simd_algs);
+ missing_x86_features = 0;
}

module_init(cast5_init);
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index addca34b3511..71559fd3ea87 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -100,19 +100,21 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *cast6_simd_algs[ARRAY_SIZE(cast6_algs)];

static int __init cast6_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}

return simd_register_skciphers_compat(cast6_algs,
@@ -122,8 +124,10 @@ static int __init cast6_init(void)

static void __exit cast6_exit(void)
{
- simd_unregister_skciphers(cast6_algs, ARRAY_SIZE(cast6_algs),
- cast6_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(cast6_algs, ARRAY_SIZE(cast6_algs),
+ cast6_simd_algs);
+ missing_x86_features = 0;
}

module_init(cast6_init);
diff --git a/arch/x86/crypto/curve25519-x86_64.c b/arch/x86/crypto/curve25519-x86_64.c
index 6d222849e409..74672351e534 100644
--- a/arch/x86/crypto/curve25519-x86_64.c
+++ b/arch/x86/crypto/curve25519-x86_64.c
@@ -1706,13 +1706,20 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (BMI2)");
+
static int __init curve25519_mod_init(void)
{
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (!boot_cpu_has(X86_FEATURE_BMI2))
- return -ENODEV;
+ if (!boot_cpu_has(X86_FEATURE_BMI2)) {
+ missing_x86_features = 1;
+ return 0;
+ }

static_branch_enable(&curve25519_use_bmi2_adx);

@@ -1725,6 +1732,7 @@ static void __exit curve25519_mod_exit(void)
if (IS_REACHABLE(CONFIG_CRYPTO_KPP) &&
static_branch_likely(&curve25519_use_bmi2_adx))
crypto_unregister_kpp(&curve25519_alg);
+ missing_x86_features = 0;
}

module_init(curve25519_mod_init);
diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index fa415fec5793..2e63947bc9fa 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -67,20 +67,28 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (OSXSAVE)");
+
static int __init nhpoly1305_mod_init(void)
{
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (!boot_cpu_has(X86_FEATURE_OSXSAVE))
- return -ENODEV;
+ if (!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ missing_x86_features = 1;
+ return 0;
+ }

return crypto_register_shash(&nhpoly1305_alg);
}

static void __exit nhpoly1305_mod_exit(void)
{
- crypto_unregister_shash(&nhpoly1305_alg);
+ if (!missing_x86_features)
+ crypto_unregister_shash(&nhpoly1305_alg);
}

module_init(nhpoly1305_mod_init);
diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c
index b98e32f8e2a4..20d4a68ec1d7 100644
--- a/arch/x86/crypto/polyval-clmulni_glue.c
+++ b/arch/x86/crypto/polyval-clmulni_glue.c
@@ -182,20 +182,29 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (AVX)");
+
static int __init polyval_clmulni_mod_init(void)
{
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (!boot_cpu_has(X86_FEATURE_AVX))
- return -ENODEV;
+ if (!boot_cpu_has(X86_FEATURE_AVX)) {
+ missing_x86_features = 1;
+ return 0;
+ }

return crypto_register_shash(&polyval_alg);
}

static void __exit polyval_clmulni_mod_exit(void)
{
- crypto_unregister_shash(&polyval_alg);
+ if (!missing_x86_features)
+ crypto_unregister_shash(&polyval_alg);
+ missing_x86_features = 0;
}

module_init(polyval_clmulni_mod_init);
diff --git a/arch/x86/crypto/serpent_avx2_glue.c b/arch/x86/crypto/serpent_avx2_glue.c
index bc18149fb928..2aa62c93a16f 100644
--- a/arch/x86/crypto/serpent_avx2_glue.c
+++ b/arch/x86/crypto/serpent_avx2_glue.c
@@ -101,23 +101,25 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (OSXSAVE) and/or XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];

static int __init serpent_avx2_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("OSXSAVE instructions are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}

return simd_register_skciphers_compat(serpent_algs,
@@ -127,8 +129,10 @@ static int __init serpent_avx2_init(void)

static void __exit serpent_avx2_fini(void)
{
- simd_unregister_skciphers(serpent_algs, ARRAY_SIZE(serpent_algs),
- serpent_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(serpent_algs, ARRAY_SIZE(serpent_algs),
+ serpent_simd_algs);
+ missing_x86_features = 0;
}

module_init(serpent_avx2_init);
diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index 0db18d99da50..28ee9717df49 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -107,19 +107,21 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];

static int __init serpent_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}

return simd_register_skciphers_compat(serpent_algs,
@@ -129,8 +131,11 @@ static int __init serpent_init(void)

static void __exit serpent_exit(void)
{
- simd_unregister_skciphers(serpent_algs, ARRAY_SIZE(serpent_algs),
- serpent_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(serpent_algs,
+ ARRAY_SIZE(serpent_algs),
+ serpent_simd_algs);
+ missing_x86_features = 0;
}

module_init(serpent_init);
diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 2445648cf234..405af5e14b67 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -351,9 +351,17 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features_avx2;
+static int missing_x86_features_avx;
+module_param(missing_x86_features_avx2, int, 0444);
+module_param(missing_x86_features_avx, int, 0444);
+MODULE_PARM_DESC(missing_x86_features_avx2,
+ "Missing x86 instruction set extensions (BMI1, BMI2) to support AVX2");
+MODULE_PARM_DESC(missing_x86_features_avx,
+ "Missing x86 XSAVE features (SSE, YMM) to support AVX");
+
static int __init sha1_ssse3_mod_init(void)
{
- const char *feature_name;
int ret;

if (!x86_match_cpu(module_cpu_ids))
@@ -374,10 +382,11 @@ static int __init sha1_ssse3_mod_init(void)

if (boot_cpu_has(X86_FEATURE_BMI1) &&
boot_cpu_has(X86_FEATURE_BMI2)) {
-
ret = crypto_register_shash(&sha1_avx2_alg);
if (!ret)
using_x86_avx2 = 1;
+ } else {
+ missing_x86_features_avx2 = 1;
}
}

@@ -385,11 +394,12 @@ static int __init sha1_ssse3_mod_init(void)
if (boot_cpu_has(X86_FEATURE_AVX)) {

if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
-
+ NULL)) {
ret = crypto_register_shash(&sha1_avx_alg);
if (!ret)
using_x86_avx = 1;
+ } else {
+ missing_x86_features_avx = 1;
}
}

@@ -415,6 +425,8 @@ static void __exit sha1_ssse3_mod_fini(void)
unregister_sha1_avx2();
unregister_sha1_avx();
unregister_sha1_ssse3();
+ missing_x86_features_avx2 = 0;
+ missing_x86_features_avx = 0;
}

module_init(sha1_ssse3_mod_init);
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 1464e6ccf912..293cf7085dd3 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -413,9 +413,17 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features_avx2;
+static int missing_x86_features_avx;
+module_param(missing_x86_features_avx2, int, 0444);
+module_param(missing_x86_features_avx, int, 0444);
+MODULE_PARM_DESC(missing_x86_features_avx2,
+ "Missing x86 instruction set extensions (BMI2) to support AVX2");
+MODULE_PARM_DESC(missing_x86_features_avx,
+ "Missing x86 XSAVE features (SSE, YMM) to support AVX");
+
static int __init sha256_ssse3_mod_init(void)
{
- const char *feature_name;
int ret;

if (!x86_match_cpu(module_cpu_ids))
@@ -440,6 +448,8 @@ static int __init sha256_ssse3_mod_init(void)
ARRAY_SIZE(sha256_avx2_algs));
if (!ret)
using_x86_avx2 = 1;
+ } else {
+ missing_x86_features_avx2 = 1;
}
}

@@ -447,11 +457,13 @@ static int __init sha256_ssse3_mod_init(void)
if (boot_cpu_has(X86_FEATURE_AVX)) {

if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
+ NULL)) {
ret = crypto_register_shashes(sha256_avx_algs,
ARRAY_SIZE(sha256_avx_algs));
if (!ret)
using_x86_avx = 1;
+ } else {
+ missing_x86_features_avx = 1;
}
}

@@ -478,6 +490,8 @@ static void __exit sha256_ssse3_mod_fini(void)
unregister_sha256_avx2();
unregister_sha256_avx();
unregister_sha256_ssse3();
+ missing_x86_features_avx2 = 0;
+ missing_x86_features_avx = 0;
}

module_init(sha256_ssse3_mod_init);
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 04e2af951a3e..9f13baf7dda9 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -319,6 +319,15 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features_avx2;
+static int missing_x86_features_avx;
+module_param(missing_x86_features_avx2, int, 0444);
+module_param(missing_x86_features_avx, int, 0444);
+MODULE_PARM_DESC(missing_x86_features_avx2,
+ "Missing x86 instruction set extensions (BMI2) to support AVX2");
+MODULE_PARM_DESC(missing_x86_features_avx,
+ "Missing x86 XSAVE features (SSE, YMM) to support AVX");
+
static void unregister_sha512_avx2(void)
{
if (using_x86_avx2) {
@@ -330,7 +339,6 @@ static void unregister_sha512_avx2(void)

static int __init sha512_ssse3_mod_init(void)
{
- const char *feature_name;
int ret;

if (!x86_match_cpu(module_cpu_ids))
@@ -343,6 +351,8 @@ static int __init sha512_ssse3_mod_init(void)
ARRAY_SIZE(sha512_avx2_algs));
if (!ret)
using_x86_avx2 = 1;
+ } else {
+ missing_x86_features_avx2 = 1;
}
}

@@ -350,11 +360,13 @@ static int __init sha512_ssse3_mod_init(void)
if (boot_cpu_has(X86_FEATURE_AVX)) {

if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
+ NULL)) {
ret = crypto_register_shashes(sha512_avx_algs,
ARRAY_SIZE(sha512_avx_algs));
if (!ret)
using_x86_avx = 1;
+ } else {
+ missing_x86_features_avx = 1;
}
}

@@ -376,6 +388,8 @@ static void __exit sha512_ssse3_mod_fini(void)
unregister_sha512_avx2();
unregister_sha512_avx();
unregister_sha512_ssse3();
+ missing_x86_features_avx2 = 0;
+ missing_x86_features_avx = 0;
}

module_init(sha512_ssse3_mod_init);
diff --git a/arch/x86/crypto/sm3_avx_glue.c b/arch/x86/crypto/sm3_avx_glue.c
index c7786874319c..169ba6a2c806 100644
--- a/arch/x86/crypto/sm3_avx_glue.c
+++ b/arch/x86/crypto/sm3_avx_glue.c
@@ -126,22 +126,24 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (BMI2) and/or XSAVE features (SSE, YMM)");
+
static int __init sm3_avx_mod_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_BMI2)) {
- pr_info("BMI2 instruction are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}

- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}

return crypto_register_shash(&sm3_avx_alg);
@@ -149,7 +151,9 @@ static int __init sm3_avx_mod_init(void)

static void __exit sm3_avx_mod_exit(void)
{
- crypto_unregister_shash(&sm3_avx_alg);
+ if (!missing_x86_features)
+ crypto_unregister_shash(&sm3_avx_alg);
+ missing_x86_features = 0;
}

module_init(sm3_avx_mod_init);
diff --git a/arch/x86/crypto/sm4_aesni_avx2_glue.c b/arch/x86/crypto/sm4_aesni_avx2_glue.c
index 125b00db89b1..6bcf78231888 100644
--- a/arch/x86/crypto/sm4_aesni_avx2_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx2_glue.c
@@ -133,27 +133,29 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (AES-NI, AVX, OSXSAVE) and/or XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *
simd_sm4_aesni_avx2_skciphers[ARRAY_SIZE(sm4_aesni_avx2_skciphers)];

static int __init sm4_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AVX, AES-NI, and/or OSXSAVE instructions are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}

- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}

return simd_register_skciphers_compat(sm4_aesni_avx2_skciphers,
@@ -163,9 +165,11 @@ static int __init sm4_init(void)

static void __exit sm4_exit(void)
{
- simd_unregister_skciphers(sm4_aesni_avx2_skciphers,
- ARRAY_SIZE(sm4_aesni_avx2_skciphers),
- simd_sm4_aesni_avx2_skciphers);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(sm4_aesni_avx2_skciphers,
+ ARRAY_SIZE(sm4_aesni_avx2_skciphers),
+ simd_sm4_aesni_avx2_skciphers);
+ missing_x86_features = 0;
}

module_init(sm4_init);
diff --git a/arch/x86/crypto/sm4_aesni_avx_glue.c b/arch/x86/crypto/sm4_aesni_avx_glue.c
index ac8182b197cf..03775b1079dc 100644
--- a/arch/x86/crypto/sm4_aesni_avx_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx_glue.c
@@ -452,26 +452,28 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 instruction set extensions (AES-NI, OSXSAVE) and/or XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *
simd_sm4_aesni_avx_skciphers[ARRAY_SIZE(sm4_aesni_avx_skciphers)];

static int __init sm4_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

if (!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- pr_info("AES-NI or OSXSAVE instructions are not detected.\n");
- return -ENODEV;
+ missing_x86_features = 1;
+ return 0;
}

- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}

return simd_register_skciphers_compat(sm4_aesni_avx_skciphers,
@@ -481,9 +483,11 @@ static int __init sm4_init(void)

static void __exit sm4_exit(void)
{
- simd_unregister_skciphers(sm4_aesni_avx_skciphers,
- ARRAY_SIZE(sm4_aesni_avx_skciphers),
- simd_sm4_aesni_avx_skciphers);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(sm4_aesni_avx_skciphers,
+ ARRAY_SIZE(sm4_aesni_avx_skciphers),
+ simd_sm4_aesni_avx_skciphers);
+ missing_x86_features = 0;
}

module_init(sm4_init);
diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index 4657e6efc35d..ae3cc4ad6f4f 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -110,18 +110,21 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);

+static int missing_x86_features;
+module_param(missing_x86_features, int, 0444);
+MODULE_PARM_DESC(missing_x86_features,
+ "Missing x86 XSAVE features (SSE, YMM)");
+
static struct simd_skcipher_alg *twofish_simd_algs[ARRAY_SIZE(twofish_algs)];

static int __init twofish_init(void)
{
- const char *feature_name;
-
if (!x86_match_cpu(module_cpu_ids))
return -ENODEV;

- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
+ if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ missing_x86_features = 1;
+ return 0;
}

return simd_register_skciphers_compat(twofish_algs,
@@ -131,8 +134,10 @@ static int __init twofish_init(void)

static void __exit twofish_exit(void)
{
- simd_unregister_skciphers(twofish_algs, ARRAY_SIZE(twofish_algs),
- twofish_simd_algs);
+ if (!missing_x86_features)
+ simd_unregister_skciphers(twofish_algs, ARRAY_SIZE(twofish_algs),
+ twofish_simd_algs);
+ missing_x86_features = 0;
}

module_init(twofish_init);
--
2.38.1


Subject: [PATCH v4 24/24] crypto: x86 - standarize module descriptions

Make the module descriptions for the x86 optimized crypto modules match
the descriptions of the generic modules and the names in Kconfig.

End each description with "with <feature name>" listing the features
used for module matching.
"-- accelerated for x86 with AVX2"

Mention any other required CPU features:
"(also required: AES-NI)"

Mention any CPU features that are not required but enable additional
acceleration:
"(optional: GF-NI)"

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/aegis128-aesni-glue.c | 2 +-
arch/x86/crypto/aesni-intel_glue.c | 2 +-
arch/x86/crypto/aria_aesni_avx_glue.c | 2 +-
arch/x86/crypto/blake2s-glue.c | 1 +
arch/x86/crypto/blowfish_glue.c | 2 +-
arch/x86/crypto/camellia_aesni_avx2_glue.c | 2 +-
arch/x86/crypto/camellia_aesni_avx_glue.c | 2 +-
arch/x86/crypto/camellia_glue.c | 2 +-
arch/x86/crypto/cast5_avx_glue.c | 2 +-
arch/x86/crypto/cast6_avx_glue.c | 2 +-
arch/x86/crypto/chacha_glue.c | 2 +-
arch/x86/crypto/crc32-pclmul_glue.c | 2 +-
arch/x86/crypto/crc32c-intel_glue.c | 2 +-
arch/x86/crypto/crct10dif-pclmul_glue.c | 2 +-
arch/x86/crypto/curve25519-x86_64.c | 1 +
arch/x86/crypto/des3_ede_glue.c | 2 +-
arch/x86/crypto/ghash-clmulni-intel_glue.c | 2 +-
arch/x86/crypto/nhpoly1305-avx2-glue.c | 2 +-
arch/x86/crypto/nhpoly1305-sse2-glue.c | 2 +-
arch/x86/crypto/poly1305_glue.c | 2 +-
arch/x86/crypto/polyval-clmulni_glue.c | 2 +-
arch/x86/crypto/serpent_avx2_glue.c | 2 +-
arch/x86/crypto/serpent_avx_glue.c | 2 +-
arch/x86/crypto/serpent_sse2_glue.c | 2 +-
arch/x86/crypto/sha1_ssse3_glue.c | 2 +-
arch/x86/crypto/sha256_ssse3_glue.c | 2 +-
arch/x86/crypto/sha512_ssse3_glue.c | 2 +-
arch/x86/crypto/sm3_avx_glue.c | 2 +-
arch/x86/crypto/sm4_aesni_avx2_glue.c | 2 +-
arch/x86/crypto/sm4_aesni_avx_glue.c | 2 +-
arch/x86/crypto/twofish_avx_glue.c | 2 +-
arch/x86/crypto/twofish_glue.c | 2 +-
arch/x86/crypto/twofish_glue_3way.c | 2 +-
crypto/aes_ti.c | 2 +-
crypto/blake2b_generic.c | 2 +-
crypto/blowfish_common.c | 2 +-
crypto/crct10dif_generic.c | 2 +-
crypto/curve25519-generic.c | 1 +
crypto/sha256_generic.c | 2 +-
crypto/sha512_generic.c | 2 +-
crypto/sm3.c | 2 +-
crypto/sm4.c | 2 +-
crypto/twofish_common.c | 2 +-
crypto/twofish_generic.c | 2 +-
44 files changed, 44 insertions(+), 41 deletions(-)

diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c
index e0312ecf34a8..e72ae7ba5f12 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -322,6 +322,6 @@ module_exit(crypto_aegis128_aesni_module_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ondrej Mosnacek <[email protected]>");
-MODULE_DESCRIPTION("AEGIS-128 AEAD algorithm -- AESNI+SSE2 implementation");
+MODULE_DESCRIPTION("AEGIS-128 AEAD algorithm -- accelerated for x86 with AES-NI (also required: SEE2)");
MODULE_ALIAS_CRYPTO("aegis128");
MODULE_ALIAS_CRYPTO("aegis128-aesni");
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 80dbf98c53fd..3d8508598e76 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -1311,6 +1311,6 @@ static void __exit aesni_exit(void)
late_initcall(aesni_init);
module_exit(aesni_exit);

-MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, Intel AES-NI instructions optimized");
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm -- accelerated for x86 with AES-NI (optional: AVX, AVX2)");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c
index ebb9760967b5..1d23c7ef7aef 100644
--- a/arch/x86/crypto/aria_aesni_avx_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx_glue.c
@@ -227,6 +227,6 @@ module_exit(aria_avx_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Taehee Yoo <[email protected]>");
-MODULE_DESCRIPTION("ARIA Cipher Algorithm, AVX/AES-NI/GFNI optimized");
+MODULE_DESCRIPTION("ARIA Cipher Algorithm -- accelerated for x86 with AVX (also required: AES-NI, OSXSAVE)(optional: GF-NI)");
MODULE_ALIAS_CRYPTO("aria");
MODULE_ALIAS_CRYPTO("aria-aesni-avx");
diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c
index 781cf9471cb6..0618f0d31fae 100644
--- a/arch/x86/crypto/blake2s-glue.c
+++ b/arch/x86/crypto/blake2s-glue.c
@@ -90,3 +90,4 @@ static int __init blake2s_mod_init(void)
module_init(blake2s_mod_init);

MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("BLAKE2s hash algorithm -- accelerated for x86 with SSSE3 or AVX-512VL");
diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c
index 8e4de7859e34..67f7562d2d02 100644
--- a/arch/x86/crypto/blowfish_glue.c
+++ b/arch/x86/crypto/blowfish_glue.c
@@ -353,6 +353,6 @@ module_init(blowfish_init);
module_exit(blowfish_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Blowfish Cipher Algorithm, asm optimized");
+MODULE_DESCRIPTION("Blowfish Cipher Algorithm -- accelerated for x86");
MODULE_ALIAS_CRYPTO("blowfish");
MODULE_ALIAS_CRYPTO("blowfish-asm");
diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index e8ae1e1a801d..da89fef184d2 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -147,6 +147,6 @@ module_init(camellia_aesni_init);
module_exit(camellia_aesni_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Camellia Cipher Algorithm, AES-NI/AVX2 optimized");
+MODULE_DESCRIPTION("Camellia Cipher Algorithm -- accelerated for x86 with AVX2 (also required: AES-NI, AVX, OSXSAVE)");
MODULE_ALIAS_CRYPTO("camellia");
MODULE_ALIAS_CRYPTO("camellia-asm");
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index 6784d631575c..0eebb56bc440 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -146,6 +146,6 @@ module_init(camellia_aesni_init);
module_exit(camellia_aesni_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Camellia Cipher Algorithm, AES-NI/AVX optimized");
+MODULE_DESCRIPTION("Camellia Cipher Algorithm -- accelerated for x86 with AVX (also required: AES-NI, OSXSAVE)");
MODULE_ALIAS_CRYPTO("camellia");
MODULE_ALIAS_CRYPTO("camellia-asm");
diff --git a/arch/x86/crypto/camellia_glue.c b/arch/x86/crypto/camellia_glue.c
index 2cb9b24d9437..b8cad1655c66 100644
--- a/arch/x86/crypto/camellia_glue.c
+++ b/arch/x86/crypto/camellia_glue.c
@@ -1427,6 +1427,6 @@ module_init(camellia_init);
module_exit(camellia_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Camellia Cipher Algorithm, asm optimized");
+MODULE_DESCRIPTION("Camellia Cipher Algorithm -- accelerated for x86");
MODULE_ALIAS_CRYPTO("camellia");
MODULE_ALIAS_CRYPTO("camellia-asm");
diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index 34ef032bb8d0..4a11d3ea9838 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -133,6 +133,6 @@ static void __exit cast5_exit(void)
module_init(cast5_init);
module_exit(cast5_exit);

-MODULE_DESCRIPTION("Cast5 Cipher Algorithm, AVX optimized");
+MODULE_DESCRIPTION("Cast5 Cipher Algorithm -- accelerated for x86 with AVX");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("cast5");
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index 71559fd3ea87..53a92999a234 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -133,6 +133,6 @@ static void __exit cast6_exit(void)
module_init(cast6_init);
module_exit(cast6_exit);

-MODULE_DESCRIPTION("Cast6 Cipher Algorithm, AVX optimized");
+MODULE_DESCRIPTION("Cast6 Cipher Algorithm -- accelerated for x86 with AVX");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("cast6");
diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index ec7461412c5e..563546d0bc2a 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -320,7 +320,7 @@ module_exit(chacha_simd_mod_fini);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Martin Willi <[email protected]>");
-MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (x64 SIMD accelerated)");
+MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers -- accelerated for x86 with SSSE3 (optional: AVX, AVX2, AVX-512VL and AVX-512BW)");
MODULE_ALIAS_CRYPTO("chacha20");
MODULE_ALIAS_CRYPTO("chacha20-simd");
MODULE_ALIAS_CRYPTO("xchacha20");
diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c
index d5e889c24bea..1c297fae5d39 100644
--- a/arch/x86/crypto/crc32-pclmul_glue.c
+++ b/arch/x86/crypto/crc32-pclmul_glue.c
@@ -207,6 +207,6 @@ module_exit(crc32_pclmul_mod_fini);

MODULE_AUTHOR("Alexander Boyko <[email protected]>");
MODULE_LICENSE("GPL");
-
+MODULE_DESCRIPTION("CRC32 -- accelerated for x86 with PCLMULQDQ");
MODULE_ALIAS_CRYPTO("crc32");
MODULE_ALIAS_CRYPTO("crc32-pclmul");
diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index 3c2bf7032667..ba7899d04bb1 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -275,7 +275,7 @@ module_init(crc32c_intel_mod_init);
module_exit(crc32c_intel_mod_fini);

MODULE_AUTHOR("Austin Zhang <[email protected]>, Kent Liu <[email protected]>");
-MODULE_DESCRIPTION("CRC32c (Castagnoli) optimization using Intel Hardware.");
+MODULE_DESCRIPTION("CRC32c (Castagnoli) -- accelerated for x86 with SSE4.2 (optional: PCLMULQDQ)");
MODULE_LICENSE("GPL");

MODULE_ALIAS_CRYPTO("crc32c");
diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c
index a26dbd27da96..df9f81ee97a3 100644
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c
+++ b/arch/x86/crypto/crct10dif-pclmul_glue.c
@@ -162,7 +162,7 @@ module_init(crct10dif_intel_mod_init);
module_exit(crct10dif_intel_mod_fini);

MODULE_AUTHOR("Tim Chen <[email protected]>");
-MODULE_DESCRIPTION("T10 DIF CRC calculation accelerated with PCLMULQDQ.");
+MODULE_DESCRIPTION("T10 DIF CRC -- accelerated for x86 with PCLMULQDQ");
MODULE_LICENSE("GPL");

MODULE_ALIAS_CRYPTO("crct10dif");
diff --git a/arch/x86/crypto/curve25519-x86_64.c b/arch/x86/crypto/curve25519-x86_64.c
index 74672351e534..078508f53ff0 100644
--- a/arch/x86/crypto/curve25519-x86_64.c
+++ b/arch/x86/crypto/curve25519-x86_64.c
@@ -1742,3 +1742,4 @@ MODULE_ALIAS_CRYPTO("curve25519");
MODULE_ALIAS_CRYPTO("curve25519-x86");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Jason A. Donenfeld <[email protected]>");
+MODULE_DESCRIPTION("Curve25519 algorithm -- accelerated for x86 with ADX (also requires BMI2)");
diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c
index a4cac5129148..fc90c0a076e3 100644
--- a/arch/x86/crypto/des3_ede_glue.c
+++ b/arch/x86/crypto/des3_ede_glue.c
@@ -404,7 +404,7 @@ module_init(des3_ede_x86_init);
module_exit(des3_ede_x86_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Triple DES EDE Cipher Algorithm, asm optimized");
+MODULE_DESCRIPTION("Triple DES EDE Cipher Algorithm -- accelerated for x86");
MODULE_ALIAS_CRYPTO("des3_ede");
MODULE_ALIAS_CRYPTO("des3_ede-asm");
MODULE_AUTHOR("Jussi Kivilinna <[email protected]>");
diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
index d19a8e9b34a6..30f4966df4de 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
@@ -363,5 +363,5 @@ module_init(ghash_pclmulqdqni_mod_init);
module_exit(ghash_pclmulqdqni_mod_exit);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("GHASH hash function, accelerated by PCLMULQDQ-NI");
+MODULE_DESCRIPTION("GHASH hash function -- accelerated for x86 with PCLMULQDQ");
MODULE_ALIAS_CRYPTO("ghash");
diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c
index 2e63947bc9fa..ed6209f027e7 100644
--- a/arch/x86/crypto/nhpoly1305-avx2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c
@@ -94,7 +94,7 @@ static void __exit nhpoly1305_mod_exit(void)
module_init(nhpoly1305_mod_init);
module_exit(nhpoly1305_mod_exit);

-MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function (AVX2-accelerated)");
+MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function -- accelerated for x86 with AVX2 (also required: OSXSAVE)");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Eric Biggers <[email protected]>");
MODULE_ALIAS_CRYPTO("nhpoly1305");
diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c
index c47765e46236..d09156e702dd 100644
--- a/arch/x86/crypto/nhpoly1305-sse2-glue.c
+++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c
@@ -83,7 +83,7 @@ static void __exit nhpoly1305_mod_exit(void)
module_init(nhpoly1305_mod_init);
module_exit(nhpoly1305_mod_exit);

-MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function (SSE2-accelerated)");
+MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function -- accelerated for x86 with SSE2");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Eric Biggers <[email protected]>");
MODULE_ALIAS_CRYPTO("nhpoly1305");
diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
index d3c0d5b335ea..78f88be4a22a 100644
--- a/arch/x86/crypto/poly1305_glue.c
+++ b/arch/x86/crypto/poly1305_glue.c
@@ -313,6 +313,6 @@ module_exit(poly1305_simd_mod_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Jason A. Donenfeld <[email protected]>");
-MODULE_DESCRIPTION("Poly1305 authenticator");
+MODULE_DESCRIPTION("Poly1305 authenticator -- accelerated for x86 (optional: AVX, AVX2, AVX-512F)");
MODULE_ALIAS_CRYPTO("poly1305");
MODULE_ALIAS_CRYPTO("poly1305-simd");
diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c
index 20d4a68ec1d7..447f0f219759 100644
--- a/arch/x86/crypto/polyval-clmulni_glue.c
+++ b/arch/x86/crypto/polyval-clmulni_glue.c
@@ -211,6 +211,6 @@ module_init(polyval_clmulni_mod_init);
module_exit(polyval_clmulni_mod_exit);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("POLYVAL hash function accelerated by PCLMULQDQ-NI");
+MODULE_DESCRIPTION("POLYVAL hash function - accelerated for x86 with PCLMULQDQ (also required: AVX)");
MODULE_ALIAS_CRYPTO("polyval");
MODULE_ALIAS_CRYPTO("polyval-clmulni");
diff --git a/arch/x86/crypto/serpent_avx2_glue.c b/arch/x86/crypto/serpent_avx2_glue.c
index 2aa62c93a16f..0a57779a7559 100644
--- a/arch/x86/crypto/serpent_avx2_glue.c
+++ b/arch/x86/crypto/serpent_avx2_glue.c
@@ -139,6 +139,6 @@ module_init(serpent_avx2_init);
module_exit(serpent_avx2_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Serpent Cipher Algorithm, AVX2 optimized");
+MODULE_DESCRIPTION("Serpent Cipher Algorithm -- accelerated for x86 with AVX2 (also required: OSXSAVE)");
MODULE_ALIAS_CRYPTO("serpent");
MODULE_ALIAS_CRYPTO("serpent-asm");
diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index 28ee9717df49..9d03fb25537f 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -141,6 +141,6 @@ static void __exit serpent_exit(void)
module_init(serpent_init);
module_exit(serpent_exit);

-MODULE_DESCRIPTION("Serpent Cipher Algorithm, AVX optimized");
+MODULE_DESCRIPTION("Serpent Cipher Algorithm -- accelerated for x86 with AVX");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("serpent");
diff --git a/arch/x86/crypto/serpent_sse2_glue.c b/arch/x86/crypto/serpent_sse2_glue.c
index 74f0c89f55ef..287b19527105 100644
--- a/arch/x86/crypto/serpent_sse2_glue.c
+++ b/arch/x86/crypto/serpent_sse2_glue.c
@@ -131,6 +131,6 @@ static void __exit serpent_sse2_exit(void)
module_init(serpent_sse2_init);
module_exit(serpent_sse2_exit);

-MODULE_DESCRIPTION("Serpent Cipher Algorithm, SSE2 optimized");
+MODULE_DESCRIPTION("Serpent Cipher Algorithm -- accelerated for x86 with SSE2");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("serpent");
diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 405af5e14b67..113756544d4e 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -433,7 +433,7 @@ module_init(sha1_ssse3_mod_init);
module_exit(sha1_ssse3_mod_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("SHA1 Secure Hash Algorithm, Supplemental SSE3 accelerated");
+MODULE_DESCRIPTION("SHA1 Secure Hash Algorithm -- accelerated for x86 with SSSE3, AVX, AVX2, or SHA-NI");

MODULE_ALIAS_CRYPTO("sha1");
MODULE_ALIAS_CRYPTO("sha1-ssse3");
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 293cf7085dd3..78fa25d2e4ba 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -498,7 +498,7 @@ module_init(sha256_ssse3_mod_init);
module_exit(sha256_ssse3_mod_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("SHA256 Secure Hash Algorithm, Supplemental SSE3 accelerated");
+MODULE_DESCRIPTION("SHA-224 and SHA-256 Secure Hash Algorithms -- accelerated for x86 with SSSE3, AVX, AVX2, or SHA-NI");

MODULE_ALIAS_CRYPTO("sha256");
MODULE_ALIAS_CRYPTO("sha256-ssse3");
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 9f13baf7dda9..2fa951069604 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -396,7 +396,7 @@ module_init(sha512_ssse3_mod_init);
module_exit(sha512_ssse3_mod_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("SHA512 Secure Hash Algorithm, Supplemental SSE3 accelerated");
+MODULE_DESCRIPTION("SHA-384 and SHA-512 Secure Hash Algorithms -- accelerated for x86 with SSSE3, AVX, or AVX2");

MODULE_ALIAS_CRYPTO("sha512");
MODULE_ALIAS_CRYPTO("sha512-ssse3");
diff --git a/arch/x86/crypto/sm3_avx_glue.c b/arch/x86/crypto/sm3_avx_glue.c
index 169ba6a2c806..9e1177fbf032 100644
--- a/arch/x86/crypto/sm3_avx_glue.c
+++ b/arch/x86/crypto/sm3_avx_glue.c
@@ -161,6 +161,6 @@ module_exit(sm3_avx_mod_exit);

MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Tianjia Zhang <[email protected]>");
-MODULE_DESCRIPTION("SM3 Secure Hash Algorithm, AVX assembler accelerated");
+MODULE_DESCRIPTION("SM3 Secure Hash Algorithm -- accelerated for x86 with AVX (also required: BMI2)");
MODULE_ALIAS_CRYPTO("sm3");
MODULE_ALIAS_CRYPTO("sm3-avx");
diff --git a/arch/x86/crypto/sm4_aesni_avx2_glue.c b/arch/x86/crypto/sm4_aesni_avx2_glue.c
index 6bcf78231888..b497a6006c8d 100644
--- a/arch/x86/crypto/sm4_aesni_avx2_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx2_glue.c
@@ -177,6 +177,6 @@ module_exit(sm4_exit);

MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Tianjia Zhang <[email protected]>");
-MODULE_DESCRIPTION("SM4 Cipher Algorithm, AES-NI/AVX2 optimized");
+MODULE_DESCRIPTION("SM4 Cipher Algorithm -- accelerated for x86 with AVX2 (also required: AES-NI, AVX, OSXSAVE)");
MODULE_ALIAS_CRYPTO("sm4");
MODULE_ALIAS_CRYPTO("sm4-aesni-avx2");
diff --git a/arch/x86/crypto/sm4_aesni_avx_glue.c b/arch/x86/crypto/sm4_aesni_avx_glue.c
index 03775b1079dc..e583ee0948af 100644
--- a/arch/x86/crypto/sm4_aesni_avx_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx_glue.c
@@ -495,6 +495,6 @@ module_exit(sm4_exit);

MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Tianjia Zhang <[email protected]>");
-MODULE_DESCRIPTION("SM4 Cipher Algorithm, AES-NI/AVX optimized");
+MODULE_DESCRIPTION("SM4 Cipher Algorithm -- accelerated for x86 with AVX (also required: AES-NI, OSXSAVE)");
MODULE_ALIAS_CRYPTO("sm4");
MODULE_ALIAS_CRYPTO("sm4-aesni-avx");
diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index ae3cc4ad6f4f..7b405c66d5fa 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -143,6 +143,6 @@ static void __exit twofish_exit(void)
module_init(twofish_init);
module_exit(twofish_exit);

-MODULE_DESCRIPTION("Twofish Cipher Algorithm, AVX optimized");
+MODULE_DESCRIPTION("Twofish Cipher Algorithm -- accelerated for x86 with AVX");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("twofish");
diff --git a/arch/x86/crypto/twofish_glue.c b/arch/x86/crypto/twofish_glue.c
index ade98aef3402..10729675e79c 100644
--- a/arch/x86/crypto/twofish_glue.c
+++ b/arch/x86/crypto/twofish_glue.c
@@ -105,6 +105,6 @@ module_init(twofish_glue_init);
module_exit(twofish_glue_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION ("Twofish Cipher Algorithm, asm optimized");
+MODULE_DESCRIPTION("Twofish Cipher Algorithm -- accelerated for x86");
MODULE_ALIAS_CRYPTO("twofish");
MODULE_ALIAS_CRYPTO("twofish-asm");
diff --git a/arch/x86/crypto/twofish_glue_3way.c b/arch/x86/crypto/twofish_glue_3way.c
index 8db2f23b3056..43f428b59684 100644
--- a/arch/x86/crypto/twofish_glue_3way.c
+++ b/arch/x86/crypto/twofish_glue_3way.c
@@ -177,6 +177,6 @@ module_init(twofish_3way_init);
module_exit(twofish_3way_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Twofish Cipher Algorithm, 3-way parallel asm optimized");
+MODULE_DESCRIPTION("Twofish Cipher Algorithm -- accelerated for x86 (3-way parallel)");
MODULE_ALIAS_CRYPTO("twofish");
MODULE_ALIAS_CRYPTO("twofish-asm");
diff --git a/crypto/aes_ti.c b/crypto/aes_ti.c
index 205c2c257d49..3cff553495ad 100644
--- a/crypto/aes_ti.c
+++ b/crypto/aes_ti.c
@@ -78,6 +78,6 @@ static void __exit aes_fini(void)
module_init(aes_init);
module_exit(aes_fini);

-MODULE_DESCRIPTION("Generic fixed time AES");
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm -- generic fixed time");
MODULE_AUTHOR("Ard Biesheuvel <[email protected]>");
MODULE_LICENSE("GPL v2");
diff --git a/crypto/blake2b_generic.c b/crypto/blake2b_generic.c
index 6704c0355889..ee53f25ff254 100644
--- a/crypto/blake2b_generic.c
+++ b/crypto/blake2b_generic.c
@@ -175,7 +175,7 @@ subsys_initcall(blake2b_mod_init);
module_exit(blake2b_mod_fini);

MODULE_AUTHOR("David Sterba <[email protected]>");
-MODULE_DESCRIPTION("BLAKE2b generic implementation");
+MODULE_DESCRIPTION("BLAKE2b hash algorithm");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("blake2b-160");
MODULE_ALIAS_CRYPTO("blake2b-160-generic");
diff --git a/crypto/blowfish_common.c b/crypto/blowfish_common.c
index 1c072012baff..8c75fdfcd09c 100644
--- a/crypto/blowfish_common.c
+++ b/crypto/blowfish_common.c
@@ -394,4 +394,4 @@ int blowfish_setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int keylen)
EXPORT_SYMBOL_GPL(blowfish_setkey);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Blowfish Cipher common functions");
+MODULE_DESCRIPTION("Blowfish Cipher Algorithm common functions");
diff --git a/crypto/crct10dif_generic.c b/crypto/crct10dif_generic.c
index e843982073bb..81c131c8ccd0 100644
--- a/crypto/crct10dif_generic.c
+++ b/crypto/crct10dif_generic.c
@@ -116,7 +116,7 @@ subsys_initcall(crct10dif_mod_init);
module_exit(crct10dif_mod_fini);

MODULE_AUTHOR("Tim Chen <[email protected]>");
-MODULE_DESCRIPTION("T10 DIF CRC calculation.");
+MODULE_DESCRIPTION("T10 DIF CRC calculation");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("crct10dif");
MODULE_ALIAS_CRYPTO("crct10dif-generic");
diff --git a/crypto/curve25519-generic.c b/crypto/curve25519-generic.c
index d055b0784c77..4f96583b31dd 100644
--- a/crypto/curve25519-generic.c
+++ b/crypto/curve25519-generic.c
@@ -88,3 +88,4 @@ module_exit(curve25519_exit);
MODULE_ALIAS_CRYPTO("curve25519");
MODULE_ALIAS_CRYPTO("curve25519-generic");
MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Curve25519 algorithm");
diff --git a/crypto/sha256_generic.c b/crypto/sha256_generic.c
index bf147b01e313..141430c25e15 100644
--- a/crypto/sha256_generic.c
+++ b/crypto/sha256_generic.c
@@ -102,7 +102,7 @@ subsys_initcall(sha256_generic_mod_init);
module_exit(sha256_generic_mod_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("SHA-224 and SHA-256 Secure Hash Algorithm");
+MODULE_DESCRIPTION("SHA-224 and SHA-256 Secure Hash Algorithms");

MODULE_ALIAS_CRYPTO("sha224");
MODULE_ALIAS_CRYPTO("sha224-generic");
diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index be70e76d6d86..63c5616ec770 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -219,7 +219,7 @@ subsys_initcall(sha512_generic_mod_init);
module_exit(sha512_generic_mod_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("SHA-512 and SHA-384 Secure Hash Algorithms");
+MODULE_DESCRIPTION("SHA-384 and SHA-512 Secure Hash Algorithms");

MODULE_ALIAS_CRYPTO("sha384");
MODULE_ALIAS_CRYPTO("sha384-generic");
diff --git a/crypto/sm3.c b/crypto/sm3.c
index d473e358a873..2a400eb69e66 100644
--- a/crypto/sm3.c
+++ b/crypto/sm3.c
@@ -242,5 +242,5 @@ void sm3_final(struct sm3_state *sctx, u8 *out)
}
EXPORT_SYMBOL_GPL(sm3_final);

-MODULE_DESCRIPTION("Generic SM3 library");
+MODULE_DESCRIPTION("SM3 Secure Hash Algorithm generic library");
MODULE_LICENSE("GPL v2");
diff --git a/crypto/sm4.c b/crypto/sm4.c
index 2c44193bc27e..d46b598b41cd 100644
--- a/crypto/sm4.c
+++ b/crypto/sm4.c
@@ -180,5 +180,5 @@ void sm4_crypt_block(const u32 *rk, u8 *out, const u8 *in)
}
EXPORT_SYMBOL_GPL(sm4_crypt_block);

-MODULE_DESCRIPTION("Generic SM4 library");
+MODULE_DESCRIPTION("SM4 Cipher Algorithm generic library");
MODULE_LICENSE("GPL v2");
diff --git a/crypto/twofish_common.c b/crypto/twofish_common.c
index f921f30334f4..daa28045069d 100644
--- a/crypto/twofish_common.c
+++ b/crypto/twofish_common.c
@@ -690,4 +690,4 @@ int twofish_setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int key_len)
EXPORT_SYMBOL_GPL(twofish_setkey);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Twofish cipher common functions");
+MODULE_DESCRIPTION("Twofish Cipher Algorithm common functions");
diff --git a/crypto/twofish_generic.c b/crypto/twofish_generic.c
index 86b2f067a416..4fe42b4ac82d 100644
--- a/crypto/twofish_generic.c
+++ b/crypto/twofish_generic.c
@@ -191,6 +191,6 @@ subsys_initcall(twofish_mod_init);
module_exit(twofish_mod_fini);

MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION ("Twofish Cipher Algorithm");
+MODULE_DESCRIPTION("Twofish Cipher Algorithm");
MODULE_ALIAS_CRYPTO("twofish");
MODULE_ALIAS_CRYPTO("twofish-generic");
--
2.38.1


2022-11-16 11:34:50

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v4 17/24] crypto: x86/poly - load based on CPU features

On Tue, Nov 15, 2022 at 10:13:35PM -0600, Robert Elliott wrote:
> +static const struct x86_cpu_id module_cpu_ids[] = {
> + X86_MATCH_FEATURE(X86_FEATURE_ANY, NULL),
> + {}
> +};
> +MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
> +
> static int __init poly1305_simd_mod_init(void)
> {
> + if (!x86_match_cpu(module_cpu_ids))
> + return -ENODEV;

What exactly does this accomplish? Isn't this just a no-op?

Jason

2022-11-16 11:39:09

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v4 21/24] crypto: x86 - report used CPU features via module parameters

On Tue, Nov 15, 2022 at 10:13:39PM -0600, Robert Elliott wrote:
> For modules that have multiple choices, add read-only module parameters
> reporting which CPU features a module is using.
>
> The parameters show up as follows for modules that modify the behavior
> of their registered drivers or register additional drivers for
> each choice:
> /sys/module/aesni_intel/parameters/using_x86_avx:1
> /sys/module/aesni_intel/parameters/using_x86_avx2:1
> /sys/module/aria_aesni_avx_x86_64/parameters/using_x86_gfni:0
> /sys/module/chacha_x86_64/parameters/using_x86_avx2:1
> /sys/module/chacha_x86_64/parameters/using_x86_avx512:1
> /sys/module/crc32c_intel/parameters/using_x86_pclmulqdq:1
> /sys/module/curve25519_x86_64/parameters/using_x86_adx:1
> /sys/module/libblake2s_x86_64/parameters/using_x86_avx512:1
> /sys/module/libblake2s_x86_64/parameters/using_x86_ssse3:1
> /sys/module/poly1305_x86_64/parameters/using_x86_avx:1
> /sys/module/poly1305_x86_64/parameters/using_x86_avx2:1
> /sys/module/poly1305_x86_64/parameters/using_x86_avx512:0
> /sys/module/sha1_ssse3/parameters/using_x86_avx:1
> /sys/module/sha1_ssse3/parameters/using_x86_avx2:1
> /sys/module/sha1_ssse3/parameters/using_x86_shani:0
> /sys/module/sha1_ssse3/parameters/using_x86_ssse3:1
> /sys/module/sha256_ssse3/parameters/using_x86_avx:1
> /sys/module/sha256_ssse3/parameters/using_x86_avx2:1
> /sys/module/sha256_ssse3/parameters/using_x86_shani:0
> /sys/module/sha256_ssse3/parameters/using_x86_ssse3:1
> /sys/module/sha512_ssse3/parameters/using_x86_avx:1
> /sys/module/sha512_ssse3/parameters/using_x86_avx2:1
> /sys/module/sha512_ssse3/parameters/using_x86_ssse3:1

Isn't chacha missing?

However, what's the point of any of this? Who benefits from this info?
If something seems slow, I'll generally look at perf top, which provides
this same thing.

Also, "using" isn't quite correct. Some AVX2 machines will never use any
ssse3 instructions, despite the code being executable.

>
> Delete the aesni_intel prints reporting those selections:
> pr_info("AVX2 version of gcm_enc/dec engaged.\n");

This part I like.

> +module_param_named(using_x86_adx, curve25519_use_bmi2_adx.key.enabled.counter, int, 0444);
> +MODULE_PARM_DESC(using_x86_adx, "Using x86 instruction set extensions: ADX");

And BMI2, not just ADX.

2022-11-16 11:51:44

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v4 20/24] crypto: x86/ciphers - load based on CPU features

On Tue, Nov 15, 2022 at 10:13:38PM -0600, Robert Elliott wrote:
> diff --git a/arch/x86/crypto/curve25519-x86_64.c b/arch/x86/crypto/curve25519-x86_64.c
> index d55fa9e9b9e6..ae7536b17bf9 100644
> --- a/arch/x86/crypto/curve25519-x86_64.c
> +++ b/arch/x86/crypto/curve25519-x86_64.c
> @@ -12,7 +12,7 @@
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/scatterlist.h>
> -
> +#include <asm/cpu_device_id.h>
> #include <asm/cpufeature.h>
> #include <asm/processor.h>
>
> @@ -1697,13 +1697,22 @@ static struct kpp_alg curve25519_alg = {
> .max_size = curve25519_max_size,
> };
>
> +static const struct x86_cpu_id module_cpu_ids[] = {
> + X86_MATCH_FEATURE(X86_FEATURE_ADX, NULL),
> + {}
> +};
> +MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
>
> static int __init curve25519_mod_init(void)
> {
> - if (boot_cpu_has(X86_FEATURE_BMI2) && boot_cpu_has(X86_FEATURE_ADX))
> - static_branch_enable(&curve25519_use_bmi2_adx);
> - else
> - return 0;
> + if (!x86_match_cpu(module_cpu_ids))
> + return -ENODEV;
> +
> + if (!boot_cpu_has(X86_FEATURE_BMI2))
> + return -ENODEV;
> +
> + static_branch_enable(&curve25519_use_bmi2_adx);

Can the user still insmod this? If so, you can't remove the ADX check.
Ditto for rest of patch.

2022-11-17 04:01:24

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v4 00/24] crypto: fix RCU stalls

On Tue, Nov 15, 2022 at 10:13:18PM -0600, Robert Elliott wrote:
> This series fixes the RCU stalls triggered by the x86 crypto
> modules discussed in
> https://lore.kernel.org/all/MW5PR84MB18426EBBA3303770A8BC0BDFAB759@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/
>
> Two root causes were:
> - too much data processed between kernel_fpu_begin and
> kernel_fpu_end calls (which are heavily used by the x86
> optimized drivers)
> - tcrypt not calling cond_resched during speed test loops
>
> These problems have always been lurking, but improving the
> loading of the x86/sha512 module led to it happening a lot
> during boot when using SHA-512 for module signature checking.

Can we split this series up please? The fixes to the stalls should
stand separately from the changes to how modules are loaded. The
latter is more of an improvement while the former should be applied
ASAP.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Subject: RE: [PATCH v4 00/24] crypto: fix RCU stalls



> -----Original Message-----
> From: Herbert Xu <[email protected]>
> Sent: Wednesday, November 16, 2022 9:59 PM
> Subject: Re: [PATCH v4 00/24] crypto: fix RCU stalls
>
> On Tue, Nov 15, 2022 at 10:13:18PM -0600, Robert Elliott wrote:
...
> > These problems have always been lurking, but improving the
> > loading of the x86/sha512 module led to it happening a lot
> > during boot when using SHA-512 for module signature checking.
>
> Can we split this series up please? The fixes to the stalls should
> stand separately from the changes to how modules are loaded. The
> latter is more of an improvement while the former should be applied
> ASAP.

Yes. With the v4 patch numbers:
[PATCH v4 01/24] crypto: tcrypt - test crc32
[PATCH v4 02/24] crypto: tcrypt - test nhpoly1305

Those ensure the changes to those hash modules are testable.

[PATCH v4 03/24] crypto: tcrypt - reschedule during cycles speed

That's only for tcrypt so not urgent for users, but pretty
simple.

[PATCH v4 04/24] crypto: x86/sha - limit FPU preemption
[PATCH v4 05/24] crypto: x86/crc - limit FPU preemption
[PATCH v4 06/24] crypto: x86/sm3 - limit FPU preemption
[PATCH v4 07/24] crypto: x86/ghash - use u8 rather than char
[PATCH v4 08/24] crypto: x86/ghash - restructure FPU context saving
[PATCH v4 09/24] crypto: x86/ghash - limit FPU preemption
[PATCH v4 10/24] crypto: x86/poly - limit FPU preemption
[PATCH v4 11/24] crypto: x86/aegis - limit FPU preemption
[PATCH v4 12/24] crypto: x86/sha - register all variations
[PATCH v4 13/24] crypto: x86/sha - minimize time in FPU context

That's the end of the fixes set.

[PATCH v4 14/24] crypto: x86/sha - load based on CPU features
[PATCH v4 15/24] crypto: x86/crc - load based on CPU features
[PATCH v4 16/24] crypto: x86/sm3 - load based on CPU features
[PATCH v4 17/24] crypto: x86/poly - load based on CPU features
[PATCH v4 18/24] crypto: x86/ghash - load based on CPU features
[PATCH v4 19/24] crypto: x86/aesni - avoid type conversions
[PATCH v4 20/24] crypto: x86/ciphers - load based on CPU features
[PATCH v4 21/24] crypto: x86 - report used CPU features via module
[PATCH v4 22/24] crypto: x86 - report missing CPU features via module
[PATCH v4 23/24] crypto: x86 - report suboptimal CPUs via module
[PATCH v4 24/24] crypto: x86 - standardize module descriptions

I'll put those in a new series.

For 6.1, I still suggest reverting aa031b8f702e ("crypto: x86/sha512 -
load based on CPU features) since that exposed the problem. Target
the fixes for 6.2 and module loading for 6.2 or 6.3.


2022-11-17 15:25:45

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v4 00/24] crypto: fix RCU stalls

On Thu, Nov 17, 2022 at 4:14 PM Elliott, Robert (Servers)
<[email protected]> wrote:
> > -----Original Message-----
> > From: Herbert Xu <[email protected]>
> > Sent: Wednesday, November 16, 2022 9:59 PM
> > Subject: Re: [PATCH v4 00/24] crypto: fix RCU stalls
> >
> > On Tue, Nov 15, 2022 at 10:13:18PM -0600, Robert Elliott wrote:
> ...
> > > These problems have always been lurking, but improving the
> > > loading of the x86/sha512 module led to it happening a lot
> > > during boot when using SHA-512 for module signature checking.
> >
> > Can we split this series up please? The fixes to the stalls should
> > stand separately from the changes to how modules are loaded. The
> > latter is more of an improvement while the former should be applied
> > ASAP.
>
> Yes. With the v4 patch numbers:
> [PATCH v4 01/24] crypto: tcrypt - test crc32
> [PATCH v4 02/24] crypto: tcrypt - test nhpoly1305
>
> Those ensure the changes to those hash modules are testable.
>
> [PATCH v4 03/24] crypto: tcrypt - reschedule during cycles speed
>
> That's only for tcrypt so not urgent for users, but pretty
> simple.
>
> [PATCH v4 04/24] crypto: x86/sha - limit FPU preemption
> [PATCH v4 05/24] crypto: x86/crc - limit FPU preemption
> [PATCH v4 06/24] crypto: x86/sm3 - limit FPU preemption
> [PATCH v4 07/24] crypto: x86/ghash - use u8 rather than char
> [PATCH v4 08/24] crypto: x86/ghash - restructure FPU context saving
> [PATCH v4 09/24] crypto: x86/ghash - limit FPU preemption
> [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption
> [PATCH v4 11/24] crypto: x86/aegis - limit FPU preemption
> [PATCH v4 12/24] crypto: x86/sha - register all variations
> [PATCH v4 13/24] crypto: x86/sha - minimize time in FPU context
>
> That's the end of the fixes set.
>
> [PATCH v4 14/24] crypto: x86/sha - load based on CPU features
> [PATCH v4 15/24] crypto: x86/crc - load based on CPU features
> [PATCH v4 16/24] crypto: x86/sm3 - load based on CPU features
> [PATCH v4 17/24] crypto: x86/poly - load based on CPU features
> [PATCH v4 18/24] crypto: x86/ghash - load based on CPU features
> [PATCH v4 19/24] crypto: x86/aesni - avoid type conversions
> [PATCH v4 20/24] crypto: x86/ciphers - load based on CPU features
> [PATCH v4 21/24] crypto: x86 - report used CPU features via module
> [PATCH v4 22/24] crypto: x86 - report missing CPU features via module
> [PATCH v4 23/24] crypto: x86 - report suboptimal CPUs via module
> [PATCH v4 24/24] crypto: x86 - standardize module descriptions
>
> I'll put those in a new series.

Thanks. Please take into account my review feedback this time for your
next series.

Jason