Subject: [PATCH 0/8] crypto: kernel-doc for assembly language

Clean up the existing kernel-doc headers in the crypto subsystem,
then add support for kernel-doc headers in assembly language files
for functions called from C code.

This provides a place to document the assumptions made by the
assembly language functions about their arguments (e.g., how
they handle length values of 0, less than some value, not
multiples of some value, etc.).

Not all the assembly language files are tackled yet - just some
of the x86 files pending changes related to kernel_fpu_begin/end.

Example man page formatted output for one of them:
---
$ nroff -man /tmp/man/sha1_transform_avx2.9
sha1_transform_avx2(9) Kernel Hacker's Manual sha1_transform_avx2(9)



NAME
sha1_transform_avx2 - Calculate SHA1 hash using the x86 AVX2 feature
set

SYNOPSIS
void sha1_transform_avx2 (u32 *digest , const u8 *data , int blocks );

ARGUMENTS
digest address of current 20-byte hash value (rdi, CTX macro)

data address of data (rsi, BUF macro); data size must be a mul‐
tiple of 64 bytes

blocks number of 64-byte blocks (rdx, CNT macro)

DESCRIPTION
This function supports 64-bit CPUs.

RETURN
none

PROTOTYPE
asmlinkage void sha1_transform_avx2(u32 *digest, const u8 *data, int
blocks)



December 2022 sha1_transform_avx2 sha1_transform_avx2(9)



Robert Elliott (8):
crypto: clean up kernel-doc headers
doc: support kernel-doc for asm functions
crypto: x86/sha - add kernel-doc comments to assembly
crypto: x86/crc - add kernel-doc comments to assembly
crypto: x86/sm3 - add kernel-doc comments to assembly
crypto: x86/ghash - add kernel-doc comments to assembly
crypto: x86/blake2s - add kernel-doc comments to assembly
crypto: x86/chacha - add kernel-doc comments to assembly

.../mips/cavium-octeon/crypto/octeon-crypto.c | 19 ++--
arch/x86/crypto/blake2s-core.S | 26 +++++
arch/x86/crypto/chacha-avx2-x86_64.S | 90 ++++++++++++------
arch/x86/crypto/chacha-avx512vl-x86_64.S | 94 ++++++++++++-------
arch/x86/crypto/chacha-ssse3-x86_64.S | 75 ++++++++++-----
arch/x86/crypto/crc32-pclmul_asm.S | 24 ++---
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 29 +++---
arch/x86/crypto/crct10dif-pcl-asm_64.S | 24 +++--
arch/x86/crypto/ghash-clmulni-intel_asm.S | 27 +++++-
arch/x86/crypto/sha1_avx2_x86_64_asm.S | 32 +++----
arch/x86/crypto/sha1_ni_asm.S | 22 +++--
arch/x86/crypto/sha1_ssse3_asm.S | 33 ++++---
arch/x86/crypto/sha256-avx-asm.S | 24 +++--
arch/x86/crypto/sha256-avx2-asm.S | 25 +++--
arch/x86/crypto/sha256-ssse3-asm.S | 26 ++---
arch/x86/crypto/sha256_ni_asm.S | 25 ++---
arch/x86/crypto/sha512-avx-asm.S | 33 +++----
arch/x86/crypto/sha512-avx2-asm.S | 34 +++----
arch/x86/crypto/sha512-ssse3-asm.S | 36 ++++---
arch/x86/crypto/sm3-avx-asm_64.S | 18 ++--
crypto/asymmetric_keys/verify_pefile.c | 2 +-
crypto/async_tx/async_pq.c | 11 +--
crypto/async_tx/async_tx.c | 4 +-
crypto/crypto_engine.c | 2 +-
include/crypto/acompress.h | 2 +-
include/crypto/des.h | 4 +-
include/crypto/if_alg.h | 26 ++---
include/crypto/internal/ecc.h | 8 +-
include/crypto/internal/rsa.h | 2 +-
include/crypto/kdf_sp800108.h | 39 ++++----
scripts/kernel-doc | 48 +++++++++-
31 files changed, 545 insertions(+), 319 deletions(-)

--
2.38.1


Subject: [PATCH 4/8] crypto: x86/crc - add kernel-doc comments to assembly

Add kernel-doc comments for assembly language functions exported to
C glue code.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/crc32-pclmul_asm.S | 24 ++++++++++---------
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 29 ++++++++++++++---------
arch/x86/crypto/crct10dif-pcl-asm_64.S | 24 +++++++++++++------
3 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/arch/x86/crypto/crc32-pclmul_asm.S b/arch/x86/crypto/crc32-pclmul_asm.S
index ca53e96996ac..f704b2067a80 100644
--- a/arch/x86/crypto/crc32-pclmul_asm.S
+++ b/arch/x86/crypto/crc32-pclmul_asm.S
@@ -17,7 +17,6 @@

#include <linux/linkage.h>

-
.section .rodata
.align 16
/*
@@ -67,19 +66,22 @@
#define CRC %ecx
#endif

-
-
.text
/**
- * Calculate crc32
- * BUF - buffer (16 bytes aligned)
- * LEN - sizeof buffer (16 bytes aligned), LEN should be grater than 63
- * CRC - initial crc32
- * return %eax crc32
- * uint crc32_pclmul_le_16(unsigned char const *buffer,
- * size_t len, uint crc32)
+ * crc32_pclmul_le_16 - Calculate CRC32 using x86 PCLMULQDQ instructions
+ * @buffer: address of data (32-bit %eax/64-bit %rdi, BUF macro);
+ * must be aligned to a multiple of 16
+ * @len: data size (32-bit %edx/64 bit %rsi, LEN macro);
+ * must be a multiple of 16 and greater than 63
+ * @crc32: initial CRC32 value (32-bit %ecx/64-bit $edx, CRC macro)
+ * only uses lower 32 bits
+ *
+ * This function supports both 32-bit and 64-bit CPUs.
+ * It requires data to be aligned and a minimum size.
+ *
+ * Return: (32-bit %eax/64-bit %rax) CRC32 value (in lower 32 bits)
+ * Prototype: asmlinkage u32 crc32_pclmul_le_16(const u8 *buffer, size_t len, u32 crc32);
*/
-
SYM_FUNC_START(crc32_pclmul_le_16) /* buffer and buffer size are 16 bytes aligned */
movdqa (BUF), %xmm1
movdqa 0x10(BUF), %xmm2
diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
index ec35915f0901..3d646011d84b 100644
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -70,22 +70,30 @@
.error "SMALL_ SIZE must be < 256"
.endif

-# unsigned int crc_pcl(u8 *buffer, int len, unsigned int crc_init);
-
.text
+/**
+ * crc_pcl - Calculate CRC32C using x86 CRC32 and PCLMULQDQ instructions
+ * @buffer: address of data (%rdi, bufp macro)
+ * @len: data size (%rsi, len macro)
+ * @crc_init: initial CRC32C value (%rdx, crc_init_arg macro);
+ * only using lower 32 bits
+ *
+ * This function supports 64-bit CPUs.
+ * It loops on 8-byte aligned QWORDs, but also supports unaligned
+ * addresses and all length values.
+ *
+ * Return: CRC32C value (upper 32 bits zero)(%rax)
+ * Prototype: asmlinkage unsigned int crc_pcl(const u8 *buffer, unsigned int len,
+ unsigned int crc_init);
+ */
SYM_FUNC_START(crc_pcl)
#define bufp rdi
-#define bufp_dw %edi
-#define bufp_w %di
-#define bufp_b %dil
#define bufptmp %rcx
#define block_0 %rcx
#define block_1 %rdx
#define block_2 %r11
#define len %rsi
#define len_dw %esi
-#define len_w %si
-#define len_b %sil
#define crc_init_arg %rdx
#define tmp %rbx
#define crc_init %r8
@@ -97,7 +105,7 @@ SYM_FUNC_START(crc_pcl)
pushq %rdi
pushq %rsi

- ## Move crc_init for Linux to a different
+ ## Move crc_init for Linux to a different register
mov crc_init_arg, crc_init

################################################################
@@ -216,7 +224,7 @@ LABEL crc_ %i
## 4) Combine three results:
################################################################

- lea (K_table-8)(%rip), %bufp # first entry is for idx 1
+ lea (K_table-8)(%rip), %bufp # first entry is for idx 1
shlq $3, %rax # rax *= 8
pmovzxdq (%bufp,%rax), %xmm0 # 2 consts: K1:K2
leal (%eax,%eax,2), %eax # rax *= 3 (total *24)
@@ -326,10 +334,9 @@ JMPTBL_ENTRY %i
i=i+1
.endr

-
################################################################
## PCLMULQDQ tables
- ## Table is 128 entries x 2 words (8 bytes) each
+ ## Table is 128 entries x 8 bytes each
################################################################
.align 8
K_table:
diff --git a/arch/x86/crypto/crct10dif-pcl-asm_64.S b/arch/x86/crypto/crct10dif-pcl-asm_64.S
index 721474abfb71..88af572703b2 100644
--- a/arch/x86/crypto/crct10dif-pcl-asm_64.S
+++ b/arch/x86/crypto/crct10dif-pcl-asm_64.S
@@ -52,8 +52,6 @@

#include <linux/linkage.h>

-.text
-
#define init_crc %edi
#define buf %rsi
#define len %rdx
@@ -89,11 +87,23 @@
xorps \src_reg, \dst_reg
.endm

-#
-# u16 crc_t10dif_pcl(u16 init_crc, const *u8 buf, size_t len);
-#
-# Assumes len >= 16.
-#
+.text
+/**
+ * crc_t10dif_pcl - Calculate CRC16 per T10 DIF (data integrity format)
+ * using x86 PCLMULQDQ instructions
+ * @init_crc: initial CRC16 value (%rdi, init_crc macro);
+ * only uses lower 16 bits
+ * @buf: address of data (%rsi, buf macro);
+ * data buffer must be at least 16 bytes
+ * @len: data size (%rdx, len macro);
+ * must be >= 16
+ *
+ * This function supports 64-bit CPUs.
+ * It allows data to be at any offset.
+ *
+ * Return: (%rax) CRC16 value (upper 48 bits zero)
+ * Prototype: asmlinkage u16 crc_t10dif_pcl(u16 init_crc, const *u8 buf, size_t len);
+ */
.align 16
SYM_FUNC_START(crc_t10dif_pcl)

--
2.38.1

Subject: [PATCH 5/8] crypto: x86/sm3 - add kernel-doc comments to assembly

Add kernel-doc comments for assembly language functions exported to
C glue code.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/sm3-avx-asm_64.S | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/crypto/sm3-avx-asm_64.S b/arch/x86/crypto/sm3-avx-asm_64.S
index b12b9efb5ec5..d02ebe5e0bb5 100644
--- a/arch/x86/crypto/sm3-avx-asm_64.S
+++ b/arch/x86/crypto/sm3-avx-asm_64.S
@@ -321,19 +321,19 @@

.text

-/*
- * Transform nblocks*64 bytes (nblocks*16 32-bit words) at DATA.
+/**
+ * sm3_transform_avx - Calculate SM3 hash using x86 AVX feature set
+ * @state: address of 32-byte context (%rdi, RSTATE macro)
+ * @data: address of data (%rsi, RDATA macro);
+ * must be at least 64 bytes and a multiple of 64 bytes
+ * @nblocks: number of 64-byte blocks (%rdx, RNBLKS macro);
+ * must be >= 1
*
- * void sm3_transform_avx(struct sm3_state *state,
- * const u8 *data, int nblocks);
+ * Return: none. However, the @state buffer is updated.
+ * Prototype: asmlinkage void sm3_transform_avx(u32 *state, const u8 *data, int nblocks);
*/
.align 16
SYM_FUNC_START(sm3_transform_avx)
- /* input:
- * %rdi: ctx, CTX
- * %rsi: data (64*nblks bytes)
- * %rdx: nblocks
- */
vzeroupper;

pushq %rbp;
--
2.38.1

Subject: [PATCH 1/8] crypto: clean up kernel-doc headers

Fix these problems in the kernel-doc function header comments, as
reported by running:
scripts/kernel-doc -man \
$(git grep -l '/\*\*' -- :^Documentation :^tools) \
| scripts/split-man.pl /tmp/man 2> err.log
cat err.log | grep crypto | grep -v drivers

arch/mips/cavium-octeon/crypto/octeon-crypto.c:18: warning: This comment starts with '/**', but isn't a kernel-doc comment.
arch/mips/cavium-octeon/crypto/octeon-crypto.c:50: warning: This comment starts with '/**', but isn't a kernel-doc comment.
arch/mips/cavium-octeon/crypto/octeon-crypto.c:60: warning: Function parameter or member 'crypto_flags' not described in 'octeon_crypto_disable'
arch/mips/cavium-octeon/crypto/octeon-crypto.c:60: warning: Excess function parameter 'flags' description in 'octeon_crypto_disable'

crypto/asymmetric_keys/verify_pefile.c:420: warning: Function parameter or member 'trusted_keys' not described in 'verify_pefile_signature'
crypto/asymmetric_keys/verify_pefile.c:420: warning: Excess function parameter 'trust_keys' description in 'verify_pefile_signature'

crypto/async_tx/async_pq.c:19: warning: cannot understand function prototype: 'struct page *pq_scribble_page; '
crypto/async_tx/async_pq.c:40: warning: Function parameter or member 'chan' not described in 'do_async_gen_syndrome'
crypto/async_tx/async_pq.c:40: warning: Function parameter or member 'scfs' not described in 'do_async_gen_syndrome'
crypto/async_tx/async_pq.c:40: warning: Function parameter or member 'disks' not described in 'do_async_gen_syndrome'
crypto/async_tx/async_pq.c:40: warning: Function parameter or member 'unmap' not described in 'do_async_gen_syndrome'
crypto/async_tx/async_pq.c:40: warning: Function parameter or member 'dma_flags' not described in 'do_async_gen_syndrome'
crypto/async_tx/async_pq.c:40: warning: Function parameter or member 'submit' not described in 'do_async_gen_syndrome'
crypto/async_tx/async_pq.c:109: warning: Function parameter or member 'blocks' not described in 'do_sync_gen_syndrome'
crypto/async_tx/async_pq.c:109: warning: Function parameter or member 'offsets' not described in 'do_sync_gen_syndrome'
crypto/async_tx/async_pq.c:109: warning: Function parameter or member 'disks' not described in 'do_sync_gen_syndrome'
crypto/async_tx/async_pq.c:109: warning: Function parameter or member 'len' not described in 'do_sync_gen_syndrome'
crypto/async_tx/async_pq.c:109: warning: Function parameter or member 'submit' not described in 'do_sync_gen_syndrome'
crypto/async_tx/async_pq.c:302: warning: Excess function parameter 'offset' description in 'async_syndrome_val'

crypto/async_tx/async_tx.c:137: warning: cannot understand function prototype: 'enum submit_disposition '
crypto/async_tx/async_tx.c:265: warning: Function parameter or member 'tx' not described in 'async_tx_quiesce'

crypto/crypto_engine.c:514: warning: Excess function parameter 'engine' description in 'crypto_engine_alloc_init_and_set'

include/crypto/acompress.h:219: warning: Function parameter or member 'cmpl' not described in 'acomp_request_set_callback'
include/crypto/acompress.h:219: warning: Excess function parameter 'cmlp' description in 'acomp_request_set_callback'

include/crypto/des.h:43: warning: Function parameter or member 'keylen' not described in 'des_expand_key'
include/crypto/des.h:43: warning: Excess function parameter 'len' description in 'des_expand_key'
include/crypto/des.h:56: warning: Function parameter or member 'keylen' not described in 'des3_ede_expand_key'
include/crypto/des.h:56: warning: Excess function parameter 'len' description in 'des3_ede_expand_key'

include/crypto/if_alg.h:160: warning: Function parameter or member 'wait' not described in 'af_alg_ctx'
include/crypto/if_alg.h:178: warning: This comment starts with '/**', but isn't a kernel-doc comment.
include/crypto/if_alg.h:193: warning: This comment starts with '/**', but isn't a kernel-doc comment.
include/crypto/if_alg.h:204: warning: This comment starts with '/**', but isn't a kernel-doc comment.
include/crypto/if_alg.h:219: warning: This comment starts with '/**', but isn't a kernel-doc comment.
include/crypto/if_alg.h:184: warning: Function parameter or member 'sk' not described in 'af_alg_sndbuf'
include/crypto/if_alg.h:199: warning: Function parameter or member 'sk' not described in 'af_alg_writable'
include/crypto/if_alg.h:210: warning: Function parameter or member 'sk' not described in 'af_alg_rcvbuf'
include/crypto/if_alg.h:225: warning: Function parameter or member 'sk' not described in 'af_alg_readable'

include/crypto/internal/ecc.h:85: warning: Function parameter or member 'privkey' not described in 'ecc_gen_privkey'
include/crypto/internal/ecc.h:85: warning: Excess function parameter 'private_key' description in 'ecc_gen_privkey'
include/crypto/internal/ecc.h:184: warning: Function parameter or member 'right' not described in 'vli_sub'
include/crypto/internal/ecc.h:246: warning: expecting prototype for ecc_aloc_point(). Prototype was for ecc_alloc_point() instead
include/crypto/internal/ecc.h:262: warning: Function parameter or member 'point' not described in 'ecc_point_is_zero'
include/crypto/internal/ecc.h:262: warning: Excess function parameter 'p' description in 'ecc_point_is_zero'

include/crypto/internal/rsa.h:32: warning: cannot understand function prototype: 'struct rsa_key '

include/crypto/kdf_sp800108.h:34: warning: Function parameter or member 'kmd' not described in 'crypto_kdf108_ctr_generate'
include/crypto/kdf_sp800108.h:34: warning: Function parameter or member 'info' not described in 'crypto_kdf108_ctr_generate'
include/crypto/kdf_sp800108.h:34: warning: Function parameter or member 'info_nvec' not described in 'crypto_kdf108_ctr_generate'
include/crypto/kdf_sp800108.h:34: warning: Function parameter or member 'dst' not described in 'crypto_kdf108_ctr_generate'
include/crypto/kdf_sp800108.h:34: warning: Function parameter or member 'dlen' not described in 'crypto_kdf108_ctr_generate'
include/crypto/kdf_sp800108.h:34: warning: expecting prototype for Counter KDF generate operation according to SP800(). Prototype was for crypto_kdf108_ctr_generate() instead
include/crypto/kdf_sp800108.h:37: warning: This comment starts with '/**', but isn't a kernel-doc comment.
include/crypto/kdf_sp800108.h:37: warning: Function parameter or member 'info_nvec' not described in 'crypto_kdf108_ctr_generate'
include/crypto/kdf_sp800108.h:37: warning: Excess function parameter 'info_vec' description in 'crypto_kdf108_ctr_generate'

Signed-off-by: Robert Elliott <[email protected]>
---
.../mips/cavium-octeon/crypto/octeon-crypto.c | 19 ++++-----
crypto/asymmetric_keys/verify_pefile.c | 2 +-
crypto/async_tx/async_pq.c | 11 +++---
crypto/async_tx/async_tx.c | 4 +-
crypto/crypto_engine.c | 2 +-
include/crypto/acompress.h | 2 +-
include/crypto/des.h | 4 +-
include/crypto/if_alg.h | 26 ++++++-------
include/crypto/internal/ecc.h | 8 ++--
include/crypto/internal/rsa.h | 2 +-
include/crypto/kdf_sp800108.h | 39 +++++++++++--------
11 files changed, 62 insertions(+), 57 deletions(-)

diff --git a/arch/mips/cavium-octeon/crypto/octeon-crypto.c b/arch/mips/cavium-octeon/crypto/octeon-crypto.c
index cfb4a146cf17..c4badbf756b5 100644
--- a/arch/mips/cavium-octeon/crypto/octeon-crypto.c
+++ b/arch/mips/cavium-octeon/crypto/octeon-crypto.c
@@ -14,10 +14,11 @@
#include "octeon-crypto.h"

/**
- * Enable access to Octeon's COP2 crypto hardware for kernel use. Wrap any
- * crypto operations in calls to octeon_crypto_enable/disable in order to make
- * sure the state of COP2 isn't corrupted if userspace is also performing
- * hardware crypto operations. Allocate the state parameter on the stack.
+ * octeon_crypto_enable - Enable access to Octeon's COP2 crypto hardware for kernel use.
+ * Wrap any crypto operations in calls to octeon_crypto_enable/disable
+ * in order to make sure the state of COP2 isn't corrupted if userspace
+ * is also performing hardware crypto operations.
+ * Allocate the state parameter on the stack.
* Returns with preemption disabled.
*
* @state: Pointer to state structure to store current COP2 state in.
@@ -46,12 +47,12 @@ unsigned long octeon_crypto_enable(struct octeon_cop2_state *state)
EXPORT_SYMBOL_GPL(octeon_crypto_enable);

/**
- * Disable access to Octeon's COP2 crypto hardware in the kernel. This must be
- * called after an octeon_crypto_enable() before any context switch or return to
- * userspace.
+ * octeon_crypto_disable - Disable access to Octeon's COP2 crypto hardware in the kernel.
+ * This must be called after an octeon_crypto_enable() before any
+ * context switch or return to userspace.
*
- * @state: Pointer to COP2 state to restore
- * @flags: Return value from octeon_crypto_enable()
+ * @state: Pointer to COP2 state to restore
+ * @crypto_flags: Return value from octeon_crypto_enable()
*/
void octeon_crypto_disable(struct octeon_cop2_state *state,
unsigned long crypto_flags)
diff --git a/crypto/asymmetric_keys/verify_pefile.c b/crypto/asymmetric_keys/verify_pefile.c
index 7553ab18db89..148cad70fe79 100644
--- a/crypto/asymmetric_keys/verify_pefile.c
+++ b/crypto/asymmetric_keys/verify_pefile.c
@@ -387,7 +387,7 @@ static int pefile_digest_pe(const void *pebuf, unsigned int pelen,
* verify_pefile_signature - Verify the signature on a PE binary image
* @pebuf: Buffer containing the PE binary image
* @pelen: Length of the binary image
- * @trust_keys: Signing certificate(s) to use as starting points
+ * @trusted_keys: Signing certificate(s) to use as starting points
* @usage: The use to which the key is being put.
*
* Validate that the certificate chain inside the PKCS#7 message inside the PE
diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index f9cdc5e91664..c95908d70f7e 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -11,9 +11,8 @@
#include <linux/async_tx.h>
#include <linux/gfp.h>

-/**
- * pq_scribble_page - space to hold throwaway P or Q buffer for
- * synchronous gen_syndrome
+/*
+ * space to hold throwaway P or Q buffer for synchronous gen_syndrome
*/
static struct page *pq_scribble_page;

@@ -28,7 +27,7 @@ static struct page *pq_scribble_page;

#define MAX_DISKS 255

-/**
+/*
* do_async_gen_syndrome - asynchronously calculate P and/or Q
*/
static __async_inline struct dma_async_tx_descriptor *
@@ -100,7 +99,7 @@ do_async_gen_syndrome(struct dma_chan *chan,
return tx;
}

-/**
+/*
* do_sync_gen_syndrome - synchronously calculate a raid6 syndrome
*/
static void
@@ -281,7 +280,7 @@ pq_val_chan(struct async_submit_ctl *submit, struct page **blocks, int disks, si
/**
* async_syndrome_val - asynchronously validate a raid6 syndrome
* @blocks: source blocks from idx 0..disks-3, P @ disks-2 and Q @ disks-1
- * @offset: common offset into each block (src and dest) to start transaction
+ * @offsets: common offset into each block (src and dest) to start transaction
* @disks: number of blocks (including missing P or Q, see below)
* @len: length of operation in bytes
* @pqres: on val failure SUM_CHECK_P_RESULT and/or SUM_CHECK_Q_RESULT are set
diff --git a/crypto/async_tx/async_tx.c b/crypto/async_tx/async_tx.c
index 9256934312d7..ad72057a5e0d 100644
--- a/crypto/async_tx/async_tx.c
+++ b/crypto/async_tx/async_tx.c
@@ -124,7 +124,7 @@ async_tx_channel_switch(struct dma_async_tx_descriptor *depend_tx,


/**
- * submit_disposition - flags for routing an incoming operation
+ * enum submit_disposition - flags for routing an incoming operation
* @ASYNC_TX_SUBMITTED: we were able to append the new operation under the lock
* @ASYNC_TX_CHANNEL_SWITCH: when the lock is dropped schedule a channel switch
* @ASYNC_TX_DIRECT_SUBMIT: when the lock is dropped submit directly
@@ -258,7 +258,7 @@ EXPORT_SYMBOL_GPL(async_trigger_callback);

/**
* async_tx_quiesce - ensure tx is complete and freeable upon return
- * @tx - transaction to quiesce
+ * @tx: transaction to quiesce
*/
void async_tx_quiesce(struct dma_async_tx_descriptor **tx)
{
diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c
index bb8e77077f02..64dc9aa3ca24 100644
--- a/crypto/crypto_engine.c
+++ b/crypto/crypto_engine.c
@@ -499,7 +499,7 @@ EXPORT_SYMBOL_GPL(crypto_engine_stop);
* This has the form:
* callback(struct crypto_engine *engine)
* where:
- * @engine: the crypto engine structure.
+ * engine: the crypto engine structure.
* @rt: whether this queue is set to run as a realtime task
* @qlen: maximum size of the crypto-engine queue
*
diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h
index cb3d6b1c655d..8d50c19e0f8e 100644
--- a/include/crypto/acompress.h
+++ b/include/crypto/acompress.h
@@ -208,7 +208,7 @@ void acomp_request_free(struct acomp_req *req);
*
* @req: request that the callback will be set for
* @flgs: specify for instance if the operation may backlog
- * @cmlp: callback which will be called
+ * @cmpl: callback which will be called
* @data: private data used by the caller
*/
static inline void acomp_request_set_callback(struct acomp_req *req,
diff --git a/include/crypto/des.h b/include/crypto/des.h
index 7812b4331ae4..2fcc72988843 100644
--- a/include/crypto/des.h
+++ b/include/crypto/des.h
@@ -34,7 +34,7 @@ void des3_ede_decrypt(const struct des3_ede_ctx *dctx, u8 *dst, const u8 *src);
* des_expand_key - Expand a DES input key into a key schedule
* @ctx: the key schedule
* @key: buffer containing the input key
- * @len: size of the buffer contents
+ * @keylen: size of the buffer contents
*
* Returns 0 on success, -EINVAL if the input key is rejected and -ENOKEY if
* the key is accepted but has been found to be weak.
@@ -45,7 +45,7 @@ int des_expand_key(struct des_ctx *ctx, const u8 *key, unsigned int keylen);
* des3_ede_expand_key - Expand a triple DES input key into a key schedule
* @ctx: the key schedule
* @key: buffer containing the input key
- * @len: size of the buffer contents
+ * @keylen: size of the buffer contents
*
* Returns 0 on success, -EINVAL if the input key is rejected and -ENOKEY if
* the key is accepted but has been found to be weak. Note that weak keys will
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
index a5db86670bdf..da66314d9bc7 100644
--- a/include/crypto/if_alg.h
+++ b/include/crypto/if_alg.h
@@ -124,7 +124,7 @@ struct af_alg_async_req {
* @tsgl_list: Link to TX SGL
* @iv: IV for cipher operation
* @aead_assoclen: Length of AAD for AEAD cipher operations
- * @completion: Work queue for synchronous operation
+ * @wait: helper structure for async operation
* @used: TX bytes sent to kernel. This variable is used to
* ensure that user space cannot cause the kernel
* to allocate too much memory in sendmsg operation.
@@ -174,10 +174,10 @@ static inline struct alg_sock *alg_sk(struct sock *sk)
}

/**
- * Size of available buffer for sending data from user space to kernel.
+ * af_alg_sndbuf - Size of available buffer for sending data from user space to kernel.
*
- * @sk socket of connection to user space
- * @return number of bytes still available
+ * @sk: socket of connection to user space
+ * @return: number of bytes still available
*/
static inline int af_alg_sndbuf(struct sock *sk)
{
@@ -189,10 +189,10 @@ static inline int af_alg_sndbuf(struct sock *sk)
}

/**
- * Can the send buffer still be written to?
+ * af_alg_writable - Can the send buffer still be written to?
*
- * @sk socket of connection to user space
- * @return true => writable, false => not writable
+ * @sk: socket of connection to user space
+ * @return: true => writable, false => not writable
*/
static inline bool af_alg_writable(struct sock *sk)
{
@@ -200,10 +200,10 @@ static inline bool af_alg_writable(struct sock *sk)
}

/**
- * Size of available buffer used by kernel for the RX user space operation.
+ * af_alg_rcvbuf - Size of available buffer used by kernel for the RX user space operation.
*
- * @sk socket of connection to user space
- * @return number of bytes still available
+ * @sk: socket of connection to user space
+ * @return: number of bytes still available
*/
static inline int af_alg_rcvbuf(struct sock *sk)
{
@@ -215,10 +215,10 @@ static inline int af_alg_rcvbuf(struct sock *sk)
}

/**
- * Can the RX buffer still be written to?
+ * af_alg_readable - Can the RX buffer still be written to?
*
- * @sk socket of connection to user space
- * @return true => writable, false => not writable
+ * @sk: socket of connection to user space
+ * @return: true => writable, false => not writable
*/
static inline bool af_alg_readable(struct sock *sk)
{
diff --git a/include/crypto/internal/ecc.h b/include/crypto/internal/ecc.h
index 4f6c1a68882f..4b8155fea03c 100644
--- a/include/crypto/internal/ecc.h
+++ b/include/crypto/internal/ecc.h
@@ -76,7 +76,7 @@ int ecc_is_key_valid(unsigned int curve_id, unsigned int ndigits,
* point G.
* @curve_id: id representing the curve to use
* @ndigits: curve number of digits
- * @private_key: buffer for storing the generated private key
+ * @privkey: buffer for storing the generated private key
*
* Returns 0 if the private key was generated successfully, a negative value
* if an error occurred.
@@ -172,7 +172,7 @@ int vli_cmp(const u64 *left, const u64 *right, unsigned int ndigits);
*
* @result: where to write result
* @left: vli
- * @right vli
+ * @right: vli
* @ndigits: length of all vlis
*
* Note: can modify in-place.
@@ -236,7 +236,7 @@ void vli_mod_mult_slow(u64 *result, const u64 *left, const u64 *right,
unsigned int vli_num_bits(const u64 *vli, unsigned int ndigits);

/**
- * ecc_aloc_point() - Allocate ECC point.
+ * ecc_alloc_point() - Allocate ECC point.
*
* @ndigits: Length of vlis in u64 qwords.
*
@@ -254,7 +254,7 @@ void ecc_free_point(struct ecc_point *p);
/**
* ecc_point_is_zero() - Check if point is zero.
*
- * @p: Point to check for zero.
+ * @point: Point to check for zero.
*
* Return: true if point is the point at infinity, false otherwise.
*/
diff --git a/include/crypto/internal/rsa.h b/include/crypto/internal/rsa.h
index e870133f4b77..78a7544aaa11 100644
--- a/include/crypto/internal/rsa.h
+++ b/include/crypto/internal/rsa.h
@@ -10,7 +10,7 @@
#include <linux/types.h>

/**
- * rsa_key - RSA key structure
+ * struct rsa_key - RSA key structure
* @n : RSA modulus raw byte stream
* @e : RSA public exponent raw byte stream
* @d : RSA private exponent raw byte stream
diff --git a/include/crypto/kdf_sp800108.h b/include/crypto/kdf_sp800108.h
index b7b20a778fb7..1c16343cd3fd 100644
--- a/include/crypto/kdf_sp800108.h
+++ b/include/crypto/kdf_sp800108.h
@@ -11,17 +11,20 @@
#include <linux/uio.h>

/**
- * Counter KDF generate operation according to SP800-108 section 5.1
- * as well as SP800-56A section 5.8.1 (Single-step KDF).
+ * crypto_kdf108_ctr_generate - Counter KDF generate operation
+ * according to SP800-108 section 5.1
+ * as well as SP800-56A section 5.8.1
+ * (Single-step KDF).
*
- * @kmd Keyed message digest whose key was set with crypto_kdf108_setkey or
- * unkeyed message digest
- * @info optional context and application specific information - this may be
- * NULL
- * @info_vec number of optional context/application specific information entries
- * @dst destination buffer that the caller already allocated
- * @dlen length of the destination buffer - the KDF derives that amount of
- * bytes.
+ * @kmd: Keyed message digest whose key was set with
+ * crypto_kdf108_setkey or unkeyed message digest
+ * @info: optional context and application specific information -
+ * this may be NULL
+ * @info_nvec: number of optional context/application specific
+ * information entries
+ * @dst: destination buffer that the caller already allocated
+ * @dlen: length of the destination buffer -
+ * the KDF derives that amount of bytes.
*
* To comply with SP800-108, the caller must provide Label || 0x00 || Context
* in the info parameter.
@@ -33,14 +36,16 @@ int crypto_kdf108_ctr_generate(struct crypto_shash *kmd,
u8 *dst, unsigned int dlen);

/**
- * Counter KDF setkey operation
+ * crypto_kdf108_setkey - Counter KDF setkey operation
*
- * @kmd Keyed message digest allocated by the caller. The key should not have
- * been set.
- * @key Seed key to be used to initialize the keyed message digest context.
- * @keylen This length of the key buffer.
- * @ikm The SP800-108 KDF does not support IKM - this parameter must be NULL
- * @ikmlen This parameter must be 0.
+ * @kmd: Keyed message digest allocated by the caller.
+ * The key should not have been set.
+ * @key: Seed key to be used to initialize the
+ * keyed message digest context.
+ * @keylen: This length of the key buffer.
+ * @ikm: The SP800-108 KDF does not support IKM -
+ * this parameter must be NULL
+ * @ikmlen: This parameter must be 0.
*
* According to SP800-108 section 7.2, the seed key must be at least as large as
* the message digest size of the used keyed message digest. This limitation
--
2.38.1

Subject: [PATCH 6/8] crypto: x86/ghash - add kernel-doc comments to assembly

Add kernel-doc comments for assembly language functions exported to
C glue code.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/ghash-clmulni-intel_asm.S | 27 +++++++++++++++++++----
1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S
index 2bf871899920..09cf9271b83a 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_asm.S
+++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S
@@ -88,7 +88,16 @@ SYM_FUNC_START_LOCAL(__clmul_gf128mul_ble)
RET
SYM_FUNC_END(__clmul_gf128mul_ble)

-/* void clmul_ghash_mul(char *dst, const u128 *shash) */
+/**
+ * clmul_ghash_mul - Calculate GHASH final multiplication using x86 PCLMULQDQ instructions
+ * @dst: address of hash value to update (%rdi)
+ * @shash: address of hash context (%rsi)
+ *
+ * This supports 64-bit CPUs.
+ *
+ * Return: none (but @dst is updated)
+ * Prototype: asmlinkage void clmul_ghash_mul(char *dst, const u128 *shash)
+ */
SYM_FUNC_START(clmul_ghash_mul)
FRAME_BEGIN
movups (%rdi), DATA
@@ -102,9 +111,19 @@ SYM_FUNC_START(clmul_ghash_mul)
RET
SYM_FUNC_END(clmul_ghash_mul)

-/*
- * void clmul_ghash_update(char *dst, const char *src, unsigned int srclen,
- * const u128 *shash);
+/**
+ * clmul_ghash_update - Calculate GHASH using x86 PCLMULQDQ instructions
+ * @dst: address of hash value to update (%rdi)
+ * @src: address of data to hash (%rsi)
+ * @srclen: number of bytes in data buffer (%rdx);
+ * function does nothing and returns if below 16
+ * @shash: address of hash context (%rcx)
+ *
+ * This supports 64-bit CPUs.
+ *
+ * Return: none (but @dst is updated)
+ * Prototype: asmlinkage clmul_ghash_update(char *dst, const char *src,
+ * unsigned int srclen, const u128 *shash);
*/
SYM_FUNC_START(clmul_ghash_update)
FRAME_BEGIN
--
2.38.1

Subject: [PATCH 8/8] crypto: x86/chacha - add kernel-doc comments to assembly

Add kernel-doc comments for assembly language functions exported to
C glue code.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/chacha-avx2-x86_64.S | 90 +++++++++++++++--------
arch/x86/crypto/chacha-avx512vl-x86_64.S | 94 +++++++++++++++---------
arch/x86/crypto/chacha-ssse3-x86_64.S | 75 ++++++++++++-------
3 files changed, 170 insertions(+), 89 deletions(-)

diff --git a/arch/x86/crypto/chacha-avx2-x86_64.S b/arch/x86/crypto/chacha-avx2-x86_64.S
index f3d8fc018249..5ebced6f32c3 100644
--- a/arch/x86/crypto/chacha-avx2-x86_64.S
+++ b/arch/x86/crypto/chacha-avx2-x86_64.S
@@ -34,18 +34,26 @@ CTR4BL: .octa 0x00000000000000000000000000000002

.text

+/**
+ * chacha_2block_xor_avx2 - Encrypt 2 blocks using the x86 AVX2 feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 2 data blocks output, o (%rsi)
+ * @src: address of up to 2 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts two ChaCha blocks by loading the state
+ * matrix twice across four AVX registers. It performs matrix operations
+ * on four words in each matrix in parallel, but requires shuffling to
+ * rearrange the words after each round.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_2block_xor_avx2(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_2block_xor_avx2)
- # %rdi: Input state matrix, s
- # %rsi: up to 2 data blocks output, o
- # %rdx: up to 2 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
-
- # This function encrypts two ChaCha blocks by loading the state
- # matrix twice across four AVX registers. It performs matrix operations
- # on four words in each matrix in parallel, but requires shuffling to
- # rearrange the words after each round.
-
vzeroupper

# x0..3[0-2] = s0..3
@@ -226,20 +234,28 @@ SYM_FUNC_START(chacha_2block_xor_avx2)

SYM_FUNC_END(chacha_2block_xor_avx2)

+/**
+ * chacha_4block_xor_avx2 - Encrypt 4 blocks using the x86 AVX2 feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 4 data blocks output, o (%rsi)
+ * @src: address of up to 4 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts four ChaCha blocks by loading the state
+ * matrix four times across eight AVX registers. It performs matrix
+ * operations on four words in two matrices in parallel, sequentially
+ * to the operations on the four words of the other two matrices. The
+ * required word shuffling has a rather high latency, we can do the
+ * arithmetic on two matrix-pairs without much slowdown.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_4block_xor_avx2(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_4block_xor_avx2)
- # %rdi: Input state matrix, s
- # %rsi: up to 4 data blocks output, o
- # %rdx: up to 4 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
-
- # This function encrypts four ChaCha blocks by loading the state
- # matrix four times across eight AVX registers. It performs matrix
- # operations on four words in two matrices in parallel, sequentially
- # to the operations on the four words of the other two matrices. The
- # required word shuffling has a rather high latency, we can do the
- # arithmetic on two matrix-pairs without much slowdown.
-
vzeroupper

# x0..3[0-4] = s0..3
@@ -531,12 +547,28 @@ SYM_FUNC_START(chacha_4block_xor_avx2)

SYM_FUNC_END(chacha_4block_xor_avx2)

+/**
+ * chacha_8block_xor_avx2 - Encrypt 8 blocks using the x86 AVX2 feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 8 data blocks output, o (%rsi)
+ * @src: address of up to 8 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts four ChaCha blocks by loading the state
+ * matrix four times across eight AVX registers. It performs matrix
+ * operations on four words in two matrices in parallel, sequentially
+ * to the operations on the four words of the other two matrices. The
+ * required word shuffling has a rather high latency, we can do the
+ * arithmetic on two matrix-pairs without much slowdown.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_8block_xor_avx2(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_8block_xor_avx2)
- # %rdi: Input state matrix, s
- # %rsi: up to 8 data blocks output, o
- # %rdx: up to 8 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds

# This function encrypts eight consecutive ChaCha blocks by loading
# the state matrix in AVX registers eight times. As we need some
diff --git a/arch/x86/crypto/chacha-avx512vl-x86_64.S b/arch/x86/crypto/chacha-avx512vl-x86_64.S
index 259383e1ad44..b4a85365e164 100644
--- a/arch/x86/crypto/chacha-avx512vl-x86_64.S
+++ b/arch/x86/crypto/chacha-avx512vl-x86_64.S
@@ -24,18 +24,26 @@ CTR8BL: .octa 0x00000003000000020000000100000000

.text

+/**
+ * chacha_2block_xor_avx512vl - Encrypt 2 blocks using the x86 AVX512VL feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 2 data blocks output, o (%rsi)
+ * @src: address of up to 2 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts two ChaCha blocks by loading the state
+ * matrix twice across four AVX registers. It performs matrix operations
+ * on four words in each matrix in parallel, but requires shuffling to
+ * rearrange the words after each round.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_2block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_2block_xor_avx512vl)
- # %rdi: Input state matrix, s
- # %rsi: up to 2 data blocks output, o
- # %rdx: up to 2 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
-
- # This function encrypts two ChaCha blocks by loading the state
- # matrix twice across four AVX registers. It performs matrix operations
- # on four words in each matrix in parallel, but requires shuffling to
- # rearrange the words after each round.
-
vzeroupper

# x0..3[0-2] = s0..3
@@ -189,20 +197,28 @@ SYM_FUNC_START(chacha_2block_xor_avx512vl)

SYM_FUNC_END(chacha_2block_xor_avx512vl)

+/**
+ * chacha_4block_xor_avx512vl - Encrypt 4 blocks using the x86 AVX512VL feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 4 data blocks output, o (%rsi)
+ * @src: address of up to 4 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts four ChaCha blocks by loading the state
+ * matrix four times across eight AVX registers. It performs matrix
+ * operations on four words in two matrices in parallel, sequentially
+ * to the operations on the four words of the other two matrices. The
+ * required word shuffling has a rather high latency, we can do the
+ * arithmetic on two matrix-pairs without much slowdown.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_4block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_4block_xor_avx512vl)
- # %rdi: Input state matrix, s
- # %rsi: up to 4 data blocks output, o
- # %rdx: up to 4 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
-
- # This function encrypts four ChaCha blocks by loading the state
- # matrix four times across eight AVX registers. It performs matrix
- # operations on four words in two matrices in parallel, sequentially
- # to the operations on the four words of the other two matrices. The
- # required word shuffling has a rather high latency, we can do the
- # arithmetic on two matrix-pairs without much slowdown.
-
vzeroupper

# x0..3[0-4] = s0..3
@@ -455,18 +471,26 @@ SYM_FUNC_START(chacha_4block_xor_avx512vl)

SYM_FUNC_END(chacha_4block_xor_avx512vl)

+/**
+ * chacha_8block_xor_avx512vl - Encrypt 8 blocks using the x86 AVX512VL feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 8 data blocks output, o (%rsi)
+ * @src: address of up to 8 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts eight consecutive ChaCha blocks by loading
+ * the state matrix in AVX registers eight times. Compared to AVX2, this
+ * mostly benefits from the new rotate instructions in VL and the
+ * additional registers.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_8block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_8block_xor_avx512vl)
- # %rdi: Input state matrix, s
- # %rsi: up to 8 data blocks output, o
- # %rdx: up to 8 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
-
- # This function encrypts eight consecutive ChaCha blocks by loading
- # the state matrix in AVX registers eight times. Compared to AVX2, this
- # mostly benefits from the new rotate instructions in VL and the
- # additional registers.
-
vzeroupper

# x0..15[0-7] = s[0..15]
diff --git a/arch/x86/crypto/chacha-ssse3-x86_64.S b/arch/x86/crypto/chacha-ssse3-x86_64.S
index 7111949cd5b9..6f5395ba54ab 100644
--- a/arch/x86/crypto/chacha-ssse3-x86_64.S
+++ b/arch/x86/crypto/chacha-ssse3-x86_64.S
@@ -34,7 +34,6 @@ CTRINC: .octa 0x00000003000000020000000100000000
* Clobbers: %r8d, %xmm4-%xmm7
*/
SYM_FUNC_START_LOCAL(chacha_permute)
-
movdqa ROT8(%rip),%xmm4
movdqa ROT16(%rip),%xmm5

@@ -111,12 +110,21 @@ SYM_FUNC_START_LOCAL(chacha_permute)
RET
SYM_FUNC_END(chacha_permute)

+/**
+ * chacha_block_xor_ssse3 - Encrypt 1 block using the x86 SSSE3 feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 1 data block output, o (%rsi)
+ * @src: address of up to 1 data block input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_block_xor_ssse3)
- # %rdi: Input state matrix, s
- # %rsi: up to 1 data block output, o
- # %rdx: up to 1 data block input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
FRAME_BEGIN

# x0..3 = s0..3
@@ -199,10 +207,19 @@ SYM_FUNC_START(chacha_block_xor_ssse3)

SYM_FUNC_END(chacha_block_xor_ssse3)

+/**
+ * hchacha_block_ssse3 - Encrypt 1 block using the x86 SSSE3 feature set
+ * @state: address of input state matrix, s (%rdu)
+ * @out: address of output (8 32-bit words)(%rsi)
+ * @nrounds: number of rounds (%edx);
+ * only uses lower 32 bits
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void hchacha_block_ssse3(const u32 *state, u32 *out, int nrounds);
+ */
SYM_FUNC_START(hchacha_block_ssse3)
- # %rdi: Input state matrix, s
- # %rsi: output (8 32-bit words)
- # %edx: nrounds
FRAME_BEGIN

movdqu 0x00(%rdi),%xmm0
@@ -220,23 +237,31 @@ SYM_FUNC_START(hchacha_block_ssse3)
RET
SYM_FUNC_END(hchacha_block_ssse3)

+/**
+ * chacha_4block_xor_ssse3 - Encrypt 4 blocks using the x86 SSSE3 feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 4 data blocks output, o (%rsi)
+ * @src: address of up to 4 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts four consecutive ChaCha blocks by loading the
+ * state matrix in SSE registers four times. As we need some scratch
+ * registers, we save the first four registers on the stack. The
+ * algorithm performs each operation on the corresponding word of each
+ * state matrix, hence requires no word shuffling. For final XORing step
+ * we transpose the matrix by interleaving 32- and then 64-bit words,
+ * which allows us to do XOR in SSE registers. 8/16-bit word rotation is
+ * done with the slightly better performing SSSE3 byte shuffling,
+ * 7/12-bit word rotation uses traditional shift+OR.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_4block_xor_ssse3(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_4block_xor_ssse3)
- # %rdi: Input state matrix, s
- # %rsi: up to 4 data blocks output, o
- # %rdx: up to 4 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
-
- # This function encrypts four consecutive ChaCha blocks by loading the
- # the state matrix in SSE registers four times. As we need some scratch
- # registers, we save the first four registers on the stack. The
- # algorithm performs each operation on the corresponding word of each
- # state matrix, hence requires no word shuffling. For final XORing step
- # we transpose the matrix by interleaving 32- and then 64-bit words,
- # which allows us to do XOR in SSE registers. 8/16-bit word rotation is
- # done with the slightly better performing SSSE3 byte shuffling,
- # 7/12-bit word rotation uses traditional shift+OR.
-
lea 8(%rsp),%r10
sub $0x80,%rsp
and $~63,%rsp
--
2.38.1

Subject: [PATCH 7/8] crypto: x86/blake2s - add kernel-doc comments to assembly

Add kernel-doc comments for assembly language functions exported to
C glue code.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/blake2s-core.S | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/arch/x86/crypto/blake2s-core.S b/arch/x86/crypto/blake2s-core.S
index b50b35ff1fdb..7605e7d94fd2 100644
--- a/arch/x86/crypto/blake2s-core.S
+++ b/arch/x86/crypto/blake2s-core.S
@@ -46,6 +46,19 @@ SIGMA2:
#endif /* CONFIG_AS_AVX512 */

.text
+/**
+ * blake2s_compress_ssse3 - Calculate BLAKE2s hash using the x86 SSSE3 feature set
+ * @state: pointer to 48-byte state (%rdi)
+ * @data: pointer to data (%rsi)
+ * @nblocks: number of 64-byte blocks of data (%rdx)
+ * @inc: counter increment value (%rcx)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none (but modifies state)
+ * Prototype: asmlinkage void blake2s_compress_ssse3(struct blake2s_state *state, const u8 *data,
+ * unsigned int nblocks, u32 inc);
+ */
SYM_FUNC_START(blake2s_compress_ssse3)
testq %rdx,%rdx
je .Lendofloop
@@ -175,6 +188,19 @@ SYM_FUNC_START(blake2s_compress_ssse3)
SYM_FUNC_END(blake2s_compress_ssse3)

#ifdef CONFIG_AS_AVX512
+/**
+ * blake2s_compress_avx512 - Calculate BLAKE2s hash using the x86 AVX-512VL feature set
+ * @state: address of 48-byte state (%rdi)
+ * @data: address of data (%rsi)
+ * @nblocks: number of 64-byte blocks of data (%rdx)
+ * @inc: counter increment value (%rcx)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none (but modifies state)
+ * Prototype: asmlinkage void blake2s_compress_avx512(struct blake2s_state *state, const u8 *data,
+ * unsigned int nblocks, u32 inc);
+ */
SYM_FUNC_START(blake2s_compress_avx512)
vmovdqu (%rdi),%xmm0
vmovdqu 0x10(%rdi),%xmm1
--
2.38.1

Subject: [PATCH 3/8] crypto: x86/sha - add kernel-doc comments to assembly

Add kernel-doc comments for assembly language functions exported to
C glue code.

Remove .align directives that are overridden by SYM_FUNC_START
(which includes .align 4).

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/sha1_avx2_x86_64_asm.S | 32 +++++++++++------------
arch/x86/crypto/sha1_ni_asm.S | 22 +++++++++-------
arch/x86/crypto/sha1_ssse3_asm.S | 33 +++++++++++++++--------
arch/x86/crypto/sha256-avx-asm.S | 24 ++++++++++-------
arch/x86/crypto/sha256-avx2-asm.S | 25 +++++++++++-------
arch/x86/crypto/sha256-ssse3-asm.S | 26 +++++++++++--------
arch/x86/crypto/sha256_ni_asm.S | 25 +++++++++---------
arch/x86/crypto/sha512-avx-asm.S | 33 +++++++++++------------
arch/x86/crypto/sha512-avx2-asm.S | 34 ++++++++++++------------
arch/x86/crypto/sha512-ssse3-asm.S | 36 ++++++++++++--------------
10 files changed, 161 insertions(+), 129 deletions(-)

diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
index a96b2fd26dab..c3ee9334cb0f 100644
--- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S
+++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
@@ -62,11 +62,6 @@
*Visit http://software.intel.com/en-us/articles/
*and refer to improving-the-performance-of-the-secure-hash-algorithm-1/
*
- *Updates 20-byte SHA-1 record at start of 'state', from 'input', for
- *even number of 'blocks' consecutive 64-byte blocks.
- *
- *extern "C" void sha1_transform_avx2(
- * struct sha1_state *state, const u8* input, int blocks );
*/

#include <linux/linkage.h>
@@ -629,13 +624,22 @@ _loop3:
_end:

.endm
-/*
- * macro implements SHA-1 function's body for several 64-byte blocks
- * param: function's name
- */
-.macro SHA1_VECTOR_ASM name
- SYM_FUNC_START(\name)

+.text
+
+/**
+ * sha1_transform_avx2 - Calculate SHA1 hash using the x86 AVX2 feature set
+ * @digest: address of current 20-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, BUF macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, CNT macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha1_transform_avx2(u32 *digest, const u8 *data, int blocks)
+ */
+SYM_FUNC_START(sha1_transform_avx2)
push %rbx
push %r12
push %r13
@@ -675,9 +679,7 @@ _loop3:
pop %rbx

RET
-
- SYM_FUNC_END(\name)
-.endm
+SYM_FUNC_END(sha1_transform_avx2)

.section .rodata

@@ -706,6 +708,4 @@ BSWAP_SHUFB_CTL:
.long 0x04050607
.long 0x08090a0b
.long 0x0c0d0e0f
-.text

-SHA1_VECTOR_ASM sha1_transform_avx2
diff --git a/arch/x86/crypto/sha1_ni_asm.S b/arch/x86/crypto/sha1_ni_asm.S
index 2f94ec0e763b..4aa8507b15b4 100644
--- a/arch/x86/crypto/sha1_ni_asm.S
+++ b/arch/x86/crypto/sha1_ni_asm.S
@@ -71,9 +71,16 @@
#define MSG3 %xmm6
#define SHUF_MASK %xmm7

+.text

-/*
- * Intel SHA Extensions optimized implementation of a SHA-1 update function
+/**
+ * sha1_transform_ni - Calculate SHA1 hash using the x86 SHA-NI feature set
+ * @digest: address of current 20-byte hash value (%rdi, DIGEST_PTR macro)
+ * @data: address of data (%rsi, DATA_PTR macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, NUM_BLKS macro)
+ *
+ * This function supports 64-bit CPUs.
*
* The function takes a pointer to the current hash values, a pointer to the
* input data, and a number of 64 byte blocks to process. Once all blocks have
@@ -85,15 +92,10 @@
* The indented lines in the loop are instructions related to rounds processing.
* The non-indented lines are instructions related to the message schedule.
*
- * void sha1_ni_transform(uint32_t *digest, const void *data,
- uint32_t numBlocks)
- * digest : pointer to digest
- * data: pointer to input data
- * numBlocks: Number of blocks to process
+ * Return: none
+ * Prototype: asmlinkage void sha1_transform_ni(u32 *digest, const u8 *data, int blocks)
*/
-.text
-.align 32
-SYM_FUNC_START(sha1_ni_transform)
+SYM_FUNC_START(sha1_transform_ni)
push %rbp
mov %rsp, %rbp
sub $FRAME_SIZE, %rsp
diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S
index 263f916362e0..8151a079ba6c 100644
--- a/arch/x86/crypto/sha1_ssse3_asm.S
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -450,20 +450,24 @@ BSWAP_SHUFB_CTL:
.long 0x0c0d0e0f


-.section .text
-
W_PRECALC_SSSE3
.macro xmm_mov a, b
movdqu \a,\b
.endm

-/*
- * SSSE3 optimized implementation:
+.text
+
+/**
+ * sha1_transform_ssse3 - Calculate SHA1 hash using the x86 SSSE3 feature set
+ * @digest: address of current 20-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, BUF macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, CNT macro)
*
- * extern "C" void sha1_transform_ssse3(struct sha1_state *state,
- * const u8 *data, int blocks);
+ * This function supports 64-bit CPUs.
*
- * Note that struct sha1_state is assumed to begin with u32 state[5].
+ * Return: none
+ * Prototype: asmlinkage void sha1_transform_ssse3(u32 *digest, const u8 *data, int blocks)
*/
SHA1_VECTOR_ASM sha1_transform_ssse3

@@ -545,9 +549,16 @@ W_PRECALC_AVX
vmovdqu \a,\b
.endm

-
-/* AVX optimized implementation:
- * extern "C" void sha1_transform_avx(struct sha1_state *state,
- * const u8 *data, int blocks);
+/**
+ * sha1_transform_avx - Calculate SHA1 hash using the x86 AVX feature set
+ * @digest: address of current 20-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, BUF macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, CNT macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha1_transform_avx(u32 *digest, const u8 *data, int blocks)
*/
SHA1_VECTOR_ASM sha1_transform_avx
diff --git a/arch/x86/crypto/sha256-avx-asm.S b/arch/x86/crypto/sha256-avx-asm.S
index 3baa1ec39097..2a60af20a3ff 100644
--- a/arch/x86/crypto/sha256-avx-asm.S
+++ b/arch/x86/crypto/sha256-avx-asm.S
@@ -94,9 +94,9 @@ SHUF_00BA = %xmm10 # shuffle xBxA -> 00BA
SHUF_DC00 = %xmm12 # shuffle xDxC -> DC00
BYTE_FLIP_MASK = %xmm13

-NUM_BLKS = %rdx # 3rd arg
-INP = %rsi # 2nd arg
CTX = %rdi # 1st arg
+INP = %rsi # 2nd arg
+NUM_BLKS = %rdx # 3rd arg

SRND = %rsi # clobbers INP
c = %ecx
@@ -339,15 +339,21 @@ a = TMP_
ROTATE_ARGS
.endm

-########################################################################
-## void sha256_transform_avx(state sha256_state *state, const u8 *data, int blocks)
-## arg 1 : pointer to state
-## arg 2 : pointer to input data
-## arg 3 : Num blocks
-########################################################################
.text
+
+/**
+ * sha256_transform_avx - Calculate SHA256 hash using the x86 AVX feature set
+ * @digest: address of current 32-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, INP macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, NUM_BLKS macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha256_transform_avx(u32 *digest, const u8 *data, int blocks)
+ */
SYM_FUNC_START(sha256_transform_avx)
-.align 32
pushq %rbx
pushq %r12
pushq %r13
diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
index 9bcdbc47b8b4..2f2d332f41a4 100644
--- a/arch/x86/crypto/sha256-avx2-asm.S
+++ b/arch/x86/crypto/sha256-avx2-asm.S
@@ -89,9 +89,9 @@ BYTE_FLIP_MASK = %ymm13

X_BYTE_FLIP_MASK = %xmm13 # XMM version of BYTE_FLIP_MASK

-NUM_BLKS = %rdx # 3rd arg
-INP = %rsi # 2nd arg
CTX = %rdi # 1st arg
+INP = %rsi # 2nd arg
+NUM_BLKS = %rdx # 3rd arg
c = %ecx
d = %r8d
e = %edx # clobbers NUM_BLKS
@@ -516,15 +516,22 @@ STACK_SIZE = _CTX + _CTX_SIZE

.endm

-########################################################################
-## void sha256_transform_rorx(struct sha256_state *state, const u8 *data, int blocks)
-## arg 1 : pointer to state
-## arg 2 : pointer to input data
-## arg 3 : Num blocks
-########################################################################
.text
+
+/**
+ * sha256_transform_rorx - Calculate SHA512 hash using x86 AVX2 feature set
+ * including the RORX (rotate right logical without affecting flags) instruction
+ * @digest: address of current 32-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, INP macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, NUM_BLKS macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha256_transform_rorx(u32 *digest, const u8 *data, int blocks)
+ */
SYM_FUNC_START(sha256_transform_rorx)
-.align 32
pushq %rbx
pushq %r12
pushq %r13
diff --git a/arch/x86/crypto/sha256-ssse3-asm.S b/arch/x86/crypto/sha256-ssse3-asm.S
index c4a5db612c32..087d03fb10e1 100644
--- a/arch/x86/crypto/sha256-ssse3-asm.S
+++ b/arch/x86/crypto/sha256-ssse3-asm.S
@@ -87,9 +87,9 @@ SHUF_00BA = %xmm10 # shuffle xBxA -> 00BA
SHUF_DC00 = %xmm11 # shuffle xDxC -> DC00
BYTE_FLIP_MASK = %xmm12

-NUM_BLKS = %rdx # 3rd arg
-INP = %rsi # 2nd arg
CTX = %rdi # 1st arg
+INP = %rsi # 2nd arg
+NUM_BLKS = %rdx # 3rd arg

SRND = %rsi # clobbers INP
c = %ecx
@@ -346,17 +346,21 @@ a = TMP_
ROTATE_ARGS
.endm

-########################################################################
-## void sha256_transform_ssse3(struct sha256_state *state, const u8 *data,
-## int blocks);
-## arg 1 : pointer to state
-## (struct sha256_state is assumed to begin with u32 state[8])
-## arg 2 : pointer to input data
-## arg 3 : Num blocks
-########################################################################
.text
+
+/**
+ * sha256_transform_ssse3 - Calculate SHA256 hash using the x86 SSSE3 feature set
+ * @digest: address of current 32-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, INP macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, NUM_BLKS macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha256_transform_ssse3(u32 *digest, const u8 *data, int blocks)
+ */
SYM_FUNC_START(sha256_transform_ssse3)
-.align 32
pushq %rbx
pushq %r12
pushq %r13
diff --git a/arch/x86/crypto/sha256_ni_asm.S b/arch/x86/crypto/sha256_ni_asm.S
index 94d50dd27cb5..a7b3f86cc127 100644
--- a/arch/x86/crypto/sha256_ni_asm.S
+++ b/arch/x86/crypto/sha256_ni_asm.S
@@ -75,8 +75,16 @@
#define ABEF_SAVE %xmm9
#define CDGH_SAVE %xmm10

-/*
- * Intel SHA Extensions optimized implementation of a SHA-256 update function
+.text
+
+/**
+ * sha256_transform_ni - Calculate SHA256 hash using the x86 SHA-NI feature set
+ * @digest: address of current 32-byte hash value (%rdi, DIGEST_PTR macro)
+ * @data: address of data (%rsi, DATA_PTR macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, NUM_BLKS macro)
+ *
+ * This function supports 64-bit CPUs.
*
* The function takes a pointer to the current hash values, a pointer to the
* input data, and a number of 64 byte blocks to process. Once all blocks have
@@ -88,17 +96,10 @@
* The indented lines in the loop are instructions related to rounds processing.
* The non-indented lines are instructions related to the message schedule.
*
- * void sha256_ni_transform(uint32_t *digest, const void *data,
- uint32_t numBlocks);
- * digest : pointer to digest
- * data: pointer to input data
- * numBlocks: Number of blocks to process
+ * Return: none
+ * Prototype: asmlinkage void sha256_transform_ni(u32 *digest, const u8 *data, int blocks)
*/
-
-.text
-.align 32
-SYM_FUNC_START(sha256_ni_transform)
-
+SYM_FUNC_START(sha256_transform_ni)
shl $6, NUM_BLKS /* convert to bytes */
jz .Ldone_hash
add DATA_PTR, NUM_BLKS /* pointer to end of data */
diff --git a/arch/x86/crypto/sha512-avx-asm.S b/arch/x86/crypto/sha512-avx-asm.S
index 1fefe6dd3a9e..145534a0c6f7 100644
--- a/arch/x86/crypto/sha512-avx-asm.S
+++ b/arch/x86/crypto/sha512-avx-asm.S
@@ -49,15 +49,10 @@

#include <linux/linkage.h>

-.text
-
# Virtual Registers
-# ARG1
-digest = %rdi
-# ARG2
-msg = %rsi
-# ARG3
-msglen = %rdx
+digest = %rdi # ARG1
+msg = %rsi # ARG2
+msglen = %rdx # ARG3
T1 = %rcx
T2 = %r8
a_64 = %r9
@@ -265,14 +260,20 @@ frame_size = frame_WK + WK_SIZE
RotateState
.endm

-########################################################################
-# void sha512_transform_avx(sha512_state *state, const u8 *data, int blocks)
-# Purpose: Updates the SHA512 digest stored at "state" with the message
-# stored in "data".
-# The size of the message pointed to by "data" must be an integer multiple
-# of SHA512 message blocks.
-# "blocks" is the message length in SHA512 blocks
-########################################################################
+.text
+
+/**
+ * sha512_transform_avx - Calculate SHA512 hash using the x86 AVX feature set
+ * @digest: address of current 64-byte hash value (%rdi, digest macro)
+ * @data: address of data (%rsi, msg macro);
+ * data must be a multiple of 128 bytes
+ * @blocks: number of 128-byte blocks (%rdx, msglen macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha512_transform_avx(u32 *digest, const u8 *data, int blocks)
+ */
SYM_FUNC_START(sha512_transform_avx)
test msglen, msglen
je nowork
diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
index 5cdaab7d6901..bd9e08c8643d 100644
--- a/arch/x86/crypto/sha512-avx2-asm.S
+++ b/arch/x86/crypto/sha512-avx2-asm.S
@@ -51,8 +51,6 @@

#include <linux/linkage.h>

-.text
-
# Virtual Registers
Y_0 = %ymm4
Y_1 = %ymm5
@@ -68,13 +66,10 @@ XFER = YTMP0

BYTE_FLIP_MASK = %ymm9

-# 1st arg is %rdi, which is saved to the stack and accessed later via %r12
-CTX1 = %rdi
+CTX1 = %rdi # 1st arg, which is saved to the stack and accessed later via %r12
CTX2 = %r12
-# 2nd arg
-INP = %rsi
-# 3rd arg
-NUM_BLKS = %rdx
+INP = %rsi # 2nd arg
+NUM_BLKS = %rdx # 3rd arg

c = %rcx
d = %r8
@@ -557,14 +552,21 @@ frame_size = frame_CTX + CTX_SIZE

.endm

-########################################################################
-# void sha512_transform_rorx(sha512_state *state, const u8 *data, int blocks)
-# Purpose: Updates the SHA512 digest stored at "state" with the message
-# stored in "data".
-# The size of the message pointed to by "data" must be an integer multiple
-# of SHA512 message blocks.
-# "blocks" is the message length in SHA512 blocks
-########################################################################
+.text
+
+/**
+ * sha512_transform_rorx - Calculate SHA512 hash using the x86 AVX2 feature set
+ * including the RORX (rotate right logical without affecting flags) instruction
+ * @digest: address of 64-byte hash value (%rdi, CTX1 macro)
+ * @data: address of data (%rsi, INP macro);
+ * data must be a multiple of 128 bytes
+ * @blocks: number of 128-byte blocks (%rdx, NUM_BLKS macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha512_transform_rorx(u32 *digest, const u8 *data, int blocks)
+ */
SYM_FUNC_START(sha512_transform_rorx)
# Save GPRs
push %rbx
diff --git a/arch/x86/crypto/sha512-ssse3-asm.S b/arch/x86/crypto/sha512-ssse3-asm.S
index b84c22e06c5f..cd6a0455d548 100644
--- a/arch/x86/crypto/sha512-ssse3-asm.S
+++ b/arch/x86/crypto/sha512-ssse3-asm.S
@@ -49,15 +49,10 @@

#include <linux/linkage.h>

-.text
-
# Virtual Registers
-# ARG1
-digest = %rdi
-# ARG2
-msg = %rsi
-# ARG3
-msglen = %rdx
+digest = %rdi # ARG1
+msg = %rsi # ARG2
+msglen = %rdx # ARG3
T1 = %rcx
T2 = %r8
a_64 = %r9
@@ -264,18 +259,21 @@ frame_size = frame_WK + WK_SIZE
RotateState
.endm

-########################################################################
-## void sha512_transform_ssse3(struct sha512_state *state, const u8 *data,
-## int blocks);
-# (struct sha512_state is assumed to begin with u64 state[8])
-# Purpose: Updates the SHA512 digest stored at "state" with the message
-# stored in "data".
-# The size of the message pointed to by "data" must be an integer multiple
-# of SHA512 message blocks.
-# "blocks" is the message length in SHA512 blocks.
-########################################################################
-SYM_FUNC_START(sha512_transform_ssse3)
+.text

+/**
+ * sha512_transform_ssse3 - Calculate SHA512 hash using x86 SSSE3 feature set
+ * @digest: address of current 64-byte hash value (%rdi, digest macro)
+ * @data: address of data (%rsi, msg macro);
+ * data size must be a multiple of 128 bytes
+ * @blocks: number of 128-byte blocks (%rdx, msglen macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha512_transform_ssse3(u32 *digest, const u8 *data, int blocks)
+ */
+SYM_FUNC_START(sha512_transform_ssse3)
test msglen, msglen
je nowork

--
2.38.1

2022-12-15 10:49:54

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 3/8] crypto: x86/sha - add kernel-doc comments to assembly

Hi Robert,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on lwn/docs-next]
[also build test ERROR on v6.1]
[cannot apply to herbert-cryptodev-2.6/master herbert-crypto-2.6/master linus/master next-20221215]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Robert-Elliott/crypto-kernel-doc-for-assembly-language/20221215-144624
base: git://git.lwn.net/linux.git docs-next
patch link: https://lore.kernel.org/r/20221215063857.161665-4-elliott%40hpe.com
patch subject: [PATCH 3/8] crypto: x86/sha - add kernel-doc comments to assembly
config: x86_64-randconfig-a013
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/ae551f9c28c6734d5b3f8c35412e64c72796d9aa
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Robert-Elliott/crypto-kernel-doc-for-assembly-language/20221215-144624
git checkout ae551f9c28c6734d5b3f8c35412e64c72796d9aa
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

/tmp/ccqiXWUb.s: Assembler messages:
>> /tmp/ccqiXWUb.s: Error: invalid operands (.text and *UND* sections) for `-' when setting `.L__sym_size_sha1_ni_transform'

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (1.75 kB)
config (120.54 kB)
Download all attachments

2022-12-15 11:30:41

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 3/8] crypto: x86/sha - add kernel-doc comments to assembly

Hi Robert,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on lwn/docs-next]
[also build test ERROR on v6.1]
[cannot apply to herbert-cryptodev-2.6/master herbert-crypto-2.6/master linus/master next-20221215]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Robert-Elliott/crypto-kernel-doc-for-assembly-language/20221215-144624
base: git://git.lwn.net/linux.git docs-next
patch link: https://lore.kernel.org/r/20221215063857.161665-4-elliott%40hpe.com
patch subject: [PATCH 3/8] crypto: x86/sha - add kernel-doc comments to assembly
config: x86_64-randconfig-a011
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/ae551f9c28c6734d5b3f8c35412e64c72796d9aa
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Robert-Elliott/crypto-kernel-doc-for-assembly-language/20221215-144624
git checkout ae551f9c28c6734d5b3f8c35412e64c72796d9aa
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

/tmp/cc2fL2SU.s: Assembler messages:
>> /tmp/cc2fL2SU.s: Error: invalid operands (.text and *UND* sections) for `-' when setting `.L__sym_size_sha256_ni_transform'

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (1.75 kB)
config (160.97 kB)
Download all attachments
Subject: [PATCH v2 0/8] crypto: kernel-doc for assembly language

Clean up the existing kernel-doc headers in the crypto subsystem,
then add support for kernel-doc headers in assembly language files
for functions called from C code.

This provides a place to document the assumptions made by the
assembly language functions about their arguments (e.g., how
they handle length values of 0, less than some value, not
multiples of some value, etc.).

Not all the assembly language files are tackled yet - just some
of the x86 files pending changes related to kernel_fpu_begin/end.

Changes in patch series v2:
* rebased to upstream after Herbert's v6.2-p1 tag was merged
* made not dependent on any of the proposed FPU patches
(instead, those will assume this is in place)
* adds Documentation for the new kernel-doc Prototype line

Example man page formatted output for one of them:

$ nroff -man /tmp/man/sha1_transform_avx2.9
sha1_transform_avx2(9) Kernel Hacker's Manual sha1_transform_avx2(9)

NAME
sha1_transform_avx2 - Calculate SHA1 hash using the x86 AVX2 feature
set

SYNOPSIS
void sha1_transform_avx2 (u32 *digest , const u8 *data , int blocks );

ARGUMENTS
digest address of current 20-byte hash value (rdi, CTX macro)

data address of data (rsi, BUF macro); data size must be a mul‐
tiple of 64 bytes

blocks number of 64-byte blocks (rdx, CNT macro)

DESCRIPTION
This function supports 64-bit CPUs.

RETURN
none

PROTOTYPE
asmlinkage void sha1_transform_avx2(u32 *digest, const u8 *data, int
blocks)

December 2022 sha1_transform_avx2 sha1_transform_avx2(9)


Robert Elliott (8):
crypto: clean up kernel-doc headers
doc: support kernel-doc for asm functions
crypto: x86/sha - add kernel-doc comments to assembly
crypto: x86/crc - add kernel-doc comments to assembly
crypto: x86/sm3 - add kernel-doc comments to assembly
crypto: x86/ghash - add kernel-doc comments to assembly
crypto: x86/blake2s - add kernel-doc comments to assembly
crypto: x86/chacha - add kernel-doc comments to assembly

.../mips/cavium-octeon/crypto/octeon-crypto.c | 19 ++--
arch/x86/crypto/blake2s-core.S | 26 +++++
arch/x86/crypto/chacha-avx2-x86_64.S | 90 ++++++++++++------
arch/x86/crypto/chacha-avx512vl-x86_64.S | 94 ++++++++++++-------
arch/x86/crypto/chacha-ssse3-x86_64.S | 75 ++++++++++-----
arch/x86/crypto/crc32-pclmul_asm.S | 24 ++---
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 29 +++---
arch/x86/crypto/crct10dif-pcl-asm_64.S | 24 +++--
arch/x86/crypto/ghash-clmulni-intel_asm.S | 27 +++++-
arch/x86/crypto/sha1_avx2_x86_64_asm.S | 32 +++----
arch/x86/crypto/sha1_ni_asm.S | 22 +++--
arch/x86/crypto/sha1_ssse3_asm.S | 33 ++++---
arch/x86/crypto/sha256-avx-asm.S | 24 +++--
arch/x86/crypto/sha256-avx2-asm.S | 25 +++--
arch/x86/crypto/sha256-ssse3-asm.S | 26 ++---
arch/x86/crypto/sha256_ni_asm.S | 25 ++---
arch/x86/crypto/sha512-avx-asm.S | 33 +++----
arch/x86/crypto/sha512-avx2-asm.S | 34 +++----
arch/x86/crypto/sha512-ssse3-asm.S | 36 ++++---
arch/x86/crypto/sm3-avx-asm_64.S | 18 ++--
crypto/asymmetric_keys/verify_pefile.c | 2 +-
crypto/async_tx/async_pq.c | 11 +--
crypto/async_tx/async_tx.c | 4 +-
crypto/crypto_engine.c | 2 +-
include/crypto/acompress.h | 2 +-
include/crypto/des.h | 4 +-
include/crypto/if_alg.h | 26 ++---
include/crypto/internal/ecc.h | 8 +-
include/crypto/internal/rsa.h | 2 +-
include/crypto/kdf_sp800108.h | 39 ++++----
scripts/kernel-doc | 48 +++++++++-
31 files changed, 545 insertions(+), 319 deletions(-)

--
2.38.1


Robert Elliott (8):
crypto: clean up kernel-doc headers
doc: support kernel-doc for asm functions
crypto: x86/sha - add kernel-doc comments to assembly
crypto: x86/crc - add kernel-doc comments to assembly
crypto: x86/sm3 - add kernel-doc comments to assembly
crypto: x86/ghash - add kernel-doc comments to assembly
crypto: x86/blake2s - add kernel-doc comments to assembly
crypto: x86/chacha - add kernel-doc comments to assembly

Documentation/doc-guide/kernel-doc.rst | 79 ++++++++++++++++
.../mips/cavium-octeon/crypto/octeon-crypto.c | 19 ++--
arch/x86/crypto/blake2s-core.S | 26 +++++
arch/x86/crypto/chacha-avx2-x86_64.S | 90 ++++++++++++------
arch/x86/crypto/chacha-avx512vl-x86_64.S | 94 ++++++++++++-------
arch/x86/crypto/chacha-ssse3-x86_64.S | 75 ++++++++++-----
arch/x86/crypto/crc32-pclmul_asm.S | 24 ++---
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 29 +++---
arch/x86/crypto/crct10dif-pcl-asm_64.S | 24 +++--
arch/x86/crypto/ghash-clmulni-intel_asm.S | 27 +++++-
arch/x86/crypto/sha1_avx2_x86_64_asm.S | 32 +++----
arch/x86/crypto/sha1_ni_asm.S | 19 ++--
arch/x86/crypto/sha1_ssse3_asm.S | 33 ++++---
arch/x86/crypto/sha256-avx-asm.S | 23 +++--
arch/x86/crypto/sha256-avx2-asm.S | 24 +++--
arch/x86/crypto/sha256-ssse3-asm.S | 25 +++--
arch/x86/crypto/sha256_ni_asm.S | 22 +++--
arch/x86/crypto/sha512-avx-asm.S | 33 +++----
arch/x86/crypto/sha512-avx2-asm.S | 34 +++----
arch/x86/crypto/sha512-ssse3-asm.S | 36 ++++---
arch/x86/crypto/sm3-avx-asm_64.S | 18 ++--
crypto/asymmetric_keys/verify_pefile.c | 2 +-
crypto/async_tx/async_pq.c | 11 +--
crypto/async_tx/async_tx.c | 4 +-
crypto/crypto_engine.c | 2 +-
include/crypto/acompress.h | 2 +-
include/crypto/des.h | 4 +-
include/crypto/if_alg.h | 26 ++---
include/crypto/internal/ecc.h | 8 +-
include/crypto/internal/rsa.h | 2 +-
include/crypto/kdf_sp800108.h | 39 ++++----
scripts/kernel-doc | 49 +++++++++-
32 files changed, 623 insertions(+), 312 deletions(-)

--
2.38.1

Subject: [PATCH v2 1/8] crypto: clean up kernel-doc headers

Fix these problems in the kernel-doc function header comments, as
reported by running:
scripts/kernel-doc -man \
$(git grep -l '/\*\*' -- :^Documentation :^tools) \
| scripts/split-man.pl /tmp/man 2> err.log
cat err.log | grep crypto | grep -v drivers

arch/mips/cavium-octeon/crypto/octeon-crypto.c:18: warning: This comment starts with '/**', but isn't a kernel-doc comment.
arch/mips/cavium-octeon/crypto/octeon-crypto.c:50: warning: This comment starts with '/**', but isn't a kernel-doc comment.
arch/mips/cavium-octeon/crypto/octeon-crypto.c:60: warning: Function parameter or member 'crypto_flags' not described in 'octeon_crypto_disable'
arch/mips/cavium-octeon/crypto/octeon-crypto.c:60: warning: Excess function parameter 'flags' description in 'octeon_crypto_disable'

crypto/asymmetric_keys/verify_pefile.c:420: warning: Function parameter or member 'trusted_keys' not described in 'verify_pefile_signature'
crypto/asymmetric_keys/verify_pefile.c:420: warning: Excess function parameter 'trust_keys' description in 'verify_pefile_signature'

crypto/async_tx/async_pq.c:19: warning: cannot understand function prototype: 'struct page *pq_scribble_page; '
crypto/async_tx/async_pq.c:40: warning: Function parameter or member 'chan' not described in 'do_async_gen_syndrome'
crypto/async_tx/async_pq.c:40: warning: Function parameter or member 'scfs' not described in 'do_async_gen_syndrome'
crypto/async_tx/async_pq.c:40: warning: Function parameter or member 'disks' not described in 'do_async_gen_syndrome'
crypto/async_tx/async_pq.c:40: warning: Function parameter or member 'unmap' not described in 'do_async_gen_syndrome'
crypto/async_tx/async_pq.c:40: warning: Function parameter or member 'dma_flags' not described in 'do_async_gen_syndrome'
crypto/async_tx/async_pq.c:40: warning: Function parameter or member 'submit' not described in 'do_async_gen_syndrome'
crypto/async_tx/async_pq.c:109: warning: Function parameter or member 'blocks' not described in 'do_sync_gen_syndrome'
crypto/async_tx/async_pq.c:109: warning: Function parameter or member 'offsets' not described in 'do_sync_gen_syndrome'
crypto/async_tx/async_pq.c:109: warning: Function parameter or member 'disks' not described in 'do_sync_gen_syndrome'
crypto/async_tx/async_pq.c:109: warning: Function parameter or member 'len' not described in 'do_sync_gen_syndrome'
crypto/async_tx/async_pq.c:109: warning: Function parameter or member 'submit' not described in 'do_sync_gen_syndrome'
crypto/async_tx/async_pq.c:302: warning: Excess function parameter 'offset' description in 'async_syndrome_val'

crypto/async_tx/async_tx.c:137: warning: cannot understand function prototype: 'enum submit_disposition '
crypto/async_tx/async_tx.c:265: warning: Function parameter or member 'tx' not described in 'async_tx_quiesce'

crypto/crypto_engine.c:514: warning: Excess function parameter 'engine' description in 'crypto_engine_alloc_init_and_set'

include/crypto/acompress.h:219: warning: Function parameter or member 'cmpl' not described in 'acomp_request_set_callback'
include/crypto/acompress.h:219: warning: Excess function parameter 'cmlp' description in 'acomp_request_set_callback'

include/crypto/des.h:43: warning: Function parameter or member 'keylen' not described in 'des_expand_key'
include/crypto/des.h:43: warning: Excess function parameter 'len' description in 'des_expand_key'
include/crypto/des.h:56: warning: Function parameter or member 'keylen' not described in 'des3_ede_expand_key'
include/crypto/des.h:56: warning: Excess function parameter 'len' description in 'des3_ede_expand_key'

include/crypto/if_alg.h:160: warning: Function parameter or member 'wait' not described in 'af_alg_ctx'
include/crypto/if_alg.h:178: warning: This comment starts with '/**', but isn't a kernel-doc comment.
include/crypto/if_alg.h:193: warning: This comment starts with '/**', but isn't a kernel-doc comment.
include/crypto/if_alg.h:204: warning: This comment starts with '/**', but isn't a kernel-doc comment.
include/crypto/if_alg.h:219: warning: This comment starts with '/**', but isn't a kernel-doc comment.
include/crypto/if_alg.h:184: warning: Function parameter or member 'sk' not described in 'af_alg_sndbuf'
include/crypto/if_alg.h:199: warning: Function parameter or member 'sk' not described in 'af_alg_writable'
include/crypto/if_alg.h:210: warning: Function parameter or member 'sk' not described in 'af_alg_rcvbuf'
include/crypto/if_alg.h:225: warning: Function parameter or member 'sk' not described in 'af_alg_readable'

include/crypto/internal/ecc.h:85: warning: Function parameter or member 'privkey' not described in 'ecc_gen_privkey'
include/crypto/internal/ecc.h:85: warning: Excess function parameter 'private_key' description in 'ecc_gen_privkey'
include/crypto/internal/ecc.h:184: warning: Function parameter or member 'right' not described in 'vli_sub'
include/crypto/internal/ecc.h:246: warning: expecting prototype for ecc_aloc_point(). Prototype was for ecc_alloc_point() instead
include/crypto/internal/ecc.h:262: warning: Function parameter or member 'point' not described in 'ecc_point_is_zero'
include/crypto/internal/ecc.h:262: warning: Excess function parameter 'p' description in 'ecc_point_is_zero'

include/crypto/internal/rsa.h:32: warning: cannot understand function prototype: 'struct rsa_key '

include/crypto/kdf_sp800108.h:34: warning: Function parameter or member 'kmd' not described in 'crypto_kdf108_ctr_generate'
include/crypto/kdf_sp800108.h:34: warning: Function parameter or member 'info' not described in 'crypto_kdf108_ctr_generate'
include/crypto/kdf_sp800108.h:34: warning: Function parameter or member 'info_nvec' not described in 'crypto_kdf108_ctr_generate'
include/crypto/kdf_sp800108.h:34: warning: Function parameter or member 'dst' not described in 'crypto_kdf108_ctr_generate'
include/crypto/kdf_sp800108.h:34: warning: Function parameter or member 'dlen' not described in 'crypto_kdf108_ctr_generate'
include/crypto/kdf_sp800108.h:34: warning: expecting prototype for Counter KDF generate operation according to SP800(). Prototype was for crypto_kdf108_ctr_generate() instead
include/crypto/kdf_sp800108.h:37: warning: This comment starts with '/**', but isn't a kernel-doc comment.
include/crypto/kdf_sp800108.h:37: warning: Function parameter or member 'info_nvec' not described in 'crypto_kdf108_ctr_generate'
include/crypto/kdf_sp800108.h:37: warning: Excess function parameter 'info_vec' description in 'crypto_kdf108_ctr_generate'

Signed-off-by: Robert Elliott <[email protected]>
---
.../mips/cavium-octeon/crypto/octeon-crypto.c | 19 ++++-----
crypto/asymmetric_keys/verify_pefile.c | 2 +-
crypto/async_tx/async_pq.c | 11 +++---
crypto/async_tx/async_tx.c | 4 +-
crypto/crypto_engine.c | 2 +-
include/crypto/acompress.h | 2 +-
include/crypto/des.h | 4 +-
include/crypto/if_alg.h | 26 ++++++-------
include/crypto/internal/ecc.h | 8 ++--
include/crypto/internal/rsa.h | 2 +-
include/crypto/kdf_sp800108.h | 39 +++++++++++--------
11 files changed, 62 insertions(+), 57 deletions(-)

diff --git a/arch/mips/cavium-octeon/crypto/octeon-crypto.c b/arch/mips/cavium-octeon/crypto/octeon-crypto.c
index cfb4a146cf17..c4badbf756b5 100644
--- a/arch/mips/cavium-octeon/crypto/octeon-crypto.c
+++ b/arch/mips/cavium-octeon/crypto/octeon-crypto.c
@@ -14,10 +14,11 @@
#include "octeon-crypto.h"

/**
- * Enable access to Octeon's COP2 crypto hardware for kernel use. Wrap any
- * crypto operations in calls to octeon_crypto_enable/disable in order to make
- * sure the state of COP2 isn't corrupted if userspace is also performing
- * hardware crypto operations. Allocate the state parameter on the stack.
+ * octeon_crypto_enable - Enable access to Octeon's COP2 crypto hardware for kernel use.
+ * Wrap any crypto operations in calls to octeon_crypto_enable/disable
+ * in order to make sure the state of COP2 isn't corrupted if userspace
+ * is also performing hardware crypto operations.
+ * Allocate the state parameter on the stack.
* Returns with preemption disabled.
*
* @state: Pointer to state structure to store current COP2 state in.
@@ -46,12 +47,12 @@ unsigned long octeon_crypto_enable(struct octeon_cop2_state *state)
EXPORT_SYMBOL_GPL(octeon_crypto_enable);

/**
- * Disable access to Octeon's COP2 crypto hardware in the kernel. This must be
- * called after an octeon_crypto_enable() before any context switch or return to
- * userspace.
+ * octeon_crypto_disable - Disable access to Octeon's COP2 crypto hardware in the kernel.
+ * This must be called after an octeon_crypto_enable() before any
+ * context switch or return to userspace.
*
- * @state: Pointer to COP2 state to restore
- * @flags: Return value from octeon_crypto_enable()
+ * @state: Pointer to COP2 state to restore
+ * @crypto_flags: Return value from octeon_crypto_enable()
*/
void octeon_crypto_disable(struct octeon_cop2_state *state,
unsigned long crypto_flags)
diff --git a/crypto/asymmetric_keys/verify_pefile.c b/crypto/asymmetric_keys/verify_pefile.c
index 7553ab18db89..148cad70fe79 100644
--- a/crypto/asymmetric_keys/verify_pefile.c
+++ b/crypto/asymmetric_keys/verify_pefile.c
@@ -387,7 +387,7 @@ static int pefile_digest_pe(const void *pebuf, unsigned int pelen,
* verify_pefile_signature - Verify the signature on a PE binary image
* @pebuf: Buffer containing the PE binary image
* @pelen: Length of the binary image
- * @trust_keys: Signing certificate(s) to use as starting points
+ * @trusted_keys: Signing certificate(s) to use as starting points
* @usage: The use to which the key is being put.
*
* Validate that the certificate chain inside the PKCS#7 message inside the PE
diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index f9cdc5e91664..c95908d70f7e 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -11,9 +11,8 @@
#include <linux/async_tx.h>
#include <linux/gfp.h>

-/**
- * pq_scribble_page - space to hold throwaway P or Q buffer for
- * synchronous gen_syndrome
+/*
+ * space to hold throwaway P or Q buffer for synchronous gen_syndrome
*/
static struct page *pq_scribble_page;

@@ -28,7 +27,7 @@ static struct page *pq_scribble_page;

#define MAX_DISKS 255

-/**
+/*
* do_async_gen_syndrome - asynchronously calculate P and/or Q
*/
static __async_inline struct dma_async_tx_descriptor *
@@ -100,7 +99,7 @@ do_async_gen_syndrome(struct dma_chan *chan,
return tx;
}

-/**
+/*
* do_sync_gen_syndrome - synchronously calculate a raid6 syndrome
*/
static void
@@ -281,7 +280,7 @@ pq_val_chan(struct async_submit_ctl *submit, struct page **blocks, int disks, si
/**
* async_syndrome_val - asynchronously validate a raid6 syndrome
* @blocks: source blocks from idx 0..disks-3, P @ disks-2 and Q @ disks-1
- * @offset: common offset into each block (src and dest) to start transaction
+ * @offsets: common offset into each block (src and dest) to start transaction
* @disks: number of blocks (including missing P or Q, see below)
* @len: length of operation in bytes
* @pqres: on val failure SUM_CHECK_P_RESULT and/or SUM_CHECK_Q_RESULT are set
diff --git a/crypto/async_tx/async_tx.c b/crypto/async_tx/async_tx.c
index 9256934312d7..ad72057a5e0d 100644
--- a/crypto/async_tx/async_tx.c
+++ b/crypto/async_tx/async_tx.c
@@ -124,7 +124,7 @@ async_tx_channel_switch(struct dma_async_tx_descriptor *depend_tx,


/**
- * submit_disposition - flags for routing an incoming operation
+ * enum submit_disposition - flags for routing an incoming operation
* @ASYNC_TX_SUBMITTED: we were able to append the new operation under the lock
* @ASYNC_TX_CHANNEL_SWITCH: when the lock is dropped schedule a channel switch
* @ASYNC_TX_DIRECT_SUBMIT: when the lock is dropped submit directly
@@ -258,7 +258,7 @@ EXPORT_SYMBOL_GPL(async_trigger_callback);

/**
* async_tx_quiesce - ensure tx is complete and freeable upon return
- * @tx - transaction to quiesce
+ * @tx: transaction to quiesce
*/
void async_tx_quiesce(struct dma_async_tx_descriptor **tx)
{
diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c
index bb8e77077f02..64dc9aa3ca24 100644
--- a/crypto/crypto_engine.c
+++ b/crypto/crypto_engine.c
@@ -499,7 +499,7 @@ EXPORT_SYMBOL_GPL(crypto_engine_stop);
* This has the form:
* callback(struct crypto_engine *engine)
* where:
- * @engine: the crypto engine structure.
+ * engine: the crypto engine structure.
* @rt: whether this queue is set to run as a realtime task
* @qlen: maximum size of the crypto-engine queue
*
diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h
index e4bc96528902..77870c3aeec5 100644
--- a/include/crypto/acompress.h
+++ b/include/crypto/acompress.h
@@ -209,7 +209,7 @@ void acomp_request_free(struct acomp_req *req);
*
* @req: request that the callback will be set for
* @flgs: specify for instance if the operation may backlog
- * @cmlp: callback which will be called
+ * @cmpl: callback which will be called
* @data: private data used by the caller
*/
static inline void acomp_request_set_callback(struct acomp_req *req,
diff --git a/include/crypto/des.h b/include/crypto/des.h
index 7812b4331ae4..2fcc72988843 100644
--- a/include/crypto/des.h
+++ b/include/crypto/des.h
@@ -34,7 +34,7 @@ void des3_ede_decrypt(const struct des3_ede_ctx *dctx, u8 *dst, const u8 *src);
* des_expand_key - Expand a DES input key into a key schedule
* @ctx: the key schedule
* @key: buffer containing the input key
- * @len: size of the buffer contents
+ * @keylen: size of the buffer contents
*
* Returns 0 on success, -EINVAL if the input key is rejected and -ENOKEY if
* the key is accepted but has been found to be weak.
@@ -45,7 +45,7 @@ int des_expand_key(struct des_ctx *ctx, const u8 *key, unsigned int keylen);
* des3_ede_expand_key - Expand a triple DES input key into a key schedule
* @ctx: the key schedule
* @key: buffer containing the input key
- * @len: size of the buffer contents
+ * @keylen: size of the buffer contents
*
* Returns 0 on success, -EINVAL if the input key is rejected and -ENOKEY if
* the key is accepted but has been found to be weak. Note that weak keys will
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
index a5db86670bdf..da66314d9bc7 100644
--- a/include/crypto/if_alg.h
+++ b/include/crypto/if_alg.h
@@ -124,7 +124,7 @@ struct af_alg_async_req {
* @tsgl_list: Link to TX SGL
* @iv: IV for cipher operation
* @aead_assoclen: Length of AAD for AEAD cipher operations
- * @completion: Work queue for synchronous operation
+ * @wait: helper structure for async operation
* @used: TX bytes sent to kernel. This variable is used to
* ensure that user space cannot cause the kernel
* to allocate too much memory in sendmsg operation.
@@ -174,10 +174,10 @@ static inline struct alg_sock *alg_sk(struct sock *sk)
}

/**
- * Size of available buffer for sending data from user space to kernel.
+ * af_alg_sndbuf - Size of available buffer for sending data from user space to kernel.
*
- * @sk socket of connection to user space
- * @return number of bytes still available
+ * @sk: socket of connection to user space
+ * @return: number of bytes still available
*/
static inline int af_alg_sndbuf(struct sock *sk)
{
@@ -189,10 +189,10 @@ static inline int af_alg_sndbuf(struct sock *sk)
}

/**
- * Can the send buffer still be written to?
+ * af_alg_writable - Can the send buffer still be written to?
*
- * @sk socket of connection to user space
- * @return true => writable, false => not writable
+ * @sk: socket of connection to user space
+ * @return: true => writable, false => not writable
*/
static inline bool af_alg_writable(struct sock *sk)
{
@@ -200,10 +200,10 @@ static inline bool af_alg_writable(struct sock *sk)
}

/**
- * Size of available buffer used by kernel for the RX user space operation.
+ * af_alg_rcvbuf - Size of available buffer used by kernel for the RX user space operation.
*
- * @sk socket of connection to user space
- * @return number of bytes still available
+ * @sk: socket of connection to user space
+ * @return: number of bytes still available
*/
static inline int af_alg_rcvbuf(struct sock *sk)
{
@@ -215,10 +215,10 @@ static inline int af_alg_rcvbuf(struct sock *sk)
}

/**
- * Can the RX buffer still be written to?
+ * af_alg_readable - Can the RX buffer still be written to?
*
- * @sk socket of connection to user space
- * @return true => writable, false => not writable
+ * @sk: socket of connection to user space
+ * @return: true => writable, false => not writable
*/
static inline bool af_alg_readable(struct sock *sk)
{
diff --git a/include/crypto/internal/ecc.h b/include/crypto/internal/ecc.h
index 4f6c1a68882f..4b8155fea03c 100644
--- a/include/crypto/internal/ecc.h
+++ b/include/crypto/internal/ecc.h
@@ -76,7 +76,7 @@ int ecc_is_key_valid(unsigned int curve_id, unsigned int ndigits,
* point G.
* @curve_id: id representing the curve to use
* @ndigits: curve number of digits
- * @private_key: buffer for storing the generated private key
+ * @privkey: buffer for storing the generated private key
*
* Returns 0 if the private key was generated successfully, a negative value
* if an error occurred.
@@ -172,7 +172,7 @@ int vli_cmp(const u64 *left, const u64 *right, unsigned int ndigits);
*
* @result: where to write result
* @left: vli
- * @right vli
+ * @right: vli
* @ndigits: length of all vlis
*
* Note: can modify in-place.
@@ -236,7 +236,7 @@ void vli_mod_mult_slow(u64 *result, const u64 *left, const u64 *right,
unsigned int vli_num_bits(const u64 *vli, unsigned int ndigits);

/**
- * ecc_aloc_point() - Allocate ECC point.
+ * ecc_alloc_point() - Allocate ECC point.
*
* @ndigits: Length of vlis in u64 qwords.
*
@@ -254,7 +254,7 @@ void ecc_free_point(struct ecc_point *p);
/**
* ecc_point_is_zero() - Check if point is zero.
*
- * @p: Point to check for zero.
+ * @point: Point to check for zero.
*
* Return: true if point is the point at infinity, false otherwise.
*/
diff --git a/include/crypto/internal/rsa.h b/include/crypto/internal/rsa.h
index e870133f4b77..78a7544aaa11 100644
--- a/include/crypto/internal/rsa.h
+++ b/include/crypto/internal/rsa.h
@@ -10,7 +10,7 @@
#include <linux/types.h>

/**
- * rsa_key - RSA key structure
+ * struct rsa_key - RSA key structure
* @n : RSA modulus raw byte stream
* @e : RSA public exponent raw byte stream
* @d : RSA private exponent raw byte stream
diff --git a/include/crypto/kdf_sp800108.h b/include/crypto/kdf_sp800108.h
index b7b20a778fb7..1c16343cd3fd 100644
--- a/include/crypto/kdf_sp800108.h
+++ b/include/crypto/kdf_sp800108.h
@@ -11,17 +11,20 @@
#include <linux/uio.h>

/**
- * Counter KDF generate operation according to SP800-108 section 5.1
- * as well as SP800-56A section 5.8.1 (Single-step KDF).
+ * crypto_kdf108_ctr_generate - Counter KDF generate operation
+ * according to SP800-108 section 5.1
+ * as well as SP800-56A section 5.8.1
+ * (Single-step KDF).
*
- * @kmd Keyed message digest whose key was set with crypto_kdf108_setkey or
- * unkeyed message digest
- * @info optional context and application specific information - this may be
- * NULL
- * @info_vec number of optional context/application specific information entries
- * @dst destination buffer that the caller already allocated
- * @dlen length of the destination buffer - the KDF derives that amount of
- * bytes.
+ * @kmd: Keyed message digest whose key was set with
+ * crypto_kdf108_setkey or unkeyed message digest
+ * @info: optional context and application specific information -
+ * this may be NULL
+ * @info_nvec: number of optional context/application specific
+ * information entries
+ * @dst: destination buffer that the caller already allocated
+ * @dlen: length of the destination buffer -
+ * the KDF derives that amount of bytes.
*
* To comply with SP800-108, the caller must provide Label || 0x00 || Context
* in the info parameter.
@@ -33,14 +36,16 @@ int crypto_kdf108_ctr_generate(struct crypto_shash *kmd,
u8 *dst, unsigned int dlen);

/**
- * Counter KDF setkey operation
+ * crypto_kdf108_setkey - Counter KDF setkey operation
*
- * @kmd Keyed message digest allocated by the caller. The key should not have
- * been set.
- * @key Seed key to be used to initialize the keyed message digest context.
- * @keylen This length of the key buffer.
- * @ikm The SP800-108 KDF does not support IKM - this parameter must be NULL
- * @ikmlen This parameter must be 0.
+ * @kmd: Keyed message digest allocated by the caller.
+ * The key should not have been set.
+ * @key: Seed key to be used to initialize the
+ * keyed message digest context.
+ * @keylen: This length of the key buffer.
+ * @ikm: The SP800-108 KDF does not support IKM -
+ * this parameter must be NULL
+ * @ikmlen: This parameter must be 0.
*
* According to SP800-108 section 7.2, the seed key must be at least as large as
* the message digest size of the used keyed message digest. This limitation
--
2.38.1

Subject: [PATCH v2 2/8] doc: support kernel-doc for asm functions

Support kernel-doc comments in assembly language files for functions
called by C functions.

The comment must include a line containing:
* Prototype: asmlinkage ... rest of C prototype...

and that function name must match the name used in line like:
SYM_FUNC_START(name)

or
SOMETHING name

which is used in a few places in which SYM_FUNC_START is nested.

Signed-off-by: Robert Elliott <[email protected]>

---
v2 Add documentation for the new kernel-doc Prototype line.
Rebased onto 6.1.0.
Support new SYM_TYPED_FUNC_START macro.
---
Documentation/doc-guide/kernel-doc.rst | 79 ++++++++++++++++++++++++++
scripts/kernel-doc | 49 +++++++++++++++-
2 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst
index 1dcbd7332476..554694a15586 100644
--- a/Documentation/doc-guide/kernel-doc.rst
+++ b/Documentation/doc-guide/kernel-doc.rst
@@ -93,6 +93,9 @@ The brief description following the function name may span multiple lines, and
ends with an argument description, a blank comment line, or the end of the
comment block.

+This may also be used to describe a functions in an assembly language file,
+provided that a Prototype line is also present (see below).
+
Function parameters
~~~~~~~~~~~~~~~~~~~

@@ -171,6 +174,82 @@ named ``Return``.
as a new section heading, which probably won't produce the desired
effect.

+Prototypes
+~~~~~~~~~~
+
+In assembly language files (.S files), functions callable by
+C code are defined with::
+
+ SYM_FUNCTION_START(function_name)
+ assembly language code ...
+
+This does not list the arguments like a C function definition; that
+information is implicit in the assembly language instructions that follow.
+
+To document that usage and how the function should be referenced by
+C code, include the recommended Prototype like this::
+
+ /**
+ * crc_pcl - Calculate CRC32C using x86 CRC32 and PCLMULQDQ instructions
+ * @buffer: address of data (%rdi, bufp macro)
+ * @len: data size (%rsi, len macro)
+ * @crc_init: initial CRC32C value (%rdx, crc_init_arg macro);
+ * only using lower 32 bits
+ *
+ * This function supports 64-bit CPUs.
+ * It loops on 8-byte aligned QWORDs, but also supports unaligned
+ * addresses and all length values.
+ *
+ * Return: CRC32C value (upper 32 bits zero)(%rax)
+ * Prototype: asmlinkage unsigned int crc_pcl(const u8 *buffer,
+ * unsigned int len,
+ * unsigned int crc_init);
+ */
+ SYM_FUNC_START(crc_pcl)
+ assembly language code ...
+
+scripts/kernel-doc ensures that the arguments match those in the
+prototype and that the function name matches everywhere.
+
+Variants of SYM_FUNC_START like SYM_TYPED_FUNC_START and
+SYM_FUNC_START_WEAK are also supprted.
+
+In a few cases, a macro is defined that contains the SYM_FUNC_START()
+macro and code. scripts/kernel-doc recognizes that format as well::
+
+ .macro SHA1_VECTOR_ASM name
+ SYM_FUNC_START(\name)
+ assembly language code ...
+
+ /**
+ * sha1_transform_ssse3 - Calculate SHA1 hash using the x86 SSSE3 feature set
+ * @digest: address of current 20-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, BUF macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, CNT macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha1_transform_ssse3(u32 *digest, const u8 *data, int blocks)
+ */
+ SHA1_VECTOR_ASM sha1_transform_ssse3
+
+ /**
+ * sha1_transform_avx - Calculate SHA1 hash using the x86 AVX feature set
+ * @digest: address of current 20-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, BUF macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, CNT macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha1_transform_avx(u32 *digest, const u8 *data, int blocks)
+ */
+ SHA1_VECTOR_ASM sha1_transform_avx
+
+
Structure, union, and enumeration documentation
-----------------------------------------------

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index 54b0893cae66..e23591d3c78c 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -174,6 +174,7 @@ my %nosymbol_table = ();
my $declaration_start_line;
my ($type, $declaration_name, $return_type);
my ($newsection, $newcontents, $prototype, $brcount, %source_map);
+my %asmprototypes;

if (defined($ENV{'KBUILD_VERBOSE'})) {
$verbose = "$ENV{'KBUILD_VERBOSE'}";
@@ -248,7 +249,7 @@ my $doc_decl = $doc_com . '(\w+)';
# while trying to not match literal block starts like "example::"
#
my $doc_sect = $doc_com .
- '\s*(\@[.\w]+|\@\.\.\.|description|context|returns?|notes?|examples?)\s*:([^:].*)?$';
+ '\s*(\@[.\w]+|\@\.\.\.|description|context|returns?|prototype|notes?|examples?)\s*:([^:].*)?$';
my $doc_content = $doc_com_body . '(.*)';
my $doc_block = $doc_com . 'DOC:\s*(.*)?';
my $doc_inline_start = '^\s*/\*\*\s*$';
@@ -278,6 +279,7 @@ my $section_intro = "Introduction";
my $section = $section_default;
my $section_context = "Context";
my $section_return = "Return";
+my $section_asmprototype = "Prototype";

my $undescribed = "-- undescribed --";

@@ -469,6 +471,13 @@ sub dump_section {
$new_start_line = 0;
}
}
+
+ if ($name eq $section_asmprototype) {
+ # extract the function name for future matching to SYM.*FUNC_START.*(name)
+ # since that doesn't include arguments like a C function call
+ my ($func) = ($contents =~ /^.*\s+(\S+)\(/);
+ $asmprototypes{$func} = $contents;
+ }
}

##
@@ -1865,9 +1874,32 @@ sub syscall_munge() {
sub process_proto_function($$) {
my $x = shift;
my $file = shift;
+ my $funcname;

$x =~ s@\/\/.*$@@gos; # strip C99-style comments to end of line

+ # support asm functions declared with one of these starting in
+ # the first column:
+ # SYM_FUNC_START(name)
+ # SYM_FUNC_START_LOCAL(name)
+ # SYM_FUNC_START_WEAK(name)
+ # SYM_TYPED_FUNC_START(name)
+ # or for nested macros:
+ # SOMESTRING<whitespace>name
+ if ($file =~ /\.S$/) {
+ if ($x =~ /^SYM.*FUNC_START/) {
+ ($funcname) = ($x =~ /^SYM.*FUNC_START.*\((.*)\)/);
+ } elsif ($x =~ /^[A-Za-z0-9_]+\s+[A-Za-z0-9_]+/) {
+ ($funcname) = ($x =~ /^[A-Za-z0-9_]+\s+([A-Za-z0-9_]+)/);
+ }
+ }
+ if (defined $funcname) {
+ $prototype = $asmprototypes{$funcname};
+ dump_function($asmprototypes{$funcname}, $file);
+ reset_state();
+ return;
+ }
+
if ($x =~ m#\s*/\*\s+MACDOC\s*#io || ($x =~ /^#/ && $x !~ /^#\s*define/)) {
# do nothing
}
@@ -2106,6 +2138,8 @@ sub process_body($$) {
$newsection = $section_default;
} elsif ($newsection =~ m/^context$/i) {
$newsection = $section_context;
+ } elsif ($newsection =~ m/^prototype$/i) {
+ $newsection = $section_asmprototype;
} elsif ($newsection =~ m/^returns?$/i) {
$newsection = $section_return;
} elsif ($newsection =~ m/^\@return$/) {
@@ -2156,6 +2190,16 @@ sub process_body($$) {
$contents = "";
$new_start_line = $.;
$state = STATE_BODY;
+ } elsif ($section eq $section_asmprototype) {
+ my ($protoline) = /Prototype:\s+(.+)$/;
+ my ($funcname) = $protoline =~ /Prototype\.*\s+(\S+)\(/;
+
+ $asmprototypes{$funcname} = $protoline;
+ dump_section($file, $section, $contents);
+ $section = $section_default;
+ $contents = "";
+ $new_start_line = $.;
+ $state = STATE_BODY;
} else {
if ($section ne $section_default) {
$state = STATE_BODY_WITH_BLANK_LINE;
@@ -2171,7 +2215,7 @@ sub process_body($$) {
$declaration_purpose =~ s/\s+/ /g;
} else {
my $cont = $1;
- if ($section =~ m/^@/ || $section eq $section_context) {
+ if ($section =~ m/^@/ || $section eq $section_context || $section eq $section_asmprototype) {
if (!defined $leading_space) {
if ($cont =~ m/^(\s+)/) {
$leading_space = $1;
@@ -2307,6 +2351,7 @@ sub process_file($) {
}
# Replace tabs by spaces
while ($_ =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e) {};
+
# Hand this line to the appropriate state handler
if ($state == STATE_NORMAL) {
process_normal();
--
2.38.1

Subject: [PATCH v2 3/8] crypto: x86/sha - add kernel-doc comments to assembly

Add kernel-doc comments for assembly language functions exported to
C glue code.

Remove .align directives that are overridden by SYM_FUNC_START
(which includes .align 4).

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/sha1_avx2_x86_64_asm.S | 32 +++++++++++------------
arch/x86/crypto/sha1_ni_asm.S | 19 ++++++++------
arch/x86/crypto/sha1_ssse3_asm.S | 33 +++++++++++++++--------
arch/x86/crypto/sha256-avx-asm.S | 23 ++++++++++------
arch/x86/crypto/sha256-avx2-asm.S | 24 +++++++++++------
arch/x86/crypto/sha256-ssse3-asm.S | 25 +++++++++++-------
arch/x86/crypto/sha256_ni_asm.S | 22 +++++++++-------
arch/x86/crypto/sha512-avx-asm.S | 33 +++++++++++------------
arch/x86/crypto/sha512-avx2-asm.S | 34 ++++++++++++------------
arch/x86/crypto/sha512-ssse3-asm.S | 36 ++++++++++++--------------
10 files changed, 159 insertions(+), 122 deletions(-)

diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
index a96b2fd26dab..c3ee9334cb0f 100644
--- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S
+++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
@@ -62,11 +62,6 @@
*Visit http://software.intel.com/en-us/articles/
*and refer to improving-the-performance-of-the-secure-hash-algorithm-1/
*
- *Updates 20-byte SHA-1 record at start of 'state', from 'input', for
- *even number of 'blocks' consecutive 64-byte blocks.
- *
- *extern "C" void sha1_transform_avx2(
- * struct sha1_state *state, const u8* input, int blocks );
*/

#include <linux/linkage.h>
@@ -629,13 +624,22 @@ _loop3:
_end:

.endm
-/*
- * macro implements SHA-1 function's body for several 64-byte blocks
- * param: function's name
- */
-.macro SHA1_VECTOR_ASM name
- SYM_FUNC_START(\name)

+.text
+
+/**
+ * sha1_transform_avx2 - Calculate SHA1 hash using the x86 AVX2 feature set
+ * @digest: address of current 20-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, BUF macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, CNT macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha1_transform_avx2(u32 *digest, const u8 *data, int blocks)
+ */
+SYM_FUNC_START(sha1_transform_avx2)
push %rbx
push %r12
push %r13
@@ -675,9 +679,7 @@ _loop3:
pop %rbx

RET
-
- SYM_FUNC_END(\name)
-.endm
+SYM_FUNC_END(sha1_transform_avx2)

.section .rodata

@@ -706,6 +708,4 @@ BSWAP_SHUFB_CTL:
.long 0x04050607
.long 0x08090a0b
.long 0x0c0d0e0f
-.text

-SHA1_VECTOR_ASM sha1_transform_avx2
diff --git a/arch/x86/crypto/sha1_ni_asm.S b/arch/x86/crypto/sha1_ni_asm.S
index cade913d4882..a69595b033c8 100644
--- a/arch/x86/crypto/sha1_ni_asm.S
+++ b/arch/x86/crypto/sha1_ni_asm.S
@@ -72,9 +72,16 @@
#define MSG3 %xmm6
#define SHUF_MASK %xmm7

+.text

-/*
- * Intel SHA Extensions optimized implementation of a SHA-1 update function
+/**
+ * sha1_ni_transform - Calculate SHA1 hash using the x86 SHA-NI feature set
+ * @digest: address of current 20-byte hash value (%rdi, DIGEST_PTR macro)
+ * @data: address of data (%rsi, DATA_PTR macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, NUM_BLKS macro)
+ *
+ * This function supports 64-bit CPUs.
*
* The function takes a pointer to the current hash values, a pointer to the
* input data, and a number of 64 byte blocks to process. Once all blocks have
@@ -86,13 +93,9 @@
* The indented lines in the loop are instructions related to rounds processing.
* The non-indented lines are instructions related to the message schedule.
*
- * void sha1_ni_transform(uint32_t *digest, const void *data,
- uint32_t numBlocks)
- * digest : pointer to digest
- * data: pointer to input data
- * numBlocks: Number of blocks to process
+ * Return: none
+ * Prototype: asmlinkage void sha1_ni_transform(u32 *digest, const u8 *data, int blocks)
*/
-.text
SYM_TYPED_FUNC_START(sha1_ni_transform)
push %rbp
mov %rsp, %rbp
diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S
index f54988c80eb4..1472fe35dfae 100644
--- a/arch/x86/crypto/sha1_ssse3_asm.S
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -451,20 +451,24 @@ BSWAP_SHUFB_CTL:
.long 0x0c0d0e0f


-.section .text
-
W_PRECALC_SSSE3
.macro xmm_mov a, b
movdqu \a,\b
.endm

-/*
- * SSSE3 optimized implementation:
+.text
+
+/**
+ * sha1_transform_ssse3 - Calculate SHA1 hash using the x86 SSSE3 feature set
+ * @digest: address of current 20-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, BUF macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, CNT macro)
*
- * extern "C" void sha1_transform_ssse3(struct sha1_state *state,
- * const u8 *data, int blocks);
+ * This function supports 64-bit CPUs.
*
- * Note that struct sha1_state is assumed to begin with u32 state[5].
+ * Return: none
+ * Prototype: asmlinkage void sha1_transform_ssse3(u32 *digest, const u8 *data, int blocks)
*/
SHA1_VECTOR_ASM sha1_transform_ssse3

@@ -546,9 +550,16 @@ W_PRECALC_AVX
vmovdqu \a,\b
.endm

-
-/* AVX optimized implementation:
- * extern "C" void sha1_transform_avx(struct sha1_state *state,
- * const u8 *data, int blocks);
+/**
+ * sha1_transform_avx - Calculate SHA1 hash using the x86 AVX feature set
+ * @digest: address of current 20-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, BUF macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, CNT macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha1_transform_avx(u32 *digest, const u8 *data, int blocks)
*/
SHA1_VECTOR_ASM sha1_transform_avx
diff --git a/arch/x86/crypto/sha256-avx-asm.S b/arch/x86/crypto/sha256-avx-asm.S
index 5555b5d5215a..73615c5bbb5f 100644
--- a/arch/x86/crypto/sha256-avx-asm.S
+++ b/arch/x86/crypto/sha256-avx-asm.S
@@ -95,9 +95,9 @@ SHUF_00BA = %xmm10 # shuffle xBxA -> 00BA
SHUF_DC00 = %xmm12 # shuffle xDxC -> DC00
BYTE_FLIP_MASK = %xmm13

-NUM_BLKS = %rdx # 3rd arg
-INP = %rsi # 2nd arg
CTX = %rdi # 1st arg
+INP = %rsi # 2nd arg
+NUM_BLKS = %rdx # 3rd arg

SRND = %rsi # clobbers INP
c = %ecx
@@ -340,13 +340,20 @@ a = TMP_
ROTATE_ARGS
.endm

-########################################################################
-## void sha256_transform_avx(state sha256_state *state, const u8 *data, int blocks)
-## arg 1 : pointer to state
-## arg 2 : pointer to input data
-## arg 3 : Num blocks
-########################################################################
.text
+
+/**
+ * sha256_transform_avx - Calculate SHA256 hash using the x86 AVX feature set
+ * @digest: address of current 32-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, INP macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, NUM_BLKS macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha256_transform_avx(u32 *digest, const u8 *data, int blocks)
+ */
SYM_TYPED_FUNC_START(sha256_transform_avx)
pushq %rbx
pushq %r12
diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
index 3eada9416852..73cbe0322e7d 100644
--- a/arch/x86/crypto/sha256-avx2-asm.S
+++ b/arch/x86/crypto/sha256-avx2-asm.S
@@ -90,9 +90,9 @@ BYTE_FLIP_MASK = %ymm13

X_BYTE_FLIP_MASK = %xmm13 # XMM version of BYTE_FLIP_MASK

-NUM_BLKS = %rdx # 3rd arg
-INP = %rsi # 2nd arg
CTX = %rdi # 1st arg
+INP = %rsi # 2nd arg
+NUM_BLKS = %rdx # 3rd arg
c = %ecx
d = %r8d
e = %edx # clobbers NUM_BLKS
@@ -517,13 +517,21 @@ STACK_SIZE = _CTX + _CTX_SIZE

.endm

-########################################################################
-## void sha256_transform_rorx(struct sha256_state *state, const u8 *data, int blocks)
-## arg 1 : pointer to state
-## arg 2 : pointer to input data
-## arg 3 : Num blocks
-########################################################################
.text
+
+/**
+ * sha256_transform_rorx - Calculate SHA512 hash using x86 AVX2 feature set
+ * including the RORX (rotate right logical without affecting flags) instruction
+ * @digest: address of current 32-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, INP macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, NUM_BLKS macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha256_transform_rorx(u32 *digest, const u8 *data, int blocks)
+ */
SYM_TYPED_FUNC_START(sha256_transform_rorx)
pushq %rbx
pushq %r12
diff --git a/arch/x86/crypto/sha256-ssse3-asm.S b/arch/x86/crypto/sha256-ssse3-asm.S
index 959288eecc68..1b2a5bb405a7 100644
--- a/arch/x86/crypto/sha256-ssse3-asm.S
+++ b/arch/x86/crypto/sha256-ssse3-asm.S
@@ -88,9 +88,9 @@ SHUF_00BA = %xmm10 # shuffle xBxA -> 00BA
SHUF_DC00 = %xmm11 # shuffle xDxC -> DC00
BYTE_FLIP_MASK = %xmm12

-NUM_BLKS = %rdx # 3rd arg
-INP = %rsi # 2nd arg
CTX = %rdi # 1st arg
+INP = %rsi # 2nd arg
+NUM_BLKS = %rdx # 3rd arg

SRND = %rsi # clobbers INP
c = %ecx
@@ -347,15 +347,20 @@ a = TMP_
ROTATE_ARGS
.endm

-########################################################################
-## void sha256_transform_ssse3(struct sha256_state *state, const u8 *data,
-## int blocks);
-## arg 1 : pointer to state
-## (struct sha256_state is assumed to begin with u32 state[8])
-## arg 2 : pointer to input data
-## arg 3 : Num blocks
-########################################################################
.text
+
+/**
+ * sha256_transform_ssse3 - Calculate SHA256 hash using the x86 SSSE3 feature set
+ * @digest: address of current 32-byte hash value (%rdi, CTX macro)
+ * @data: address of data (%rsi, INP macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, NUM_BLKS macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha256_transform_ssse3(u32 *digest, const u8 *data, int blocks)
+ */
SYM_TYPED_FUNC_START(sha256_transform_ssse3)
pushq %rbx
pushq %r12
diff --git a/arch/x86/crypto/sha256_ni_asm.S b/arch/x86/crypto/sha256_ni_asm.S
index 537b6dcd7ed8..e7a3b9939327 100644
--- a/arch/x86/crypto/sha256_ni_asm.S
+++ b/arch/x86/crypto/sha256_ni_asm.S
@@ -76,8 +76,16 @@
#define ABEF_SAVE %xmm9
#define CDGH_SAVE %xmm10

-/*
- * Intel SHA Extensions optimized implementation of a SHA-256 update function
+.text
+
+/**
+ * sha256_ni_transform - Calculate SHA256 hash using the x86 SHA-NI feature set
+ * @digest: address of current 32-byte hash value (%rdi, DIGEST_PTR macro)
+ * @data: address of data (%rsi, DATA_PTR macro);
+ * data size must be a multiple of 64 bytes
+ * @blocks: number of 64-byte blocks (%rdx, NUM_BLKS macro)
+ *
+ * This function supports 64-bit CPUs.
*
* The function takes a pointer to the current hash values, a pointer to the
* input data, and a number of 64 byte blocks to process. Once all blocks have
@@ -89,16 +97,10 @@
* The indented lines in the loop are instructions related to rounds processing.
* The non-indented lines are instructions related to the message schedule.
*
- * void sha256_ni_transform(uint32_t *digest, const void *data,
- uint32_t numBlocks);
- * digest : pointer to digest
- * data: pointer to input data
- * numBlocks: Number of blocks to process
+ * Return: none
+ * Prototype: asmlinkage void sha256_ni_transform(u32 *digest, const u8 *data, int blocks)
*/
-
-.text
SYM_TYPED_FUNC_START(sha256_ni_transform)
-
shl $6, NUM_BLKS /* convert to bytes */
jz .Ldone_hash
add DATA_PTR, NUM_BLKS /* pointer to end of data */
diff --git a/arch/x86/crypto/sha512-avx-asm.S b/arch/x86/crypto/sha512-avx-asm.S
index b0984f19fdb4..958e355915d0 100644
--- a/arch/x86/crypto/sha512-avx-asm.S
+++ b/arch/x86/crypto/sha512-avx-asm.S
@@ -50,15 +50,10 @@
#include <linux/linkage.h>
#include <linux/cfi_types.h>

-.text
-
# Virtual Registers
-# ARG1
-digest = %rdi
-# ARG2
-msg = %rsi
-# ARG3
-msglen = %rdx
+digest = %rdi # ARG1
+msg = %rsi # ARG2
+msglen = %rdx # ARG3
T1 = %rcx
T2 = %r8
a_64 = %r9
@@ -266,14 +261,20 @@ frame_size = frame_WK + WK_SIZE
RotateState
.endm

-########################################################################
-# void sha512_transform_avx(sha512_state *state, const u8 *data, int blocks)
-# Purpose: Updates the SHA512 digest stored at "state" with the message
-# stored in "data".
-# The size of the message pointed to by "data" must be an integer multiple
-# of SHA512 message blocks.
-# "blocks" is the message length in SHA512 blocks
-########################################################################
+.text
+
+/**
+ * sha512_transform_avx - Calculate SHA512 hash using the x86 AVX feature set
+ * @digest: address of current 64-byte hash value (%rdi, digest macro)
+ * @data: address of data (%rsi, msg macro);
+ * data must be a multiple of 128 bytes
+ * @blocks: number of 128-byte blocks (%rdx, msglen macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha512_transform_avx(u32 *digest, const u8 *data, int blocks)
+ */
SYM_TYPED_FUNC_START(sha512_transform_avx)
test msglen, msglen
je nowork
diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
index b1ca99055ef9..e896b2f54120 100644
--- a/arch/x86/crypto/sha512-avx2-asm.S
+++ b/arch/x86/crypto/sha512-avx2-asm.S
@@ -52,8 +52,6 @@
#include <linux/linkage.h>
#include <linux/cfi_types.h>

-.text
-
# Virtual Registers
Y_0 = %ymm4
Y_1 = %ymm5
@@ -69,13 +67,10 @@ XFER = YTMP0

BYTE_FLIP_MASK = %ymm9

-# 1st arg is %rdi, which is saved to the stack and accessed later via %r12
-CTX1 = %rdi
+CTX1 = %rdi # 1st arg, which is saved to the stack and accessed later via %r12
CTX2 = %r12
-# 2nd arg
-INP = %rsi
-# 3rd arg
-NUM_BLKS = %rdx
+INP = %rsi # 2nd arg
+NUM_BLKS = %rdx # 3rd arg

c = %rcx
d = %r8
@@ -558,14 +553,21 @@ frame_size = frame_CTX + CTX_SIZE

.endm

-########################################################################
-# void sha512_transform_rorx(sha512_state *state, const u8 *data, int blocks)
-# Purpose: Updates the SHA512 digest stored at "state" with the message
-# stored in "data".
-# The size of the message pointed to by "data" must be an integer multiple
-# of SHA512 message blocks.
-# "blocks" is the message length in SHA512 blocks
-########################################################################
+.text
+
+/**
+ * sha512_transform_rorx - Calculate SHA512 hash using the x86 AVX2 feature set
+ * including the RORX (rotate right logical without affecting flags) instruction
+ * @digest: address of 64-byte hash value (%rdi, CTX1 macro)
+ * @data: address of data (%rsi, INP macro);
+ * data must be a multiple of 128 bytes
+ * @blocks: number of 128-byte blocks (%rdx, NUM_BLKS macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha512_transform_rorx(u32 *digest, const u8 *data, int blocks)
+ */
SYM_TYPED_FUNC_START(sha512_transform_rorx)
# Save GPRs
push %rbx
diff --git a/arch/x86/crypto/sha512-ssse3-asm.S b/arch/x86/crypto/sha512-ssse3-asm.S
index c06afb5270e5..3bbf164e07fb 100644
--- a/arch/x86/crypto/sha512-ssse3-asm.S
+++ b/arch/x86/crypto/sha512-ssse3-asm.S
@@ -50,15 +50,10 @@
#include <linux/linkage.h>
#include <linux/cfi_types.h>

-.text
-
# Virtual Registers
-# ARG1
-digest = %rdi
-# ARG2
-msg = %rsi
-# ARG3
-msglen = %rdx
+digest = %rdi # ARG1
+msg = %rsi # ARG2
+msglen = %rdx # ARG3
T1 = %rcx
T2 = %r8
a_64 = %r9
@@ -265,18 +260,21 @@ frame_size = frame_WK + WK_SIZE
RotateState
.endm

-########################################################################
-## void sha512_transform_ssse3(struct sha512_state *state, const u8 *data,
-## int blocks);
-# (struct sha512_state is assumed to begin with u64 state[8])
-# Purpose: Updates the SHA512 digest stored at "state" with the message
-# stored in "data".
-# The size of the message pointed to by "data" must be an integer multiple
-# of SHA512 message blocks.
-# "blocks" is the message length in SHA512 blocks.
-########################################################################
-SYM_TYPED_FUNC_START(sha512_transform_ssse3)
+.text

+/**
+ * sha512_transform_ssse3 - Calculate SHA512 hash using x86 SSSE3 feature set
+ * @digest: address of current 64-byte hash value (%rdi, digest macro)
+ * @data: address of data (%rsi, msg macro);
+ * data size must be a multiple of 128 bytes
+ * @blocks: number of 128-byte blocks (%rdx, msglen macro)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void sha512_transform_ssse3(u32 *digest, const u8 *data, int blocks)
+ */
+SYM_TYPED_FUNC_START(sha512_transform_ssse3)
test msglen, msglen
je nowork

--
2.38.1

Subject: [PATCH v2 5/8] crypto: x86/sm3 - add kernel-doc comments to assembly

Add kernel-doc comments for assembly language functions exported to
C glue code.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/sm3-avx-asm_64.S | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/crypto/sm3-avx-asm_64.S b/arch/x86/crypto/sm3-avx-asm_64.S
index 503bab450a91..30e51c681c07 100644
--- a/arch/x86/crypto/sm3-avx-asm_64.S
+++ b/arch/x86/crypto/sm3-avx-asm_64.S
@@ -322,18 +322,18 @@

.text

-/*
- * Transform nblocks*64 bytes (nblocks*16 32-bit words) at DATA.
+/**
+ * sm3_transform_avx - Calculate SM3 hash using x86 AVX feature set
+ * @state: address of 32-byte context (%rdi, RSTATE macro)
+ * @data: address of data (%rsi, RDATA macro);
+ * must be at least 64 bytes and a multiple of 64 bytes
+ * @nblocks: number of 64-byte blocks (%rdx, RNBLKS macro);
+ * must be >= 1
*
- * void sm3_transform_avx(struct sm3_state *state,
- * const u8 *data, int nblocks);
+ * Return: none. However, the @state buffer is updated.
+ * Prototype: asmlinkage void sm3_transform_avx(u32 *state, const u8 *data, int nblocks);
*/
SYM_TYPED_FUNC_START(sm3_transform_avx)
- /* input:
- * %rdi: ctx, CTX
- * %rsi: data (64*nblks bytes)
- * %rdx: nblocks
- */
vzeroupper;

pushq %rbp;
--
2.38.1

Subject: [PATCH v2 6/8] crypto: x86/ghash - add kernel-doc comments to assembly

Add kernel-doc comments for assembly language functions exported to
C glue code.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/ghash-clmulni-intel_asm.S | 27 +++++++++++++++++++----
1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S
index 2bf871899920..09cf9271b83a 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_asm.S
+++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S
@@ -88,7 +88,16 @@ SYM_FUNC_START_LOCAL(__clmul_gf128mul_ble)
RET
SYM_FUNC_END(__clmul_gf128mul_ble)

-/* void clmul_ghash_mul(char *dst, const u128 *shash) */
+/**
+ * clmul_ghash_mul - Calculate GHASH final multiplication using x86 PCLMULQDQ instructions
+ * @dst: address of hash value to update (%rdi)
+ * @shash: address of hash context (%rsi)
+ *
+ * This supports 64-bit CPUs.
+ *
+ * Return: none (but @dst is updated)
+ * Prototype: asmlinkage void clmul_ghash_mul(char *dst, const u128 *shash)
+ */
SYM_FUNC_START(clmul_ghash_mul)
FRAME_BEGIN
movups (%rdi), DATA
@@ -102,9 +111,19 @@ SYM_FUNC_START(clmul_ghash_mul)
RET
SYM_FUNC_END(clmul_ghash_mul)

-/*
- * void clmul_ghash_update(char *dst, const char *src, unsigned int srclen,
- * const u128 *shash);
+/**
+ * clmul_ghash_update - Calculate GHASH using x86 PCLMULQDQ instructions
+ * @dst: address of hash value to update (%rdi)
+ * @src: address of data to hash (%rsi)
+ * @srclen: number of bytes in data buffer (%rdx);
+ * function does nothing and returns if below 16
+ * @shash: address of hash context (%rcx)
+ *
+ * This supports 64-bit CPUs.
+ *
+ * Return: none (but @dst is updated)
+ * Prototype: asmlinkage clmul_ghash_update(char *dst, const char *src,
+ * unsigned int srclen, const u128 *shash);
*/
SYM_FUNC_START(clmul_ghash_update)
FRAME_BEGIN
--
2.38.1

Subject: [PATCH v2 4/8] crypto: x86/crc - add kernel-doc comments to assembly

Add kernel-doc comments for assembly language functions exported to
C glue code.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/crc32-pclmul_asm.S | 24 ++++++++++---------
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 29 ++++++++++++++---------
arch/x86/crypto/crct10dif-pcl-asm_64.S | 24 +++++++++++++------
3 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/arch/x86/crypto/crc32-pclmul_asm.S b/arch/x86/crypto/crc32-pclmul_asm.S
index ca53e96996ac..f704b2067a80 100644
--- a/arch/x86/crypto/crc32-pclmul_asm.S
+++ b/arch/x86/crypto/crc32-pclmul_asm.S
@@ -17,7 +17,6 @@

#include <linux/linkage.h>

-
.section .rodata
.align 16
/*
@@ -67,19 +66,22 @@
#define CRC %ecx
#endif

-
-
.text
/**
- * Calculate crc32
- * BUF - buffer (16 bytes aligned)
- * LEN - sizeof buffer (16 bytes aligned), LEN should be grater than 63
- * CRC - initial crc32
- * return %eax crc32
- * uint crc32_pclmul_le_16(unsigned char const *buffer,
- * size_t len, uint crc32)
+ * crc32_pclmul_le_16 - Calculate CRC32 using x86 PCLMULQDQ instructions
+ * @buffer: address of data (32-bit %eax/64-bit %rdi, BUF macro);
+ * must be aligned to a multiple of 16
+ * @len: data size (32-bit %edx/64 bit %rsi, LEN macro);
+ * must be a multiple of 16 and greater than 63
+ * @crc32: initial CRC32 value (32-bit %ecx/64-bit $edx, CRC macro)
+ * only uses lower 32 bits
+ *
+ * This function supports both 32-bit and 64-bit CPUs.
+ * It requires data to be aligned and a minimum size.
+ *
+ * Return: (32-bit %eax/64-bit %rax) CRC32 value (in lower 32 bits)
+ * Prototype: asmlinkage u32 crc32_pclmul_le_16(const u8 *buffer, size_t len, u32 crc32);
*/
-
SYM_FUNC_START(crc32_pclmul_le_16) /* buffer and buffer size are 16 bytes aligned */
movdqa (BUF), %xmm1
movdqa 0x10(BUF), %xmm2
diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
index ec35915f0901..3d646011d84b 100644
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -70,22 +70,30 @@
.error "SMALL_ SIZE must be < 256"
.endif

-# unsigned int crc_pcl(u8 *buffer, int len, unsigned int crc_init);
-
.text
+/**
+ * crc_pcl - Calculate CRC32C using x86 CRC32 and PCLMULQDQ instructions
+ * @buffer: address of data (%rdi, bufp macro)
+ * @len: data size (%rsi, len macro)
+ * @crc_init: initial CRC32C value (%rdx, crc_init_arg macro);
+ * only using lower 32 bits
+ *
+ * This function supports 64-bit CPUs.
+ * It loops on 8-byte aligned QWORDs, but also supports unaligned
+ * addresses and all length values.
+ *
+ * Return: CRC32C value (upper 32 bits zero)(%rax)
+ * Prototype: asmlinkage unsigned int crc_pcl(const u8 *buffer, unsigned int len,
+ unsigned int crc_init);
+ */
SYM_FUNC_START(crc_pcl)
#define bufp rdi
-#define bufp_dw %edi
-#define bufp_w %di
-#define bufp_b %dil
#define bufptmp %rcx
#define block_0 %rcx
#define block_1 %rdx
#define block_2 %r11
#define len %rsi
#define len_dw %esi
-#define len_w %si
-#define len_b %sil
#define crc_init_arg %rdx
#define tmp %rbx
#define crc_init %r8
@@ -97,7 +105,7 @@ SYM_FUNC_START(crc_pcl)
pushq %rdi
pushq %rsi

- ## Move crc_init for Linux to a different
+ ## Move crc_init for Linux to a different register
mov crc_init_arg, crc_init

################################################################
@@ -216,7 +224,7 @@ LABEL crc_ %i
## 4) Combine three results:
################################################################

- lea (K_table-8)(%rip), %bufp # first entry is for idx 1
+ lea (K_table-8)(%rip), %bufp # first entry is for idx 1
shlq $3, %rax # rax *= 8
pmovzxdq (%bufp,%rax), %xmm0 # 2 consts: K1:K2
leal (%eax,%eax,2), %eax # rax *= 3 (total *24)
@@ -326,10 +334,9 @@ JMPTBL_ENTRY %i
i=i+1
.endr

-
################################################################
## PCLMULQDQ tables
- ## Table is 128 entries x 2 words (8 bytes) each
+ ## Table is 128 entries x 8 bytes each
################################################################
.align 8
K_table:
diff --git a/arch/x86/crypto/crct10dif-pcl-asm_64.S b/arch/x86/crypto/crct10dif-pcl-asm_64.S
index 5286db5b8165..6903713f7e1b 100644
--- a/arch/x86/crypto/crct10dif-pcl-asm_64.S
+++ b/arch/x86/crypto/crct10dif-pcl-asm_64.S
@@ -52,8 +52,6 @@

#include <linux/linkage.h>

-.text
-
#define init_crc %edi
#define buf %rsi
#define len %rdx
@@ -89,11 +87,23 @@
xorps \src_reg, \dst_reg
.endm

-#
-# u16 crc_t10dif_pcl(u16 init_crc, const *u8 buf, size_t len);
-#
-# Assumes len >= 16.
-#
+.text
+/**
+ * crc_t10dif_pcl - Calculate CRC16 per T10 DIF (data integrity format)
+ * using x86 PCLMULQDQ instructions
+ * @init_crc: initial CRC16 value (%rdi, init_crc macro);
+ * only uses lower 16 bits
+ * @buf: address of data (%rsi, buf macro);
+ * data buffer must be at least 16 bytes
+ * @len: data size (%rdx, len macro);
+ * must be >= 16
+ *
+ * This function supports 64-bit CPUs.
+ * It allows data to be at any offset.
+ *
+ * Return: (%rax) CRC16 value (upper 48 bits zero)
+ * Prototype: asmlinkage u16 crc_t10dif_pcl(u16 init_crc, const *u8 buf, size_t len);
+ */
SYM_FUNC_START(crc_t10dif_pcl)

movdqa .Lbswap_mask(%rip), BSWAP_MASK
--
2.38.1

Subject: [PATCH v2 8/8] crypto: x86/chacha - add kernel-doc comments to assembly

Add kernel-doc comments for assembly language functions exported to
C glue code.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/chacha-avx2-x86_64.S | 90 +++++++++++++++--------
arch/x86/crypto/chacha-avx512vl-x86_64.S | 94 +++++++++++++++---------
arch/x86/crypto/chacha-ssse3-x86_64.S | 75 ++++++++++++-------
3 files changed, 170 insertions(+), 89 deletions(-)

diff --git a/arch/x86/crypto/chacha-avx2-x86_64.S b/arch/x86/crypto/chacha-avx2-x86_64.S
index f3d8fc018249..5ebced6f32c3 100644
--- a/arch/x86/crypto/chacha-avx2-x86_64.S
+++ b/arch/x86/crypto/chacha-avx2-x86_64.S
@@ -34,18 +34,26 @@ CTR4BL: .octa 0x00000000000000000000000000000002

.text

+/**
+ * chacha_2block_xor_avx2 - Encrypt 2 blocks using the x86 AVX2 feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 2 data blocks output, o (%rsi)
+ * @src: address of up to 2 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts two ChaCha blocks by loading the state
+ * matrix twice across four AVX registers. It performs matrix operations
+ * on four words in each matrix in parallel, but requires shuffling to
+ * rearrange the words after each round.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_2block_xor_avx2(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_2block_xor_avx2)
- # %rdi: Input state matrix, s
- # %rsi: up to 2 data blocks output, o
- # %rdx: up to 2 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
-
- # This function encrypts two ChaCha blocks by loading the state
- # matrix twice across four AVX registers. It performs matrix operations
- # on four words in each matrix in parallel, but requires shuffling to
- # rearrange the words after each round.
-
vzeroupper

# x0..3[0-2] = s0..3
@@ -226,20 +234,28 @@ SYM_FUNC_START(chacha_2block_xor_avx2)

SYM_FUNC_END(chacha_2block_xor_avx2)

+/**
+ * chacha_4block_xor_avx2 - Encrypt 4 blocks using the x86 AVX2 feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 4 data blocks output, o (%rsi)
+ * @src: address of up to 4 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts four ChaCha blocks by loading the state
+ * matrix four times across eight AVX registers. It performs matrix
+ * operations on four words in two matrices in parallel, sequentially
+ * to the operations on the four words of the other two matrices. The
+ * required word shuffling has a rather high latency, we can do the
+ * arithmetic on two matrix-pairs without much slowdown.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_4block_xor_avx2(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_4block_xor_avx2)
- # %rdi: Input state matrix, s
- # %rsi: up to 4 data blocks output, o
- # %rdx: up to 4 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
-
- # This function encrypts four ChaCha blocks by loading the state
- # matrix four times across eight AVX registers. It performs matrix
- # operations on four words in two matrices in parallel, sequentially
- # to the operations on the four words of the other two matrices. The
- # required word shuffling has a rather high latency, we can do the
- # arithmetic on two matrix-pairs without much slowdown.
-
vzeroupper

# x0..3[0-4] = s0..3
@@ -531,12 +547,28 @@ SYM_FUNC_START(chacha_4block_xor_avx2)

SYM_FUNC_END(chacha_4block_xor_avx2)

+/**
+ * chacha_8block_xor_avx2 - Encrypt 8 blocks using the x86 AVX2 feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 8 data blocks output, o (%rsi)
+ * @src: address of up to 8 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts four ChaCha blocks by loading the state
+ * matrix four times across eight AVX registers. It performs matrix
+ * operations on four words in two matrices in parallel, sequentially
+ * to the operations on the four words of the other two matrices. The
+ * required word shuffling has a rather high latency, we can do the
+ * arithmetic on two matrix-pairs without much slowdown.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_8block_xor_avx2(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_8block_xor_avx2)
- # %rdi: Input state matrix, s
- # %rsi: up to 8 data blocks output, o
- # %rdx: up to 8 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds

# This function encrypts eight consecutive ChaCha blocks by loading
# the state matrix in AVX registers eight times. As we need some
diff --git a/arch/x86/crypto/chacha-avx512vl-x86_64.S b/arch/x86/crypto/chacha-avx512vl-x86_64.S
index 259383e1ad44..b4a85365e164 100644
--- a/arch/x86/crypto/chacha-avx512vl-x86_64.S
+++ b/arch/x86/crypto/chacha-avx512vl-x86_64.S
@@ -24,18 +24,26 @@ CTR8BL: .octa 0x00000003000000020000000100000000

.text

+/**
+ * chacha_2block_xor_avx512vl - Encrypt 2 blocks using the x86 AVX512VL feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 2 data blocks output, o (%rsi)
+ * @src: address of up to 2 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts two ChaCha blocks by loading the state
+ * matrix twice across four AVX registers. It performs matrix operations
+ * on four words in each matrix in parallel, but requires shuffling to
+ * rearrange the words after each round.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_2block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_2block_xor_avx512vl)
- # %rdi: Input state matrix, s
- # %rsi: up to 2 data blocks output, o
- # %rdx: up to 2 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
-
- # This function encrypts two ChaCha blocks by loading the state
- # matrix twice across four AVX registers. It performs matrix operations
- # on four words in each matrix in parallel, but requires shuffling to
- # rearrange the words after each round.
-
vzeroupper

# x0..3[0-2] = s0..3
@@ -189,20 +197,28 @@ SYM_FUNC_START(chacha_2block_xor_avx512vl)

SYM_FUNC_END(chacha_2block_xor_avx512vl)

+/**
+ * chacha_4block_xor_avx512vl - Encrypt 4 blocks using the x86 AVX512VL feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 4 data blocks output, o (%rsi)
+ * @src: address of up to 4 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts four ChaCha blocks by loading the state
+ * matrix four times across eight AVX registers. It performs matrix
+ * operations on four words in two matrices in parallel, sequentially
+ * to the operations on the four words of the other two matrices. The
+ * required word shuffling has a rather high latency, we can do the
+ * arithmetic on two matrix-pairs without much slowdown.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_4block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_4block_xor_avx512vl)
- # %rdi: Input state matrix, s
- # %rsi: up to 4 data blocks output, o
- # %rdx: up to 4 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
-
- # This function encrypts four ChaCha blocks by loading the state
- # matrix four times across eight AVX registers. It performs matrix
- # operations on four words in two matrices in parallel, sequentially
- # to the operations on the four words of the other two matrices. The
- # required word shuffling has a rather high latency, we can do the
- # arithmetic on two matrix-pairs without much slowdown.
-
vzeroupper

# x0..3[0-4] = s0..3
@@ -455,18 +471,26 @@ SYM_FUNC_START(chacha_4block_xor_avx512vl)

SYM_FUNC_END(chacha_4block_xor_avx512vl)

+/**
+ * chacha_8block_xor_avx512vl - Encrypt 8 blocks using the x86 AVX512VL feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 8 data blocks output, o (%rsi)
+ * @src: address of up to 8 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts eight consecutive ChaCha blocks by loading
+ * the state matrix in AVX registers eight times. Compared to AVX2, this
+ * mostly benefits from the new rotate instructions in VL and the
+ * additional registers.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_8block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_8block_xor_avx512vl)
- # %rdi: Input state matrix, s
- # %rsi: up to 8 data blocks output, o
- # %rdx: up to 8 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
-
- # This function encrypts eight consecutive ChaCha blocks by loading
- # the state matrix in AVX registers eight times. Compared to AVX2, this
- # mostly benefits from the new rotate instructions in VL and the
- # additional registers.
-
vzeroupper

# x0..15[0-7] = s[0..15]
diff --git a/arch/x86/crypto/chacha-ssse3-x86_64.S b/arch/x86/crypto/chacha-ssse3-x86_64.S
index 7111949cd5b9..6f5395ba54ab 100644
--- a/arch/x86/crypto/chacha-ssse3-x86_64.S
+++ b/arch/x86/crypto/chacha-ssse3-x86_64.S
@@ -34,7 +34,6 @@ CTRINC: .octa 0x00000003000000020000000100000000
* Clobbers: %r8d, %xmm4-%xmm7
*/
SYM_FUNC_START_LOCAL(chacha_permute)
-
movdqa ROT8(%rip),%xmm4
movdqa ROT16(%rip),%xmm5

@@ -111,12 +110,21 @@ SYM_FUNC_START_LOCAL(chacha_permute)
RET
SYM_FUNC_END(chacha_permute)

+/**
+ * chacha_block_xor_ssse3 - Encrypt 1 block using the x86 SSSE3 feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 1 data block output, o (%rsi)
+ * @src: address of up to 1 data block input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_block_xor_ssse3)
- # %rdi: Input state matrix, s
- # %rsi: up to 1 data block output, o
- # %rdx: up to 1 data block input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
FRAME_BEGIN

# x0..3 = s0..3
@@ -199,10 +207,19 @@ SYM_FUNC_START(chacha_block_xor_ssse3)

SYM_FUNC_END(chacha_block_xor_ssse3)

+/**
+ * hchacha_block_ssse3 - Encrypt 1 block using the x86 SSSE3 feature set
+ * @state: address of input state matrix, s (%rdu)
+ * @out: address of output (8 32-bit words)(%rsi)
+ * @nrounds: number of rounds (%edx);
+ * only uses lower 32 bits
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none
+ * Prototype: asmlinkage void hchacha_block_ssse3(const u32 *state, u32 *out, int nrounds);
+ */
SYM_FUNC_START(hchacha_block_ssse3)
- # %rdi: Input state matrix, s
- # %rsi: output (8 32-bit words)
- # %edx: nrounds
FRAME_BEGIN

movdqu 0x00(%rdi),%xmm0
@@ -220,23 +237,31 @@ SYM_FUNC_START(hchacha_block_ssse3)
RET
SYM_FUNC_END(hchacha_block_ssse3)

+/**
+ * chacha_4block_xor_ssse3 - Encrypt 4 blocks using the x86 SSSE3 feature set
+ * @state: address of input state matrix, s (%rdi)
+ * @dst: address of up to 4 data blocks output, o (%rsi)
+ * @src: address of up to 4 data blocks input, i (%rdx)
+ * @len: input/output length in bytes (%rcx)
+ * @nrounds: number of rounds (%r8d)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * This function encrypts four consecutive ChaCha blocks by loading the
+ * state matrix in SSE registers four times. As we need some scratch
+ * registers, we save the first four registers on the stack. The
+ * algorithm performs each operation on the corresponding word of each
+ * state matrix, hence requires no word shuffling. For final XORing step
+ * we transpose the matrix by interleaving 32- and then 64-bit words,
+ * which allows us to do XOR in SSE registers. 8/16-bit word rotation is
+ * done with the slightly better performing SSSE3 byte shuffling,
+ * 7/12-bit word rotation uses traditional shift+OR.
+ *
+ * Return: none
+ * Prototype: asmlinkage void chacha_4block_xor_ssse3(u32 *state, u8 *dst, const u8 *src,
+ * unsigned int len, int nrounds);
+ */
SYM_FUNC_START(chacha_4block_xor_ssse3)
- # %rdi: Input state matrix, s
- # %rsi: up to 4 data blocks output, o
- # %rdx: up to 4 data blocks input, i
- # %rcx: input/output length in bytes
- # %r8d: nrounds
-
- # This function encrypts four consecutive ChaCha blocks by loading the
- # the state matrix in SSE registers four times. As we need some scratch
- # registers, we save the first four registers on the stack. The
- # algorithm performs each operation on the corresponding word of each
- # state matrix, hence requires no word shuffling. For final XORing step
- # we transpose the matrix by interleaving 32- and then 64-bit words,
- # which allows us to do XOR in SSE registers. 8/16-bit word rotation is
- # done with the slightly better performing SSSE3 byte shuffling,
- # 7/12-bit word rotation uses traditional shift+OR.
-
lea 8(%rsp),%r10
sub $0x80,%rsp
and $~63,%rsp
--
2.38.1

Subject: [PATCH v2 7/8] crypto: x86/blake2s - add kernel-doc comments to assembly

Add kernel-doc comments for assembly language functions exported to
C glue code.

Signed-off-by: Robert Elliott <[email protected]>
---
arch/x86/crypto/blake2s-core.S | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/arch/x86/crypto/blake2s-core.S b/arch/x86/crypto/blake2s-core.S
index b50b35ff1fdb..7605e7d94fd2 100644
--- a/arch/x86/crypto/blake2s-core.S
+++ b/arch/x86/crypto/blake2s-core.S
@@ -46,6 +46,19 @@ SIGMA2:
#endif /* CONFIG_AS_AVX512 */

.text
+/**
+ * blake2s_compress_ssse3 - Calculate BLAKE2s hash using the x86 SSSE3 feature set
+ * @state: pointer to 48-byte state (%rdi)
+ * @data: pointer to data (%rsi)
+ * @nblocks: number of 64-byte blocks of data (%rdx)
+ * @inc: counter increment value (%rcx)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none (but modifies state)
+ * Prototype: asmlinkage void blake2s_compress_ssse3(struct blake2s_state *state, const u8 *data,
+ * unsigned int nblocks, u32 inc);
+ */
SYM_FUNC_START(blake2s_compress_ssse3)
testq %rdx,%rdx
je .Lendofloop
@@ -175,6 +188,19 @@ SYM_FUNC_START(blake2s_compress_ssse3)
SYM_FUNC_END(blake2s_compress_ssse3)

#ifdef CONFIG_AS_AVX512
+/**
+ * blake2s_compress_avx512 - Calculate BLAKE2s hash using the x86 AVX-512VL feature set
+ * @state: address of 48-byte state (%rdi)
+ * @data: address of data (%rsi)
+ * @nblocks: number of 64-byte blocks of data (%rdx)
+ * @inc: counter increment value (%rcx)
+ *
+ * This function supports 64-bit CPUs.
+ *
+ * Return: none (but modifies state)
+ * Prototype: asmlinkage void blake2s_compress_avx512(struct blake2s_state *state, const u8 *data,
+ * unsigned int nblocks, u32 inc);
+ */
SYM_FUNC_START(blake2s_compress_avx512)
vmovdqu (%rdi),%xmm0
vmovdqu 0x10(%rdi),%xmm1
--
2.38.1

2022-12-20 06:03:19

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 6/8] crypto: x86/ghash - add kernel-doc comments to assembly

On Mon, Dec 19, 2022 at 12:55:53PM -0600, Robert Elliott wrote:
> +/**
> + * clmul_ghash_mul - Calculate GHASH final multiplication using x86 PCLMULQDQ instructions

Well, it does one multiplication. It's not necessarily the final one.

> + * @dst: address of hash value to update (%rdi)
> + * @shash: address of hash context (%rsi)

This terminology is confusing. I would call these the accumulator and key,
respectively.

> +/**
> + * clmul_ghash_update - Calculate GHASH using x86 PCLMULQDQ instructions
> + * @dst: address of hash value to update (%rdi)
> + * @src: address of data to hash (%rsi)
> + * @srclen: number of bytes in data buffer (%rdx);
> + * function does nothing and returns if below 16
> + * @shash: address of hash context (%rcx)
> + *
> + * This supports 64-bit CPUs.
> + *
> + * Return: none (but @dst is updated)
> + * Prototype: asmlinkage clmul_ghash_update(char *dst, const char *src,
> + * unsigned int srclen, const u128 *shash);

"function does nothing and returns if below 16" =>
"function processes round_down(srclen, 16) bytes".

- Eric

2022-12-20 06:19:10

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 8/8] crypto: x86/chacha - add kernel-doc comments to assembly

On Mon, Dec 19, 2022 at 12:55:55PM -0600, Robert Elliott wrote:
> +/**
> + * chacha_2block_xor_avx2 - Encrypt 2 blocks using the x86 AVX2 feature set
> + * @state: address of input state matrix, s (%rdi)
> + * @dst: address of up to 2 data blocks output, o (%rsi)
> + * @src: address of up to 2 data blocks input, i (%rdx)
> + * @len: input/output length in bytes (%rcx)
> + * @nrounds: number of rounds (%r8d)
> + *
> + * This function encrypts two ChaCha blocks by loading the state
> + * matrix twice across four AVX registers. It performs matrix operations
> + * on four words in each matrix in parallel, but requires shuffling to
> + * rearrange the words after each round.

2 blocks, or up to 2 blocks? What does that mean?

> + *
> + * Return: none
> + * Prototype: asmlinkage void chacha_2block_xor_avx2(u32 *state, u8 *dst, const u8 *src,
> + * unsigned int len, int nrounds);

When the return type is void, there is no need to write "Return: none".

- Eric