2022-09-19 12:13:42

by Yang Shen

[permalink] [raw]
Subject: [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark

Add crypto benchmark - A tool to help the users quickly get the
performance of a algorithm registered in crypto.

The tool tries to use the same API to unify the processes of different
algorithms. The algorithm can do some private operations in the callbacks.
For users, they can see the unified configuration parameters, rather than
a set of configuration parameters corresponding to each algorithm.

This tool can provide users with the ability to test the performance of
algorithms in some specific scenarios. At present, the following parameters
are selected for users configuration: block size, block number,
thread number, bound numa and request number for per tfm. These parameters
can help users simulate approximate business scenarios.

For the RFC version, the compression benchmark test is supported.
I did some verification on Kunpeng920.

The first test case is for zlib-deflate software algorithm.
The cpu frequency is 2.6 GHz. I want to show you the influence of these
parameters.

The configuration is following:
run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 1024,
loop 1, numamask 0x0, optype 0, reqnum 1, threadnum 1, time 1.
The result is :
Crypto benchmark result:
throughput pps time
150 MB/s 150 kPP/s 1000 ms

And then change the block size:
run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 8192,
loop 1, numamask 0x0, optype 0, reqnum 1, threadnum 1, time 1.
Crypto benchmark result:
throughput pps time
473 MB/s 59 kPP/s 1005 ms

run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 65536,
loop 1, numamask 0x0, optype 0, reqnum 1, threadnum 1, time 1.
Crypto benchmark result:
throughput pps time
421 MB/s 6 kPP/s 1038 ms

With the test, users can know that the throughput and pps are both
influenced by block size on this server. And the throughput has a peak
value while the pps is inverse ratio with bolck size increasing.
Due to the software algorithm, thread number will linear increase the
result while it is less than cpu number and other parameters have little
influence on performance.

The second test case is for zlib-deflate hardware. The tested parameters
has the same effect on hardware. Here I test the parameter 'reqnum'.
The software algorithm register to synchronous process. So here it is
useless for software performance.

run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 8192,
loop 1, numamask 0x0, optype 0, reqnum 1, threadnum 1, time 1.
Crypto benchmark result:
throughput pps time
367 MB/s 46 kPP/s 941 ms

run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 8192,
loop 1, numamask 0x0, optype 0, reqnum 10, threadnum 1, time 1.
Crypto benchmark result:
throughput pps time
3507 MB/s 438 kPP/s 1003 ms

run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 8192,
loop 1, numamask 0x0, optype 0, reqnum 100, threadnum 1, time 1.
Crypto benchmark result:
throughput pps time
6318 MB/s 790 kPP/s 1093 ms

So we can know that for asynchronous algorithms, request number for per
tfm also influence the throughput and pps until a peak value.

So with this tool, we can get a quick verification for different platform
and get some reference for business scenarios configuration.

Yang Shen (6):
moduleparams: Add hexulong type parameter
crypto: benchmark - add a crypto benchmark tool
crytpo: benchmark - support compression/decompresssion
crypto: benchmark - add help information
crypto: benchmark - add API documentation
MAINTAINERS: add crypto benchmark MAINTAINER

Documentation/crypto/benchmark.rst | 104 +++++
MAINTAINERS | 7 +
crypto/Kconfig | 2 +
crypto/Makefile | 5 +
crypto/benchmark/Kconfig | 11 +
crypto/benchmark/Makefile | 3 +
crypto/benchmark/benchmark.c | 599 +++++++++++++++++++++++++++++
crypto/benchmark/benchmark.h | 76 ++++
crypto/benchmark/bm_comp.c | 435 +++++++++++++++++++++
crypto/benchmark/bm_comp.h | 19 +
include/linux/moduleparam.h | 7 +-
kernel/params.c | 1 +
12 files changed, 1268 insertions(+), 1 deletion(-)
create mode 100644 Documentation/crypto/benchmark.rst
create mode 100644 crypto/benchmark/Kconfig
create mode 100644 crypto/benchmark/Makefile
create mode 100644 crypto/benchmark/benchmark.c
create mode 100644 crypto/benchmark/benchmark.h
create mode 100644 crypto/benchmark/bm_comp.c
create mode 100644 crypto/benchmark/bm_comp.h

--
2.24.0


2022-09-19 12:13:51

by Yang Shen

[permalink] [raw]
Subject: [RFC PATCH 2/6] crypto: benchmark - add a crypto benchmark tool

Provide a crypto benchmark to help the developer quickly get the
performance of a algorithm registered in crypto.

Due to the crypto algorithms have multifarious parameters, the tool
cannot support all test scenes. In order to provide users with simple
and easy-to-use tools and support as many test scenarios as possible,
benchmark refers to the crypto method to provide a unified struct
'crypto_bm_ops'. And the algorithm registers its own callbacks to parse
the user's input. In crypto, a algorithm class has multiple algorithms,
but all of them uses the same API. So in the benchmark, a algorithm
class uses the same 'ops' and distinguish specific algorithm by name.

First, consider the performance calculation model. Considering the
crypto subsystem model, a reasonable process code based on crypto api
should create a numa node based 'crypto_tfm' in advance and apply for
a certain amount of 'crypto_req' according to their own business.
In the real business processing stage, the thread send tasks based on
'crypto_req' and wait for completion.

Therefore, the benchmark will create 'crypto_tfm' and 'crypto_req' at
first, and then count all requests time to calculate performance.
So the result is the pure algorithm performance. When each algorithm
class implements its own 'ops', it needs to pay attention to the content
completed in the callback. Before the 'ops.perf', the tool had better
prepare the request data set. And in order to avoid the false high
performance of the algorithm caused by the false cache and TLB hit rate,
the size of data set should be larger than 'crypto_req' number.
The 'crypto_bm_ops' has following api:
- init & uninit
The initialize related functions. Algorithm can do some private setting.
- create_tfm & release_tfm
The 'crypto_tfm' related functions. Algorithm has different tfm name in
crypto. But they both has a member named tfm, so use tfm to stand for
algorithm handle. The benchmark has provides the tfm array.
- create_req & release_req
The 'crypto_req' related functions. The callbacks should create a 'reqnum'
'crypto_req' group in struct 'crypto_bm_base'. And the also suggest
prepare the request data in this function. In order to avoid the false
high performance of the algorithm caused by the false cache and TLB hit
rate, the size of data set should be larger than 'crypto_req' number.
- perf
The request sending functions. The registrant should use parameter 'loop'
to send requests repeatly. And update the count in struct
'crypto_bm_thread_data'.

Then consider the parameters that user can configure. Generally speaking,
the following parameters will affect the performance of the algorithm:
tfm number, request number, block size, numa node. And some parameters
will affect the stability of performance: testing time and requests sent
number. To sum up, the benchmark has following parameters:
- algorithm
The testing algorithm name. Showed in /proc/crypto.
- algtype
The testing algorithm class. Can get the algorithm class by echo 'algtype'
to /sys/module/crypto_benchmark/parameters/help.
- inputsize
The testing length that can greatly impact performance. Such as data size
for compress or key length for encryption.
- loop
The testing loop times. Avoid performance fluctuations caused by
environment.
- numamask
The testing bind numamask. Used for allocate memory, create threads and
create 'crypto_tfm'.
- optype
The testing algorithm operation type. Can get the algorithm available
operation types by cat /sys/module/crypto_benchmark/parameters/help
with specified 'algtype'.
- reqnum
The testing request number for per tfm. Used for test asynchrony api
performance.
- threadnum
The testing thread number. To simplify model, create a 'crypto_tfm' per
thread.
- time
The testing time. Used for stop the test thread.
- run
Start or stop the test.

Users can configure parameters under
/sys/modules/crypto_benchmark/parameters/.
Then echo 1 to 'run' to start the test. And if they want to stop the
test, just echo 0 to 'run'.

Signed-off-by: Yang Shen <[email protected]>
---
crypto/Kconfig | 2 +
crypto/Makefile | 5 +
crypto/benchmark/Kconfig | 11 +
crypto/benchmark/Makefile | 3 +
crypto/benchmark/benchmark.c | 509 +++++++++++++++++++++++++++++++++++
crypto/benchmark/benchmark.h | 76 ++++++
6 files changed, 606 insertions(+)
create mode 100644 crypto/benchmark/Kconfig
create mode 100644 crypto/benchmark/Makefile
create mode 100644 crypto/benchmark/benchmark.c
create mode 100644 crypto/benchmark/benchmark.h

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 40423a14f86f..a0f618f349fc 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1438,4 +1438,6 @@ source "drivers/crypto/Kconfig"
source "crypto/asymmetric_keys/Kconfig"
source "certs/Kconfig"

+source "crypto/benchmark/Kconfig"
+
endif # if CRYPTO
diff --git a/crypto/Makefile b/crypto/Makefile
index a6f94e04e1da..67edf4e1337c 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -212,3 +212,8 @@ obj-$(CONFIG_CRYPTO_SIMD) += crypto_simd.o
# Key derivation function
#
obj-$(CONFIG_CRYPTO_KDF800108_CTR) += kdf_sp800108.o
+
+#
+# crypto benchmark
+#
+obj-y += benchmark/
diff --git a/crypto/benchmark/Kconfig b/crypto/benchmark/Kconfig
new file mode 100644
index 000000000000..abee14ba8e40
--- /dev/null
+++ b/crypto/benchmark/Kconfig
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+
+config CRYPTO_BENCHMARK
+ bool "Testing performance of crypto algorithms"
+ depends on CRYPTO
+ help
+ This option support test crypto async api performance.
+ Select this if you want to test crypto algorithms performance
+ conveniently.
+ Before use it, you should check whether the algorithm class is
+ supported.
diff --git a/crypto/benchmark/Makefile b/crypto/benchmark/Makefile
new file mode 100644
index 000000000000..5244178e14c4
--- /dev/null
+++ b/crypto/benchmark/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_CRYPTO_BENCHMARK) += crypto_benchmark.o
+crypto_benchmark-objs += benchmark.o
diff --git a/crypto/benchmark/benchmark.c b/crypto/benchmark/benchmark.c
new file mode 100644
index 000000000000..9a833b277d87
--- /dev/null
+++ b/crypto/benchmark/benchmark.c
@@ -0,0 +1,509 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022 HiSilicon Limited.
+ */
+#include <linux/crypto.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/kthread.h>
+#include <linux/string.h>
+#include <linux/wait.h>
+
+#include "benchmark.h"
+
+enum crypto_bm_status {
+ CRYPTO_BM_STOP,
+ CRYPTO_BM_RUN,
+};
+
+enum crypto_bm_alg {
+ CRYPTO_BM_ALG_MAX,
+};
+
+struct crypto_bm_alg_ops {
+ const char *alg;
+ int (*init)(struct crypto_bm_base *base);
+ void (*uninit)(struct crypto_bm_base *base);
+ int (*create_tfm)(struct crypto_bm_base *base, u32 idx);
+ void (*release_tfm)(struct crypto_bm_base *base, u32 idx);
+ int (*create_req)(struct crypto_bm_base *base, u32 idx);
+ void (*release_req)(struct crypto_bm_base *base, u32 idx);
+ int (*perf)(struct crypto_bm_thread_data *data);
+};
+
+struct {
+ wait_queue_head_t wq;
+ atomic_t count;
+} crypto_bm_wq = { 0 };
+
+#define CRYPTO_BM_THREAD_MAX 1024U
+
+#define algorithm_desc "Testing algorithm name"
+#define algtype_desc "Testing algorithm type, according to enum crypto_bm_alg"
+#define inputsize_desc "Testing input size"
+#define loop_desc "Testing loop times, the unit is kile, 0/1(default, 1 ktimes), 2(2 ktimes) ..."
+#define numamask_desc "Testing bind numamask, 0(default, not bind), 1(bind to node 0), 3(bind to node0 and node1) ..."
+#define optype_desc "Testing algorithm operation type 0 && 1: 0(default, compress/encipher), 1(decompress/decipher)"
+#define reqnum_desc "Testing request number for per tfm, 0/1 (default 1 request), 2(2 requests) ..."
+#define threadnum_desc "Testing thread number, one 'crypto_tfm' per thread. 0/1 (default 1 thread), 2(2 threads) ..."
+#define time_desc "Testing time, the unit is second, 0/1 (default 1 s), 2(2 s) ..."
+#define run_desc "Start/stop all the tests based on the configuration, 0(default, not run, stop), or run"
+
+static atomic_t benchmark_status;
+
+static struct crypto_bm_attrs benchmark_attrs = { 0 };
+
+static struct crypto_bm_base benchmark_base = {
+ .attrs = &benchmark_attrs,
+};
+
+static struct crypto_bm_thread_data thread_data[CRYPTO_BM_THREAD_MAX] = { 0 };
+
+static struct task_struct *crypto_bm_perf[CRYPTO_BM_THREAD_MAX] = { NULL };
+static struct task_struct *test_thread;
+
+static struct crypto_bm_alg_ops benchmark_ops[] = {
+ {
+ /* sentinel */
+ }
+};
+
+static int crypto_bm_algorithm_param_set(const char *val, const struct kernel_param *kp)
+{
+ char *s = strstrip((char *)val);
+
+ if (atomic_read(&benchmark_status))
+ return -EBUSY;
+
+ if (!crypto_has_alg(s, 0, 0)) {
+ pr_err("failed to find the algorithm %s\n", s);
+ return -EINVAL;
+ }
+
+ return param_set_charp(s, kp);
+}
+
+static const struct kernel_param_ops alg_ops = {
+ .set = crypto_bm_algorithm_param_set,
+ .get = param_get_charp,
+};
+
+module_param_cb(algorithm, &alg_ops, &benchmark_attrs.algorithm, 0644);
+MODULE_PARM_DESC(algorithm, algorithm_desc);
+
+static int crypto_bm_numamask_param_set(const char *val, const struct kernel_param *kp)
+{
+ if (atomic_read(&benchmark_status))
+ return -EBUSY;
+
+ return param_set_hexulong(val, kp);
+}
+
+static const struct kernel_param_ops numamask_ops = {
+ .set = crypto_bm_numamask_param_set,
+ .get = param_get_hexulong,
+};
+
+module_param_cb(numamask, &numamask_ops, &benchmark_attrs.numamask, 0644);
+MODULE_PARM_DESC(numamask, numamask_desc);
+
+#define MODULE_PARAMETER_DEF(xxx) \
+static int xxx##_set(const char *val, const struct kernel_param *kp) \
+{ \
+ u32 n; \
+ int ret; \
+ if (atomic_read(&benchmark_status)) \
+ return -EBUSY; \
+ ret = kstrtou32(val, 10, &n); \
+ if (ret != 0) \
+ return -EINVAL; \
+ return param_set_uint(val, kp); \
+} \
+static const struct kernel_param_ops xxx##_ops = { \
+ .set = xxx##_set, \
+ .get = param_get_uint \
+}; \
+module_param_cb(xxx, &xxx##_ops, &benchmark_attrs.xxx, 0644); \
+MODULE_PARM_DESC(xxx, xxx##_desc)
+
+MODULE_PARAMETER_DEF(algtype);
+MODULE_PARAMETER_DEF(inputsize);
+MODULE_PARAMETER_DEF(loop);
+MODULE_PARAMETER_DEF(optype);
+MODULE_PARAMETER_DEF(reqnum);
+MODULE_PARAMETER_DEF(threadnum);
+MODULE_PARAMETER_DEF(time);
+
+static int crypto_bm_check_params(struct crypto_bm_attrs *attrs)
+{
+ if (attrs->algorithm == NULL) {
+ pr_err("algorithm is NULL\n");
+ return -EINVAL;
+ }
+
+ if (attrs->algtype >= CRYPTO_BM_ALG_MAX) {
+ pr_err("algorithm type %d is invalid\n", attrs->algtype);
+ return -EINVAL;
+ }
+
+ if (attrs->inputsize == 0) {
+ pr_err("input size is 0\n");
+ return -EINVAL;
+ }
+
+ if (attrs->threadnum >= CRYPTO_BM_THREAD_MAX) {
+ pr_err("thread number is bigger than %u\n", CRYPTO_BM_THREAD_MAX);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static void crypto_bm_set_default_params(struct crypto_bm_attrs *attrs)
+{
+ attrs->loop = (attrs->loop == 0) ? 1 : attrs->loop;
+ attrs->reqnum = (attrs->reqnum == 0) ? 1 : attrs->reqnum;
+ attrs->threadnum = (attrs->threadnum == 0) ? 1 : attrs->threadnum;
+ attrs->time = (attrs->time == 0) ? 1 : attrs->time;
+}
+
+static int crypto_bm_init_alg(struct crypto_bm_base *base)
+{
+ u32 idx = base->attrs->algtype;
+
+ return benchmark_ops[idx].init(base);
+}
+
+static void crypto_bm_uninit_alg(struct crypto_bm_base *base)
+{
+ u32 idx = base->attrs->algtype;
+
+ benchmark_ops[idx].uninit(base);
+}
+
+static int crypto_bm_create_tfm(struct crypto_bm_base *base)
+{
+ struct crypto_bm_attrs *attrs = base->attrs;
+ int i, ret, nodes, sbit, count = 0;
+ u32 threadnum = attrs->threadnum;
+ u32 threadpernode, threadrest;
+ u32 idx = attrs->algtype;
+
+ base->gthread = kcalloc(threadnum, sizeof(*base->gthread), GFP_KERNEL);
+ if (!base->gthread)
+ return -ENOMEM;
+
+ nodes = bitmap_weight(&attrs->numamask, MAX_NUMNODES);
+
+ if (nodes == 0) {
+ for (i = 0; i < threadnum; i++) {
+ base->gthread[i].id = i;
+ base->gthread[i].node = NUMA_NO_NODE;
+ ret = benchmark_ops[idx].create_tfm(base, i);
+ if (ret)
+ goto out_free_tfm;
+ }
+ } else {
+ threadpernode = threadnum / nodes;
+ threadrest = threadnum % nodes;
+ for_each_set_bit(sbit, (unsigned long *)&attrs->numamask, MAX_NUMNODES) {
+ int start = count * threadpernode;
+ int end = (count + 1) * threadpernode;
+
+ end += (++count == nodes) ? threadrest : 0;
+ for (i = start; i < end; i++) {
+ base->gthread[i].id = i;
+ base->gthread[i].node = sbit;
+ ret = benchmark_ops[idx].create_tfm(base, i);
+ if (ret)
+ goto out_free_tfm;
+ }
+ }
+ }
+
+ return 0;
+
+out_free_tfm:
+ for (i--; i >= 0; i--)
+ benchmark_ops[idx].release_tfm(base, i);
+
+ kfree(base->gthread);
+
+ return ret;
+}
+
+static void crypto_bm_release_tfm(struct crypto_bm_base *base)
+{
+ u32 threadnum = base->attrs->threadnum;
+ u32 idx = base->attrs->algtype;
+ int i;
+
+ for (i = 0; i < threadnum; i++)
+ benchmark_ops[idx].release_tfm(base, i);
+
+ kfree(base->gthread);
+}
+
+static int crypto_bm_create_req(struct crypto_bm_base *base)
+{
+ u32 threadnum = base->attrs->threadnum;
+ u32 idx = base->attrs->algtype;
+ int i, ret;
+
+ for (i = 0; i < threadnum; i++) {
+ ret = benchmark_ops[idx].create_req(base, i);
+ if (ret)
+ goto out_release_req;
+ }
+
+ return 0;
+
+out_release_req:
+ for (i--; i >= 0 ; i--)
+ benchmark_ops[idx].release_req(base, i);
+
+ return ret;
+}
+
+static void crypto_bm_release_req(struct crypto_bm_base *base)
+{
+ u32 threadnum = base->attrs->threadnum;
+ u32 idx = base->attrs->algtype;
+ int i;
+
+ for (i = 0; i < threadnum; i++)
+ benchmark_ops[idx].release_req(base, i);
+}
+
+static int crypto_bm_test_perf(void *data)
+{
+ struct crypto_bm_thread_data *tdata = data;
+ struct crypto_bm_base *base = tdata->base;
+ struct crypto_bm_attrs *attrs = base->attrs;
+ unsigned long endtime = jiffies + attrs->time * HZ;
+ u32 idx = attrs->algtype;
+ int ret;
+
+ do {
+ if (kthread_should_stop())
+ break;
+
+ if (time_after(jiffies, endtime))
+ break;
+
+ ret = benchmark_ops[idx].perf(tdata);
+ if (ret)
+ break;
+ } while (1);
+
+ crypto_bm_perf[tdata->threadid] = NULL;
+ atomic_dec(&crypto_bm_wq.count);
+ wake_up(&crypto_bm_wq.wq);
+
+ return ret;
+}
+
+static void crypto_bm_show_perf(u64 time)
+{
+ u32 threadnum = benchmark_attrs.threadnum;
+ u32 inputsize = benchmark_attrs.inputsize;
+ u64 throughput, pps, reqsum = 0;
+ int i;
+
+ for (i = 0; i < threadnum; i++)
+ reqsum += atomic_read(&thread_data[i].count.recv_req);
+
+ /*
+ * reqsum * inputsize (bytes) / (1024 * 1024)
+ * throughput = -------------------------------------------- (MB/s)
+ * time (ns) / 1000000000
+ */
+ throughput = reqsum * inputsize * 953 / (time);
+
+ /*
+ * reqsum / 1024
+ * pps = -------------------
+ * time / 1000000000
+ */
+ pps = reqsum * 976562 / (time);
+
+ pr_err("Crypto benchmark result:\n"
+ "\t throughput \t pps \t\t time\n"
+ "\t %llu MB/s \t %llu kPP/s \t %llu ms\n",
+ throughput, pps, time / 1000000);
+}
+
+static int crypto_bm_test(void *data)
+{
+ struct crypto_bm_base *base = data;
+ u32 threadnum = base->attrs->threadnum;
+ struct timespec64 begin, end;
+ int i, ret, node;
+
+ init_waitqueue_head(&crypto_bm_wq.wq);
+ atomic_set(&crypto_bm_wq.count, threadnum);
+
+ memset(crypto_bm_perf, 0, sizeof(*crypto_bm_perf) * threadnum);
+
+ ret = crypto_bm_init_alg(base);
+ if (ret)
+ goto out_set_stop;
+
+ ret = crypto_bm_create_tfm(base);
+ if (ret)
+ goto out_uninit;
+
+ ret = crypto_bm_create_req(base);
+ if (ret)
+ goto out_free_tfm;
+
+
+ for (i = 0; i < threadnum; i++) {
+ node = base->gthread[i].node;
+ thread_data[i].threadid = i;
+ thread_data[i].base = base;
+ memset(&thread_data[i].count, 0, sizeof(thread_data[i].count));
+ crypto_bm_perf[i] = kthread_create_on_node(crypto_bm_test_perf, &thread_data[i],
+ node, "crypto_bm_perf-%d", i);
+ if (IS_ERR(crypto_bm_perf[i])) {
+ ret = PTR_ERR(crypto_bm_perf[i]);
+ crypto_bm_perf[i] = NULL;
+ pr_err("failed to create %dth performance thread, ret = %d\n", i, ret);
+ goto out_stop_thread;
+ }
+ kthread_bind_mask(crypto_bm_perf[i], cpumask_of_node(node));
+ }
+ i = 0;
+
+ ktime_get_real_ts64(&begin);
+ for (i = 0; i < threadnum; i++)
+ wake_up_process(crypto_bm_perf[i]);
+ wait_event_interruptible(crypto_bm_wq.wq, atomic_read(&crypto_bm_wq.count) == 0);
+ ktime_get_real_ts64(&end);
+
+ crypto_bm_show_perf(timespec64_to_ns(&end) - timespec64_to_ns(&begin));
+
+out_stop_thread:
+ for (i--; i >= 0; i--) {
+ if (!crypto_bm_perf[i])
+ continue;
+ kthread_stop(crypto_bm_perf[i]);
+ crypto_bm_perf[i] = NULL;
+ }
+
+ crypto_bm_release_req(base);
+
+out_free_tfm:
+ crypto_bm_release_tfm(base);
+
+out_uninit:
+ crypto_bm_uninit_alg(base);
+
+out_set_stop:
+ atomic_set(&benchmark_status, CRYPTO_BM_STOP);
+ test_thread = NULL;
+
+ return ret;
+}
+
+static int crypto_bm_start_test(struct crypto_bm_base *base)
+{
+ int ret = 0;
+
+ if (atomic_cmpxchg(&benchmark_status, CRYPTO_BM_STOP, CRYPTO_BM_RUN)) {
+ pr_err("Crypto benchmark is busy now, please try later!\n");
+ return -EBUSY;
+ }
+
+ test_thread = kthread_run(crypto_bm_test, base, "crypto_bm_test");
+ if (IS_ERR(test_thread))
+ ret = PTR_ERR(test_thread);
+
+ return ret;
+}
+
+static void crypto_bm_stop_test(void)
+{
+ u32 threadnum = benchmark_attrs.threadnum;
+ int i, ret;
+
+ if (!atomic_read(&benchmark_status))
+ return;
+
+ for (i = 0; i < threadnum; i++) {
+ if (!crypto_bm_perf[i])
+ continue;
+ ret = kthread_stop(crypto_bm_perf[i]);
+ if (ret)
+ pr_err("failed to stop %dth performance thread, ret = %d\n", i, ret);
+ crypto_bm_perf[i] = NULL;
+ }
+
+ if (test_thread) {
+ ret = kthread_stop(test_thread);
+ if (ret)
+ pr_err("failed to stop test thread, ret = %d\n", ret);
+ }
+
+ atomic_set(&benchmark_status, CRYPTO_BM_STOP);
+}
+
+static int run_set(const char *val, const struct kernel_param *kp)
+{
+ int ret;
+ u32 n;
+
+ ret = kstrtou32(val, 10, &n);
+ if (ret != 0)
+ return -EINVAL;
+
+ if (n == 0) {
+ crypto_bm_stop_test();
+ } else {
+ ret = crypto_bm_check_params(&benchmark_attrs);
+ if (ret)
+ return ret;
+
+ crypto_bm_set_default_params(&benchmark_attrs);
+
+ ret = crypto_bm_start_test(&benchmark_base);
+ if (ret) {
+ pr_err("failed to start test, ret = %d\n", ret);
+ return ret;
+ }
+ pr_info("run set: algorithm %s, algtype %s, inputsize %d, loop %d, numamask 0x%lx, optype %d, reqnum %d, threadnum %d, time %d.\n",
+ benchmark_attrs.algorithm, benchmark_ops[benchmark_attrs.algtype].alg,
+ benchmark_attrs.inputsize, benchmark_attrs.loop, benchmark_attrs.numamask,
+ benchmark_attrs.optype, benchmark_attrs.reqnum, benchmark_attrs.threadnum,
+ benchmark_attrs.time);
+ }
+
+ return param_set_int(val, kp);
+}
+
+static const struct kernel_param_ops run_ops = {
+ .set = run_set,
+ .get = param_get_uint,
+};
+
+static u32 run;
+module_param_cb(run, &run_ops, &run, 0644);
+MODULE_PARM_DESC(run, run_desc);
+
+static int __init crypto_bm_init(void)
+{
+ atomic_set(&benchmark_status, CRYPTO_BM_STOP);
+
+ return 0;
+}
+
+static void __exit crypto_bm_exit(void)
+{
+}
+
+module_init(crypto_bm_init);
+module_exit(crypto_bm_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Driver for testing performance of crypto algorithms");
diff --git a/crypto/benchmark/benchmark.h b/crypto/benchmark/benchmark.h
new file mode 100644
index 000000000000..84cb49af81ba
--- /dev/null
+++ b/crypto/benchmark/benchmark.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2022 HiSilicon Limited.
+ */
+#ifndef CRYPTO_BM_H
+#define CRYPTO_BM_H
+
+#include <linux/crypto.h>
+#include <linux/errno.h>
+#include <linux/find.h>
+#include <linux/gfp.h>
+#include <linux/nodemask.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+
+/**
+ * struct crypto_bm_attrs - crypto benchmark attributes configured by users.
+ *
+ * @algorithm: The algorithm name registered in crypto.
+ * @algtype: The algorithm class list in enum crypto_bm_alg. Used to
+ * choose the crypto_bm_ops.
+ * @inputsize: The testing length that can greatly impact performance.
+ * Such as data size for compress or key length for encryption.
+ * @loop: The request sending loop times. The value is 1000 times
+ * of user's setting.
+ * @numamask: The mask of testing bind numa nodes.
+ * @optype: The algorithm test operation. Defined by the algorithm self.
+ * @reqnum: The crypto request number of a tfm.
+ * @threadnum: The test thread number. And it is equal to tfm number.
+ * @time: The testing time.
+ */
+struct crypto_bm_attrs {
+ char *algorithm;
+ u32 algtype;
+ u32 inputsize;
+ u32 loop;
+ unsigned long numamask;
+ u32 optype;
+ u32 reqnum;
+ u32 threadnum;
+ u32 time;
+};
+
+/**
+ * struct crypto_bm_base - crypto benchmark test objects.
+ *
+ * @attrs: The test configuration.
+ * @gthread: A array storing resources related to the test thread.
+ */
+struct crypto_bm_base {
+ struct crypto_bm_attrs *attrs;
+ struct {
+ u32 id;
+ int node;
+ void *tfm;
+ void **req;
+ } *gthread;
+};
+
+/**
+ * struct crypto_bm_thread_data - crypto benchmark test thread common information.
+ *
+ * @threadid: The test thread number.
+ * @count: Count the thread test request numbers.
+ * @base: crypto benchmark test objects.
+ */
+struct crypto_bm_thread_data {
+ int threadid;
+ struct {
+ atomic_t send_req;
+ atomic_t recv_req;
+ } count;
+ struct crypto_bm_base *base;
+} ____cacheline_aligned;
+
+#endif
--
2.24.0

2022-09-19 12:13:56

by Yang Shen

[permalink] [raw]
Subject: [RFC PATCH 5/6] crypto: benchmark - add API documentation

Provide a crypto benchmark to help the developer quickly get the
performance of a crypto-registed algorithm.

To simulate more scenes, the tool has following parameters under
'/sys/modules/crypto_benchmark/parameters/' to configure: algorithm,
algtype, inputsize, loop, numamask, optype, reqnum, threadnum
and time.

To shield the differences between different algorithms, the tool has
following interface to do a crypto request: init, uninit, create_tfm,
release_tfm, create_req, release_req, perf and help.

Signed-off-by: Yang Shen <[email protected]>
---
Documentation/crypto/benchmark.rst | 104 +++++++++++++++++++++++++++++
1 file changed, 104 insertions(+)
create mode 100644 Documentation/crypto/benchmark.rst

diff --git a/Documentation/crypto/benchmark.rst b/Documentation/crypto/benchmark.rst
new file mode 100644
index 000000000000..e9b13e81bce3
--- /dev/null
+++ b/Documentation/crypto/benchmark.rst
@@ -0,0 +1,104 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Crypto Benchmark
+================
+
+Overview
+--------
+The crypto benchmark is a crypto algorithm performance tool.
+
+Designed Scheme
+---------------
+
+1. Parameters
+
+The crypto benchmark is used for test the algorithm registered in crypto
+subsystem. Users can use module parameters to simulate different scenarios.
+Both considering the test scenarios and the use complexity, the benchmark
+tool has following module parameters:
+
+- algorithm
+The 'algorithm' is used to create a 'crypto_tfm'. The right algorithm name
+can be found in /proc/crypto.
+
+- algtype
+The 'algtype' is used to find the operations of algorithm. Can get the
+algorithm class by echo 'algtype' to
+/sys/module/crypto_benchmark/parameters/help.
+
+- inputsize
+The 'inputsize' is used as testing inputsize, outputsize will be set
+according to algorithm.
+
+- loop
+The 'loop' is used as times to try to send request for one 'crypto_req'.
+Avoid performance fluctuations caused by environment.
+For synchronization mode, the loop times is equal to send times.
+But for asynchronization, the send times is often less than loop times.
+
+- numamask
+The 'numamask' is used as testing binding numa nodes. The input will be
+analyzed as a bitmap.
+
+- optype
+The 'optype' is used for choose algorithm operation function. Can get the
+algorithm available operation types by cat
+/sys/module/crypto_benchmark/parameters/help with specified 'algtype'.
+For example, choose the compress and decompress when test crypto comp.
+
+- reqnum
+The 'reqnum' is used as requests number of a crypto tfm. For asynchronization,
+one thread may used plural 'crypto_req' to improve performance. One request
+a thread is a synchronous model
+
+- threadnum
+The 'threadnum' is used for creating testing threads. To simplify model,
+create a 'crypto_tfm' per thread. Notice that all threads will be divided
+equally to the specified NUMA node, and threads that cannot be divided
+equally will be created on the last node.
+
+- time
+The 'time' is used for testing. Used for stop the test thread. If the time
+is not enough, the thread will send another group loop times requests.
+
+- run
+The 'run' is used to trigger the test. Echo 0 for stop all test threads,
+and others for starting test.
+
+- help
+The 'help' is used to guide users to use the test interface. Echo a module
+parameter name to 'help' can get the detailed information. Cat the 'help'
+can get some private information according to 'algtype'.
+
+2. Register
+
+There are too many differences between crypto algorithms. Therefore, the
+crypto benchmark only completes the general work. All the different parts
+are put into the callback of the algorithm to complete. The usual crypto
+task can be divided into three parts: alloc tfm, alloc request, and send
+request.
+
+A new algorithm class want to register to crypto benchmark should realize
+following callbacks:
+
+- init & uninit
+The initialize related functions. Algorithm can do some private setting.
+
+- create_tfm & release_tfm
+The crypto_tfm related functions. Algorithm has different tfm name.
+But they both has a member named tfm, so use tfm to stand for algorithm
+handle. The benchmark has provides the tfm array.
+
+- create_req & release_req
+The crypto_req related functions. The registrant should create a 'reqnum'
+'crypto_req' group in struct 'crypto_bm_base'. And the also suggest
+prepare the request data in this function. To simulate real cache and TLB
+hit rate, using a big data groups is a good plan.
+
+- perf
+The request sending functions. The registrant should use parameter 'loop'
+to send requests repeatly. And update the count in struct
+'crypto_bm_thread_data'.
+
+- help
+The algorithm private parameters meaning functions.
--
2.24.0

2022-09-19 12:14:18

by Yang Shen

[permalink] [raw]
Subject: [RFC PATCH 4/6] crypto: benchmark - add help information

Add a new module parameters 'help' to make users understand the benchmark
module parameters. And due to the algorithms have different notes, add
a new callback 'help' to show the differences.

Signed-off-by: Yang Shen <[email protected]>
---
crypto/benchmark/benchmark.c | 79 ++++++++++++++++++++++++++++++++++++
crypto/benchmark/bm_comp.c | 10 +++++
crypto/benchmark/bm_comp.h | 1 +
3 files changed, 90 insertions(+)

diff --git a/crypto/benchmark/benchmark.c b/crypto/benchmark/benchmark.c
index b5dcf5829b22..a3ccd8955eaa 100644
--- a/crypto/benchmark/benchmark.c
+++ b/crypto/benchmark/benchmark.c
@@ -32,6 +32,12 @@ struct crypto_bm_alg_ops {
int (*create_req)(struct crypto_bm_base *base, u32 idx);
void (*release_req)(struct crypto_bm_base *base, u32 idx);
int (*perf)(struct crypto_bm_thread_data *data);
+ void (*help)(void);
+};
+
+struct crypto_bm_mp_info {
+ const char *mp;
+ const char *help_info;
};

struct {
@@ -51,6 +57,9 @@ struct {
#define threadnum_desc "Testing thread number, one 'crypto_tfm' per thread. 0/1 (default 1 thread), 2(2 threads) ..."
#define time_desc "Testing time, the unit is second, 0/1 (default 1 s), 2(2 s) ..."
#define run_desc "Start/stop all the tests based on the configuration, 0(default, not run, stop), or run"
+#define help_desc "Some help information. Echo a module parameter can get the info " \
+ "of module parameter. Cat 'help' directly can get the help "\
+ "information provided by 'algtype'."

static atomic_t benchmark_status;

@@ -75,11 +84,47 @@ static struct crypto_bm_alg_ops benchmark_ops[] = {
.create_req = crypto_bm_create_req_comp,
.release_req = crypto_bm_release_req_comp,
.perf = crypto_bm_perf_comp,
+ .help = crypto_bm_help_comp,
}, {
/* sentinel */
}
};

+static struct crypto_bm_mp_info modules_help[] = {
+ {
+ .mp = "algorithm",
+ .help_info = "Please input a crypto supported algorithm name.\n"
+ "The algorithm name can be found on /proc/crypto.",
+ }, {
+ .mp = "algtype",
+ .help_info = "Please input a valid value to choose algorithm class.\n"
+ "0: CRYPTO_BM_COMP",
+ }, {
+ .mp = "inputsize",
+ .help_info = "Please input a valid value as testing input size.",
+ }, {
+ .mp = "loop",
+ .help_info = "Please input the send loop times.",
+ }, {
+ .mp = "numamask",
+ .help_info = "Please input a bitmap as testing numa nodes.",
+ }, {
+ .mp = "optype",
+ .help_info = "Please input a valid value for testing operation.\n"
+ "Can get the algorithm type support optype by cat 'help'."
+ }, {
+ .mp = "reqnum",
+ .help_info = "Please input a valid value for per thread request number.",
+ }, {
+ .mp = "threadnum",
+ .help_info = "Please input a valid value for creating threads.\n"
+ "One thread will create a crypto_tfm.",
+ }, {
+ .mp = "time",
+ .help_info = "Please input a valid value for testing time.",
+ }
+};
+
static int crypto_bm_algorithm_param_set(const char *val, const struct kernel_param *kp)
{
char *s = strstrip((char *)val);
@@ -103,6 +148,40 @@ static const struct kernel_param_ops alg_ops = {
module_param_cb(algorithm, &alg_ops, &benchmark_attrs.algorithm, 0644);
MODULE_PARM_DESC(algorithm, algorithm_desc);

+static int crypto_bm_help_param_set(const char *val, const struct kernel_param *kp)
+{
+ int size = ARRAY_SIZE(modules_help);
+ char *s = strstrip((char *)val);
+ int i;
+
+ for (i = 0; i < size; i++) {
+ if (!strcmp(s, modules_help[i].mp))
+ pr_err("%s\n", modules_help[i].help_info);
+ }
+
+ return 0;
+}
+
+static int crypto_bm_help_param_get(char *val, const struct kernel_param *kp)
+{
+ u32 idx = benchmark_attrs.algtype;
+
+ if (idx >= CRYPTO_BM_ALG_MAX)
+ return -EINVAL;
+
+ benchmark_ops[idx].help();
+
+ return 0;
+}
+
+static const struct kernel_param_ops help_ops = {
+ .set = crypto_bm_help_param_set,
+ .get = crypto_bm_help_param_get,
+};
+
+module_param_cb(help, &help_ops, NULL, 0644);
+MODULE_PARM_DESC(help, help_desc);
+
static int crypto_bm_numamask_param_set(const char *val, const struct kernel_param *kp)
{
if (atomic_read(&benchmark_status))
diff --git a/crypto/benchmark/bm_comp.c b/crypto/benchmark/bm_comp.c
index 2772a8e86e2e..62192a55b2ab 100644
--- a/crypto/benchmark/bm_comp.c
+++ b/crypto/benchmark/bm_comp.c
@@ -423,3 +423,13 @@ int crypto_bm_perf_comp(struct crypto_bm_thread_data *data)

return ret;
}
+
+void crypto_bm_help_comp(void)
+{
+ pr_err("Welcome to use the crypto benchmark to test compress algorithm!\n"
+ "There ars some different moduel parameters requirement:\n"
+ "optype: 0 for compression, 1 for decompression\n"
+ "inputsize: for compression, the inputsize is src_len,\n"
+ " for decompression, the inputsize is dst_len, and the src_len will depend on the data compression ratio.\n"
+ );
+}
diff --git a/crypto/benchmark/bm_comp.h b/crypto/benchmark/bm_comp.h
index 78b45f8b22a6..aedafde2c3ad 100644
--- a/crypto/benchmark/bm_comp.h
+++ b/crypto/benchmark/bm_comp.h
@@ -14,5 +14,6 @@ void crypto_bm_release_tfm_comp(struct crypto_bm_base *base, u32 idx);
int crypto_bm_create_req_comp(struct crypto_bm_base *base, u32 idx);
void crypto_bm_release_req_comp(struct crypto_bm_base *base, u32 idx);
int crypto_bm_perf_comp(struct crypto_bm_thread_data *data);
+void crypto_bm_help_comp(void);

#endif
--
2.24.0

2022-09-20 07:33:49

by Greg KH

[permalink] [raw]
Subject: Re: [RFC PATCH 2/6] crypto: benchmark - add a crypto benchmark tool

On Mon, Sep 19, 2022 at 08:05:33PM +0800, Yang Shen wrote:
> Provide a crypto benchmark to help the developer quickly get the
> performance of a algorithm registered in crypto.
>
> Due to the crypto algorithms have multifarious parameters, the tool
> cannot support all test scenes. In order to provide users with simple
> and easy-to-use tools and support as many test scenarios as possible,
> benchmark refers to the crypto method to provide a unified struct
> 'crypto_bm_ops'. And the algorithm registers its own callbacks to parse
> the user's input. In crypto, a algorithm class has multiple algorithms,
> but all of them uses the same API. So in the benchmark, a algorithm
> class uses the same 'ops' and distinguish specific algorithm by name.
>
> First, consider the performance calculation model. Considering the
> crypto subsystem model, a reasonable process code based on crypto api
> should create a numa node based 'crypto_tfm' in advance and apply for
> a certain amount of 'crypto_req' according to their own business.
> In the real business processing stage, the thread send tasks based on
> 'crypto_req' and wait for completion.
>
> Therefore, the benchmark will create 'crypto_tfm' and 'crypto_req' at
> first, and then count all requests time to calculate performance.
> So the result is the pure algorithm performance. When each algorithm
> class implements its own 'ops', it needs to pay attention to the content
> completed in the callback. Before the 'ops.perf', the tool had better
> prepare the request data set. And in order to avoid the false high
> performance of the algorithm caused by the false cache and TLB hit rate,
> the size of data set should be larger than 'crypto_req' number.
> The 'crypto_bm_ops' has following api:
> - init & uninit
> The initialize related functions. Algorithm can do some private setting.
> - create_tfm & release_tfm
> The 'crypto_tfm' related functions. Algorithm has different tfm name in
> crypto. But they both has a member named tfm, so use tfm to stand for
> algorithm handle. The benchmark has provides the tfm array.
> - create_req & release_req
> The 'crypto_req' related functions. The callbacks should create a 'reqnum'
> 'crypto_req' group in struct 'crypto_bm_base'. And the also suggest
> prepare the request data in this function. In order to avoid the false
> high performance of the algorithm caused by the false cache and TLB hit
> rate, the size of data set should be larger than 'crypto_req' number.
> - perf
> The request sending functions. The registrant should use parameter 'loop'
> to send requests repeatly. And update the count in struct
> 'crypto_bm_thread_data'.
>
> Then consider the parameters that user can configure. Generally speaking,
> the following parameters will affect the performance of the algorithm:
> tfm number, request number, block size, numa node. And some parameters
> will affect the stability of performance: testing time and requests sent
> number. To sum up, the benchmark has following parameters:
> - algorithm
> The testing algorithm name. Showed in /proc/crypto.
> - algtype
> The testing algorithm class. Can get the algorithm class by echo 'algtype'
> to /sys/module/crypto_benchmark/parameters/help.
> - inputsize
> The testing length that can greatly impact performance. Such as data size
> for compress or key length for encryption.
> - loop
> The testing loop times. Avoid performance fluctuations caused by
> environment.
> - numamask
> The testing bind numamask. Used for allocate memory, create threads and
> create 'crypto_tfm'.
> - optype
> The testing algorithm operation type. Can get the algorithm available
> operation types by cat /sys/module/crypto_benchmark/parameters/help
> with specified 'algtype'.
> - reqnum
> The testing request number for per tfm. Used for test asynchrony api
> performance.
> - threadnum
> The testing thread number. To simplify model, create a 'crypto_tfm' per
> thread.
> - time
> The testing time. Used for stop the test thread.
> - run
> Start or stop the test.
>
> Users can configure parameters under
> /sys/modules/crypto_benchmark/parameters/.

Please don't use module parameters for stuff like this, use configfs
which was designed for this type of interactions.

thanks,

greg k-h

2022-09-20 08:35:53

by Herbert Xu

[permalink] [raw]
Subject: Re: [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark

On Mon, Sep 19, 2022 at 08:05:31PM +0800, Yang Shen wrote:
> Add crypto benchmark - A tool to help the users quickly get the
> performance of a algorithm registered in crypto.

Please explain how this relates to the existing speed testing
functionality in tcrypt.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt