2009-02-02 06:42:24

by Huang, Ying

[permalink] [raw]
Subject: [PATCH 2/3] crypto: Per-CPU cryptd thread implementation based on kcrypto_wq

Original cryptd thread implementation has scalability issue, this
patch solve the issue with a per-CPU thread implementation.

struct cryptd_queue is defined to be a per-CPU queue, which holds one
struct cryptd_cpu_queue for each CPU. In struct cryptd_cpu_queue, a
struct crypto_queue holds all requests for the CPU, a struct
work_struct is used to run all requests for the CPU.

Testing based on dm-crypt on an Intel Core 2 E6400 (two cores) machine
shows 19.2% performance gain. The testing script is as follow:

-------------------- script begin ---------------------------
#!/bin/sh

dmc_create()
{
# Create a crypt device using dmsetup
dmsetup create $2 --table "0 `blockdev --getsize $1` crypt cbc(aes-asm)?cryptd?plain:plain babebabebabebabebabebabebabebabe 0 $1 0"
}

dmsetup remove crypt0
dmsetup remove crypt1

dd if=/dev/zero of=/dev/ram0 bs=1M count=4 >& /dev/null
dd if=/dev/zero of=/dev/ram1 bs=1M count=4 >& /dev/null

dmc_create /dev/ram0 crypt0
dmc_create /dev/ram1 crypt1

cat >tr.sh <<EOF
#!/bin/sh

for n in \$(seq 10); do
dd if=/dev/dm-0 of=/dev/null >& /dev/null &
dd if=/dev/dm-1 of=/dev/null >& /dev/null &
done
wait
EOF

for n in $(seq 10); do
/usr/bin/time sh tr.sh
done
rm tr.sh
-------------------- script end ---------------------------

The separator of dm-crypt parameter is changed from "-" to "?", because
"-" is used in some cipher driver name too, and cryptds need to specify
cipher driver name instead of cipher name.

The test result on an Intel Core2 E6400 (two cores) is as follow:

without patch:
-----------------wo begin --------------------------
0.04user 0.38system 0:00.39elapsed 107%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6566minor)pagefaults 0swaps
0.07user 0.35system 0:00.35elapsed 121%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6567minor)pagefaults 0swaps
0.06user 0.34system 0:00.30elapsed 135%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6562minor)pagefaults 0swaps
0.05user 0.37system 0:00.36elapsed 119%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6607minor)pagefaults 0swaps
0.06user 0.36system 0:00.35elapsed 120%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6562minor)pagefaults 0swaps
0.05user 0.37system 0:00.31elapsed 136%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6594minor)pagefaults 0swaps
0.04user 0.34system 0:00.30elapsed 126%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6597minor)pagefaults 0swaps
0.06user 0.32system 0:00.31elapsed 125%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6571minor)pagefaults 0swaps
0.06user 0.34system 0:00.31elapsed 134%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6581minor)pagefaults 0swaps
0.05user 0.38system 0:00.31elapsed 138%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6600minor)pagefaults 0swaps
-----------------wo end --------------------------


with patch:
------------------w begin --------------------------
0.02user 0.31system 0:00.24elapsed 141%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6554minor)pagefaults 0swaps
0.05user 0.34system 0:00.31elapsed 127%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6606minor)pagefaults 0swaps
0.07user 0.33system 0:00.26elapsed 155%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6559minor)pagefaults 0swaps
0.07user 0.32system 0:00.26elapsed 151%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6562minor)pagefaults 0swaps
0.05user 0.34system 0:00.26elapsed 150%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6603minor)pagefaults 0swaps
0.03user 0.36system 0:00.31elapsed 124%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6562minor)pagefaults 0swaps
0.04user 0.35system 0:00.26elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6586minor)pagefaults 0swaps
0.03user 0.37system 0:00.27elapsed 146%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6562minor)pagefaults 0swaps
0.04user 0.36system 0:00.26elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6594minor)pagefaults 0swaps
0.04user 0.35system 0:00.26elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6557minor)pagefaults 0swaps
------------------w end --------------------------

The middle value of elapsed time is:
wo cryptwq: 0.31
w cryptwq: 0.26

The performance gain is about (0.31-0.26)/0.26 = 0.192.

Signed-off-by: Huang Ying <[email protected]>

---
crypto/Kconfig | 1
crypto/cryptd.c | 215 +++++++++++++++++++++++++-------------------------------
2 files changed, 99 insertions(+), 117 deletions(-)

--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -114,6 +114,7 @@ config CRYPTO_CRYPTD
select CRYPTO_BLKCIPHER
select CRYPTO_HASH
select CRYPTO_MANAGER
+ select CRYPTO_WORKQUEUE
help
This is a generic software asynchronous crypto daemon that
converts an arbitrary synchronous software crypto algorithm
--- a/crypto/cryptd.c
+++ b/crypto/cryptd.c
@@ -13,30 +13,30 @@
#include <crypto/algapi.h>
#include <crypto/internal/hash.h>
#include <crypto/cryptd.h>
+#include <crypto/crypto_wq.h>
#include <linux/err.h>
#include <linux/init.h>
#include <linux/kernel.h>
-#include <linux/kthread.h>
#include <linux/list.h>
#include <linux/module.h>
-#include <linux/mutex.h>
#include <linux/scatterlist.h>
#include <linux/sched.h>
#include <linux/slab.h>
-#include <linux/spinlock.h>

-#define CRYPTD_MAX_QLEN 100
+#define CRYPTD_MAX_CPU_QLEN 100

-struct cryptd_state {
- spinlock_t lock;
- struct mutex mutex;
+struct cryptd_cpu_queue {
struct crypto_queue queue;
- struct task_struct *task;
+ struct work_struct work;
+};
+
+struct cryptd_queue {
+ struct cryptd_cpu_queue *cpu_queue;
};

struct cryptd_instance_ctx {
struct crypto_spawn spawn;
- struct cryptd_state *state;
+ struct cryptd_queue *queue;
};

struct cryptd_blkcipher_ctx {
@@ -55,11 +55,80 @@ struct cryptd_hash_request_ctx {
crypto_completion_t complete;
};

-static inline struct cryptd_state *cryptd_get_state(struct crypto_tfm *tfm)
+static void cryptd_queue_worker(struct work_struct *work);
+
+int cryptd_init_queue(struct cryptd_queue *queue, unsigned int max_cpu_qlen)
+{
+ int cpu;
+ struct cryptd_cpu_queue *cpu_queue;
+
+ queue->cpu_queue = alloc_percpu(struct cryptd_cpu_queue);
+ if (!queue->cpu_queue)
+ return -ENOMEM;
+ for_each_possible_cpu(cpu) {
+ cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu);
+ crypto_init_queue(&cpu_queue->queue, max_cpu_qlen);
+ INIT_WORK(&cpu_queue->work, cryptd_queue_worker);
+ }
+ return 0;
+}
+
+void cryptd_fini_queue(struct cryptd_queue *queue)
+{
+ int cpu;
+ struct cryptd_cpu_queue *cpu_queue;
+
+ for_each_possible_cpu(cpu) {
+ cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu);
+ BUG_ON(cpu_queue->queue.qlen);
+ }
+ free_percpu(queue->cpu_queue);
+}
+
+int cryptd_enqueue_request(struct cryptd_queue *queue,
+ struct crypto_async_request *request)
+{
+ int cpu, err;
+ struct cryptd_cpu_queue *cpu_queue;
+
+ cpu = get_cpu();
+ cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu);
+ err = crypto_enqueue_request(&cpu_queue->queue, request);
+ queue_work_on(cpu, kcrypto_wq, &cpu_queue->work);
+ put_cpu();
+
+ return err;
+}
+
+static void cryptd_queue_worker(struct work_struct *work)
+{
+ struct cryptd_cpu_queue *cpu_queue;
+ struct crypto_async_request *req, *backlog;
+
+ cpu_queue = container_of(work, struct cryptd_cpu_queue, work);
+ /* Only handle one request at a time to avoid hogging crypto
+ * workqueue */
+ preempt_disable();
+ backlog = crypto_get_backlog(&cpu_queue->queue);
+ req = crypto_dequeue_request(&cpu_queue->queue);
+ preempt_enable();
+
+ if (!req)
+ goto out;
+
+ if (backlog)
+ backlog->complete(backlog, -EINPROGRESS);
+ req->complete(req, 0);
+out:
+ if (cpu_queue->queue.qlen)
+ queue_work(kcrypto_wq, &cpu_queue->work);
+}
+
+static inline struct cryptd_queue *cryptd_get_queue(struct crypto_tfm *tfm)
{
struct crypto_instance *inst = crypto_tfm_alg_instance(tfm);
struct cryptd_instance_ctx *ictx = crypto_instance_ctx(inst);
- return ictx->state;
+ return ictx->queue;
}

static int cryptd_blkcipher_setkey(struct crypto_ablkcipher *parent,
@@ -131,19 +200,13 @@ static int cryptd_blkcipher_enqueue(stru
{
struct cryptd_blkcipher_request_ctx *rctx = ablkcipher_request_ctx(req);
struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req);
- struct cryptd_state *state =
- cryptd_get_state(crypto_ablkcipher_tfm(tfm));
- int err;
+ struct cryptd_queue *queue;

+ queue = cryptd_get_queue(crypto_ablkcipher_tfm(tfm));
rctx->complete = req->base.complete;
req->base.complete = complete;

- spin_lock_bh(&state->lock);
- err = ablkcipher_enqueue_request(&state->queue, req);
- spin_unlock_bh(&state->lock);
-
- wake_up_process(state->task);
- return err;
+ return cryptd_enqueue_request(queue, &req->base);
}

static int cryptd_blkcipher_encrypt_enqueue(struct ablkcipher_request *req)
@@ -177,21 +240,12 @@ static int cryptd_blkcipher_init_tfm(str
static void cryptd_blkcipher_exit_tfm(struct crypto_tfm *tfm)
{
struct cryptd_blkcipher_ctx *ctx = crypto_tfm_ctx(tfm);
- struct cryptd_state *state = cryptd_get_state(tfm);
- int active;
-
- mutex_lock(&state->mutex);
- active = ablkcipher_tfm_in_queue(&state->queue,
- __crypto_ablkcipher_cast(tfm));
- mutex_unlock(&state->mutex);
-
- BUG_ON(active);

crypto_free_blkcipher(ctx->child);
}

static struct crypto_instance *cryptd_alloc_instance(struct crypto_alg *alg,
- struct cryptd_state *state)
+ struct cryptd_queue *queue)
{
struct crypto_instance *inst;
struct cryptd_instance_ctx *ctx;
@@ -214,7 +268,7 @@ static struct crypto_instance *cryptd_al
if (err)
goto out_free_inst;

- ctx->state = state;
+ ctx->queue = queue;

memcpy(inst->alg.cra_name, alg->cra_name, CRYPTO_MAX_ALG_NAME);

@@ -232,7 +286,7 @@ out_free_inst:
}

static struct crypto_instance *cryptd_alloc_blkcipher(
- struct rtattr **tb, struct cryptd_state *state)
+ struct rtattr **tb, struct cryptd_queue *queue)
{
struct crypto_instance *inst;
struct crypto_alg *alg;
@@ -242,7 +296,7 @@ static struct crypto_instance *cryptd_al
if (IS_ERR(alg))
return ERR_CAST(alg);

- inst = cryptd_alloc_instance(alg, state);
+ inst = cryptd_alloc_instance(alg, queue);
if (IS_ERR(inst))
goto out_put_alg;

@@ -290,15 +344,6 @@ static int cryptd_hash_init_tfm(struct c
static void cryptd_hash_exit_tfm(struct crypto_tfm *tfm)
{
struct cryptd_hash_ctx *ctx = crypto_tfm_ctx(tfm);
- struct cryptd_state *state = cryptd_get_state(tfm);
- int active;
-
- mutex_lock(&state->mutex);
- active = ahash_tfm_in_queue(&state->queue,
- __crypto_ahash_cast(tfm));
- mutex_unlock(&state->mutex);
-
- BUG_ON(active);

crypto_free_hash(ctx->child);
}
@@ -324,19 +369,13 @@ static int cryptd_hash_enqueue(struct ah
{
struct cryptd_hash_request_ctx *rctx = ahash_request_ctx(req);
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
- struct cryptd_state *state =
- cryptd_get_state(crypto_ahash_tfm(tfm));
- int err;
+ struct cryptd_queue *queue =
+ cryptd_get_queue(crypto_ahash_tfm(tfm));

rctx->complete = req->base.complete;
req->base.complete = complete;

- spin_lock_bh(&state->lock);
- err = ahash_enqueue_request(&state->queue, req);
- spin_unlock_bh(&state->lock);
-
- wake_up_process(state->task);
- return err;
+ return cryptd_enqueue_request(queue, &req->base);
}

static void cryptd_hash_init(struct crypto_async_request *req_async, int err)
@@ -469,7 +508,7 @@ static int cryptd_hash_digest_enqueue(st
}

static struct crypto_instance *cryptd_alloc_hash(
- struct rtattr **tb, struct cryptd_state *state)
+ struct rtattr **tb, struct cryptd_queue *queue)
{
struct crypto_instance *inst;
struct crypto_alg *alg;
@@ -479,7 +518,7 @@ static struct crypto_instance *cryptd_al
if (IS_ERR(alg))
return ERR_PTR(PTR_ERR(alg));

- inst = cryptd_alloc_instance(alg, state);
+ inst = cryptd_alloc_instance(alg, queue);
if (IS_ERR(inst))
goto out_put_alg;

@@ -503,7 +542,7 @@ out_put_alg:
return inst;
}

-static struct cryptd_state state;
+static struct cryptd_queue queue;

static struct crypto_instance *cryptd_alloc(struct rtattr **tb)
{
@@ -515,9 +554,9 @@ static struct crypto_instance *cryptd_al

switch (algt->type & algt->mask & CRYPTO_ALG_TYPE_MASK) {
case CRYPTO_ALG_TYPE_BLKCIPHER:
- return cryptd_alloc_blkcipher(tb, &state);
+ return cryptd_alloc_blkcipher(tb, &queue);
case CRYPTO_ALG_TYPE_DIGEST:
- return cryptd_alloc_hash(tb, &state);
+ return cryptd_alloc_hash(tb, &queue);
}

return ERR_PTR(-EINVAL);
@@ -572,82 +611,24 @@ void cryptd_free_ablkcipher(struct crypt
}
EXPORT_SYMBOL_GPL(cryptd_free_ablkcipher);

-static inline int cryptd_create_thread(struct cryptd_state *state,
- int (*fn)(void *data), const char *name)
-{
- spin_lock_init(&state->lock);
- mutex_init(&state->mutex);
- crypto_init_queue(&state->queue, CRYPTD_MAX_QLEN);
-
- state->task = kthread_run(fn, state, name);
- if (IS_ERR(state->task))
- return PTR_ERR(state->task);
-
- return 0;
-}
-
-static inline void cryptd_stop_thread(struct cryptd_state *state)
-{
- BUG_ON(state->queue.qlen);
- kthread_stop(state->task);
-}
-
-static int cryptd_thread(void *data)
-{
- struct cryptd_state *state = data;
- int stop;
-
- current->flags |= PF_NOFREEZE;
-
- do {
- struct crypto_async_request *req, *backlog;
-
- mutex_lock(&state->mutex);
- __set_current_state(TASK_INTERRUPTIBLE);
-
- spin_lock_bh(&state->lock);
- backlog = crypto_get_backlog(&state->queue);
- req = crypto_dequeue_request(&state->queue);
- spin_unlock_bh(&state->lock);
-
- stop = kthread_should_stop();
-
- if (stop || req) {
- __set_current_state(TASK_RUNNING);
- if (req) {
- if (backlog)
- backlog->complete(backlog,
- -EINPROGRESS);
- req->complete(req, 0);
- }
- }
-
- mutex_unlock(&state->mutex);
-
- schedule();
- } while (!stop);
-
- return 0;
-}
-
static int __init cryptd_init(void)
{
int err;

- err = cryptd_create_thread(&state, cryptd_thread, "cryptd");
+ err = cryptd_init_queue(&queue, CRYPTD_MAX_CPU_QLEN);
if (err)
return err;

err = crypto_register_template(&cryptd_tmpl);
if (err)
- kthread_stop(state.task);
+ cryptd_fini_queue(&queue);

return err;
}

static void __exit cryptd_exit(void)
{
- cryptd_stop_thread(&state);
+ cryptd_fini_queue(&queue);
crypto_unregister_template(&cryptd_tmpl);
}



Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-02-03 09:11:17

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/3] crypto: Per-CPU cryptd thread implementation based on kcrypto_wq

On Mon, 02 Feb 2009 14:42:20 +0800 Huang Ying <[email protected]> wrote:

> Original cryptd thread implementation has scalability issue, this
> patch solve the issue with a per-CPU thread implementation.
>
> struct cryptd_queue is defined to be a per-CPU queue, which holds one
> struct cryptd_cpu_queue for each CPU. In struct cryptd_cpu_queue, a
> struct crypto_queue holds all requests for the CPU, a struct
> work_struct is used to run all requests for the CPU.
>
> ...
>
> +int cryptd_init_queue(struct cryptd_queue *queue, unsigned int max_cpu_qlen)
> +{
> + int cpu;
> + struct cryptd_cpu_queue *cpu_queue;
> +
> + queue->cpu_queue = alloc_percpu(struct cryptd_cpu_queue);
> + if (!queue->cpu_queue)
> + return -ENOMEM;
> + for_each_possible_cpu(cpu) {
> + cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu);
> + crypto_init_queue(&cpu_queue->queue, max_cpu_qlen);
> + INIT_WORK(&cpu_queue->work, cryptd_queue_worker);
> + }
> + return 0;
> +}

Could be made static, I believe.

> +void cryptd_fini_queue(struct cryptd_queue *queue)
> +{
> + int cpu;
> + struct cryptd_cpu_queue *cpu_queue;
> +
> + for_each_possible_cpu(cpu) {
> + cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu);
> + BUG_ON(cpu_queue->queue.qlen);
> + }
> + free_percpu(queue->cpu_queue);
> +}

Ditto.

> +int cryptd_enqueue_request(struct cryptd_queue *queue,
> + struct crypto_async_request *request)
> +{
> + int cpu, err;
> + struct cryptd_cpu_queue *cpu_queue;
> +
> + cpu = get_cpu();
> + cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu);
> + err = crypto_enqueue_request(&cpu_queue->queue, request);
> + queue_work_on(cpu, kcrypto_wq, &cpu_queue->work);
> + put_cpu();
> +
> + return err;
> +}

Ditto?

> +static void cryptd_queue_worker(struct work_struct *work)

A few comments explaining the code wouldn't kill us...

> +{
> + struct cryptd_cpu_queue *cpu_queue;
> + struct crypto_async_request *req, *backlog;
> +
> + cpu_queue = container_of(work, struct cryptd_cpu_queue, work);
> + /* Only handle one request at a time to avoid hogging crypto
> + * workqueue */

Not sure what that means.

> + preempt_disable();
> + backlog = crypto_get_backlog(&cpu_queue->queue);
> + req = crypto_dequeue_request(&cpu_queue->queue);
> + preempt_enable();

This function is run by a per-cpu thread, so it cannot be migrated to
another CPU by the CPU scheduler.

It is quite unclear what this preempt_disable() is needed for. Please
either remove it or provide adequate comments.

> + if (!req)
> + goto out;
> +
> + if (backlog)
> + backlog->complete(backlog, -EINPROGRESS);
> + req->complete(req, 0);
> +out:
> + if (cpu_queue->queue.qlen)
> + queue_work(kcrypto_wq, &cpu_queue->work);

Again, unclear and needs commentary.

If we come here via the `goto out;' path above, this CPU queue has no
more work do do, I think? If so, why does it requeue itself?

> +}
> +
>
> ...
>


2009-02-03 09:21:56

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 2/3] crypto: Per-CPU cryptd thread implementation based on kcrypto_wq

On Tue, Feb 03, 2009 at 01:10:38AM -0800, Andrew Morton wrote:
>
> > +{
> > + struct cryptd_cpu_queue *cpu_queue;
> > + struct crypto_async_request *req, *backlog;
> > +
> > + cpu_queue = container_of(work, struct cryptd_cpu_queue, work);
> > + /* Only handle one request at a time to avoid hogging crypto
> > + * workqueue */
>
> Not sure what that means.

It looks like this was copied from chainiv which put the work
in eventd, which is obviously no longer the case here.

> > + if (!req)
> > + goto out;
> > +
> > + if (backlog)
> > + backlog->complete(backlog, -EINPROGRESS);
> > + req->complete(req, 0);
> > +out:
> > + if (cpu_queue->queue.qlen)
> > + queue_work(kcrypto_wq, &cpu_queue->work);
>
> Again, unclear and needs commentary.
>
> If we come here via the `goto out;' path above, this CPU queue has no
> more work do do, I think? If so, why does it requeue itself?

This looks like another vestige from chainiv.

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2009-02-04 01:08:01

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH 2/3] crypto: Per-CPU cryptd thread implementation based on kcrypto_wq

On Tue, 2009-02-03 at 17:10 +0800, Andrew Morton wrote:
> On Mon, 02 Feb 2009 14:42:20 +0800 Huang Ying <[email protected]> wrote:
>
> > Original cryptd thread implementation has scalability issue, this
> > patch solve the issue with a per-CPU thread implementation.
> >
> > struct cryptd_queue is defined to be a per-CPU queue, which holds one
> > struct cryptd_cpu_queue for each CPU. In struct cryptd_cpu_queue, a
> > struct crypto_queue holds all requests for the CPU, a struct
> > work_struct is used to run all requests for the CPU.
> >
> > ...
> >
> > +int cryptd_init_queue(struct cryptd_queue *queue, unsigned int max_cpu_qlen)
> > +{
> > + int cpu;
> > + struct cryptd_cpu_queue *cpu_queue;
> > +
> > + queue->cpu_queue = alloc_percpu(struct cryptd_cpu_queue);
> > + if (!queue->cpu_queue)
> > + return -ENOMEM;
> > + for_each_possible_cpu(cpu) {
> > + cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu);
> > + crypto_init_queue(&cpu_queue->queue, max_cpu_qlen);
> > + INIT_WORK(&cpu_queue->work, cryptd_queue_worker);
> > + }
> > + return 0;
> > +}
>
> Could be made static, I believe.

OK. I will make it static.

> > +void cryptd_fini_queue(struct cryptd_queue *queue)
> > +{
> > + int cpu;
> > + struct cryptd_cpu_queue *cpu_queue;
> > +
> > + for_each_possible_cpu(cpu) {
> > + cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu);
> > + BUG_ON(cpu_queue->queue.qlen);
> > + }
> > + free_percpu(queue->cpu_queue);
> > +}
>
> Ditto.

As above.

> > +int cryptd_enqueue_request(struct cryptd_queue *queue,
> > + struct crypto_async_request *request)
> > +{
> > + int cpu, err;
> > + struct cryptd_cpu_queue *cpu_queue;
> > +
> > + cpu = get_cpu();
> > + cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu);
> > + err = crypto_enqueue_request(&cpu_queue->queue, request);
> > + queue_work_on(cpu, kcrypto_wq, &cpu_queue->work);
> > + put_cpu();
> > +
> > + return err;
> > +}
>
> Ditto?

As above.

> > +static void cryptd_queue_worker(struct work_struct *work)
>
> A few comments explaining the code wouldn't kill us...

OK. I will add some comments. This the function for work_struct in
cryptd_cpu_queue, it does real encryption/decryption.

> > +{
> > + struct cryptd_cpu_queue *cpu_queue;
> > + struct crypto_async_request *req, *backlog;
> > +
> > + cpu_queue = container_of(work, struct cryptd_cpu_queue, work);
> > + /* Only handle one request at a time to avoid hogging crypto
> > + * workqueue */
>
> Not sure what that means.

cryptd is not the only user of kcrypto_wq (now, chainiv uses it too),
and there may be a lot of requests in cryptd_cpu_queue, to reduce the
latency of other kcrypto_wq user, only one request is processed in
work_struct function, then the work_struct will be inserted again.

> > + preempt_disable();
> > + backlog = crypto_get_backlog(&cpu_queue->queue);
> > + req = crypto_dequeue_request(&cpu_queue->queue);
> > + preempt_enable();
>
> This function is run by a per-cpu thread, so it cannot be migrated to
> another CPU by the CPU scheduler.
>
> It is quite unclear what this preempt_disable() is needed for. Please
> either remove it or provide adequate comments.

preempt_disable is used to protect cpu_queue->queue. It can be accessed
by cryptd_enqueue_request(). When cryptd_queue_worker() is running, it
can be preempted by cryptd_enqueue_request() on same CPU.

> > + if (!req)
> > + goto out;
> > +
> > + if (backlog)
> > + backlog->complete(backlog, -EINPROGRESS);
> > + req->complete(req, 0);
> > +out:
> > + if (cpu_queue->queue.qlen)
> > + queue_work(kcrypto_wq, &cpu_queue->work);
>
> Again, unclear and needs commentary.
>
> If we come here via the `goto out;' path above, this CPU queue has no
> more work do do, I think? If so, why does it requeue itself?

If there is no more work to do, the cpu_queue->queue.qlen will be zero,
so it will not request itself. If we come here via another path, we may
need do that.

Best Regards,
Huang Ying


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part