2009-03-16 12:13:57

by Steffen Klassert

[permalink] [raw]
Subject: [RFC] [PATCH 0/4] Parallel IPsec

This patchset adds the 'pcrypt' parallel crypto template. With this template it
is possible to process the crypto requests of a transform in parallel without
getting request reorder. This is in particular interesting for IPsec.

I posted a first network based version to the netdev list some time ago. The
discussion can be found here:
http://lwn.net/Articles/309029/

The parallel crypto template is based on a generic parallelization/serialization
method. This method uses the remote softirq invocation infrastructure for
parallelization and serialization. With this method data objects can be
processed in parallel, starting at some given point.
After doing some expensive operations in parallel, it is possible to serialize
again. The parallelized data objects return after serialization in the order as
they were before the parallelization. In the case of IPsec, this makes it
possible to run the expensive parts in parallel without getting packet
reordering.

I did forwarding tests with two quad core machines (Intel Core 2 Quad Q6600)
and an EXFO FTB-400 packet blazer.
Unfortunately the thoughput tests are not that meaningful as long as we don't
use the new reentrant ahash/shash interface, because the lock in authenc_hash
serializes the requests. As soon as this work stabilize I'll start to convert
authenc to ahash if nobody else did it in between.

The results of the troughput tests are as follows:

cryptodev-2.6
Packetsize: 1420 byte
Encryption: aes192-sha1
bidirectional throughput: 2 x 132 Mbit/s
unidirectional throughput: 260 Mbit/s

cryptodev-2.6 + pcrypt (authenc) parallelization:
Packetsize: 1420 byte
Encryption: aes192-sha1
bidirectional throughput: 2 x 320 Mbit/s
unidirectional throughput: 493 Mbit/s

To reduce the hold time of the lock in authenc_hash I did the same tests again
with aes192-digest_null:

cryptodev-2.6
Packetsize: 1420 byte
Encryption: aes192-digest_null
bidirectional throughput: 2 x 243 Mbit/s
unidirectional throughput: 480 Mbit/s

cryptodev-2.6 + pcrypt (authenc) parallelization:
Packetsize: 1420 byte
Encryption: aes192-digest_null
bidirectional throughput: 2 x 592 Mbit/s
unidirectional throughput: 936 Mbit/s

Steffen


2009-03-16 11:53:41

by Steffen Klassert

[permalink] [raw]
Subject: [RFC] [PATCH 4/4] esp: add the pcrypt hooks to esp

Add the pcrypt hooks to esp to be able to use pcrypt-ed IPsec.

Signed-off-by: Steffen Klassert <[email protected]>
---
net/ipv4/esp4.c | 5 +++--
net/ipv6/esp6.c | 5 +++--
2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 18bb383..9f72d94 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -1,5 +1,6 @@
#include <crypto/aead.h>
#include <crypto/authenc.h>
+#include <crypto/pcrypt.h>
#include <linux/err.h>
#include <linux/module.h>
#include <net/ip.h>
@@ -447,7 +448,7 @@ static int esp_init_aead(struct xfrm_state *x)
struct crypto_aead *aead;
int err;

- aead = crypto_alloc_aead(x->aead->alg_name, 0, 0);
+ aead = crypto_alloc_aead_tfm(x->aead->alg_name, 0, 0);
err = PTR_ERR(aead);
if (IS_ERR(aead))
goto error;
@@ -489,7 +490,7 @@ static int esp_init_authenc(struct xfrm_state *x)
x->ealg->alg_name) >= CRYPTO_MAX_ALG_NAME)
goto error;

- aead = crypto_alloc_aead(authenc_name, 0, 0);
+ aead = crypto_alloc_aead_tfm(authenc_name, 0, 0);
err = PTR_ERR(aead);
if (IS_ERR(aead))
goto error;
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index c2f2501..eede728 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -26,6 +26,7 @@

#include <crypto/aead.h>
#include <crypto/authenc.h>
+#include <crypto/pcrypt.h>
#include <linux/err.h>
#include <linux/module.h>
#include <net/ip.h>
@@ -390,7 +391,7 @@ static int esp_init_aead(struct xfrm_state *x)
struct crypto_aead *aead;
int err;

- aead = crypto_alloc_aead(x->aead->alg_name, 0, 0);
+ aead = crypto_alloc_aead_tfm(x->aead->alg_name, 0, 0);
err = PTR_ERR(aead);
if (IS_ERR(aead))
goto error;
@@ -432,7 +433,7 @@ static int esp_init_authenc(struct xfrm_state *x)
x->ealg->alg_name) >= CRYPTO_MAX_ALG_NAME)
goto error;

- aead = crypto_alloc_aead(authenc_name, 0, 0);
+ aead = crypto_alloc_aead_tfm(authenc_name, 0, 0);
err = PTR_ERR(aead);
if (IS_ERR(aead))
goto error;
--
1.5.4.2


2009-03-16 12:13:57

by Steffen Klassert

[permalink] [raw]
Subject: [RFC] [PATCH 2/4] cpu_chainiv: add percpu IV chain genarator

If the crypro requests of a crypto transformation are processed in
parallel, the usual chain IV generator would serialize the crypto
requests again. The percpu IV chain genarator allocates the IV as
percpu data and generates percpu IV chains, so a crypro request
does not need to wait for the completition of the IV generation
from a previous request that runs on a different cpu.

Signed-off-by: Steffen Klassert <[email protected]>
---
crypto/Makefile | 1 +
crypto/cpu_chainiv.c | 327 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 328 insertions(+), 0 deletions(-)
create mode 100644 crypto/cpu_chainiv.c

diff --git a/crypto/Makefile b/crypto/Makefile
index 673d9f7..24f7279 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -19,6 +19,7 @@ crypto_blkcipher-objs := ablkcipher.o
crypto_blkcipher-objs += blkcipher.o
obj-$(CONFIG_CRYPTO_BLKCIPHER2) += crypto_blkcipher.o
obj-$(CONFIG_CRYPTO_BLKCIPHER2) += chainiv.o
+obj-$(CONFIG_CRYPTO_BLKCIPHER2) += cpu_chainiv.o
obj-$(CONFIG_CRYPTO_BLKCIPHER2) += eseqiv.o
obj-$(CONFIG_CRYPTO_SEQIV) += seqiv.o

diff --git a/crypto/cpu_chainiv.c b/crypto/cpu_chainiv.c
new file mode 100644
index 0000000..8acfc77
--- /dev/null
+++ b/crypto/cpu_chainiv.c
@@ -0,0 +1,327 @@
+/*
+ * cpu_chainiv - Per CPU Chain IV Generator
+ *
+ * Generate IVs by using the last block of the previous encryption on
+ * the local cpu. This is mainly useful for CBC with a parallel algorithm.
+ *
+ * Based on chainiv.c by Herbert Xu <[email protected]>
+ *
+ * Copyright (C) 2009 secunet Security Networks AG
+ * Copyright (C) 2009 Steffen Klassert <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/aead.h>
+#include <crypto/rng.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+
+struct cpu_civ_ctx {
+ spinlock_t lock;
+ char *iv;
+};
+
+static int cpu_civ_aead_givencrypt(struct aead_givcrypt_request *req)
+{
+ struct crypto_aead *geniv = aead_givcrypt_reqtfm(req);
+ struct cpu_civ_ctx *ctx = crypto_aead_ctx(geniv);
+ struct aead_request *subreq = aead_givcrypt_reqctx(req);
+ unsigned int ivsize;
+ char *iv;
+ int err;
+
+ aead_request_set_tfm(subreq, aead_geniv_base(geniv));
+ aead_request_set_callback(subreq, req->areq.base.flags,
+ req->areq.base.complete,
+ req->areq.base.data);
+ aead_request_set_crypt(subreq, req->areq.src, req->areq.dst,
+ req->areq.cryptlen, req->areq.iv);
+ aead_request_set_assoc(subreq, req->areq.assoc, req->areq.assoclen);
+
+ local_bh_disable();
+
+ ivsize = crypto_aead_ivsize(geniv);
+
+ iv = per_cpu_ptr(ctx->iv, smp_processor_id());
+
+ memcpy(req->giv, iv, ivsize);
+ memcpy(subreq->iv, iv, ivsize);
+
+ err = crypto_aead_encrypt(subreq);
+ if (err)
+ goto unlock;
+
+ memcpy(iv, subreq->iv, ivsize);
+
+unlock:
+ local_bh_enable();
+
+ return err;
+}
+
+static int cpu_civ_aead_givencrypt_first(struct aead_givcrypt_request *req)
+{
+ struct crypto_aead *geniv = aead_givcrypt_reqtfm(req);
+ struct cpu_civ_ctx *ctx = crypto_aead_ctx(geniv);
+ char *iv;
+ int err = 0;
+ int cpu;
+
+ spin_lock_bh(&ctx->lock);
+ if (crypto_aead_crt(geniv)->givencrypt != cpu_civ_aead_givencrypt_first)
+ goto unlock;
+
+ crypto_aead_crt(geniv)->givencrypt = cpu_civ_aead_givencrypt;
+
+ for_each_possible_cpu(cpu) {
+ iv = per_cpu_ptr(ctx->iv, cpu);
+ err = crypto_rng_get_bytes(crypto_default_rng, iv,
+ crypto_aead_ivsize(geniv));
+
+ if (err)
+ break;
+ }
+
+unlock:
+ spin_unlock_bh(&ctx->lock);
+
+ if (err)
+ return err;
+
+ return cpu_civ_aead_givencrypt(req);
+}
+
+static int cpu_civ_ablkcipher_givencrypt(struct skcipher_givcrypt_request *req)
+{
+ struct crypto_ablkcipher *geniv = skcipher_givcrypt_reqtfm(req);
+ struct cpu_civ_ctx *ctx = crypto_ablkcipher_ctx(geniv);
+ struct ablkcipher_request *subreq = skcipher_givcrypt_reqctx(req);
+ unsigned int ivsize;
+ char *iv;
+ int err;
+
+ ablkcipher_request_set_tfm(subreq, skcipher_geniv_cipher(geniv));
+ ablkcipher_request_set_callback(subreq, req->creq.base.flags,
+ req->creq.base.complete,
+ req->creq.base.data);
+ ablkcipher_request_set_crypt(subreq, req->creq.src, req->creq.dst,
+ req->creq.nbytes, req->creq.info);
+
+ local_bh_disable();
+
+ ivsize = crypto_ablkcipher_ivsize(geniv);
+
+ iv = per_cpu_ptr(ctx->iv, smp_processor_id());
+
+ memcpy(req->giv, iv, ivsize);
+ memcpy(subreq->info, iv, ivsize);
+
+ err = crypto_ablkcipher_encrypt(subreq);
+ if (err)
+ goto unlock;
+
+ memcpy(iv, subreq->info, ivsize);
+
+unlock:
+ local_bh_enable();
+
+ return err;
+}
+
+static int cpu_civ_ablkcipher_givencrypt_first(
+ struct skcipher_givcrypt_request *req)
+{
+ struct crypto_ablkcipher *geniv = skcipher_givcrypt_reqtfm(req);
+ struct cpu_civ_ctx *ctx = crypto_ablkcipher_ctx(geniv);
+ char *iv;
+ int err = 0;
+ int cpu;
+
+ spin_lock_bh(&ctx->lock);
+ if (crypto_ablkcipher_crt(geniv)->givencrypt !=
+ cpu_civ_ablkcipher_givencrypt_first)
+ goto unlock;
+
+ crypto_ablkcipher_crt(geniv)->givencrypt = cpu_civ_ablkcipher_givencrypt;
+
+ for_each_possible_cpu(cpu) {
+ iv = per_cpu_ptr(ctx->iv, cpu);
+ err = crypto_rng_get_bytes(crypto_default_rng, iv,
+ crypto_ablkcipher_ivsize(geniv));
+
+ if (err)
+ break;
+ }
+
+unlock:
+ spin_unlock_bh(&ctx->lock);
+
+ if (err)
+ return err;
+
+ return cpu_civ_ablkcipher_givencrypt(req);
+}
+
+static int cpu_civ_aead_init(struct crypto_tfm *tfm)
+{
+ struct cpu_civ_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ spin_lock_init(&ctx->lock);
+ tfm->crt_aead.reqsize = sizeof(struct aead_request);
+
+ ctx->iv = __alloc_percpu(tfm->crt_aead.ivsize);
+
+ if (!ctx->iv)
+ return -ENOMEM;
+
+ return aead_geniv_init(tfm);
+}
+
+void cpu_civ_aead_exit(struct crypto_tfm *tfm)
+{
+ struct cpu_civ_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ free_percpu(ctx->iv);
+ aead_geniv_exit(tfm);
+}
+
+static int cpu_civ_ablkcipher_init(struct crypto_tfm *tfm)
+{
+ struct cpu_civ_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ spin_lock_init(&ctx->lock);
+ tfm->crt_ablkcipher.reqsize = sizeof(struct ablkcipher_request);
+
+ ctx->iv = __alloc_percpu(tfm->crt_ablkcipher.ivsize);
+
+ if (!ctx->iv)
+ return -ENOMEM;
+
+ return skcipher_geniv_init(tfm);
+}
+
+void cpu_civ_ablkcipher_exit(struct crypto_tfm *tfm)
+{
+ struct cpu_civ_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ free_percpu(ctx->iv);
+ skcipher_geniv_exit(tfm);
+}
+
+static struct crypto_template cpu_civ_tmpl;
+
+static struct crypto_instance *cpu_civ_ablkcipher_alloc(struct rtattr **tb)
+{
+ struct crypto_instance *inst;
+
+ inst = skcipher_geniv_alloc(&cpu_civ_tmpl, tb, 0, 0);
+ if (IS_ERR(inst))
+ goto out;
+
+ inst->alg.cra_ablkcipher.givencrypt = cpu_civ_ablkcipher_givencrypt_first;
+
+ inst->alg.cra_init = cpu_civ_ablkcipher_init;
+ inst->alg.cra_exit = cpu_civ_ablkcipher_exit;
+
+out:
+ return inst;
+}
+
+static struct crypto_instance *cpu_civ_aead_alloc(struct rtattr **tb)
+{
+ struct crypto_instance *inst;
+
+ inst = aead_geniv_alloc(&cpu_civ_tmpl, tb, 0, 0);
+
+ if (IS_ERR(inst))
+ goto out;
+
+ inst->alg.cra_aead.givencrypt = cpu_civ_aead_givencrypt_first;
+
+ inst->alg.cra_init = cpu_civ_aead_init;
+ inst->alg.cra_exit = cpu_civ_aead_exit;
+
+out:
+ return inst;
+}
+
+static struct crypto_instance *cpu_civ_alloc(struct rtattr **tb)
+{
+ struct crypto_attr_type *algt;
+ struct crypto_instance *inst;
+ int err;
+
+ algt = crypto_get_attr_type(tb);
+ err = PTR_ERR(algt);
+ if (IS_ERR(algt))
+ return ERR_PTR(err);
+
+ err = crypto_get_default_rng();
+ if (err)
+ return ERR_PTR(err);
+
+ if ((algt->type ^ CRYPTO_ALG_TYPE_AEAD) & CRYPTO_ALG_TYPE_MASK)
+ inst = cpu_civ_ablkcipher_alloc(tb);
+ else
+ inst = cpu_civ_aead_alloc(tb);
+
+ if (IS_ERR(inst)) {
+ crypto_put_default_rng();
+ goto out;
+ }
+
+ inst->alg.cra_ctxsize = sizeof(struct cpu_civ_ctx);
+
+out:
+ return inst;
+}
+
+static void cpu_civ_free(struct crypto_instance *inst)
+{
+ if ((inst->alg.cra_flags ^ CRYPTO_ALG_TYPE_AEAD) & CRYPTO_ALG_TYPE_MASK)
+ skcipher_geniv_free(inst);
+ else
+ aead_geniv_free(inst);
+
+ crypto_put_default_rng();
+}
+
+static struct crypto_template cpu_civ_tmpl = {
+ .name = "cpu_chainiv",
+ .alloc = cpu_civ_alloc,
+ .free = cpu_civ_free,
+ .module = THIS_MODULE,
+};
+
+static int __init cpu_civ_module_init(void)
+{
+ return crypto_register_template(&cpu_civ_tmpl);
+}
+
+static void cpu_civ_module_exit(void)
+{
+ crypto_unregister_template(&cpu_civ_tmpl);
+}
+
+module_init(cpu_civ_module_init);
+module_exit(cpu_civ_module_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Per CPU Chain IV Generator");
--
1.5.4.2


2009-03-16 12:13:58

by Steffen Klassert

[permalink] [raw]
Subject: [RFC] [PATCH 1/4] padata: generic interface for parallel processing

This patch introduces an interface to process data objects
in parallel. On request it is possible to serialize again.
The parallelized objects return after serialization in the
same order as they were before the parallelization.

Signed-off-by: Steffen Klassert <[email protected]>
---
include/linux/interrupt.h | 3 +-
include/linux/padata.h | 116 +++++++++++
kernel/Makefile | 2 +-
kernel/padata.c | 490 +++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 609 insertions(+), 2 deletions(-)
create mode 100644 include/linux/padata.h
create mode 100644 kernel/padata.c

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 9127f6b..e1af5d6 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -253,7 +253,8 @@ enum
TASKLET_SOFTIRQ,
SCHED_SOFTIRQ,
HRTIMER_SOFTIRQ,
- RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */
+ PADATA_SOFTIRQ,
+ RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */

NR_SOFTIRQS
};
diff --git a/include/linux/padata.h b/include/linux/padata.h
new file mode 100644
index 0000000..469359f
--- /dev/null
+++ b/include/linux/padata.h
@@ -0,0 +1,116 @@
+/*
+ * padata.h - header for the padata parallelization interface
+ *
+ * Copyright (C) 2008, 2009 secunet Security Networks AG
+ * Copyright (C) 2008, 2009 Steffen Klassert <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef PADATA_H
+#define PADATA_H
+
+#include <linux/interrupt.h>
+#include <linux/smp.h>
+#include <linux/list.h>
+
+enum
+{
+ NO_PADATA=0,
+ AEAD_ENC_PADATA,
+ AEAD_DEC_PADATA,
+ NR_PADATA
+};
+
+struct padata_priv {
+ struct list_head list;
+ struct call_single_data csd;
+ int cb_cpu;
+ int seq_nr;
+ unsigned int nr;
+ int info;
+ void (*parallel)(struct padata_priv *padata);
+ void (*serial)(struct padata_priv *padata);
+};
+
+struct padata_queue {
+ struct list_head list;
+ atomic_t num_obj;
+ int cpu_index;
+ spinlock_t lock;
+};
+
+struct parallel_data {
+ struct work_struct work;
+ struct padata_queue *queue;
+ atomic_t seq_nr;
+ atomic_t queued_objects;
+ cpumask_t cpu_map;
+ cpumask_t new_cpu_map;
+ u8 flags;
+#define PADATA_INIT 1
+#define PADATA_FLUSH_HARD 2
+#define PADATA_RESET_IN_PROGRESS 4
+ spinlock_t lock;
+};
+
+#ifdef CONFIG_USE_GENERIC_SMP_HELPERS
+extern void __init padata_init(unsigned int nr, cpumask_t cpu_map);
+extern void padata_dont_wait(unsigned int nr, struct padata_priv *padata);
+extern int padata_do_parallel(unsigned int softirq_nr, unsigned int nr,
+ struct padata_priv *padata, int cb_cpu);
+extern int padata_do_serial(unsigned int nr, struct padata_priv *padata);
+extern cpumask_t padata_get_cpumap(unsigned int nr);
+extern void padata_set_cpumap(unsigned int nr, cpumask_t cpu_map);
+extern void padata_add_cpu(unsigned int nr, int cpu);
+extern void padata_remove_cpu(unsigned int nr, int cpu);
+extern void padata_start(unsigned int nr);
+extern void padata_stop(unsigned int nr);
+#else
+static inline void padata_init(unsigned int nr,cpumask_t cpu_map)
+{
+}
+static inline void padata_dont_wait(unsigned int nr, struct padata_priv *padata)
+{
+}
+static inline int padata_do_parallel(unsigned int softirq_nr, unsigned int nr,
+ struct padata_priv *padata, int cb_cpu)
+{
+ return 0;
+}
+static inline int padata_do_serial(unsigned int nr, struct padata_priv *padata)
+{
+ return 0;
+}
+static inline cpumask_t padata_get_cpumap(unsigned int nr)
+{
+ return cpu_online_map;
+}
+static inline void padata_set_cpumap(unsigned int nr, cpumask_t cpu_map)
+{
+}
+static inline padata_add_cpu(unsigned int nr, int cpu)
+{
+}
+static inline padata_remove_cpu(unsigned int nr, int cpu)
+{
+}
+static inline padata_start(unsigned int nr)
+{
+}
+static inline padata_stop(unsigned int nr)
+{
+}
+#endif
+#endif
diff --git a/kernel/Makefile b/kernel/Makefile
index 170a921..1c11a14 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -40,7 +40,7 @@ obj-$(CONFIG_RT_MUTEXES) += rtmutex.o
obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmutex-debug.o
obj-$(CONFIG_RT_MUTEX_TESTER) += rtmutex-tester.o
obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
-obj-$(CONFIG_USE_GENERIC_SMP_HELPERS) += smp.o
+obj-$(CONFIG_USE_GENERIC_SMP_HELPERS) += smp.o padata.o
ifneq ($(CONFIG_SMP),y)
obj-y += up.o
endif
diff --git a/kernel/padata.c b/kernel/padata.c
new file mode 100644
index 0000000..192c9a6
--- /dev/null
+++ b/kernel/padata.c
@@ -0,0 +1,490 @@
+/*
+ * padata.c - generic interface to process data streams in parallel
+ *
+ * Copyright (C) 2008, 2009 secunet Security Networks AG
+ * Copyright (C) 2008, 2009 Steffen Klassert <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <linux/module.h>
+#include <linux/cpumask.h>
+#include <linux/err.h>
+#include <linux/padata.h>
+
+#define MAX_SEQ_NR 1000000000
+
+static struct parallel_data padata_vec[NR_PADATA];
+static struct padata_priv *padata_get_next(struct parallel_data *par_data);
+
+static void padata_flush_hard(struct parallel_data *par_data)
+{
+ int cpu;
+ struct padata_priv *padata;
+ struct padata_queue *queue;
+
+ for_each_cpu_mask(cpu, par_data->cpu_map) {
+ queue = per_cpu_ptr(par_data->queue, cpu);
+
+ while(!list_empty(&queue->list)) {
+ padata = list_entry(queue->list.next, struct padata_priv, list);
+
+ spin_lock(&queue->lock);
+ list_del_init(&padata->list);
+ spin_unlock(&queue->lock);
+
+ atomic_dec(&par_data->queued_objects);
+ padata->serial(padata);
+ }
+ }
+}
+
+static void padata_flush_order(struct parallel_data *par_data)
+{
+ struct padata_priv *padata;
+
+ while (1) {
+ padata = padata_get_next(par_data);
+
+ if (padata && !IS_ERR(padata))
+ padata->serial(padata);
+ else
+ break;
+ }
+
+ padata_flush_hard(par_data);
+}
+
+static void padata_reset_work(struct work_struct *work)
+{
+ int cpu, cpu_index;
+ struct padata_queue *queue;
+ struct parallel_data *par_data;
+
+ par_data = container_of(work, struct parallel_data, work);
+
+ if (par_data->flags & (PADATA_INIT|PADATA_RESET_IN_PROGRESS))
+ return;
+
+ spin_lock_bh(&par_data->lock);
+ par_data->flags |= PADATA_RESET_IN_PROGRESS;
+
+ if (!(par_data->flags & PADATA_FLUSH_HARD))
+ padata_flush_order(par_data);
+ else
+ padata_flush_hard(par_data);
+
+ cpu_index = 0;
+
+ par_data->cpu_map = par_data->new_cpu_map;
+
+ for_each_cpu_mask(cpu, par_data->cpu_map) {
+ queue = per_cpu_ptr(par_data->queue, cpu);
+
+ atomic_set(&queue->num_obj, 0);
+ queue->cpu_index = cpu_index;
+ cpu_index++;
+ }
+ spin_unlock_bh(&par_data->lock);
+
+ atomic_set(&par_data->seq_nr, -1);
+ par_data->flags &= ~PADATA_RESET_IN_PROGRESS;
+ par_data->flags |= PADATA_INIT;
+}
+
+static struct padata_priv *padata_get_next(struct parallel_data *par_data)
+{
+ int cpu, num_cpus, empty;
+ int seq_nr, calc_seq_nr, next_nr;
+ struct padata_queue *queue, *next_queue;
+ struct padata_priv *padata;
+
+ empty = 0;
+ next_nr = -1;
+ next_queue = NULL;
+
+ num_cpus = cpus_weight(par_data->cpu_map);
+
+ for_each_cpu_mask(cpu, par_data->cpu_map) {
+ queue = per_cpu_ptr(par_data->queue, cpu);
+
+ /*
+ * Calculate the seq_nr of the object that should be
+ * next in this queue.
+ */
+ calc_seq_nr = (atomic_read(&queue->num_obj) * num_cpus)
+ + queue->cpu_index;
+
+ if (!list_empty(&queue->list)) {
+ padata = list_entry(queue->list.next,
+ struct padata_priv, list);
+
+ seq_nr = padata->seq_nr;
+
+ if (unlikely(calc_seq_nr != seq_nr)) {
+ par_data->flags &= ~PADATA_INIT;
+ par_data->flags |= PADATA_FLUSH_HARD;
+ padata = NULL;
+ goto out;
+ }
+ } else {
+ seq_nr = calc_seq_nr;
+ empty++;
+ }
+
+ if (next_nr < 0 || seq_nr < next_nr) {
+ next_nr = seq_nr;
+ next_queue = queue;
+ }
+ }
+
+ padata = NULL;
+
+ if (empty == num_cpus)
+ goto out;
+
+ if (!list_empty(&next_queue->list)) {
+ padata = list_entry(next_queue->list.next,
+ struct padata_priv, list);
+
+ spin_lock(&next_queue->lock);
+ list_del_init(&padata->list);
+ spin_unlock(&next_queue->lock);
+
+ atomic_dec(&par_data->queued_objects);
+ atomic_inc(&next_queue->num_obj);
+
+ goto out;
+ }
+
+ if (next_nr % num_cpus == next_queue->cpu_index) {
+ padata = ERR_PTR(-ENODATA);
+ goto out;
+ }
+
+ padata = ERR_PTR(-EINPROGRESS);
+out:
+ return padata;
+}
+
+static void padata_action(struct softirq_action *h)
+{
+ struct list_head *cpu_list, local_list;
+
+ cpu_list = &__get_cpu_var(softirq_work_list[PADATA_SOFTIRQ]);
+
+ local_irq_disable();
+ list_replace_init(cpu_list, &local_list);
+ local_irq_enable();
+
+ while (!list_empty(&local_list)) {
+ struct padata_priv *padata;
+
+ padata = list_entry(local_list.next,
+ struct padata_priv, csd.list);
+
+ list_del_init(&padata->csd.list);
+
+ padata->serial(padata);
+ }
+}
+
+static int padata_cpu_hash(unsigned int nr, struct padata_priv *padata)
+{
+ int cpu, target_cpu, this_cpu, cpu_index;
+
+ this_cpu = smp_processor_id();
+
+ if (padata->nr != 0)
+ return this_cpu;
+
+ if (!(padata_vec[nr].flags & PADATA_INIT))
+ return this_cpu;
+
+ padata->seq_nr = atomic_inc_return(&padata_vec[nr].seq_nr);
+
+ if (padata->seq_nr > MAX_SEQ_NR) {
+ padata_vec[nr].flags &= ~PADATA_INIT;
+ padata->seq_nr = 0;
+ schedule_work(&padata_vec[nr].work);
+ return this_cpu;
+ }
+
+ padata->nr = nr;
+
+ /*
+ * Hash the sequence numbers to the cpus by taking
+ * seq_nr mod. number of cpus in use.
+ */
+ cpu_index = padata->seq_nr % cpus_weight(padata_vec[nr].cpu_map);
+
+ target_cpu = first_cpu(padata_vec[nr].cpu_map);
+ for (cpu = 0; cpu < cpu_index; cpu++)
+ target_cpu = next_cpu(target_cpu, padata_vec[nr].cpu_map);
+
+ return target_cpu;
+}
+
+/*
+ * padata_dont_wait - must be called if an object that runs in parallel will
+ * not be serialized with padata_do_serial
+ *
+ * @nr: number of the padata instance
+ * @padata: object that will not be seen by padata_do_serial
+ */
+void padata_dont_wait(unsigned int nr, struct padata_priv *padata)
+{
+ struct padata_queue *queue;
+
+ if (!(padata_vec[nr].flags & PADATA_INIT))
+ return;
+
+ if (padata->nr == 0 || padata->nr != nr)
+ return;
+
+ queue = per_cpu_ptr(padata_vec[nr].queue, smp_processor_id());
+ atomic_inc(&queue->num_obj);
+
+ padata->nr = 0;
+ padata->seq_nr = 0;
+}
+EXPORT_SYMBOL(padata_dont_wait);
+
+/*
+ * padata_do_parallel - padata parallelization function
+ *
+ * @softirq_nr: number of the softirq that will do the parallelization
+ * @nr: number of the padata instance
+ * @padata: object to be parallelized
+ * @cb_cpu: cpu number on which the serialization callback function will run
+ */
+int padata_do_parallel(unsigned int softirq_nr, unsigned int nr,
+ struct padata_priv *padata, int cb_cpu)
+{
+ int target_cpu;
+
+ padata->cb_cpu = cb_cpu;
+
+ local_bh_disable();
+ target_cpu = padata_cpu_hash(nr, padata);
+ local_bh_enable();
+
+ send_remote_softirq(&padata->csd, target_cpu, softirq_nr);
+
+ return 1;
+}
+EXPORT_SYMBOL(padata_do_parallel);
+
+/*
+ * padata_do_serial - padata serialization function
+ *
+ * @nr: number of the padata instance
+ * @padata: object to be serialized
+ *
+ * returns 1 if the serialization callback function will be called
+ * from padata, 0 else
+ */
+int padata_do_serial(unsigned int nr, struct padata_priv *padata)
+{
+ int cpu;
+ struct padata_queue *reorder_queue;
+
+ if (!(padata_vec[nr].flags & PADATA_INIT))
+ return 0;
+
+ if (padata->nr != nr || padata->nr == 0) {
+ padata->serial(padata);
+ return 1;
+ }
+
+ cpu = smp_processor_id();
+
+ reorder_queue = per_cpu_ptr(padata_vec[nr].queue, cpu);
+
+ spin_lock(&reorder_queue->lock);
+ list_add_tail(&padata->list, &reorder_queue->list);
+ spin_unlock(&reorder_queue->lock);
+
+ atomic_inc(&padata_vec[nr].queued_objects);
+
+try_again:
+ if (!spin_trylock(&padata_vec[nr].lock))
+ goto out;
+
+ while(1) {
+ padata = padata_get_next(&padata_vec[nr]);
+
+ if (!padata || PTR_ERR(padata) == -EINPROGRESS)
+ break;
+ if (PTR_ERR(padata) == -ENODATA) {
+ spin_unlock(&padata_vec[nr].lock);
+ goto out;
+ }
+
+ send_remote_softirq(&padata->csd, padata->cb_cpu,
+ PADATA_SOFTIRQ);
+ }
+
+ if (unlikely(!(padata_vec[nr].flags & PADATA_INIT))) {
+ spin_unlock(&padata_vec[nr].lock);
+ goto reset_out;
+ }
+
+ spin_unlock(&padata_vec[nr].lock);
+
+ if (atomic_read(&padata_vec[nr].queued_objects))
+ goto try_again;
+
+out:
+ return 1;
+reset_out:
+ schedule_work(&padata_vec[nr].work);
+ return 1;
+}
+EXPORT_SYMBOL(padata_do_serial);
+
+/*
+ * padata_get_cpumap - get the cpu map that is actually in use
+ *
+ * @nr: number of the padata instance
+ */
+cpumask_t padata_get_cpumap(unsigned int nr)
+{
+ return padata_vec[nr].cpu_map;
+}
+EXPORT_SYMBOL(padata_get_cpumap);
+
+/*
+ * padata_set_cpumap - set the cpu map that padata uses
+ *
+ * @nr: number of the padata instance
+ * @cpu_map: the cpu map to use
+ */
+void padata_set_cpumap(unsigned int nr, cpumask_t cpu_map)
+{
+ padata_vec[nr].new_cpu_map = cpu_map;
+ padata_vec[nr].flags &= ~PADATA_INIT;
+ padata_vec[nr].flags |= PADATA_FLUSH_HARD;
+
+ schedule_work(&padata_vec[nr].work);
+}
+EXPORT_SYMBOL(padata_set_cpumap);
+
+/*
+ * padata_add_cpu - add a cpu to the padata cpu map
+ *
+ * @nr: number of the padata instance
+ * @cpu: cpu to remove
+ */
+void padata_add_cpu(unsigned int nr, int cpu)
+{
+ cpumask_t cpu_map = padata_vec[nr].cpu_map;
+
+ cpu_set(cpu, cpu_map);
+ padata_set_cpumap(nr, cpu_map);
+}
+EXPORT_SYMBOL(padata_add_cpu);
+
+/*
+ * padata_remove_cpu - remove a cpu from the padata cpu map
+ *
+ * @nr: number of the padata instance
+ * @cpu: cpu to remove
+ */
+void padata_remove_cpu(unsigned int nr, int cpu)
+{
+ cpumask_t cpu_map = padata_vec[nr].cpu_map;
+
+ cpu_clear(cpu, cpu_map);
+ padata_set_cpumap(nr, cpu_map);
+}
+EXPORT_SYMBOL(padata_remove_cpu);
+
+/*
+ * padata_start - start the parallel processing
+ *
+ * @nr: number of the padata instance
+ */
+void padata_start(unsigned int nr)
+{
+ if (padata_vec[nr].flags & PADATA_INIT)
+ return;
+
+ schedule_work(&padata_vec[nr].work);
+}
+EXPORT_SYMBOL(padata_start);
+
+/*
+ * padata_stop - stop the parallel processing
+ *
+ * @nr: number of the padata instance
+ */
+void padata_stop(unsigned int nr)
+{
+ padata_vec[nr].flags &= ~PADATA_INIT;
+}
+EXPORT_SYMBOL(padata_stop);
+
+/*
+ * padata_init - initialize a padata instance
+ *
+ * @nr: number of the padata instance
+ * @cpu_map: map of the cpu set that padata uses for parallelization
+ */
+void __init padata_init(unsigned int nr, cpumask_t cpu_map)
+{
+ int cpu, cpu_index;
+ struct padata_queue *percpu_queue, *queue;
+
+ percpu_queue = alloc_percpu(struct padata_queue);
+
+ if (!percpu_queue) {
+ printk("padata_init: Failed to alloc the serialization"
+ "queues for padata nr %d, exiting!\n", nr);
+ return;
+ }
+
+ cpu_index = 0;
+
+ for_each_possible_cpu(cpu) {
+ queue = per_cpu_ptr(percpu_queue, cpu);
+
+ if (cpu_isset(cpu, cpu_map)) {
+ queue->cpu_index = cpu_index;
+ cpu_index++;
+ }
+
+ INIT_LIST_HEAD(&queue->list);
+ spin_lock_init(&queue->lock);
+ atomic_set(&queue->num_obj, 0);
+ }
+
+ INIT_WORK(&padata_vec[nr].work, padata_reset_work);
+
+ atomic_set(&padata_vec[nr].seq_nr, -1);
+ atomic_set(&padata_vec[nr].queued_objects, 0);
+ padata_vec[nr].cpu_map = cpu_map;
+ padata_vec[nr].new_cpu_map = cpu_map;
+ padata_vec[nr].queue = percpu_queue;
+ padata_vec[nr].flags = 0;
+ spin_lock_init(&padata_vec[nr].lock);
+}
+
+static int __init padata_initcall(void)
+{
+ open_softirq(PADATA_SOFTIRQ, padata_action);
+
+ return 0;
+}
+subsys_initcall(padata_initcall);
--
1.5.4.2


2009-03-16 12:13:57

by Steffen Klassert

[permalink] [raw]
Subject: [RFC] [PATCH 3/4] pcrypt: Add pcrypt crypto parallelization engine

This patch adds a parallel crypto template that takes a crypto
algorithm and converts it to process the crypto transforms in
parallel. For the moment only aead is supported.

Signed-off-by: Steffen Klassert <[email protected]>
---
crypto/Kconfig | 13 ++
crypto/Makefile | 2 +
crypto/pcrypt.c | 417 +++++++++++++++++++++++++++++++++++++++++++++
crypto/pcrypt_core.c | 125 ++++++++++++++
include/crypto/pcrypt.h | 85 +++++++++
include/linux/crypto.h | 2 +
include/linux/interrupt.h | 2 +
7 files changed, 646 insertions(+), 0 deletions(-)
create mode 100644 crypto/pcrypt.c
create mode 100644 crypto/pcrypt_core.c
create mode 100644 include/crypto/pcrypt.h

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 74d0e62..b05fc95 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -112,6 +112,19 @@ config CRYPTO_NULL
help
These are 'Null' algorithms, used by IPsec, which do nothing.

+config CRYPTO_PCRYPT_CORE
+ bool
+ select CRYPTO_AEAD
+
+config CRYPTO_PCRYPT
+ tristate "Parallel crypto engine (EXPERIMENTAL)"
+ depends on USE_GENERIC_SMP_HELPERS && EXPERIMENTAL
+ select CRYPTO_MANAGER
+ select CRYPTO_PCRYPT_CORE
+ help
+ This converts an arbitrary crypto algorithm into a parallel
+ algorithm that is executed in a softirq.
+
config CRYPTO_WORKQUEUE
tristate

diff --git a/crypto/Makefile b/crypto/Makefile
index 24f7279..94043fe 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -57,6 +57,8 @@ obj-$(CONFIG_CRYPTO_XTS) += xts.o
obj-$(CONFIG_CRYPTO_CTR) += ctr.o
obj-$(CONFIG_CRYPTO_GCM) += gcm.o
obj-$(CONFIG_CRYPTO_CCM) += ccm.o
+obj-$(CONFIG_CRYPTO_PCRYPT_CORE) += pcrypt_core.o
+obj-$(CONFIG_CRYPTO_PCRYPT) += pcrypt.o
obj-$(CONFIG_CRYPTO_CRYPTD) += cryptd.o
obj-$(CONFIG_CRYPTO_DES) += des_generic.o
obj-$(CONFIG_CRYPTO_FCRYPT) += fcrypt.o
diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
new file mode 100644
index 0000000..86a8718
--- /dev/null
+++ b/crypto/pcrypt.c
@@ -0,0 +1,417 @@
+/*
+ * pcrypt - Parallel crypto wrapper.
+ *
+ * Copyright (C) 2009 secunet Security Networks AG
+ * Copyright (C) 2009 Steffen Klassert <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <crypto/algapi.h>
+#include <crypto/internal/aead.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <crypto/pcrypt.h>
+
+struct pcrypt_instance_ctx {
+ struct crypto_spawn spawn;
+ unsigned int tfm_count;
+};
+
+struct pcrypt_aead_ctx {
+ struct crypto_aead *child;
+ unsigned int tfm_nr;
+};
+
+static int pcrypt_do_parallel(struct padata_priv *padata, unsigned int tfm_nr,
+ unsigned int softirq, unsigned int padata_nr)
+{
+ unsigned int cpu, cpu_index, num_cpus, cb_cpu;
+ cpumask_t cpu_map;
+
+ cpu_map = padata_get_cpumap(padata_nr);
+ num_cpus = cpus_weight(cpu_map);
+
+ cpu_index = tfm_nr % num_cpus;
+
+ cb_cpu = first_cpu(cpu_map);
+ for (cpu = 0; cpu < cpu_index; cpu++)
+ cb_cpu = next_cpu(cb_cpu, cpu_map);
+
+ return padata_do_parallel(softirq, padata_nr, padata, cb_cpu);
+}
+
+static int pcrypt_aead_setkey(struct crypto_aead *parent,
+ const u8 *key, unsigned int keylen)
+{
+ struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(parent);
+
+ return crypto_aead_setkey(ctx->child, key, keylen);
+}
+
+static int pcrypt_aead_setauthsize(struct crypto_aead *parent,
+ unsigned int authsize)
+{
+ struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(parent);
+
+ return crypto_aead_setauthsize(ctx->child, authsize);
+}
+
+static void pcrypt_aead_serial(struct padata_priv *padata)
+{
+ struct pcrypt_request *preq = pcrypt_padata_request(padata);
+ struct aead_request *req = pcrypt_request_ctx(preq);
+
+ aead_request_complete(req->base.data, padata->info);
+}
+
+static void pcrypt_aead_giv_serial(struct padata_priv *padata)
+{
+ struct pcrypt_request *preq = pcrypt_padata_request(padata);
+ struct aead_givcrypt_request *req = pcrypt_request_ctx(preq);
+
+ aead_request_complete(req->areq.base.data, padata->info);
+}
+
+static void pcrypt_aead_done(struct crypto_async_request *areq, int err)
+{
+ struct aead_request *req = areq->data;
+ struct pcrypt_request *preq = aead_request_ctx(req);
+ struct padata_priv *padata = pcrypt_request_padata(preq);
+
+ padata->info = err;
+ req->base.flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
+
+ local_bh_disable();
+ if (padata_do_serial(padata->nr, padata))
+ goto out;
+
+ aead_request_complete(req, padata->info);
+
+out:
+ local_bh_enable();
+}
+
+static void pcrypt_aead_enc(struct padata_priv *padata)
+{
+ struct pcrypt_request *preq = pcrypt_padata_request(padata);
+ struct aead_request *req = pcrypt_request_ctx(preq);
+
+ padata->info = crypto_aead_encrypt(req);
+
+ if (padata->info)
+ return;
+
+ if (padata_do_serial(AEAD_ENC_PADATA, padata))
+ return;
+
+ aead_request_complete(req->base.data, padata->info);
+}
+
+static int pcrypt_aead_encrypt(struct aead_request *req)
+{
+ int err;
+ struct pcrypt_request *preq = aead_request_ctx(req);
+ struct aead_request *creq = pcrypt_request_ctx(preq);
+ struct padata_priv *padata = pcrypt_request_padata(preq);
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(aead);
+ u32 flags = aead_request_flags(req);
+
+ memset(padata, 0, sizeof(struct padata_priv));
+
+ padata->parallel = pcrypt_aead_enc;
+ padata->serial = pcrypt_aead_serial;
+
+ aead_request_set_tfm(creq, ctx->child);
+ aead_request_set_callback(creq, flags & ~CRYPTO_TFM_REQ_MAY_SLEEP,
+ pcrypt_aead_done, req);
+ aead_request_set_crypt(creq, req->src, req->dst,
+ req->cryptlen, req->iv);
+ aead_request_set_assoc(creq, req->assoc, req->assoclen);
+
+ if (pcrypt_do_parallel(padata, ctx->tfm_nr, AEAD_ENC_SOFTIRQ,
+ AEAD_ENC_PADATA))
+ err = -EINPROGRESS;
+ else
+ err = crypto_aead_encrypt(creq);
+
+ return err;
+}
+
+static void pcrypt_aead_dec(struct padata_priv *padata)
+{
+ struct pcrypt_request *preq = pcrypt_padata_request(padata);
+ struct aead_request *req = pcrypt_request_ctx(preq);
+
+ padata->info = crypto_aead_decrypt(req);
+
+ if (padata->info)
+ return;
+
+ if (padata_do_serial(AEAD_DEC_PADATA, padata))
+ return;
+
+ aead_request_complete(req->base.data, padata->info);
+}
+
+static int pcrypt_aead_decrypt(struct aead_request *req)
+{
+ int err;
+ struct pcrypt_request *preq = aead_request_ctx(req);
+ struct aead_request *creq = pcrypt_request_ctx(preq);
+ struct padata_priv *padata = pcrypt_request_padata(preq);
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(aead);
+ u32 flags = aead_request_flags(req);
+
+ memset(padata, 0, sizeof(struct padata_priv));
+
+ padata->parallel = pcrypt_aead_dec;
+ padata->serial = pcrypt_aead_serial;
+
+ aead_request_set_tfm(creq, ctx->child);
+ aead_request_set_callback(creq, flags & ~CRYPTO_TFM_REQ_MAY_SLEEP,
+ pcrypt_aead_done, req);
+ aead_request_set_crypt(creq, req->src, req->dst,
+ req->cryptlen, req->iv);
+ aead_request_set_assoc(creq, req->assoc, req->assoclen);
+
+ if (pcrypt_do_parallel(padata, ctx->tfm_nr, AEAD_DEC_SOFTIRQ,
+ AEAD_DEC_PADATA))
+ err = -EINPROGRESS;
+ else
+ err = crypto_aead_decrypt(creq);
+
+ return err;
+}
+
+static void pcrypt_aead_givenc(struct padata_priv *padata)
+{
+ struct pcrypt_request *preq = pcrypt_padata_request(padata);
+ struct aead_givcrypt_request *req = pcrypt_request_ctx(preq);
+
+ padata->info = crypto_aead_givencrypt(req);
+
+ if (padata->info)
+ return;
+
+ if (padata_do_serial(AEAD_ENC_PADATA, padata))
+ return;
+
+ aead_request_complete(req->areq.base.data, padata->info);
+}
+
+static int pcrypt_aead_givencrypt(struct aead_givcrypt_request *req)
+{
+ int err;
+ struct aead_request *areq = &req->areq;
+ struct pcrypt_request *preq = aead_request_ctx(areq);
+ struct aead_givcrypt_request *creq = pcrypt_request_ctx(preq);
+ struct padata_priv *padata = pcrypt_request_padata(preq);
+ struct crypto_aead *aead = aead_givcrypt_reqtfm(req);
+ struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(aead);
+ u32 flags = aead_request_flags(areq);
+
+ memset(padata, 0, sizeof(struct padata_priv));
+
+ padata->parallel = pcrypt_aead_givenc;
+ padata->serial = pcrypt_aead_giv_serial;
+
+ aead_givcrypt_set_tfm(creq, ctx->child);
+ aead_givcrypt_set_callback(creq, flags & ~CRYPTO_TFM_REQ_MAY_SLEEP,
+ pcrypt_aead_done, areq);
+ aead_givcrypt_set_crypt(creq, areq->src, areq->dst,
+ areq->cryptlen, areq->iv);
+ aead_givcrypt_set_assoc(creq, areq->assoc, areq->assoclen);
+ aead_givcrypt_set_giv(creq, req->giv, req->seq);
+
+
+ if (pcrypt_do_parallel(padata, ctx->tfm_nr, AEAD_ENC_SOFTIRQ,
+ AEAD_ENC_PADATA))
+ err = -EINPROGRESS;
+ else
+ err = crypto_aead_givencrypt(creq);
+
+ return err;
+}
+
+static int pcrypt_aead_init_tfm(struct crypto_tfm *tfm)
+{
+ struct crypto_instance *inst = crypto_tfm_alg_instance(tfm);
+ struct pcrypt_instance_ctx *ictx = crypto_instance_ctx(inst);
+ struct pcrypt_aead_ctx *ctx = crypto_tfm_ctx(tfm);
+ struct crypto_aead *cipher;
+
+ ictx->tfm_count++;
+ ctx->tfm_nr = ictx->tfm_count;
+
+ cipher = crypto_spawn_aead(crypto_instance_ctx(inst));
+
+ if (IS_ERR(cipher))
+ return PTR_ERR(cipher);
+
+ ctx->child = cipher;
+ tfm->crt_aead.reqsize = sizeof(struct pcrypt_request)
+ + sizeof(struct aead_givcrypt_request)
+ + crypto_aead_reqsize(cipher);
+
+ return 0;
+}
+
+static void pcrypt_aead_exit_tfm(struct crypto_tfm *tfm)
+{
+ struct pcrypt_aead_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ crypto_free_aead(ctx->child);
+}
+
+static struct crypto_instance *pcrypt_alloc_instance(struct crypto_alg *alg)
+{
+ struct crypto_instance *inst;
+ struct pcrypt_instance_ctx *ctx;
+ int err;
+
+ inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
+ if (!inst) {
+ inst = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+
+ err = -ENAMETOOLONG;
+ if (snprintf(inst->alg.cra_driver_name, CRYPTO_MAX_ALG_NAME,
+ "pcrypt(%s)", alg->cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
+ goto out_free_inst;
+
+ if (snprintf(inst->alg.cra_name, CRYPTO_MAX_ALG_NAME,
+ "pcrypt(%s)", alg->cra_name) >= CRYPTO_MAX_ALG_NAME)
+ goto out_free_inst;
+
+ ctx = crypto_instance_ctx(inst);
+ err = crypto_init_spawn(&ctx->spawn, alg, inst,
+ CRYPTO_ALG_TYPE_MASK);
+ if (err)
+ goto out_free_inst;
+
+ inst->alg.cra_priority = alg->cra_priority + 50;
+ inst->alg.cra_blocksize = alg->cra_blocksize;
+ inst->alg.cra_alignmask = alg->cra_alignmask;
+
+out:
+ return inst;
+
+out_free_inst:
+ kfree(inst);
+ inst = ERR_PTR(err);
+ goto out;
+}
+
+static struct crypto_instance *pcrypt_alloc_aead(struct rtattr **tb)
+{
+ struct crypto_instance *inst;
+ struct crypto_alg *alg;
+ struct crypto_attr_type *algt;
+
+ algt = crypto_get_attr_type(tb);
+
+ alg = crypto_get_attr_alg(tb, algt->type,
+ (algt->mask & CRYPTO_ALG_TYPE_MASK)
+ | CRYPTO_ALG_PCRYPT);
+ if (IS_ERR(alg))
+ return ERR_CAST(alg);
+
+ inst = pcrypt_alloc_instance(alg);
+ if (IS_ERR(inst))
+ goto out_put_alg;
+
+ inst->alg.cra_flags = CRYPTO_ALG_TYPE_AEAD | CRYPTO_ALG_ASYNC
+ | CRYPTO_ALG_PCRYPT;
+ inst->alg.cra_type = &crypto_nivaead_type;
+
+ inst->alg.cra_aead.ivsize = alg->cra_aead.ivsize;
+ inst->alg.cra_aead.geniv = alg->cra_aead.geniv;
+ inst->alg.cra_aead.maxauthsize = alg->cra_aead.maxauthsize;
+
+ inst->alg.cra_ctxsize = sizeof(struct pcrypt_aead_ctx);
+
+ inst->alg.cra_init = pcrypt_aead_init_tfm;
+ inst->alg.cra_exit = pcrypt_aead_exit_tfm;
+
+ inst->alg.cra_aead.setkey = pcrypt_aead_setkey;
+ inst->alg.cra_aead.setauthsize = pcrypt_aead_setauthsize;
+ inst->alg.cra_aead.encrypt = pcrypt_aead_encrypt;
+ inst->alg.cra_aead.decrypt = pcrypt_aead_decrypt;
+ inst->alg.cra_aead.givencrypt = pcrypt_aead_givencrypt;
+
+ inst->alg.cra_aead.geniv = "cpu_chainiv";
+
+out_put_alg:
+ crypto_mod_put(alg);
+ return inst;
+}
+
+static struct crypto_instance *pcrypt_alloc(struct rtattr **tb)
+{
+ struct crypto_attr_type *algt;
+
+ algt = crypto_get_attr_type(tb);
+ if (IS_ERR(algt))
+ return ERR_CAST(algt);
+
+ switch (algt->type & algt->mask & CRYPTO_ALG_TYPE_MASK) {
+ case CRYPTO_ALG_TYPE_AEAD:
+ return pcrypt_alloc_aead(tb);
+ }
+
+ return ERR_PTR(-EINVAL);
+}
+
+static void pcrypt_free(struct crypto_instance *inst)
+{
+ struct pcrypt_instance_ctx *ctx = crypto_instance_ctx(inst);
+
+ crypto_drop_spawn(&ctx->spawn);
+ kfree(inst);
+}
+
+static struct crypto_template pcrypt_tmpl = {
+ .name = "pcrypt",
+ .alloc = pcrypt_alloc,
+ .free = pcrypt_free,
+ .module = THIS_MODULE,
+};
+
+static int __init pcrypt_init(void)
+{
+ padata_start(AEAD_ENC_PADATA);
+ padata_start(AEAD_DEC_PADATA);
+
+ return crypto_register_template(&pcrypt_tmpl);
+}
+
+static void __exit pcrypt_exit(void)
+{
+ padata_stop(AEAD_ENC_PADATA);
+ padata_stop(AEAD_DEC_PADATA);
+
+ crypto_unregister_template(&pcrypt_tmpl);
+}
+
+module_init(pcrypt_init);
+module_exit(pcrypt_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Parallel crypto engine");
diff --git a/crypto/pcrypt_core.c b/crypto/pcrypt_core.c
new file mode 100644
index 0000000..7c807e0
--- /dev/null
+++ b/crypto/pcrypt_core.c
@@ -0,0 +1,125 @@
+/*
+ * pcrypt_core.c - Core functions for the pcrypt crypto parallelization
+ *
+ * Copyright (C) 2009 secunet Security Networks AG
+ * Copyright (C) 2009 Steffen Klassert <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <linux/interrupt.h>
+#include <linux/cpu.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <crypto/pcrypt.h>
+
+struct crypto_aead *pcrypt_alloc_aead_tfm(const char *alg_name,
+ u32 type, u32 mask)
+{
+ char pcrypt_alg_name[CRYPTO_MAX_ALG_NAME];
+ struct crypto_aead *tfm;
+
+ if (snprintf(pcrypt_alg_name, CRYPTO_MAX_ALG_NAME,
+ "pcrypt(%s)", alg_name) >= CRYPTO_MAX_ALG_NAME)
+ return ERR_PTR(-EINVAL);
+
+ tfm = crypto_alloc_aead(pcrypt_alg_name, type, mask);
+
+ if (IS_ERR(tfm))
+ return ERR_CAST(tfm);
+
+ return tfm;
+}
+EXPORT_SYMBOL_GPL(pcrypt_alloc_aead_tfm);
+
+static void aead_enc_action(struct softirq_action *h)
+{
+ struct list_head *cpu_list, local_list;
+
+ cpu_list = &__get_cpu_var(softirq_work_list[AEAD_ENC_SOFTIRQ]);
+
+ local_irq_disable();
+ list_replace_init(cpu_list, &local_list);
+ local_irq_enable();
+
+ while (!list_empty(&local_list)) {
+ struct padata_priv *padata;
+
+ padata = list_entry(local_list.next, struct padata_priv,
+ csd.list);
+
+ list_del_init(&padata->csd.list);
+
+ padata->parallel(padata);
+ }
+}
+
+static void aead_dec_action(struct softirq_action *h)
+{
+ struct list_head *cpu_list, local_list;
+
+ cpu_list = &__get_cpu_var(softirq_work_list[AEAD_DEC_SOFTIRQ]);
+
+ local_irq_disable();
+ list_replace_init(cpu_list, &local_list);
+ local_irq_enable();
+
+ while (!list_empty(&local_list)) {
+ struct padata_priv *padata;
+
+ padata = list_entry(local_list.next, struct padata_priv,
+ csd.list);
+
+ list_del_init(&padata->csd.list);
+
+ padata->parallel(padata);
+ }
+}
+
+static int __devinit pcrypt_cpu_callback(struct notifier_block *nfb,
+ unsigned long action, void *hcpu)
+{
+ int cpu = (unsigned long)hcpu;
+
+ switch (action) {
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ padata_add_cpu(AEAD_ENC_PADATA, cpu);
+ padata_add_cpu(AEAD_DEC_PADATA, cpu);
+ break;
+
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ padata_remove_cpu(AEAD_ENC_PADATA, cpu);
+ padata_remove_cpu(AEAD_DEC_PADATA, cpu);
+
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static int __init pcrypt_init_padata(void)
+{
+ open_softirq(AEAD_ENC_SOFTIRQ, aead_enc_action);
+ open_softirq(AEAD_DEC_SOFTIRQ, aead_dec_action);
+
+ padata_init(AEAD_ENC_PADATA, cpu_online_map);
+ padata_init(AEAD_DEC_PADATA, cpu_online_map);
+
+ hotcpu_notifier(pcrypt_cpu_callback, 0);
+
+ return 0;
+}
+subsys_initcall(pcrypt_init_padata);
diff --git a/include/crypto/pcrypt.h b/include/crypto/pcrypt.h
new file mode 100644
index 0000000..9cea12b
--- /dev/null
+++ b/include/crypto/pcrypt.h
@@ -0,0 +1,85 @@
+/*
+ * pcrypt - Parallel crypto engine.
+ *
+ * Copyright (C) 2009 secunet Security Networks AG
+ * Copyright (C) 2009 Steffen Klassert <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef _CRYPTO_PCRYPT_H
+#define _CRYPTO_PCRYPT_H
+
+#include <linux/crypto.h>
+#include <linux/kernel.h>
+#include <linux/padata.h>
+
+struct pcrypt_request {
+ struct padata_priv padata;
+ void *data;
+ void *__ctx[] CRYPTO_MINALIGN_ATTR;
+};
+
+static inline void *pcrypt_request_ctx(struct pcrypt_request *req)
+{
+ return req->__ctx;
+}
+
+static inline
+struct padata_priv *pcrypt_request_padata(struct pcrypt_request *req)
+{
+ return &req->padata;
+}
+
+static inline
+struct pcrypt_request *pcrypt_padata_request(struct padata_priv *padata)
+{
+ return container_of(padata, struct pcrypt_request, padata);
+}
+
+struct crypto_aead *pcrypt_alloc_aead_tfm(const char *alg_name, u32 type,
+ u32 mask);
+
+struct crypto_ablkcipher *pcrypt_alloc_ablkcipher_tfm(const char *alg_name,
+ u32 type, u32 mask);
+
+#ifdef CONFIG_CRYPTO_PCRYPT_CORE
+static inline struct crypto_aead *crypto_alloc_aead_tfm(const char *alg_name,
+ u32 type, u32 mask)
+{
+ return pcrypt_alloc_aead_tfm(alg_name, type, mask);
+}
+
+static inline
+struct crypto_ablkcipher *crypto_alloc_ablkcipher_tfm(const char *alg_name,
+ u32 type, u32 mask)
+{
+ return pcrypt_alloc_ablkcipher_tfm(alg_name, type, mask);
+}
+#else
+static inline struct crypto_aead *crypto_alloc_aead_tfm(const char *alg_name,
+ u32 type, u32 mask)
+{
+ return crypto_alloc_aead(alg_name, type, mask);
+}
+
+static inline
+struct crypto_ablkcipher *crypto_alloc_ablkcipher_tfm(const char *alg_name,
+ u32 type, u32 mask)
+{
+ return crypto_alloc_ablkcipher(alg_name, type, mask);
+}
+#endif
+
+#endif
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index ec29fa2..69c2655 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -71,6 +71,8 @@

#define CRYPTO_ALG_TESTED 0x00000400

+#define CRYPTO_ALG_PCRYPT 0x00000800
+
/*
* Transform masks and values (for crt_flags).
*/
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index e1af5d6..8e64f03 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -253,6 +253,8 @@ enum
TASKLET_SOFTIRQ,
SCHED_SOFTIRQ,
HRTIMER_SOFTIRQ,
+ AEAD_ENC_SOFTIRQ,
+ AEAD_DEC_SOFTIRQ,
PADATA_SOFTIRQ,
RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */

--
1.5.4.2


2009-03-27 08:36:23

by Herbert Xu

[permalink] [raw]
Subject: Re: [RFC] [PATCH 2/4] cpu_chainiv: add percpu IV chain genarator

On Mon, Mar 16, 2009 at 12:52:51PM +0100, Steffen Klassert wrote:
> If the crypro requests of a crypto transformation are processed in
> parallel, the usual chain IV generator would serialize the crypto
> requests again. The percpu IV chain genarator allocates the IV as
> percpu data and generates percpu IV chains, so a crypro request
> does not need to wait for the completition of the IV generation
> from a previous request that runs on a different cpu.
>
> Signed-off-by: Steffen Klassert <[email protected]>

I actually thought about this one when I first wrote chainiv,
I chose to avoid this because it has some security consequences.
In particular, an attacker would now be able to infer whether two
packets belong to two differnt flows from the fact that they came
from two different IV streams.

In any case, I don't think this is central to your work, right?

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2009-03-27 08:45:04

by Herbert Xu

[permalink] [raw]
Subject: Re: [RFC] [PATCH 4/4] esp: add the pcrypt hooks to esp

On Mon, Mar 16, 2009 at 12:55:26PM +0100, Steffen Klassert wrote:
>
> @@ -447,7 +448,7 @@ static int esp_init_aead(struct xfrm_state *x)
> struct crypto_aead *aead;
> int err;
>
> - aead = crypto_alloc_aead(x->aead->alg_name, 0, 0);
> + aead = crypto_alloc_aead_tfm(x->aead->alg_name, 0, 0);

I'd like this to be configurable. After all, there's not much
point in doing this if your crypto is done through PCI.

The easiest way is to let the user specify pcrypt through the
algorithm name. In fact, as long as pcrypt has a higher priority
than the default software algorithms, the user simply has to
instantiate it in order for it to be the default for all uses
of that algorithm.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2009-03-30 11:52:24

by Steffen Klassert

[permalink] [raw]
Subject: Re: [RFC] [PATCH 2/4] cpu_chainiv: add percpu IV chain genarator

On Fri, Mar 27, 2009 at 04:36:15PM +0800, Herbert Xu wrote:
> On Mon, Mar 16, 2009 at 12:52:51PM +0100, Steffen Klassert wrote:
> > If the crypro requests of a crypto transformation are processed in
> > parallel, the usual chain IV generator would serialize the crypto
> > requests again. The percpu IV chain genarator allocates the IV as
> > percpu data and generates percpu IV chains, so a crypro request
> > does not need to wait for the completition of the IV generation
> > from a previous request that runs on a different cpu.
> >
> > Signed-off-by: Steffen Klassert <[email protected]>
>
> I actually thought about this one when I first wrote chainiv,
> I chose to avoid this because it has some security consequences.
> In particular, an attacker would now be able to infer whether two
> packets belong to two differnt flows from the fact that they came
> from two different IV streams.
>
> In any case, I don't think this is central to your work, right?
>

Well, to do efficient parallel processing we need a percpu IV chain
genarator. pcrypt sends the crypto requests round robin to the cpus
independent of the flow they are belong to, so the flows and the IV
streams are mixing. As long as we use the percpu IV chain genarator just
for parallel algorithms we don't have this security issues.


2009-03-30 12:20:42

by Steffen Klassert

[permalink] [raw]
Subject: Re: [RFC] [PATCH 4/4] esp: add the pcrypt hooks to esp

On Fri, Mar 27, 2009 at 04:44:57PM +0800, Herbert Xu wrote:
> On Mon, Mar 16, 2009 at 12:55:26PM +0100, Steffen Klassert wrote:
> >
> > @@ -447,7 +448,7 @@ static int esp_init_aead(struct xfrm_state *x)
> > struct crypto_aead *aead;
> > int err;
> >
> > - aead = crypto_alloc_aead(x->aead->alg_name, 0, 0);
> > + aead = crypto_alloc_aead_tfm(x->aead->alg_name, 0, 0);
>
> I'd like this to be configurable. After all, there's not much
> point in doing this if your crypto is done through PCI.
>

Indeed, it should be configurable. I was not sure what's the best way to
make it configurable, so I decided to just make it the default to be
able to use it in this first version.

> The easiest way is to let the user specify pcrypt through the
> algorithm name. In fact, as long as pcrypt has a higher priority
> than the default software algorithms, the user simply has to
> instantiate it in order for it to be the default for all uses
> of that algorithm.
>

For IPsec I thought about something like 'pcrypt(authenc(...,...))' to
be able to process each crypto request with just one
parallelization/serialization call. Perhaps I'm missing something, but
actually I don't see how to choose for this from userspace through the
name without adding an entry for each possible combination of hmac and
blkcipher algorithm to the xfrm_aead_list. I'd appreciate any hint here.
My idea to make it configurable was to do it similar to the async
algorithms by

aead = crypto_alloc_aead_tfm(x->aead->alg_name, flags, CRYPTO_ALG_PCRYPT);

if CRYPTO_ALG_PCRYPT is set in flags, the crypto manager tries to choose
a parallel algorithm, if it is not set parallel algorithms will be ignored.
~

2009-03-30 13:20:06

by Herbert Xu

[permalink] [raw]
Subject: Re: [RFC] [PATCH 2/4] cpu_chainiv: add percpu IV chain genarator

On Mon, Mar 30, 2009 at 01:54:15PM +0200, Steffen Klassert wrote:
>
> Well, to do efficient parallel processing we need a percpu IV chain
> genarator. pcrypt sends the crypto requests round robin to the cpus
> independent of the flow they are belong to, so the flows and the IV
> streams are mixing. As long as we use the percpu IV chain genarator just
> for parallel algorithms we don't have this security issues.

How about using eseqiv? It's designed for exactly this situation
where you want parallel async processing. Its overhead is just
one extra encryption block.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2009-03-30 14:48:12

by Steffen Klassert

[permalink] [raw]
Subject: Re: [RFC] [PATCH 2/4] cpu_chainiv: add percpu IV chain genarator

On Mon, Mar 30, 2009 at 09:19:55PM +0800, Herbert Xu wrote:
>
> How about using eseqiv? It's designed for exactly this situation
> where you want parallel async processing. Its overhead is just
> one extra encryption block.
>

Yes, that's an option too. I'll give it a try.

Thanks!

2009-04-08 11:38:19

by Steffen Klassert

[permalink] [raw]
Subject: Re: [RFC] [PATCH 2/4] cpu_chainiv: add percpu IV chain genarator

On Mon, Mar 30, 2009 at 09:19:55PM +0800, Herbert Xu wrote:
>
> How about using eseqiv? It's designed for exactly this situation
> where you want parallel async processing. Its overhead is just
> one extra encryption block.
>

Actually I'm instantiate a crypto_nivaead_type algorithm and choose for
cpu_chainiv as the IV generator for that algorithm. If I want to do the
same with eseqiv it would require to add aead support to eseqiv.
Is this what you are thinking about, or is there a way to instantiate a
crypto_aead_type algorithm and notify the ablkcipher to choose for
eseqiv as it's default IV genarator?

Steffen

2009-04-09 03:20:52

by Herbert Xu

[permalink] [raw]
Subject: Re: [RFC] [PATCH 2/4] cpu_chainiv: add percpu IV chain genarator

On Wed, Apr 08, 2009 at 01:40:14PM +0200, Steffen Klassert wrote:
>
> Actually I'm instantiate a crypto_nivaead_type algorithm and choose for
> cpu_chainiv as the IV generator for that algorithm. If I want to do the
> same with eseqiv it would require to add aead support to eseqiv.
> Is this what you are thinking about, or is there a way to instantiate a
> crypto_aead_type algorithm and notify the ablkcipher to choose for
> eseqiv as it's default IV genarator?

You can allocate eseqiv(alg), e.g., eseqiv(cbc(aes)).

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2009-04-14 13:03:35

by Steffen Klassert

[permalink] [raw]
Subject: Re: [RFC] [PATCH 2/4] cpu_chainiv: add percpu IV chain genarator

On Thu, Apr 09, 2009 at 11:20:45AM +0800, Herbert Xu wrote:
>
> You can allocate eseqiv(alg), e.g., eseqiv(cbc(aes)).
>

Ok, that's possible. I missed this possibility, probaply because nobody
is actually doing it this way. Thanks!

Unfortunately eseqiv does not work out of the box if we do synchronous
encryption.

If crypto_ablkcipher_encrypt() returns synchronous, eseqiv_complete2()
is called even if req->giv is already the pointer to the generated IV.
The generated IV is overwritten with some random data in this case.

I fixed this by calling eseqiv_complete2() just in the case when an
asynchronous algorithm would call eseqiv_complete() as the complete
function. I'll send the patch with an extra mail.

Now pcrypt runs fine together with eseqiv and the throughput
results are very similar to the ones with cpu_chainiv.

Steffen