2005-01-11 17:07:30

by Michal Ludvig

[permalink] [raw]
Subject: PadLock processing multiple blocks at a time

Hi all,

I have got some improvements for VIA PadLock crypto driver.

1. Generic extension to crypto/cipher.c that allows offloading the
encryption of the whole buffer in a given mode (CBC, ...) to the
algorithm provider (i.e. PadLock). Basically it extends 'struct
cipher_alg' by some new fields:

@@ -69,6 +73,18 @@ struct cipher_alg {
unsigned int keylen, u32 *flags);
void (*cia_encrypt)(void *ctx, u8 *dst, const u8 *src);
void (*cia_decrypt)(void *ctx, u8 *dst, const u8 *src);
+ size_t cia_max_nbytes;
+ size_t cia_req_align;
+ void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
+ void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
+ void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
+ void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
+ void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
};

If cia_<mode> is non-NULL that function is used instead of the
software <mode>_process chaining function (e.g. cbc_process()). In the
case of PadLock it can significantly speed-up the {en,de}cryption.

2. On top of this I have an extension of the padlock module to support
this scheme.

I will send both patches in separate follow ups.

The speedup gained by this change is quite significant (measured with
bonnie on ext2 over dm-crypt with aes128):

No encryption 2.6.10-bk1 multiblock
Writing with putc() 10454 (100%) 7479 (72%) 9353 (89%)
Rewriting 16510 (100%) 7628 (46%) 10611 (64%)
Writing intelligently 61128 (100%) 21132 (35%) 48103 (79%)
Reading with getc() 9406 (100%) 6916 (74%) 8801 (94%)
Reading intelligently 35885 (100%) 15271 (43%) 23202 (65%)

Numbers are in kB/s, percents show the slowdown from plaintext run.
As can be seen, the multiblock encryption is significantly faster
in comparsion to the already comitted single-block-at-a-time
processing.

More statistics (e.g. comparsion with aes.ko and aes-i586.ko) are
available at http://www.logix.cz/michal/devel/padlock/bench.xp

Dave, if you're OK with these changes, please merge them.

Michal Ludvig
--
* A mouse is a device used to point at the xterm you want to type in.
* Personal homepage - http://www.logix.cz/michal


2005-01-11 17:29:08

by Michal Ludvig

[permalink] [raw]
Subject: [PATCH 1/2] PadLock processing multiple blocks at a time

#
# Extends crypto/cipher.c for offloading the whole chaining modes
# to e.g. hardware crypto accelerators.
#
# Signed-off-by: Michal Ludvig <[email protected]>
#

Index: linux-2.6.10/crypto/api.c
===================================================================
--- linux-2.6.10.orig/crypto/api.c 2004-12-24 22:35:39.000000000 +0100
+++ linux-2.6.10/crypto/api.c 2005-01-10 16:37:11.943356651 +0100
@@ -217,6 +217,19 @@ int crypto_alg_available(const char *nam
return ret;
}

+void *crypto_aligned_kmalloc(size_t size, int mode, size_t alignment, void **index)
+{
+ char *ptr;
+
+ ptr = kmalloc(size + alignment, mode);
+ *index = ptr;
+ if (alignment > 1 && ((long)ptr & (alignment - 1))) {
+ ptr += alignment - ((long)ptr & (alignment - 1));
+ }
+
+ return ptr;
+}
+
static int __init init_crypto(void)
{
printk(KERN_INFO "Initializing Cryptographic API\n");
@@ -231,3 +244,4 @@ EXPORT_SYMBOL_GPL(crypto_unregister_alg)
EXPORT_SYMBOL_GPL(crypto_alloc_tfm);
EXPORT_SYMBOL_GPL(crypto_free_tfm);
EXPORT_SYMBOL_GPL(crypto_alg_available);
+EXPORT_SYMBOL_GPL(crypto_aligned_kmalloc);
Index: linux-2.6.10/include/linux/crypto.h
===================================================================
--- linux-2.6.10.orig/include/linux/crypto.h 2005-01-07 17:26:42.000000000 +0100
+++ linux-2.6.10/include/linux/crypto.h 2005-01-10 16:37:52.157648454 +0100
@@ -42,6 +42,7 @@
#define CRYPTO_TFM_MODE_CBC 0x00000002
#define CRYPTO_TFM_MODE_CFB 0x00000004
#define CRYPTO_TFM_MODE_CTR 0x00000008
+#define CRYPTO_TFM_MODE_OFB 0x00000010

#define CRYPTO_TFM_REQ_WEAK_KEY 0x00000100
#define CRYPTO_TFM_RES_WEAK_KEY 0x00100000
@@ -72,6 +73,18 @@ struct cipher_alg {
unsigned int keylen, u32 *flags);
void (*cia_encrypt)(void *ctx, u8 *dst, const u8 *src);
void (*cia_decrypt)(void *ctx, u8 *dst, const u8 *src);
+ size_t cia_max_nbytes;
+ size_t cia_req_align;
+ void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
+ void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
+ void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
+ void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
+ void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
};

struct digest_alg {
@@ -124,6 +137,11 @@ int crypto_unregister_alg(struct crypto_
int crypto_alg_available(const char *name, u32 flags);

/*
+ * Helper function.
+ */
+void *crypto_aligned_kmalloc (size_t size, int mode, size_t alignment, void **index);
+
+/*
* Transforms: user-instantiated objects which encapsulate algorithms
* and core processing logic. Managed via crypto_alloc_tfm() and
* crypto_free_tfm(), as well as the various helpers below.
@@ -258,6 +276,18 @@ static inline unsigned int crypto_tfm_al
return tfm->__crt_alg->cra_digest.dia_digestsize;
}

+static inline unsigned int crypto_tfm_alg_max_nbytes(struct crypto_tfm *tfm)
+{
+ BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_CIPHER);
+ return tfm->__crt_alg->cra_cipher.cia_max_nbytes;
+}
+
+static inline unsigned int crypto_tfm_alg_req_align(struct crypto_tfm *tfm)
+{
+ BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_CIPHER);
+ return tfm->__crt_alg->cra_cipher.cia_req_align;
+}
+
/*
* API wrappers.
*/
Index: linux-2.6.10/crypto/cipher.c
===================================================================
--- linux-2.6.10.orig/crypto/cipher.c 2004-12-24 22:34:57.000000000 +0100
+++ linux-2.6.10/crypto/cipher.c 2005-01-10 16:37:11.974350710 +0100
@@ -20,7 +20,31 @@
#include "internal.h"
#include "scatterwalk.h"

+#define CRA_CIPHER(tfm) (tfm)->__crt_alg->cra_cipher
+
+#define DEF_TFM_FUNCTION(name,mode,encdec,iv) \
+static int name(struct crypto_tfm *tfm, \
+ struct scatterlist *dst, \
+ struct scatterlist *src, \
+ unsigned int nbytes) \
+{ \
+ return crypt(tfm, dst, src, nbytes, \
+ mode, encdec, iv); \
+}
+
+#define DEF_TFM_FUNCTION_IV(name,mode,encdec,iv) \
+static int name(struct crypto_tfm *tfm, \
+ struct scatterlist *dst, \
+ struct scatterlist *src, \
+ unsigned int nbytes, u8 *iv) \
+{ \
+ return crypt(tfm, dst, src, nbytes, \
+ mode, encdec, iv); \
+}
+
typedef void (cryptfn_t)(void *, u8 *, const u8 *);
+typedef void (cryptblkfn_t)(void *, u8 *, const u8 *, u8 *,
+ size_t, int, int);
typedef void (procfn_t)(struct crypto_tfm *, u8 *,
u8*, cryptfn_t, int enc, void *, int);

@@ -38,6 +62,36 @@ static inline void xor_128(u8 *a, const
((u32 *)a)[3] ^= ((u32 *)b)[3];
}

+static void cbc_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
+ cryptfn_t *fn, int enc, void *info, int in_place)
+{
+ u8 *iv = info;
+
+ /* Null encryption */
+ if (!iv)
+ return;
+
+ if (enc) {
+ tfm->crt_u.cipher.cit_xor_block(iv, src);
+ (*fn)(crypto_tfm_ctx(tfm), dst, iv);
+ memcpy(iv, dst, crypto_tfm_alg_blocksize(tfm));
+ } else {
+ u8 stack[in_place ? crypto_tfm_alg_blocksize(tfm) : 0];
+ u8 *buf = in_place ? stack : dst;
+
+ (*fn)(crypto_tfm_ctx(tfm), buf, src);
+ tfm->crt_u.cipher.cit_xor_block(buf, iv);
+ memcpy(iv, src, crypto_tfm_alg_blocksize(tfm));
+ if (buf != dst)
+ memcpy(dst, buf, crypto_tfm_alg_blocksize(tfm));
+ }
+}
+
+static void ecb_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
+ cryptfn_t fn, int enc, void *info, int in_place)
+{
+ (*fn)(crypto_tfm_ctx(tfm), dst, src);
+}

/*
* Generic encrypt/decrypt wrapper for ciphers, handles operations across
@@ -47,22 +101,101 @@ static inline void xor_128(u8 *a, const
static int crypt(struct crypto_tfm *tfm,
struct scatterlist *dst,
struct scatterlist *src,
- unsigned int nbytes, cryptfn_t crfn,
- procfn_t prfn, int enc, void *info)
+ unsigned int nbytes,
+ int mode, int enc, void *info)
{
- struct scatter_walk walk_in, walk_out;
- const unsigned int bsize = crypto_tfm_alg_blocksize(tfm);
- u8 tmp_src[bsize];
- u8 tmp_dst[bsize];
+ cryptfn_t *cryptofn = NULL;
+ procfn_t *processfn = NULL;
+ cryptblkfn_t *cryptomultiblockfn = NULL;
+
+ struct scatter_walk walk_in, walk_out;
+ size_t max_nbytes = crypto_tfm_alg_max_nbytes(tfm);
+ size_t bsize = crypto_tfm_alg_blocksize(tfm);
+ int req_align = crypto_tfm_alg_req_align(tfm);
+ int ret = 0;
+ int gfp;
+ void *index_src = NULL, *index_dst = NULL;
+ u8 *iv = info;
+ u8 *tmp_src, *tmp_dst;

if (!nbytes)
- return 0;
+ return ret;

if (nbytes % bsize) {
tfm->crt_flags |= CRYPTO_TFM_RES_BAD_BLOCK_LEN;
- return -EINVAL;
+ ret = -EINVAL;
+ goto out;
}

+
+ switch (mode) {
+ case CRYPTO_TFM_MODE_ECB:
+ if (CRA_CIPHER(tfm).cia_ecb)
+ cryptomultiblockfn = CRA_CIPHER(tfm).cia_ecb;
+ else {
+ cryptofn = (enc == CRYPTO_DIR_ENCRYPT) ?
+ CRA_CIPHER(tfm).cia_encrypt :
+ CRA_CIPHER(tfm).cia_decrypt;
+ processfn = ecb_process;
+ }
+ break;
+
+ case CRYPTO_TFM_MODE_CBC:
+ if (CRA_CIPHER(tfm).cia_cbc)
+ cryptomultiblockfn = CRA_CIPHER(tfm).cia_cbc;
+ else {
+ cryptofn = (enc == CRYPTO_DIR_ENCRYPT) ?
+ CRA_CIPHER(tfm).cia_encrypt :
+ CRA_CIPHER(tfm).cia_decrypt;
+ processfn = cbc_process;
+ }
+ break;
+
+ /* Until we have the appropriate {ofb,cfb,ctr}_process()
+ functions, the following cases will return -ENOSYS if
+ there is no HW support for the mode. */
+ case CRYPTO_TFM_MODE_OFB:
+ if (CRA_CIPHER(tfm).cia_ofb)
+ cryptomultiblockfn = CRA_CIPHER(tfm).cia_ofb;
+ else
+ return -ENOSYS;
+ break;
+
+ case CRYPTO_TFM_MODE_CFB:
+ if (CRA_CIPHER(tfm).cia_cfb)
+ cryptomultiblockfn = CRA_CIPHER(tfm).cia_cfb;
+ else
+ return -ENOSYS;
+ break;
+
+ case CRYPTO_TFM_MODE_CTR:
+ if (CRA_CIPHER(tfm).cia_ctr)
+ cryptomultiblockfn = CRA_CIPHER(tfm).cia_ctr;
+ else
+ return -ENOSYS;
+ break;
+
+ default:
+ BUG();
+ }
+
+ if (cryptomultiblockfn)
+ bsize = (max_nbytes > nbytes) ? nbytes : max_nbytes;
+
+ /* Some hardware crypto engines may require a specific
+ alignment of the buffers. We will align the buffers
+ already here to avoid their reallocating later. */
+ gfp = in_atomic() ? GFP_ATOMIC : GFP_KERNEL;
+ tmp_src = crypto_aligned_kmalloc(bsize, gfp,
+ req_align, &index_src);
+ tmp_dst = crypto_aligned_kmalloc(bsize, gfp,
+ req_align, &index_dst);
+
+ if (!index_src || !index_dst) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
scatterwalk_start(&walk_in, src);
scatterwalk_start(&walk_out, dst);

@@ -81,7 +214,13 @@ static int crypt(struct crypto_tfm *tfm,

scatterwalk_copychunks(src_p, &walk_in, bsize, 0);

- prfn(tfm, dst_p, src_p, crfn, enc, info, in_place);
+ if (cryptomultiblockfn)
+ (*cryptomultiblockfn)(crypto_tfm_ctx(tfm),
+ dst_p, src_p, iv,
+ bsize, enc, in_place);
+ else
+ (*processfn)(tfm, dst_p, src_p, cryptofn,
+ enc, info, in_place);

scatterwalk_done(&walk_in, 0, nbytes);

@@ -89,46 +228,23 @@ static int crypt(struct crypto_tfm *tfm,
scatterwalk_done(&walk_out, 1, nbytes);

if (!nbytes)
- return 0;
+ goto out;

crypto_yield(tfm);
}
-}
-
-static void cbc_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
- cryptfn_t fn, int enc, void *info, int in_place)
-{
- u8 *iv = info;
-
- /* Null encryption */
- if (!iv)
- return;
-
- if (enc) {
- tfm->crt_u.cipher.cit_xor_block(iv, src);
- fn(crypto_tfm_ctx(tfm), dst, iv);
- memcpy(iv, dst, crypto_tfm_alg_blocksize(tfm));
- } else {
- u8 stack[in_place ? crypto_tfm_alg_blocksize(tfm) : 0];
- u8 *buf = in_place ? stack : dst;

- fn(crypto_tfm_ctx(tfm), buf, src);
- tfm->crt_u.cipher.cit_xor_block(buf, iv);
- memcpy(iv, src, crypto_tfm_alg_blocksize(tfm));
- if (buf != dst)
- memcpy(dst, buf, crypto_tfm_alg_blocksize(tfm));
- }
-}
+out:
+ if (index_src)
+ kfree(index_src);
+ if (index_dst)
+ kfree(index_dst);

-static void ecb_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
- cryptfn_t fn, int enc, void *info, int in_place)
-{
- fn(crypto_tfm_ctx(tfm), dst, src);
+ return ret;
}

static int setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int keylen)
{
- struct cipher_alg *cia = &tfm->__crt_alg->cra_cipher;
+ struct cipher_alg *cia = &CRA_CIPHER(tfm);

if (keylen < cia->cia_min_keysize || keylen > cia->cia_max_keysize) {
tfm->crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
@@ -138,80 +254,28 @@ static int setkey(struct crypto_tfm *tfm
&tfm->crt_flags);
}

-static int ecb_encrypt(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src, unsigned int nbytes)
-{
- return crypt(tfm, dst, src, nbytes,
- tfm->__crt_alg->cra_cipher.cia_encrypt,
- ecb_process, 1, NULL);
-}
+DEF_TFM_FUNCTION(ecb_encrypt, CRYPTO_TFM_MODE_ECB, CRYPTO_DIR_ENCRYPT, NULL);
+DEF_TFM_FUNCTION(ecb_decrypt, CRYPTO_TFM_MODE_ECB, CRYPTO_DIR_DECRYPT, NULL);

-static int ecb_decrypt(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes)
-{
- return crypt(tfm, dst, src, nbytes,
- tfm->__crt_alg->cra_cipher.cia_decrypt,
- ecb_process, 1, NULL);
-}
-
-static int cbc_encrypt(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes)
-{
- return crypt(tfm, dst, src, nbytes,
- tfm->__crt_alg->cra_cipher.cia_encrypt,
- cbc_process, 1, tfm->crt_cipher.cit_iv);
-}
-
-static int cbc_encrypt_iv(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes, u8 *iv)
-{
- return crypt(tfm, dst, src, nbytes,
- tfm->__crt_alg->cra_cipher.cia_encrypt,
- cbc_process, 1, iv);
-}
-
-static int cbc_decrypt(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes)
-{
- return crypt(tfm, dst, src, nbytes,
- tfm->__crt_alg->cra_cipher.cia_decrypt,
- cbc_process, 0, tfm->crt_cipher.cit_iv);
-}
-
-static int cbc_decrypt_iv(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes, u8 *iv)
-{
- return crypt(tfm, dst, src, nbytes,
- tfm->__crt_alg->cra_cipher.cia_decrypt,
- cbc_process, 0, iv);
-}
-
-static int nocrypt(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes)
-{
- return -ENOSYS;
-}
-
-static int nocrypt_iv(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes, u8 *iv)
-{
- return -ENOSYS;
-}
+DEF_TFM_FUNCTION(cbc_encrypt, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cbc_encrypt_iv, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(cbc_decrypt, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cbc_decrypt_iv, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_DECRYPT, iv);
+
+DEF_TFM_FUNCTION(cfb_encrypt, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cfb_encrypt_iv, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(cfb_decrypt, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cfb_decrypt_iv, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_DECRYPT, iv);
+
+DEF_TFM_FUNCTION(ofb_encrypt, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ofb_encrypt_iv, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(ofb_decrypt, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ofb_decrypt_iv, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_DECRYPT, iv);
+
+DEF_TFM_FUNCTION(ctr_encrypt, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ctr_encrypt_iv, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(ctr_decrypt, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ctr_decrypt_iv, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_DECRYPT, iv);

int crypto_init_cipher_flags(struct crypto_tfm *tfm, u32 flags)
{
@@ -245,17 +309,24 @@ int crypto_init_cipher_ops(struct crypto
break;

case CRYPTO_TFM_MODE_CFB:
- ops->cit_encrypt = nocrypt;
- ops->cit_decrypt = nocrypt;
- ops->cit_encrypt_iv = nocrypt_iv;
- ops->cit_decrypt_iv = nocrypt_iv;
+ ops->cit_encrypt = cfb_encrypt;
+ ops->cit_decrypt = cfb_decrypt;
+ ops->cit_encrypt_iv = cfb_encrypt_iv;
+ ops->cit_decrypt_iv = cfb_decrypt_iv;
+ break;
+
+ case CRYPTO_TFM_MODE_OFB:
+ ops->cit_encrypt = ofb_encrypt;
+ ops->cit_decrypt = ofb_decrypt;
+ ops->cit_encrypt_iv = ofb_encrypt_iv;
+ ops->cit_decrypt_iv = ofb_decrypt_iv;
break;

case CRYPTO_TFM_MODE_CTR:
- ops->cit_encrypt = nocrypt;
- ops->cit_decrypt = nocrypt;
- ops->cit_encrypt_iv = nocrypt_iv;
- ops->cit_decrypt_iv = nocrypt_iv;
+ ops->cit_encrypt = ctr_encrypt;
+ ops->cit_decrypt = ctr_decrypt;
+ ops->cit_encrypt_iv = ctr_encrypt_iv;
+ ops->cit_decrypt_iv = ctr_decrypt_iv;
break;

default:

2005-01-11 17:15:31

by Michal Ludvig

[permalink] [raw]
Subject: [PATCH 2/2] PadLock processing multiple blocks at a time

#
# Update to padlock-aes.c that enables processing of the whole
# buffer of data at once with the given chaining mode (e.g. CBC).
#
# Signed-off-by: Michal Ludvig <[email protected]>
#
Index: linux-2.6.10/drivers/crypto/padlock-aes.c
===================================================================
--- linux-2.6.10.orig/drivers/crypto/padlock-aes.c 2005-01-07 17:26:42.000000000 +0100
+++ linux-2.6.10/drivers/crypto/padlock-aes.c 2005-01-10 17:59:17.000000000 +0100
@@ -369,19 +369,54 @@ aes_set_key(void *ctx_arg, const uint8_t

/* ====== Encryption/decryption routines ====== */

-/* This is the real call to PadLock. */
-static inline void
+/* These are the real calls to PadLock. */
+static inline void *
padlock_xcrypt_ecb(uint8_t *input, uint8_t *output, uint8_t *key,
- void *control_word, uint32_t count)
+ uint8_t *iv, void *control_word, uint32_t count)
{
asm volatile ("pushfl; popfl"); /* enforce key reload. */
asm volatile (".byte 0xf3,0x0f,0xa7,0xc8" /* rep xcryptecb */
: "+S"(input), "+D"(output)
: "d"(control_word), "b"(key), "c"(count));
+ return NULL;
+}
+
+static inline void *
+padlock_xcrypt_cbc(uint8_t *input, uint8_t *output, uint8_t *key,
+ uint8_t *iv, void *control_word, uint32_t count)
+{
+ asm volatile ("pushfl; popfl"); /* enforce key reload. */
+ asm volatile (".byte 0xf3,0x0f,0xa7,0xd0" /* rep xcryptcbc */
+ : "=m"(*output), "+S"(input), "+D"(output), "+a"(iv)
+ : "d"(control_word), "b"(key), "c"(count));
+ return iv;
+}
+
+static inline void *
+padlock_xcrypt_cfb(uint8_t *input, uint8_t *output, uint8_t *key,
+ uint8_t *iv, void *control_word, uint32_t count)
+{
+ asm volatile ("pushfl; popfl"); /* enforce key reload. */
+ asm volatile (".byte 0xf3,0x0f,0xa7,0xe0" /* rep xcryptcfb */
+ : "=m"(*output), "+S"(input), "+D"(output), "+a"(iv)
+ : "d"(control_word), "b"(key), "c"(count));
+ return iv;
+}
+
+static inline void *
+padlock_xcrypt_ofb(uint8_t *input, uint8_t *output, uint8_t *key,
+ uint8_t *iv, void *control_word, uint32_t count)
+{
+ asm volatile ("pushfl; popfl"); /* enforce key reload. */
+ asm volatile (".byte 0xf3,0x0f,0xa7,0xe8" /* rep xcryptofb */
+ : "=m"(*output), "+S"(input), "+D"(output), "+a"(iv)
+ : "d"(control_word), "b"(key), "c"(count));
+ return iv;
}

static void
-aes_padlock(void *ctx_arg, uint8_t *out_arg, const uint8_t *in_arg, int encdec)
+aes_padlock(void *ctx_arg, uint8_t *out_arg, const uint8_t *in_arg,
+ uint8_t *iv_arg, size_t nbytes, int encdec, int mode)
{
/* Don't blindly modify this structure - the items must
fit on 16-Bytes boundaries! */
@@ -419,21 +454,126 @@ aes_padlock(void *ctx_arg, uint8_t *out_
else
key = ctx->D;

- memcpy(data->buf, in_arg, AES_BLOCK_SIZE);
- padlock_xcrypt_ecb(data->buf, data->buf, key, &data->cword, 1);
- memcpy(out_arg, data->buf, AES_BLOCK_SIZE);
+ if (nbytes == AES_BLOCK_SIZE) {
+ /* Processing one block only => ECB is enough */
+ memcpy(data->buf, in_arg, AES_BLOCK_SIZE);
+ padlock_xcrypt_ecb(data->buf, data->buf, key, NULL,
+ &data->cword, 1);
+ memcpy(out_arg, data->buf, AES_BLOCK_SIZE);
+ } else {
+ /* Processing multiple blocks at once */
+ uint8_t *in, *out, *iv;
+ int gfp = in_atomic() ? GFP_ATOMIC : GFP_KERNEL;
+ void *index = NULL;
+
+ if (unlikely(((long)in_arg) & 0x0F)) {
+ in = crypto_aligned_kmalloc(nbytes, gfp, 16, &index);
+ memcpy(in, in_arg, nbytes);
+ }
+ else
+ in = (uint8_t*)in_arg;
+
+ if (unlikely(((long)out_arg) & 0x0F)) {
+ if (index)
+ out = in; /* xcrypt can work "in place" */
+ else
+ out = crypto_aligned_kmalloc(nbytes, gfp, 16,
+ &index);
+ }
+ else
+ out = out_arg;
+
+ /* Always make a local copy of IV - xcrypt may change it! */
+ iv = data->buf;
+ if (iv_arg)
+ memcpy(iv, iv_arg, AES_BLOCK_SIZE);
+
+ switch (mode) {
+ case CRYPTO_TFM_MODE_ECB:
+ iv = padlock_xcrypt_ecb(in, out, key, iv,
+ &data->cword,
+ nbytes/AES_BLOCK_SIZE);
+ break;
+
+ case CRYPTO_TFM_MODE_CBC:
+ iv = padlock_xcrypt_cbc(in, out, key, iv,
+ &data->cword,
+ nbytes/AES_BLOCK_SIZE);
+ break;
+
+ case CRYPTO_TFM_MODE_CFB:
+ iv = padlock_xcrypt_cfb(in, out, key, iv,
+ &data->cword,
+ nbytes/AES_BLOCK_SIZE);
+ break;
+
+ case CRYPTO_TFM_MODE_OFB:
+ iv = padlock_xcrypt_ofb(in, out, key, iv,
+ &data->cword,
+ nbytes/AES_BLOCK_SIZE);
+ break;
+
+ default:
+ BUG();
+ }
+
+ /* Back up IV */
+ if (iv && iv_arg)
+ memcpy(iv_arg, iv, AES_BLOCK_SIZE);
+
+ /* Copy the 16-Byte aligned output to the caller's buffer. */
+ if (out != out_arg)
+ memcpy(out_arg, out, nbytes);
+
+ if (index)
+ kfree(index);
+ }
+}
+
+static void
+aes_padlock_ecb(void *ctx, uint8_t *dst, const uint8_t *src,
+ uint8_t *iv, size_t nbytes, int encdec, int inplace)
+{
+ aes_padlock(ctx, dst, src, NULL, nbytes, encdec,
+ CRYPTO_TFM_MODE_ECB);
+}
+
+static void
+aes_padlock_cbc(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv,
+ size_t nbytes, int encdec, int inplace)
+{
+ aes_padlock(ctx, dst, src, iv, nbytes, encdec,
+ CRYPTO_TFM_MODE_CBC);
+}
+
+static void
+aes_padlock_cfb(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv,
+ size_t nbytes, int encdec, int inplace)
+{
+ aes_padlock(ctx, dst, src, iv, nbytes, encdec,
+ CRYPTO_TFM_MODE_CFB);
+}
+
+static void
+aes_padlock_ofb(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv,
+ size_t nbytes, int encdec, int inplace)
+{
+ aes_padlock(ctx, dst, src, iv, nbytes, encdec,
+ CRYPTO_TFM_MODE_OFB);
}

static void
aes_encrypt(void *ctx_arg, uint8_t *out, const uint8_t *in)
{
- aes_padlock(ctx_arg, out, in, CRYPTO_DIR_ENCRYPT);
+ aes_padlock(ctx_arg, out, in, NULL, AES_BLOCK_SIZE,
+ CRYPTO_DIR_ENCRYPT, CRYPTO_TFM_MODE_ECB);
}

static void
aes_decrypt(void *ctx_arg, uint8_t *out, const uint8_t *in)
{
- aes_padlock(ctx_arg, out, in, CRYPTO_DIR_DECRYPT);
+ aes_padlock(ctx_arg, out, in, NULL, AES_BLOCK_SIZE,
+ CRYPTO_DIR_DECRYPT, CRYPTO_TFM_MODE_ECB);
}

static struct crypto_alg aes_alg = {
@@ -454,9 +594,25 @@ static struct crypto_alg aes_alg = {
}
};

+static int disable_multiblock = 0;
+MODULE_PARM(disable_multiblock, "i");
+MODULE_PARM_DESC(disable_multiblock,
+ "Disable encryption of whole multiblock buffers.");
+
int __init padlock_init_aes(void)
{
- printk(KERN_NOTICE PFX "Using VIA PadLock ACE for AES algorithm.\n");
+ if (!disable_multiblock) {
+ aes_alg.cra_u.cipher.cia_max_nbytes = (size_t)-1;
+ aes_alg.cra_u.cipher.cia_req_align = 16;
+ aes_alg.cra_u.cipher.cia_ecb = aes_padlock_ecb;
+ aes_alg.cra_u.cipher.cia_cbc = aes_padlock_cbc;
+ aes_alg.cra_u.cipher.cia_cfb = aes_padlock_cfb;
+ aes_alg.cra_u.cipher.cia_ofb = aes_padlock_ofb;
+ }
+
+ printk(KERN_NOTICE PFX
+ "Using VIA PadLock ACE for AES algorithm%s.\n",
+ disable_multiblock ? "" : " (multiblock)");

gen_tabs();
return crypto_register_alg(&aes_alg);

2005-01-14 03:17:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/2] PadLock processing multiple blocks at a time

Michal Ludvig <[email protected]> wrote:
>
> #
> # Update to padlock-aes.c that enables processing of the whole
> # buffer of data at once with the given chaining mode (e.g. CBC).
> #

Please don't email different patche sunder the same Subject:. Choose a
Subject: which is meaningful for each patch?

This one kills gcc-2.95.x:

drivers/crypto/padlock-aes.c: In function `aes_padlock':
drivers/crypto/padlock-aes.c:391: impossible register constraint in `asm'
drivers/crypto/padlock-aes.c:402: impossible register constraint in `asm'
drivers/crypto/padlock-aes.c:413: impossible register constraint in `asm'
drivers/crypto/padlock-aes.c:391: `asm' needs too many reloads

2005-01-14 13:11:26

by Michal Ludvig

[permalink] [raw]
Subject: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time

Hi all,

I'm resending this patch with trailing spaces removed per Andrew's
comment.

This patch extends crypto/cipher.c for offloading the whole chaining modes
to e.g. hardware crypto accelerators. It is much faster to let the
hardware do all the chaining if it can do so.

Signed-off-by: Michal Ludvig <[email protected]>

---

crypto/api.c | 14 ++
crypto/cipher.c | 313 ++++++++++++++++++++++++++++++-------------------
include/linux/crypto.h | 30 ++++
3 files changed, 236 insertions(+), 121 deletions(-)


Index: linux-2.6.10/crypto/api.c
===================================================================
--- linux-2.6.10.orig/crypto/api.c 2004-12-24 22:35:39.000000000 +0100
+++ linux-2.6.10/crypto/api.c 2005-01-10 16:37:11.943356651 +0100
@@ -217,6 +217,19 @@ int crypto_alg_available(const char *nam
return ret;
}

+void *crypto_aligned_kmalloc(size_t size, int mode, size_t alignment, void **index)
+{
+ char *ptr;
+
+ ptr = kmalloc(size + alignment, mode);
+ *index = ptr;
+ if (alignment > 1 && ((long)ptr & (alignment - 1))) {
+ ptr += alignment - ((long)ptr & (alignment - 1));
+ }
+
+ return ptr;
+}
+
static int __init init_crypto(void)
{
printk(KERN_INFO "Initializing Cryptographic API\n");
@@ -231,3 +244,4 @@ EXPORT_SYMBOL_GPL(crypto_unregister_alg)
EXPORT_SYMBOL_GPL(crypto_alloc_tfm);
EXPORT_SYMBOL_GPL(crypto_free_tfm);
EXPORT_SYMBOL_GPL(crypto_alg_available);
+EXPORT_SYMBOL_GPL(crypto_aligned_kmalloc);
Index: linux-2.6.10/include/linux/crypto.h
===================================================================
--- linux-2.6.10.orig/include/linux/crypto.h 2005-01-07 17:26:42.000000000 +0100
+++ linux-2.6.10/include/linux/crypto.h 2005-01-10 16:37:52.157648454 +0100
@@ -42,6 +42,7 @@
#define CRYPTO_TFM_MODE_CBC 0x00000002
#define CRYPTO_TFM_MODE_CFB 0x00000004
#define CRYPTO_TFM_MODE_CTR 0x00000008
+#define CRYPTO_TFM_MODE_OFB 0x00000010

#define CRYPTO_TFM_REQ_WEAK_KEY 0x00000100
#define CRYPTO_TFM_RES_WEAK_KEY 0x00100000
@@ -72,6 +73,18 @@ struct cipher_alg {
unsigned int keylen, u32 *flags);
void (*cia_encrypt)(void *ctx, u8 *dst, const u8 *src);
void (*cia_decrypt)(void *ctx, u8 *dst, const u8 *src);
+ size_t cia_max_nbytes;
+ size_t cia_req_align;
+ void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
+ void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
+ void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
+ void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
+ void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+ size_t nbytes, int encdec, int inplace);
};

struct digest_alg {
@@ -124,6 +137,11 @@ int crypto_unregister_alg(struct crypto_
int crypto_alg_available(const char *name, u32 flags);

/*
+ * Helper function.
+ */
+void *crypto_aligned_kmalloc (size_t size, int mode, size_t alignment, void **index);
+
+/*
* Transforms: user-instantiated objects which encapsulate algorithms
* and core processing logic. Managed via crypto_alloc_tfm() and
* crypto_free_tfm(), as well as the various helpers below.
@@ -258,6 +276,18 @@ static inline unsigned int crypto_tfm_al
return tfm->__crt_alg->cra_digest.dia_digestsize;
}

+static inline unsigned int crypto_tfm_alg_max_nbytes(struct crypto_tfm *tfm)
+{
+ BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_CIPHER);
+ return tfm->__crt_alg->cra_cipher.cia_max_nbytes;
+}
+
+static inline unsigned int crypto_tfm_alg_req_align(struct crypto_tfm *tfm)
+{
+ BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_CIPHER);
+ return tfm->__crt_alg->cra_cipher.cia_req_align;
+}
+
/*
* API wrappers.
*/
Index: linux-2.6.10/crypto/cipher.c
===================================================================
--- linux-2.6.10.orig/crypto/cipher.c 2004-12-24 22:34:57.000000000 +0100
+++ linux-2.6.10/crypto/cipher.c 2005-01-10 16:37:11.974350710 +0100
@@ -20,7 +20,31 @@
#include "internal.h"
#include "scatterwalk.h"

+#define CRA_CIPHER(tfm) (tfm)->__crt_alg->cra_cipher
+
+#define DEF_TFM_FUNCTION(name,mode,encdec,iv) \
+static int name(struct crypto_tfm *tfm, \
+ struct scatterlist *dst, \
+ struct scatterlist *src, \
+ unsigned int nbytes) \
+{ \
+ return crypt(tfm, dst, src, nbytes, \
+ mode, encdec, iv); \
+}
+
+#define DEF_TFM_FUNCTION_IV(name,mode,encdec,iv) \
+static int name(struct crypto_tfm *tfm, \
+ struct scatterlist *dst, \
+ struct scatterlist *src, \
+ unsigned int nbytes, u8 *iv) \
+{ \
+ return crypt(tfm, dst, src, nbytes, \
+ mode, encdec, iv); \
+}
+
typedef void (cryptfn_t)(void *, u8 *, const u8 *);
+typedef void (cryptblkfn_t)(void *, u8 *, const u8 *, u8 *,
+ size_t, int, int);
typedef void (procfn_t)(struct crypto_tfm *, u8 *,
u8*, cryptfn_t, int enc, void *, int);

@@ -38,6 +62,36 @@ static inline void xor_128(u8 *a, const
((u32 *)a)[3] ^= ((u32 *)b)[3];
}

+static void cbc_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
+ cryptfn_t *fn, int enc, void *info, int in_place)
+{
+ u8 *iv = info;
+
+ /* Null encryption */
+ if (!iv)
+ return;
+
+ if (enc) {
+ tfm->crt_u.cipher.cit_xor_block(iv, src);
+ (*fn)(crypto_tfm_ctx(tfm), dst, iv);
+ memcpy(iv, dst, crypto_tfm_alg_blocksize(tfm));
+ } else {
+ u8 stack[in_place ? crypto_tfm_alg_blocksize(tfm) : 0];
+ u8 *buf = in_place ? stack : dst;
+
+ (*fn)(crypto_tfm_ctx(tfm), buf, src);
+ tfm->crt_u.cipher.cit_xor_block(buf, iv);
+ memcpy(iv, src, crypto_tfm_alg_blocksize(tfm));
+ if (buf != dst)
+ memcpy(dst, buf, crypto_tfm_alg_blocksize(tfm));
+ }
+}
+
+static void ecb_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
+ cryptfn_t fn, int enc, void *info, int in_place)
+{
+ (*fn)(crypto_tfm_ctx(tfm), dst, src);
+}

/*
* Generic encrypt/decrypt wrapper for ciphers, handles operations across
@@ -47,22 +101,101 @@ static inline void xor_128(u8 *a, const
static int crypt(struct crypto_tfm *tfm,
struct scatterlist *dst,
struct scatterlist *src,
- unsigned int nbytes, cryptfn_t crfn,
- procfn_t prfn, int enc, void *info)
+ unsigned int nbytes,
+ int mode, int enc, void *info)
{
- struct scatter_walk walk_in, walk_out;
- const unsigned int bsize = crypto_tfm_alg_blocksize(tfm);
- u8 tmp_src[bsize];
- u8 tmp_dst[bsize];
+ cryptfn_t *cryptofn = NULL;
+ procfn_t *processfn = NULL;
+ cryptblkfn_t *cryptomultiblockfn = NULL;
+
+ struct scatter_walk walk_in, walk_out;
+ size_t max_nbytes = crypto_tfm_alg_max_nbytes(tfm);
+ size_t bsize = crypto_tfm_alg_blocksize(tfm);
+ int req_align = crypto_tfm_alg_req_align(tfm);
+ int ret = 0;
+ int gfp;
+ void *index_src = NULL, *index_dst = NULL;
+ u8 *iv = info;
+ u8 *tmp_src, *tmp_dst;

if (!nbytes)
- return 0;
+ return ret;

if (nbytes % bsize) {
tfm->crt_flags |= CRYPTO_TFM_RES_BAD_BLOCK_LEN;
- return -EINVAL;
+ ret = -EINVAL;
+ goto out;
}

+
+ switch (mode) {
+ case CRYPTO_TFM_MODE_ECB:
+ if (CRA_CIPHER(tfm).cia_ecb)
+ cryptomultiblockfn = CRA_CIPHER(tfm).cia_ecb;
+ else {
+ cryptofn = (enc == CRYPTO_DIR_ENCRYPT) ?
+ CRA_CIPHER(tfm).cia_encrypt :
+ CRA_CIPHER(tfm).cia_decrypt;
+ processfn = ecb_process;
+ }
+ break;
+
+ case CRYPTO_TFM_MODE_CBC:
+ if (CRA_CIPHER(tfm).cia_cbc)
+ cryptomultiblockfn = CRA_CIPHER(tfm).cia_cbc;
+ else {
+ cryptofn = (enc == CRYPTO_DIR_ENCRYPT) ?
+ CRA_CIPHER(tfm).cia_encrypt :
+ CRA_CIPHER(tfm).cia_decrypt;
+ processfn = cbc_process;
+ }
+ break;
+
+ /* Until we have the appropriate {ofb,cfb,ctr}_process()
+ functions, the following cases will return -ENOSYS if
+ there is no HW support for the mode. */
+ case CRYPTO_TFM_MODE_OFB:
+ if (CRA_CIPHER(tfm).cia_ofb)
+ cryptomultiblockfn = CRA_CIPHER(tfm).cia_ofb;
+ else
+ return -ENOSYS;
+ break;
+
+ case CRYPTO_TFM_MODE_CFB:
+ if (CRA_CIPHER(tfm).cia_cfb)
+ cryptomultiblockfn = CRA_CIPHER(tfm).cia_cfb;
+ else
+ return -ENOSYS;
+ break;
+
+ case CRYPTO_TFM_MODE_CTR:
+ if (CRA_CIPHER(tfm).cia_ctr)
+ cryptomultiblockfn = CRA_CIPHER(tfm).cia_ctr;
+ else
+ return -ENOSYS;
+ break;
+
+ default:
+ BUG();
+ }
+
+ if (cryptomultiblockfn)
+ bsize = (max_nbytes > nbytes) ? nbytes : max_nbytes;
+
+ /* Some hardware crypto engines may require a specific
+ alignment of the buffers. We will align the buffers
+ already here to avoid their reallocating later. */
+ gfp = in_atomic() ? GFP_ATOMIC : GFP_KERNEL;
+ tmp_src = crypto_aligned_kmalloc(bsize, gfp,
+ req_align, &index_src);
+ tmp_dst = crypto_aligned_kmalloc(bsize, gfp,
+ req_align, &index_dst);
+
+ if (!index_src || !index_dst) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
scatterwalk_start(&walk_in, src);
scatterwalk_start(&walk_out, dst);

@@ -81,7 +214,13 @@ static int crypt(struct crypto_tfm *tfm,

scatterwalk_copychunks(src_p, &walk_in, bsize, 0);

- prfn(tfm, dst_p, src_p, crfn, enc, info, in_place);
+ if (cryptomultiblockfn)
+ (*cryptomultiblockfn)(crypto_tfm_ctx(tfm),
+ dst_p, src_p, iv,
+ bsize, enc, in_place);
+ else
+ (*processfn)(tfm, dst_p, src_p, cryptofn,
+ enc, info, in_place);

scatterwalk_done(&walk_in, 0, nbytes);

@@ -89,46 +228,23 @@ static int crypt(struct crypto_tfm *tfm,
scatterwalk_done(&walk_out, 1, nbytes);

if (!nbytes)
- return 0;
+ goto out;

crypto_yield(tfm);
}
-}
-
-static void cbc_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
- cryptfn_t fn, int enc, void *info, int in_place)
-{
- u8 *iv = info;
-
- /* Null encryption */
- if (!iv)
- return;
-
- if (enc) {
- tfm->crt_u.cipher.cit_xor_block(iv, src);
- fn(crypto_tfm_ctx(tfm), dst, iv);
- memcpy(iv, dst, crypto_tfm_alg_blocksize(tfm));
- } else {
- u8 stack[in_place ? crypto_tfm_alg_blocksize(tfm) : 0];
- u8 *buf = in_place ? stack : dst;

- fn(crypto_tfm_ctx(tfm), buf, src);
- tfm->crt_u.cipher.cit_xor_block(buf, iv);
- memcpy(iv, src, crypto_tfm_alg_blocksize(tfm));
- if (buf != dst)
- memcpy(dst, buf, crypto_tfm_alg_blocksize(tfm));
- }
-}
+out:
+ if (index_src)
+ kfree(index_src);
+ if (index_dst)
+ kfree(index_dst);

-static void ecb_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
- cryptfn_t fn, int enc, void *info, int in_place)
-{
- fn(crypto_tfm_ctx(tfm), dst, src);
+ return ret;
}

static int setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int keylen)
{
- struct cipher_alg *cia = &tfm->__crt_alg->cra_cipher;
+ struct cipher_alg *cia = &CRA_CIPHER(tfm);

if (keylen < cia->cia_min_keysize || keylen > cia->cia_max_keysize) {
tfm->crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
@@ -138,80 +254,28 @@ static int setkey(struct crypto_tfm *tfm
&tfm->crt_flags);
}

-static int ecb_encrypt(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src, unsigned int nbytes)
-{
- return crypt(tfm, dst, src, nbytes,
- tfm->__crt_alg->cra_cipher.cia_encrypt,
- ecb_process, 1, NULL);
-}
+DEF_TFM_FUNCTION(ecb_encrypt, CRYPTO_TFM_MODE_ECB, CRYPTO_DIR_ENCRYPT, NULL);
+DEF_TFM_FUNCTION(ecb_decrypt, CRYPTO_TFM_MODE_ECB, CRYPTO_DIR_DECRYPT, NULL);

-static int ecb_decrypt(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes)
-{
- return crypt(tfm, dst, src, nbytes,
- tfm->__crt_alg->cra_cipher.cia_decrypt,
- ecb_process, 1, NULL);
-}
-
-static int cbc_encrypt(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes)
-{
- return crypt(tfm, dst, src, nbytes,
- tfm->__crt_alg->cra_cipher.cia_encrypt,
- cbc_process, 1, tfm->crt_cipher.cit_iv);
-}
-
-static int cbc_encrypt_iv(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes, u8 *iv)
-{
- return crypt(tfm, dst, src, nbytes,
- tfm->__crt_alg->cra_cipher.cia_encrypt,
- cbc_process, 1, iv);
-}
-
-static int cbc_decrypt(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes)
-{
- return crypt(tfm, dst, src, nbytes,
- tfm->__crt_alg->cra_cipher.cia_decrypt,
- cbc_process, 0, tfm->crt_cipher.cit_iv);
-}
-
-static int cbc_decrypt_iv(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes, u8 *iv)
-{
- return crypt(tfm, dst, src, nbytes,
- tfm->__crt_alg->cra_cipher.cia_decrypt,
- cbc_process, 0, iv);
-}
-
-static int nocrypt(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes)
-{
- return -ENOSYS;
-}
-
-static int nocrypt_iv(struct crypto_tfm *tfm,
- struct scatterlist *dst,
- struct scatterlist *src,
- unsigned int nbytes, u8 *iv)
-{
- return -ENOSYS;
-}
+DEF_TFM_FUNCTION(cbc_encrypt, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cbc_encrypt_iv, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(cbc_decrypt, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cbc_decrypt_iv, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_DECRYPT, iv);
+
+DEF_TFM_FUNCTION(cfb_encrypt, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cfb_encrypt_iv, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(cfb_decrypt, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cfb_decrypt_iv, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_DECRYPT, iv);
+
+DEF_TFM_FUNCTION(ofb_encrypt, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ofb_encrypt_iv, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(ofb_decrypt, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ofb_decrypt_iv, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_DECRYPT, iv);
+
+DEF_TFM_FUNCTION(ctr_encrypt, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ctr_encrypt_iv, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(ctr_decrypt, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ctr_decrypt_iv, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_DECRYPT, iv);

int crypto_init_cipher_flags(struct crypto_tfm *tfm, u32 flags)
{
@@ -245,17 +309,24 @@ int crypto_init_cipher_ops(struct crypto
break;

case CRYPTO_TFM_MODE_CFB:
- ops->cit_encrypt = nocrypt;
- ops->cit_decrypt = nocrypt;
- ops->cit_encrypt_iv = nocrypt_iv;
- ops->cit_decrypt_iv = nocrypt_iv;
+ ops->cit_encrypt = cfb_encrypt;
+ ops->cit_decrypt = cfb_decrypt;
+ ops->cit_encrypt_iv = cfb_encrypt_iv;
+ ops->cit_decrypt_iv = cfb_decrypt_iv;
+ break;
+
+ case CRYPTO_TFM_MODE_OFB:
+ ops->cit_encrypt = ofb_encrypt;
+ ops->cit_decrypt = ofb_decrypt;
+ ops->cit_encrypt_iv = ofb_encrypt_iv;
+ ops->cit_decrypt_iv = ofb_decrypt_iv;
break;

case CRYPTO_TFM_MODE_CTR:
- ops->cit_encrypt = nocrypt;
- ops->cit_decrypt = nocrypt;
- ops->cit_encrypt_iv = nocrypt_iv;
- ops->cit_decrypt_iv = nocrypt_iv;
+ ops->cit_encrypt = ctr_encrypt;
+ ops->cit_decrypt = ctr_decrypt;
+ ops->cit_encrypt_iv = ctr_encrypt_iv;
+ ops->cit_decrypt_iv = ctr_decrypt_iv;
break;

default:

2005-01-14 13:16:08

by Michal Ludvig

[permalink] [raw]
Subject: [PATCH 2/2] CryptoAPI: Update PadLock to process multiple blocks at once

Hi all,

Update to padlock-aes.c that enables processing of the whole buffer of
data at once with the given chaining mode (e.g. CBC). It brings much
higher speed over the case where the chaining is done in software by
CryptoAPI.

This is updated revision of the patch. Now it compiles even with GCC
2.95.3.

Signed-off-by: Michal Ludvig <[email protected]>

---

padlock-aes.c | 176 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 166 insertions(+), 10 deletions(-)

Index: linux-2.6.10/drivers/crypto/padlock-aes.c
===================================================================
--- linux-2.6.10.orig/drivers/crypto/padlock-aes.c 2005-01-11 14:01:05.000000000 +0100
+++ linux-2.6.10/drivers/crypto/padlock-aes.c 2005-01-11 23:40:26.000000000 +0100
@@ -369,19 +369,54 @@ aes_set_key(void *ctx_arg, const uint8_t

/* ====== Encryption/decryption routines ====== */

-/* This is the real call to PadLock. */
-static inline void
+/* These are the real calls to PadLock. */
+static inline void *
padlock_xcrypt_ecb(uint8_t *input, uint8_t *output, uint8_t *key,
- void *control_word, uint32_t count)
+ uint8_t *iv, void *control_word, uint32_t count)
{
asm volatile ("pushfl; popfl"); /* enforce key reload. */
asm volatile (".byte 0xf3,0x0f,0xa7,0xc8" /* rep xcryptecb */
: "+S"(input), "+D"(output)
: "d"(control_word), "b"(key), "c"(count));
+ return NULL;
+}
+
+static inline void *
+padlock_xcrypt_cbc(uint8_t *input, uint8_t *output, uint8_t *key,
+ uint8_t *iv, void *control_word, uint32_t count)
+{
+ asm volatile ("pushfl; popfl"); /* enforce key reload. */
+ asm volatile (".byte 0xf3,0x0f,0xa7,0xd0" /* rep xcryptcbc */
+ : "+S"(input), "+D"(output), "+a"(iv)
+ : "d"(control_word), "b"(key), "c"(count));
+ return iv;
+}
+
+static inline void *
+padlock_xcrypt_cfb(uint8_t *input, uint8_t *output, uint8_t *key,
+ uint8_t *iv, void *control_word, uint32_t count)
+{
+ asm volatile ("pushfl; popfl"); /* enforce key reload. */
+ asm volatile (".byte 0xf3,0x0f,0xa7,0xe0" /* rep xcryptcfb */
+ : "+S"(input), "+D"(output), "+a"(iv)
+ : "d"(control_word), "b"(key), "c"(count));
+ return iv;
+}
+
+static inline void *
+padlock_xcrypt_ofb(uint8_t *input, uint8_t *output, uint8_t *key,
+ uint8_t *iv, void *control_word, uint32_t count)
+{
+ asm volatile ("pushfl; popfl"); /* enforce key reload. */
+ asm volatile (".byte 0xf3,0x0f,0xa7,0xe8" /* rep xcryptofb */
+ : "+S"(input), "+D"(output), "+a"(iv)
+ : "d"(control_word), "b"(key), "c"(count));
+ return iv;
}

static void
-aes_padlock(void *ctx_arg, uint8_t *out_arg, const uint8_t *in_arg, int encdec)
+aes_padlock(void *ctx_arg, uint8_t *out_arg, const uint8_t *in_arg,
+ uint8_t *iv_arg, size_t nbytes, int encdec, int mode)
{
/* Don't blindly modify this structure - the items must
fit on 16-Bytes boundaries! */
@@ -419,21 +454,126 @@ aes_padlock(void *ctx_arg, uint8_t *out_
else
key = ctx->D;

- memcpy(data->buf, in_arg, AES_BLOCK_SIZE);
- padlock_xcrypt_ecb(data->buf, data->buf, key, &data->cword, 1);
- memcpy(out_arg, data->buf, AES_BLOCK_SIZE);
+ if (nbytes == AES_BLOCK_SIZE) {
+ /* Processing one block only => ECB is enough */
+ memcpy(data->buf, in_arg, AES_BLOCK_SIZE);
+ padlock_xcrypt_ecb(data->buf, data->buf, key, NULL,
+ &data->cword, 1);
+ memcpy(out_arg, data->buf, AES_BLOCK_SIZE);
+ } else {
+ /* Processing multiple blocks at once */
+ uint8_t *in, *out, *iv;
+ int gfp = in_atomic() ? GFP_ATOMIC : GFP_KERNEL;
+ void *index = NULL;
+
+ if (unlikely(((long)in_arg) & 0x0F)) {
+ in = crypto_aligned_kmalloc(nbytes, gfp, 16, &index);
+ memcpy(in, in_arg, nbytes);
+ }
+ else
+ in = (uint8_t*)in_arg;
+
+ if (unlikely(((long)out_arg) & 0x0F)) {
+ if (index)
+ out = in; /* xcrypt can work "in place" */
+ else
+ out = crypto_aligned_kmalloc(nbytes, gfp, 16,
+ &index);
+ }
+ else
+ out = out_arg;
+
+ /* Always make a local copy of IV - xcrypt may change it! */
+ iv = data->buf;
+ if (iv_arg)
+ memcpy(iv, iv_arg, AES_BLOCK_SIZE);
+
+ switch (mode) {
+ case CRYPTO_TFM_MODE_ECB:
+ iv = padlock_xcrypt_ecb(in, out, key, iv,
+ &data->cword,
+ nbytes/AES_BLOCK_SIZE);
+ break;
+
+ case CRYPTO_TFM_MODE_CBC:
+ iv = padlock_xcrypt_cbc(in, out, key, iv,
+ &data->cword,
+ nbytes/AES_BLOCK_SIZE);
+ break;
+
+ case CRYPTO_TFM_MODE_CFB:
+ iv = padlock_xcrypt_cfb(in, out, key, iv,
+ &data->cword,
+ nbytes/AES_BLOCK_SIZE);
+ break;
+
+ case CRYPTO_TFM_MODE_OFB:
+ iv = padlock_xcrypt_ofb(in, out, key, iv,
+ &data->cword,
+ nbytes/AES_BLOCK_SIZE);
+ break;
+
+ default:
+ BUG();
+ }
+
+ /* Back up IV */
+ if (iv && iv_arg)
+ memcpy(iv_arg, iv, AES_BLOCK_SIZE);
+
+ /* Copy the 16-Byte aligned output to the caller's buffer. */
+ if (out != out_arg)
+ memcpy(out_arg, out, nbytes);
+
+ if (index)
+ kfree(index);
+ }
+}
+
+static void
+aes_padlock_ecb(void *ctx, uint8_t *dst, const uint8_t *src,
+ uint8_t *iv, size_t nbytes, int encdec, int inplace)
+{
+ aes_padlock(ctx, dst, src, NULL, nbytes, encdec,
+ CRYPTO_TFM_MODE_ECB);
+}
+
+static void
+aes_padlock_cbc(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv,
+ size_t nbytes, int encdec, int inplace)
+{
+ aes_padlock(ctx, dst, src, iv, nbytes, encdec,
+ CRYPTO_TFM_MODE_CBC);
+}
+
+static void
+aes_padlock_cfb(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv,
+ size_t nbytes, int encdec, int inplace)
+{
+ aes_padlock(ctx, dst, src, iv, nbytes, encdec,
+ CRYPTO_TFM_MODE_CFB);
+}
+
+static void
+aes_padlock_ofb(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv,
+ size_t nbytes, int encdec, int inplace)
+{
+ aes_padlock(ctx, dst, src, iv, nbytes, encdec,
+ CRYPTO_TFM_MODE_OFB);
}

static void
aes_encrypt(void *ctx_arg, uint8_t *out, const uint8_t *in)
{
- aes_padlock(ctx_arg, out, in, CRYPTO_DIR_ENCRYPT);
+ aes_padlock(ctx_arg, out, in, NULL, AES_BLOCK_SIZE,
+ CRYPTO_DIR_ENCRYPT, CRYPTO_TFM_MODE_ECB);
}

static void
aes_decrypt(void *ctx_arg, uint8_t *out, const uint8_t *in)
{
- aes_padlock(ctx_arg, out, in, CRYPTO_DIR_DECRYPT);
+ aes_padlock(ctx_arg, out, in, NULL, AES_BLOCK_SIZE,
+ CRYPTO_DIR_DECRYPT, CRYPTO_TFM_MODE_ECB);
}

static struct crypto_alg aes_alg = {
@@ -454,9 +594,25 @@ static struct crypto_alg aes_alg = {
}
};

+static int disable_multiblock = 0;
+MODULE_PARM(disable_multiblock, "i");
+MODULE_PARM_DESC(disable_multiblock,
+ "Disable encryption of whole multiblock buffers.");
+
int __init padlock_init_aes(void)
{
- printk(KERN_NOTICE PFX "Using VIA PadLock ACE for AES algorithm.\n");
+ if (!disable_multiblock) {
+ aes_alg.cra_u.cipher.cia_max_nbytes = (size_t)-1;
+ aes_alg.cra_u.cipher.cia_req_align = 16;
+ aes_alg.cra_u.cipher.cia_ecb = aes_padlock_ecb;
+ aes_alg.cra_u.cipher.cia_cbc = aes_padlock_cbc;
+ aes_alg.cra_u.cipher.cia_cfb = aes_padlock_cfb;
+ aes_alg.cra_u.cipher.cia_ofb = aes_padlock_ofb;
+ }
+
+ printk(KERN_NOTICE PFX
+ "Using VIA PadLock ACE for AES algorithm%s.\n",
+ disable_multiblock ? "" : " (multiblock)");

gen_tabs();
return crypto_register_alg(&aes_alg);

2005-01-14 14:21:16

by Clemens Fruhwirth

[permalink] [raw]
Subject: Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time

On Fri, 2005-01-14 at 14:10 +0100, Michal Ludvig wrote:

> This patch extends crypto/cipher.c for offloading the whole chaining modes
> to e.g. hardware crypto accelerators. It is much faster to let the
> hardware do all the chaining if it can do so.

Is there any connection to Evgeniy Polyakov's acrypto work? It appears,
that there are two project for one objective. Would be nice to see both
parties pulling on one string.

> + void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> + size_t nbytes, int encdec, int inplace);
> + void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> + size_t nbytes, int encdec, int inplace);
> + void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> + size_t nbytes, int encdec, int inplace);
> + void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> + size_t nbytes, int encdec, int inplace);
> + void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> + size_t nbytes, int encdec, int inplace);

What's the use of adding mode specific functions to the tfm struct? And
why do they all have the same function type? For instance, the "iv" or
"inplace" argument is meaningless for ECB.

Have a look at
http://clemens.endorphin.org/patches/lrw/2-tweakable-cipher-interface.diff

This patch takes the following approach to handle the
cipher mode/interface issue:

Every mode is associated with one or more interfaces. This interface is
either cit_encrypt, cit_encrypt_iv, or cit_encrypt_tweaks. How these
interfaces are associated with cipher modes, is handled in
crypto_init_cipher_flags.

Except for CBC, every mode associates with just one interface. In CBC,
the CryptoAPI caller can use the IV interface to supply an IV, or use
the current tfm's IV by using cit_encrypt instead of cit_encrypt_iv.

I don't see a gain to through dozens of pointers into the tfm, as a tfm
is always assigned a single mode.

> /*
> * Generic encrypt/decrypt wrapper for ciphers, handles operations across
> @@ -47,22 +101,101 @@ static inline void xor_128(u8 *a, const
> static int crypt(struct crypto_tfm *tfm,
> struct scatterlist *dst,
> struct scatterlist *src,
> - unsigned int nbytes, cryptfn_t crfn,
> - procfn_t prfn, int enc, void *info)

Your patch heavily interferes with my cleanup patch for crypt(..). To
put it briefly, I consider crypt(..) a mess. The function definition of
crypto and the procfn_t function is just a patchwork of stuff, added
when needed.

I've rewritten a generic scatterwalker, that's a generic replacement for
crypto, that can apply any processing function with arbitrary argument
length to the data associated with a set of scatterlists. I think this
function shouldn't be in crypto/ but in some more generic location, as I
think it could be useful for much more things.

http://clemens.endorphin.org/patches/lrw/1-generic-scatterwalker.diff
is the generic scatterwalk patch.

int scatterwalk_walker_generic(void (function)(void *priv, int length,
void **buflist), void *priv, int steps, int nsl, ...)

"function" is applied to the scatterlist data.
"priv" is a private data structure for bookkeeping. It's supplied to the
function as first parameter.
"steps" is the number of times function is called.
"nsl" is the number of scatterlists following.

After "nsl", the scatterlists follow in a tuple of data:
<struct scatterlist *, int steplength, int ioflag>

ECB, for example:
...
struct ecb_process_priv priv = {
.tfm = tfm,
.crfn = tfm->__crt_alg->cra_cipher.cia_decrypt,
};
int bsize = crypto_tfm_alg_blocksize(tfm);
scatterwalk_walker_generic(ecb_process_gw, // processing function
&priv, // private data
nbytes/bsize, // number of steps
2, // number of scatterlists
dst, bsize, 1, // first, ioflag set to output
src, bsize, 0); // second, ioflag set to input

..
static void ecb_process_gw(void *_priv, int nsg, void **buf)
{
struct ecb_process_priv *priv = (struct ecb_process_priv *)_priv;
char *dst = buf[0]; // pointer to correctly kmapped and copied dst
char *src = buf[1]; // pointer to correctly kmapped and copied src
priv->crfn(crypto_tfm_ctx(priv->tfm), dst, src);
}

Well, I recognize that I'm somehow off-topic now. But, it demonstrates
clearly, why we should get rid of crypt(..) and replace it with
something more generic.

--
Fruhwirth Clemens <[email protected]> http://clemens.endorphin.org


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-01-14 16:41:31

by Michal Ludvig

[permalink] [raw]
Subject: Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time

On Fri, 14 Jan 2005, Fruhwirth Clemens wrote:

> On Fri, 2005-01-14 at 14:10 +0100, Michal Ludvig wrote:
>
> > This patch extends crypto/cipher.c for offloading the whole chaining modes
> > to e.g. hardware crypto accelerators. It is much faster to let the
> > hardware do all the chaining if it can do so.
>
> Is there any connection to Evgeniy Polyakov's acrypto work? It appears,
> that there are two project for one objective. Would be nice to see both
> parties pulling on one string.

These projects do not compete at all. Evgeniy's work is a complete
replacement for current cryptoapi and brings the asynchronous
operations at the first place. My patches are simple and straightforward
extensions to current cryptoapi that enable offloading the chaining to
hardware where possible.

> > + void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > + size_t nbytes, int encdec, int inplace);
> > + void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > + size_t nbytes, int encdec, int inplace);
> > + void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > + size_t nbytes, int encdec, int inplace);
> > + void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > + size_t nbytes, int encdec, int inplace);
> > + void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > + size_t nbytes, int encdec, int inplace);
>
> What's the use of adding mode specific functions to the tfm struct? And
> why do they all have the same function type? For instance, the "iv" or
> "inplace" argument is meaningless for ECB.

The prototypes must be the same in my implementation, because in crypt()
only a pointer to the appropriate mode function is taken and further used
as "(func*)(arg, arg, ...)".

BTW these functions are not added to "struct crypto_tfm", but to "struct
crypto_alg" which describes what a particular module supports (i.e. along
with the block size, algorithm name, etc). In this case it can say that
e.g. padlock.ko supports encryption in CBC mode in addition to a common
single-block processing.

BTW I'll look at the pointers of the tweakable api over the weekend...

Michal Ludvig
--
* A mouse is a device used to point at the xterm you want to type in.
* Personal homepage - http://www.logix.cz/michal

2005-01-15 12:46:01

by Clemens Fruhwirth

[permalink] [raw]
Subject: Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time

On Fri, 2005-01-14 at 17:40 +0100, Michal Ludvig wrote:
> > Is there any connection to Evgeniy Polyakov's acrypto work? It appears,
> > that there are two project for one objective. Would be nice to see both
> > parties pulling on one string.
>
> These projects do not compete at all. Evgeniy's work is a complete
> replacement for current cryptoapi and brings the asynchronous
> operations at the first place. My patches are simple and straightforward
> extensions to current cryptoapi that enable offloading the chaining to
> hardware where possible.

Fine, I just saw in Evgeniy's reply, that he took your padlock
implementation. I thought both of you have been working on different
implementations.

But actually both aim for the same goal. Hardware crypto-offloading.
With padlock the need for a async interface isn't that big, because it's
not "off-loading" as it's done on the same chip and in the same thread.

However, developing two different APIs isn't particular efficient. I
know, at the moment there isn't much choice, as J.Morris hasn't commited
to acrypto in anyway. But I think it would be good to replace the
synchronized CryptoAPI implementation altogether, put the missing
internals of CryptoAPI into acrypto, and back the interfaces of
CryptoAPI with small stubs, that do like

somereturnvalue synchronized_interface(..) {
acrypto_kick_some_operation(acrypto_struct);
wait_for_completion(acrypto_struct);
return fetch_result(acrypto_struct);
}

The other way round, a asynchron interface using a synchronized
interface doesn't seem natural to me.
(That doesn't mean I oppose your patches, merely that we should start to
think in different directions)

> > > + void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > > + size_t nbytes, int encdec, int inplace);
> > > + void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > > + size_t nbytes, int encdec, int inplace);
> > > + void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > > + size_t nbytes, int encdec, int inplace);
> > > + void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > > + size_t nbytes, int encdec, int inplace);
> > > + void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > > + size_t nbytes, int encdec, int inplace);
> >
> > What's the use of adding mode specific functions to the tfm struct? And
> > why do they all have the same function type? For instance, the "iv" or
> > "inplace" argument is meaningless for ECB.
>
> The prototypes must be the same in my implementation, because in crypt()
> only a pointer to the appropriate mode function is taken and further used
> as "(func*)(arg, arg, ...)".
>
> BTW these functions are not added to "struct crypto_tfm", but to "struct
> crypto_alg" which describes what a particular module supports (i.e. along
> with the block size, algorithm name, etc). In this case it can say that
> e.g. padlock.ko supports encryption in CBC mode in addition to a common
> single-block processing.

Err, right. I overlooked that it's cia and not cit. However, I don't
like the idea of extending structs when there is a new cipher mode. I
think the API should not have to be extended for every addition, but
should be designed for such extension right from the start.

What about a "selector" function, which returns the appropriate
encryption function for a mode?

typedef void (procfn_t)(struct crypto_tfm *, u8 *,
u8*, cryptfn_t, int enc, void *, int);

put
procfn_t (*cia_modesel)(u32 function, int iface, int encdec);
into struct crypto_alg;

then in crypto_init_cipher_ops, instead of

switch (tfm->crt_cipher.cit_mode) {
..
case CRYPTO_TFM_MODE_CFB:
ops->cit_encrypt = cfb_encrypt;
ops->cit_decrypt = cfb_decrypt;
..
}
we do,
struct cipher_alg *cia = &tfm->__crt_alg->cra_cipher;

switch (tfm->crt_cipher.cit_mode) {
..
case CRYPTO_TFM_MODE_CFB:
ops->cit_encrypt = cia->cia_modesel(cit_mode, 0, IFACE_ECB);
ops->cit_decrypt = cia->cia_modesel(cit_mode, 1, IFACE_ECB);
ops->cit_encrypt_iv = cia->cia_modesel(cit_mode, 0, IFACE_IV);
ops->cit_decrypt_iv = cia->cia_modesel(cit_mode, 1, IFACE_IV);
..

Alternatively, we could also add a lookup table. But I like this better,
since this is much easier to read for people, and tfm's aren't alloced
that often.

Probably, we can add a wrapper for cia_modesel, that when cia_modesel is
NULL, it falls back to the old behaviour. This way, we don't have to
patch all algorithm implementations to include cia_modesel.

How you like that idea?

--
Fruhwirth Clemens <[email protected]> http://clemens.endorphin.org


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-01-18 16:50:15

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time

On Sat, 15 Jan 2005, Fruhwirth Clemens wrote:

> However, developing two different APIs isn't particular efficient. I
> know, at the moment there isn't much choice, as J.Morris hasn't commited
> to acrypto in anyway.

There is also the OCF port (OpenBSD crypto framework) to consider, if
permission to dual license from the original authors can be obtained.


- James
--
James Morris
<[email protected]>


2005-01-20 03:38:37

by David McCullough

[permalink] [raw]
Subject: Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time


Jivin James Morris lays it down ...
> On Sat, 15 Jan 2005, Fruhwirth Clemens wrote:
>
> > However, developing two different APIs isn't particular efficient. I
> > know, at the moment there isn't much choice, as J.Morris hasn't commited
> > to acrypto in anyway.
>
> There is also the OCF port (OpenBSD crypto framework) to consider, if
> permission to dual license from the original authors can be obtained.

For anyone looking for the OCF port for linux, you can find the latest
release here:

http://lists.logix.cz/pipermail/cryptoapi/2004/000261.html

One of the drivers uses the existing kernel crypto API to implement
a SW crypto engine for OCF.

As for permission to use a dual license, I will gladly approach the
authors if others feel it is important to know the possibility of it at this
point,

Cheers,
Davidm

--
David McCullough, [email protected] Ph:+61 7 34352815 http://www.SnapGear.com
Custom Embedded Solutions + Security Fx:+61 7 38913630 http://www.uCdot.org

2005-01-20 13:47:26

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time

On Thu, 20 Jan 2005, David McCullough wrote:

> As for permission to use a dual license, I will gladly approach the
> authors if others feel it is important to know the possibility of it at this
> point,

Please do so. It would be useful to have the option of using an already
developed, debugged and analyzed framework.


- James
--
James Morris
<[email protected]>


2005-03-03 12:01:01

by David McCullough

[permalink] [raw]
Subject: Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time


Jivin James Morris lays it down ...
> On Thu, 20 Jan 2005, David McCullough wrote:
>
> > As for permission to use a dual license, I will gladly approach the
> > authors if others feel it is important to know the possibility of it at this
> > point,
>
> Please do so. It would be useful to have the option of using an already
> developed, debugged and analyzed framework.

Ok, I finally managed to get responses from all the individual
contributors, though none of the corporations contacted have responded.

While a good number of those contacted were happy to dual-license, most
are concerned that changes made under the GPL will not be available for
use in BSD. A couple were a definate no.

I have had offers to rewrite any portions that can not be dual-licensed,
but I think that is overkill for now unless there is significant
interest in taking that path.

Fortunately we have been able to obtain some funding to complete a large
amount of work on the project so it should have some nice progress in the
next couple of weeks as that ramps up :-)

Cheers,
Davidm

--
David McCullough, [email protected] Ph:+61 7 34352815 http://www.SnapGear.com
Custom Embedded Solutions + Security Fx:+61 7 38913630 http://www.uCdot.org