2007-11-24 20:07:24

by Tan Swee Heng

[permalink] [raw]
Subject: Possible bug in CTR (and Salsa20) for large test vectors

I seem to have encountered a bug when encrypting large test vectors.

TEST SETTINGS
I modified tcrypt.c to "ctr(aes,4,8,4)"-encrypt a single large test
vector defined as:
plaintext = 00...00 (4096 bytes)
key = 00...00 (32 bytes)
nonce = 00...00 (4 bytes)
iv = 00...00 (8 bytes)
counter = 4 bytes starting from 00...01.

OUTPUT
The output I obtained (using a modified hexdump() in tcrypt.c) is this:
3992 : e2 0e 5f 84 e9 05 ea 8f
4000 : 75 f7 2a ac 4f e9 70 12
4008 : 80 88 db 7a 50 0b 22 cd
4016 : 2d 23 c8 d1 90 8b 43 14
4024 : ae 69 5c 8a 49 ba e9 74
4032 : 71 2a 62 a2 14 14 df 86
4040 : a6 4c 51 5d 9a cc c3 9e
4048 : cd e5 24 14 68 8e 5b f6
4056 : 4d 4f 06 c6 f1 7c 29 03
4064 : ab c7 50 b2 8e da 2e 34
4072 : c7 e9 d2 50 99 86 32 d4
4080 : 44 35 62 42 ef 04 05 8d
4088 : 4c af 3c 8e be b9 f2 48

Compare with what I obtained using Crypto++:
3992 : e2 0e 5f 84 e9 05 ea 8f
4000 : 75 f7 2a ac 4f e9 70 12
4008 : 67 3e 05 ba c1 56 20 99
4016 : 80 88 db 7a 50 0b 22 cd
4024 : 2d 23 c8 d1 90 8b 43 14
4032 : ae 69 5c 8a 49 ba e9 74
4040 : 71 2a 62 a2 14 14 df 86
4048 : a6 4c 51 5d 9a cc c3 9e
4056 : cd e5 24 14 68 8e 5b f6
4064 : 4d 4f 06 c6 f1 7c 29 03
4072 : ab c7 50 b2 8e da 2e 34
4080 : c7 e9 d2 50 99 86 32 d4
4088 : 44 35 62 42 ef 04 05 8d

As we can see, half a block (8 bytes) went "missing" at the start of byte 4008.

DIAGNOSIS
I added a prink() after "while (walk.nbytes)" in crypto_ctr_crypt().
On my system, it shows that the 4096 bytes buffer is broken into two
parts such that the first walk.nbytes = 4008 and the second
walk.nbytes = 88. Since 4008 % 16 = 8, therefore bytes 4000 to 4007
are XORed with the first 8 bytes of AES(counter N). The remaining 8
bytes of AES(counter N) are still available and are meant for bytes
4008 to 4015. However they are discarded when N is incremented. Thus
when the 2nd part is encrypted, byte 4008 to 4015 are XORed with the
first 8 bytes of AES(counter N+1) instead. The rest of the stream
slips as a result of this.

RESOLUTION
One idea is to keep state on how many bytes in the AES(counter N)
buffer has been consumed so that N is not incremented unnecessarily.
But there may be inefficiencies associated with these additional
checks. If anyone has better ideas, please feel free to post it. (Note
that salsa20_generic.c has the same problem. So any great ideas for
CTR would be useful to me too. :-)

Swee Heng


2007-11-25 15:58:16

by Tan Swee Heng

[permalink] [raw]
Subject: Re: Possible bug in CTR (and Salsa20) for large test vectors

Hi,

It seems the CTR bug is due to crypto_ctr_crypt_inplace() and
crypto_ctr_crypt_segment() always returning 0. For the 4096 bytes test
case I previously described, they should return 8 when walk.nbytes =
4008 and refrain from processing the last 8 bytes. (This is based on
my current understanding of blkcipher. I may be wrong.)

To resolve it, I added a "remain" parameter to both
crypto_ctr_crypt_inplace() and crypto_ctr_crypt_segment(). This
"remain" parameter represents a remainder modulo bsize. When "remain"
is zero, both functions encrypt the full walk.nbytes data. When
"remain" is positive, both functions encrypt up to the block boundary
(i.e. walk.nbytes - walk.nbytes % bsize) and leave the "remain"-ing
bytes for the next loop.

The nbytes parameter in crypto_ctr_crypt() is used to keep track of
how many bytes have been processed so far. If walk.nbytes >= nbytes,
we know it is the last few blocks and hence we can set "remain" to 0.
Else we set "remain" to walk.nbytes % bsize.

Attached is the patch for ctr.c. (Note that it is against commit
19f4711b7a8f4bf5457a52873f82985a3762cd48 as I am having problems with
the latest givcipher patches on UML. Sorry about that.)

Signed-off-by: Tan Swee Heng <[email protected]>
---
By the way, such an approach is useful for Salsa20 too as eSTREAM
provided encrypt_blocks() and encrypt_bytes() APIsl, which is
essentially what I am trying to simulate with the "remain" parameter.



On Nov 25, 2007 4:07 AM, Tan Swee Heng <[email protected]> wrote:
> I seem to have encountered a bug when encrypting large test vectors.
>
> TEST SETTINGS
> I modified tcrypt.c to "ctr(aes,4,8,4)"-encrypt a single large test
> vector defined as:
> plaintext = 00...00 (4096 bytes)
> key = 00...00 (32 bytes)
> nonce = 00...00 (4 bytes)
> iv = 00...00 (8 bytes)
> counter = 4 bytes starting from 00...01.
>
> OUTPUT
> The output I obtained (using a modified hexdump() in tcrypt.c) is this:
> 3992 : e2 0e 5f 84 e9 05 ea 8f
> 4000 : 75 f7 2a ac 4f e9 70 12
> 4008 : 80 88 db 7a 50 0b 22 cd
> 4016 : 2d 23 c8 d1 90 8b 43 14
> 4024 : ae 69 5c 8a 49 ba e9 74
> 4032 : 71 2a 62 a2 14 14 df 86
> 4040 : a6 4c 51 5d 9a cc c3 9e
> 4048 : cd e5 24 14 68 8e 5b f6
> 4056 : 4d 4f 06 c6 f1 7c 29 03
> 4064 : ab c7 50 b2 8e da 2e 34
> 4072 : c7 e9 d2 50 99 86 32 d4
> 4080 : 44 35 62 42 ef 04 05 8d
> 4088 : 4c af 3c 8e be b9 f2 48
>
> Compare with what I obtained using Crypto++:
> 3992 : e2 0e 5f 84 e9 05 ea 8f
> 4000 : 75 f7 2a ac 4f e9 70 12
> 4008 : 67 3e 05 ba c1 56 20 99
> 4016 : 80 88 db 7a 50 0b 22 cd
> 4024 : 2d 23 c8 d1 90 8b 43 14
> 4032 : ae 69 5c 8a 49 ba e9 74
> 4040 : 71 2a 62 a2 14 14 df 86
> 4048 : a6 4c 51 5d 9a cc c3 9e
> 4056 : cd e5 24 14 68 8e 5b f6
> 4064 : 4d 4f 06 c6 f1 7c 29 03
> 4072 : ab c7 50 b2 8e da 2e 34
> 4080 : c7 e9 d2 50 99 86 32 d4
> 4088 : 44 35 62 42 ef 04 05 8d
>
> As we can see, half a block (8 bytes) went "missing" at the start of byte 4008.
>
> DIAGNOSIS
> I added a prink() after "while (walk.nbytes)" in crypto_ctr_crypt().
> On my system, it shows that the 4096 bytes buffer is broken into two
> parts such that the first walk.nbytes = 4008 and the second
> walk.nbytes = 88. Since 4008 % 16 = 8, therefore bytes 4000 to 4007
> are XORed with the first 8 bytes of AES(counter N). The remaining 8
> bytes of AES(counter N) are still available and are meant for bytes
> 4008 to 4015. However they are discarded when N is incremented. Thus
> when the 2nd part is encrypted, byte 4008 to 4015 are XORed with the
> first 8 bytes of AES(counter N+1) instead. The rest of the stream
> slips as a result of this.
>
> RESOLUTION
> One idea is to keep state on how many bytes in the AES(counter N)
> buffer has been consumed so that N is not incremented unnecessarily.
> But there may be inefficiencies associated with these additional
> checks. If anyone has better ideas, please feel free to post it. (Note
> that salsa20_generic.c has the same problem. So any great ideas for
> CTR would be useful to me too. :-)
>
> Swee Heng
>


Attachments:
(No filename) (3.99 kB)
patch1-ctr.txt (3.76 kB)
Download all attachments

2007-11-26 13:21:59

by Herbert Xu

[permalink] [raw]
Subject: Re: Possible bug in CTR (and Salsa20) for large test vectors

On Sun, Nov 25, 2007 at 11:58:15PM +0800, Tan Swee Heng wrote:
>
> It seems the CTR bug is due to crypto_ctr_crypt_inplace() and
> crypto_ctr_crypt_segment() always returning 0. For the 4096 bytes test
> case I previously described, they should return 8 when walk.nbytes =
> 4008 and refrain from processing the last 8 bytes. (This is based on
> my current understanding of blkcipher. I may be wrong.)

Thanks for the report and patch!

Here is my attempt at fixing it and optimising it a bit more.
The idea is that we should only process partial blocks when
we're at the end. Otherwise we should do whole blocks.

Please let me know if this works or not.

Oh any chance you could post a tcrypt patch for the large test?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/crypto/ctr.c b/crypto/ctr.c
index b816e95..96a81ab 100644
--- a/crypto/ctr.c
+++ b/crypto/ctr.c
@@ -59,6 +59,21 @@ static int crypto_ctr_setkey(struct crypto_tfm *parent, const u8 *key,
return err;
}

+static void crypto_ctr_crypt_final(struct blkcipher_walk *walk,
+ struct crypto_cipher *tfm, u8 *ctrblk,
+ unsigned int countersize)
+{
+ unsigned int bsize = crypto_cipher_blocksize(tfm);
+ u8 *keystream = ctrblk + bsize;
+ u8 *src = walk->src.virt.addr;
+ u8 *dst = walk->dst.virt.addr;
+ unsigned int nbytes = walk->nbytes;
+
+ crypto_cipher_encrypt_one(tfm, keystream, ctrblk);
+ crypto_xor(keystream, src, nbytes);
+ memcpy(dst, keystream, nbytes);
+}
+
static int crypto_ctr_crypt_segment(struct blkcipher_walk *walk,
struct crypto_cipher *tfm, u8 *ctrblk,
unsigned int countersize)
@@ -66,33 +81,21 @@ static int crypto_ctr_crypt_segment(struct blkcipher_walk *walk,
void (*fn)(struct crypto_tfm *, u8 *, const u8 *) =
crypto_cipher_alg(tfm)->cia_encrypt;
unsigned int bsize = crypto_cipher_blocksize(tfm);
- unsigned long alignmask = crypto_cipher_alignmask(tfm) |
- (__alignof__(u32) - 1);
- u8 ks[bsize + alignmask];
- u8 *keystream = (u8 *)ALIGN((unsigned long)ks, alignmask + 1);
u8 *src = walk->src.virt.addr;
u8 *dst = walk->dst.virt.addr;
unsigned int nbytes = walk->nbytes;

do {
/* create keystream */
- fn(crypto_cipher_tfm(tfm), keystream, ctrblk);
- crypto_xor(keystream, src, min(nbytes, bsize));
-
- /* copy result into dst */
- memcpy(dst, keystream, min(nbytes, bsize));
+ fn(crypto_cipher_tfm(tfm), dst, ctrblk);
+ crypto_xor(dst, src, bsize);

/* increment counter in counterblock */
crypto_inc(ctrblk + bsize - countersize, countersize);

- if (nbytes < bsize)
- break;
-
src += bsize;
dst += bsize;
- nbytes -= bsize;
-
- } while (nbytes);
+ } while ((nbytes -= bsize) >= bsize);

return 0;
}
@@ -104,28 +107,20 @@ static int crypto_ctr_crypt_inplace(struct blkcipher_walk *walk,
void (*fn)(struct crypto_tfm *, u8 *, const u8 *) =
crypto_cipher_alg(tfm)->cia_encrypt;
unsigned int bsize = crypto_cipher_blocksize(tfm);
- unsigned long alignmask = crypto_cipher_alignmask(tfm) |
- (__alignof__(u32) - 1);
unsigned int nbytes = walk->nbytes;
u8 *src = walk->src.virt.addr;
- u8 ks[bsize + alignmask];
- u8 *keystream = (u8 *)ALIGN((unsigned long)ks, alignmask + 1);
+ u8 *keystream = ctrblk + bsize;

do {
/* create keystream */
fn(crypto_cipher_tfm(tfm), keystream, ctrblk);
- crypto_xor(src, keystream, min(nbytes, bsize));
+ crypto_xor(src, keystream, bsize);

/* increment counter in counterblock */
crypto_inc(ctrblk + bsize - countersize, countersize);

- if (nbytes < bsize)
- break;
-
src += bsize;
- nbytes -= bsize;
-
- } while (nbytes);
+ } while ((nbytes -= bsize) >= bsize);

return 0;
}
@@ -143,7 +138,7 @@ static int crypto_ctr_crypt(struct blkcipher_desc *desc,
crypto_instance_ctx(crypto_tfm_alg_instance(&tfm->base));
unsigned long alignmask = crypto_cipher_alignmask(child) |
(__alignof__(u32) - 1);
- u8 cblk[bsize + alignmask];
+ u8 cblk[bsize * 2 + alignmask];
u8 *counterblk = (u8 *)ALIGN((unsigned long)cblk, alignmask + 1);
int err;

@@ -158,7 +153,7 @@ static int crypto_ctr_crypt(struct blkcipher_desc *desc,
/* initialize counter portion of counter block */
crypto_inc(counterblk + bsize - ictx->countersize, ictx->countersize);

- while (walk.nbytes) {
+ while (walk.nbytes >= bsize) {
if (walk.src.virt.addr == walk.dst.virt.addr)
nbytes = crypto_ctr_crypt_inplace(&walk, child,
counterblk,
@@ -170,6 +165,13 @@ static int crypto_ctr_crypt(struct blkcipher_desc *desc,

err = blkcipher_walk_done(desc, &walk, nbytes);
}
+
+ if (walk.nbytes) {
+ crypto_ctr_crypt_final(&walk, child, counterblk,
+ ictx->countersize);
+ err = blkcipher_walk_done(desc, &walk, nbytes);
+ }
+
return err;
}

@@ -277,7 +279,7 @@ static struct crypto_instance *crypto_ctr_alloc(struct rtattr **tb)
inst->alg.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER;
inst->alg.cra_priority = alg->cra_priority;
inst->alg.cra_blocksize = 1;
- inst->alg.cra_alignmask = __alignof__(u32) - 1;
+ inst->alg.cra_alignmask = alg->cra_alignmask | (__alignof__(u32) - 1);
inst->alg.cra_type = &crypto_blkcipher_type;

inst->alg.cra_blkcipher.ivsize = ivsize;

2007-11-27 23:09:10

by Tan Swee Heng

[permalink] [raw]
Subject: Re: Possible bug in CTR (and Salsa20) for large test vectors

Hi Herbert,

On Nov 26, 2007 9:21 PM, Herbert Xu <[email protected]> wrote:
> Here is my attempt at fixing it and optimising it a bit more.
> The idea is that we should only process partial blocks when
> we're at the end. Otherwise we should do whole blocks.
Great patch! I like it more than mine as it captures the essence of
the problem better.

> Please let me know if this works or not.
It works if we change the following:
(1.) both crypto_ctr_crypt_segment() and crypto_ctr_crypt_inplace()
returns "nbytes" instead of 0;
(2.) blkcipher_walk_done() is called with 0 and not "nbytes" after
crypto_ctr_crypt_final().
The final patch (combining yours and the two fixes) is attached to this email.

> Oh any chance you could post a tcrypt patch for the large test?
Sure. I will send it separately. But it will cause considerable bloat
in tcrypt.ko.

One question: Can we use crypto_cipher_encrypt_one() in both
_segment() and _inplace()? It reads better than the function pointer
fn().

Regards,
Swee Heng


Attachments:
(No filename) (0.99 kB)
patch1-ctr_bug.txt (4.44 kB)
Download all attachments

2007-11-29 13:26:22

by Herbert Xu

[permalink] [raw]
Subject: Re: Possible bug in CTR (and Salsa20) for large test vectors

On Wed, Nov 28, 2007 at 07:09:08AM +0800, Tan Swee Heng wrote:
>
> The final patch (combining yours and the two fixes) is attached to this email.

Thanks! I've added it to cryptodev-2.6.

> One question: Can we use crypto_cipher_encrypt_one() in both
> _segment() and _inplace()? It reads better than the function pointer
> fn().

Normally that's what we'd use. However, these two functions are
performance-critical and reading the pointer out potentially
allows better optimisations since the compiler now knows that
we're calling the same function rather than potentially different
functions each time.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt