2016-06-28 12:37:43

by George Spelvin

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

Just a note, crypto/cts.c also does a lot of sg_set_buf() in stack buffers.

I have a local patch (appended, if anyone wants) to reduce the wasteful
amount of buffer space it uses (from 7 to 3 blocks on encrypt, from
6 to 3 blocks on decrypt), but it would take some rework to convert to
crypto_cipher_encrypt_one() or avoid stack buffers entirely.

commit c0aa0ae38dc6115b378939c5483ba6c7eb65d92a
Author: George Spelvin <[email protected]>
Date: Sat Oct 10 17:26:08 2015 -0400

crypto: cts - Reduce internal buffer usage

It only takes a 3-block temporary buffer to handle all the tricky
CTS cases. Encryption could in theory be done with two, but at a cost
in complexity.

But it's still a saving from the previous six blocks on the stack.

One issue I'm uncertain of and I'd like clarification on: to simplify
the cts_cbc_{en,de}crypt calls, I pass in the lcldesc structure which
contains the ctx->child transform rather than the parent one. I'm
assuming the block sizes are guaranteed to be the same (they're set up
in crypto_cts_alloc by copying), but I haven't been able to prove it to
my satisfaction.

Signed-off-by: George Spelvin <[email protected]>

diff --git a/crypto/cts.c b/crypto/cts.c
index e467ec0ac..e24d2e15 100644
--- a/crypto/cts.c
+++ b/crypto/cts.c
@@ -70,54 +70,44 @@ static int crypto_cts_setkey(struct crypto_tfm *parent, const u8 *key,
return err;
}

-static int cts_cbc_encrypt(struct crypto_cts_ctx *ctx,
- struct blkcipher_desc *desc,
+/*
+ * The final CTS encryption is just like CBC encryption except that:
+ * - the last plaintext block is zero-padded,
+ * - the second-last ciphertext block is trimmed, and
+ * - the last (complete) block of ciphertext is output before the
+ * (truncated) second-last one.
+ */
+static int cts_cbc_encrypt(struct blkcipher_desc *lcldesc,
struct scatterlist *dst,
struct scatterlist *src,
unsigned int offset,
unsigned int nbytes)
{
- int bsize = crypto_blkcipher_blocksize(desc->tfm);
- u8 tmp[bsize], tmp2[bsize];
- struct blkcipher_desc lcldesc;
- struct scatterlist sgsrc[1], sgdst[1];
+ int bsize = crypto_blkcipher_blocksize(lcldesc->tfm);
+ u8 tmp[3*bsize] __aligned(8);
+ struct scatterlist sgsrc[1], sgdst[2];
int lastn = nbytes - bsize;
- u8 iv[bsize];
- u8 s[bsize * 2], d[bsize * 2];
int err;

- if (lastn < 0)
+ if (lastn <= 0)
return -EINVAL;

- sg_init_table(sgsrc, 1);
- sg_init_table(sgdst, 1);
-
- memset(s, 0, sizeof(s));
- scatterwalk_map_and_copy(s, src, offset, nbytes, 0);
-
- memcpy(iv, desc->info, bsize);
-
- lcldesc.tfm = ctx->child;
- lcldesc.info = iv;
- lcldesc.flags = desc->flags;
-
- sg_set_buf(&sgsrc[0], s, bsize);
- sg_set_buf(&sgdst[0], tmp, bsize);
- err = crypto_blkcipher_encrypt_iv(&lcldesc, sgdst, sgsrc, bsize);
-
- memcpy(d + bsize, tmp, lastn);
-
- lcldesc.info = tmp;
-
- sg_set_buf(&sgsrc[0], s + bsize, bsize);
- sg_set_buf(&sgdst[0], tmp2, bsize);
- err = crypto_blkcipher_encrypt_iv(&lcldesc, sgdst, sgsrc, bsize);
-
- memcpy(d, tmp2, bsize);
-
- scatterwalk_map_and_copy(d, dst, offset, nbytes, 1);
-
- memcpy(desc->info, tmp2, bsize);
+ /* Copy the input to a temporary buffer; tmp = xxx, P[n-1], P[n] */
+ memset(tmp+2*bsize, 0, bsize);
+ scatterwalk_map_and_copy(tmp+bsize, src, offset, nbytes, 0);
+
+ sg_init_one(sgsrc, tmp+bsize, 2*bsize);
+ /* Initialize dst specially to do the rearrangement for us */
+ sg_init_table(sgdst, 2);
+ sg_set_buf(sgdst+0, tmp+bsize, bsize);
+ sg_set_buf(sgdst+1, tmp, bsize);
+
+ /* CBC-encrypt in place the two blocks; tmp = C[n], C[n-1], P[n] */
+ err = crypto_blkcipher_encrypt_iv(lcldesc, sgdst, sgsrc, 2*bsize);
+
+ /* Copy beginning of tmp to the output */
+ scatterwalk_map_and_copy(tmp, dst, offset, nbytes, 1);
+ memzero_explicit(tmp, sizeof(tmp));

return err;
}
@@ -126,8 +116,8 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc,
struct scatterlist *dst, struct scatterlist *src,
unsigned int nbytes)
{
- struct crypto_cts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
int bsize = crypto_blkcipher_blocksize(desc->tfm);
+ struct crypto_cts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
int tot_blocks = (nbytes + bsize - 1) / bsize;
int cbc_blocks = tot_blocks > 2 ? tot_blocks - 2 : 0;
struct blkcipher_desc lcldesc;
@@ -140,14 +130,14 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc,
if (tot_blocks == 1) {
err = crypto_blkcipher_encrypt_iv(&lcldesc, dst, src, bsize);
} else if (nbytes <= bsize * 2) {
- err = cts_cbc_encrypt(ctx, desc, dst, src, 0, nbytes);
+ err = cts_cbc_encrypt(&lcldesc, dst, src, 0, nbytes);
} else {
/* do normal function for tot_blocks - 2 */
err = crypto_blkcipher_encrypt_iv(&lcldesc, dst, src,
cbc_blocks * bsize);
if (err == 0) {
/* do cts for final two blocks */
- err = cts_cbc_encrypt(ctx, desc, dst, src,
+ err = cts_cbc_encrypt(&lcldesc, dst, src,
cbc_blocks * bsize,
nbytes - (cbc_blocks * bsize));
}
@@ -156,64 +146,68 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc,
return err;
}

-static int cts_cbc_decrypt(struct crypto_cts_ctx *ctx,
- struct blkcipher_desc *desc,
+/*
+ * Decrypting the final two blocks in CTS is a bit trickier;
+ * it has to be done in two separate steps.
+ *
+ * The last two blocks of the CTS ciphertext are (first) the
+ * last block C[n] of the equivalent zero-padded CBC encryption,
+ * followed by a truncated version of the second-last block C[n-1].
+ *
+ * Expressed in terms of CBC decryption (P[i] = decrypt(C[i]) ^ IV),
+ * CTS decryption can be expressed as:
+ * - Pad C[n-1] with zeros to get an IV for C[n].
+ * - CBC-decrypt C[n] to get an intermediate plaintext buffer P.
+ * - P[n] is the prefix of P (1..bsize bytes).
+ * - The suffix of P (0..bzize-1 bytes) is the missing part of C[n-1].
+ * - CBC-decrypt that C[n-1], with the incoming IV, to recover P[n-1].
+ */
+static int cts_cbc_decrypt(struct blkcipher_desc *lcldesc,
struct scatterlist *dst,
struct scatterlist *src,
unsigned int offset,
unsigned int nbytes)
{
- int bsize = crypto_blkcipher_blocksize(desc->tfm);
- u8 tmp[bsize];
- struct blkcipher_desc lcldesc;
+ int bsize = crypto_blkcipher_blocksize(lcldesc->tfm);
+ u8 tmp[3*bsize] __aligned(8);
struct scatterlist sgsrc[1], sgdst[1];
int lastn = nbytes - bsize;
- u8 iv[bsize];
- u8 s[bsize * 2], d[bsize * 2];
+ u8 *orig_iv;
int err;

- if (lastn < 0)
+ if (lastn <= 0)
return -EINVAL;

- sg_init_table(sgsrc, 1);
- sg_init_table(sgdst, 1);
+ /* 1. Copy source into tmp, zero-padded; tmp = C[n], C[n-1]+0, xxx */
+ memset(tmp+bsize, 0, bsize);
+ scatterwalk_map_and_copy(tmp, src, offset, nbytes, 0);

- scatterwalk_map_and_copy(s, src, offset, nbytes, 0);
-
- lcldesc.tfm = ctx->child;
- lcldesc.info = iv;
- lcldesc.flags = desc->flags;
-
- /* 1. Decrypt Cn-1 (s) to create Dn (tmp)*/
- memset(iv, 0, sizeof(iv));
- sg_set_buf(&sgsrc[0], s, bsize);
- sg_set_buf(&sgdst[0], tmp, bsize);
- err = crypto_blkcipher_decrypt_iv(&lcldesc, sgdst, sgsrc, bsize);
+ /* 2. Decrypt C[n] into P; tmp = C[n], C[n-1]+0, P */
+ sg_init_one(sgsrc, tmp, bsize);
+ sg_init_one(sgdst, tmp+2*bsize, bsize);
+ orig_iv = lcldesc->info;
+ lcldesc->info = tmp+bsize; /* IV for decryption: padded C[n-1] */
+ err = crypto_blkcipher_decrypt_iv(lcldesc, sgdst, sgsrc, bsize);
if (err)
- return err;
- /* 2. Pad Cn with zeros at the end to create C of length BB */
- memset(iv, 0, sizeof(iv));
- memcpy(iv, s + bsize, lastn);
- /* 3. Exclusive-or Dn (tmp) with C (iv) to create Xn (tmp) */
- crypto_xor(tmp, iv, bsize);
- /* 4. Select the first Ln bytes of Xn (tmp) to create Pn */
- memcpy(d + bsize, tmp, lastn);
+ goto cleanup;

- /* 5. Append the tail (BB - Ln) bytes of Xn (tmp) to Cn to create En */
- memcpy(s + bsize + lastn, tmp + lastn, bsize - lastn);
- /* 6. Decrypt En to create Pn-1 */
- memzero_explicit(iv, sizeof(iv));
+ /* 3. Copy tail of P to C[n-1]; tmp = C[n], C[n-1], P */
+ memcpy(tmp+bsize + lastn, tmp+2*bsize + lastn, bsize - lastn);

- sg_set_buf(&sgsrc[0], s + bsize, bsize);
- sg_set_buf(&sgdst[0], d, bsize);
- err = crypto_blkcipher_decrypt_iv(&lcldesc, sgdst, sgsrc, bsize);
+ /* 4. Decrypt C[n-1] in place; tmp = C[n], P[n-1], P */
+ sg_set_buf(sgsrc, tmp + bsize, bsize);
+ sg_set_buf(sgdst, tmp + bsize, bsize);
+ lcldesc->info = orig_iv;
+ err = crypto_blkcipher_decrypt_iv(lcldesc, sgdst, sgsrc, bsize);

- /* XOR with previous block */
- crypto_xor(d, desc->info, bsize);
+ /* 5. Copy P[n-1] and head of P to output */
+ scatterwalk_map_and_copy(tmp+bsize, dst, offset, nbytes, 1);

- scatterwalk_map_and_copy(d, dst, offset, nbytes, 1);
+ /* C[n] is the continuing IV (if anyone cares) */
+ memcpy(lcldesc->info, tmp, bsize);

- memcpy(desc->info, s, bsize);
+cleanup:
+ memzero_explicit(tmp, sizeof(tmp));
return err;
}

@@ -235,14 +229,14 @@ static int crypto_cts_decrypt(struct blkcipher_desc *desc,
if (tot_blocks == 1) {
err = crypto_blkcipher_decrypt_iv(&lcldesc, dst, src, bsize);
} else if (nbytes <= bsize * 2) {
- err = cts_cbc_decrypt(ctx, desc, dst, src, 0, nbytes);
+ err = cts_cbc_decrypt(&lcldesc, dst, src, 0, nbytes);
} else {
/* do normal function for tot_blocks - 2 */
err = crypto_blkcipher_decrypt_iv(&lcldesc, dst, src,
cbc_blocks * bsize);
if (err == 0) {
/* do cts for final two blocks */
- err = cts_cbc_decrypt(ctx, desc, dst, src,
+ err = cts_cbc_decrypt(&lcldesc, dst, src,
cbc_blocks * bsize,
nbytes - (cbc_blocks * bsize));
}


2016-06-29 12:10:24

by George Spelvin

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

>> Also not mentioned in the documentation is that some algorithms *do*
>> have different implementations depending on key size. SHA-2 is the
>> classic example.

> What do you mean by that? SHA has no keying at all.

In this case, the analagous property is hash size. Sorry, I thought
that was so obvious I didn't need to say it.

Specifically, SHA2-256 (and -224) and SHA2-512 (and -384) are separate
algorithms with similar structures but deparate implementations.

2016-06-29 02:20:49

by Herbert Xu

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Tue, Jun 28, 2016 at 10:32:12AM -0400, George Spelvin wrote:
>
> - struct crypto_instance
> - struct crypto_spawn
> - struct crypto_blkcipher
> - struct blkcipher_desc
> - More on the context structures returned by crypto_tfm_ctx

blkcipher is obsolete and will be removed soon. So if you are
going to write this then please document skcipher instead.

> Also not mentioned in the documentation is that some algorithms *do*
> have different implementations depending on key size. SHA-2 is the
> classic example.

What do you mean by that? SHA has no keying at all.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-06-28 14:32:12

by George Spelvin

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

> We have actually gained quite a bit of documentation recently.
> Have you looked at Documentation/DocBook/crypto-API.tmpl?
>
> More is always welcome of course.

It's improved since I last looked at it, but there are still many structures
that aren't described:

- struct crypto_instance
- struct crypto_spawn
- struct crypto_blkcipher
- struct blkcipher_desc
- More on the context structures returned by crypto_tfm_ctx

Also not mentioned in the documentation is that some algorithms *do*
have different implementations depending on key size. SHA-2 is the
classic example.

2016-06-28 13:30:50

by Herbert Xu

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Tue, Jun 28, 2016 at 09:23:01AM -0400, George Spelvin wrote:
>
> Wow, I should see how you do that. I couldn't get it below 3
> blocks of temporary, and the dst SG list only gives you
> one and a half.

I don't mean that I'm using no temporary buffers at all, just
that the actual crypto only operates on the SG lists. I'm still
doing the xoring and stitching in temp buffers. I just counted
and I'm using three blocks like you.

> Is net/sunrpc/auth_gss/gss_krb5_mech.c doing something odd?

Yes gss_krb5_crypto.c is the one.

> I have a request of you: like Andy, I find the crypto layer an
> impenetrable thicket of wrapper structures. I'm not suggesting there
> aren't reasons for it, but it's extremely hard to infer those reasons by
> looking at the code. If I were to draft a (hilariously wrong) overview
> document, would you be willing to edit it into correctness?

We have actually gained quite a bit of documentation recently.
Have you looked at Documentation/DocBook/crypto-API.tmpl?

More is always welcome of course.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-06-28 13:23:01

by George Spelvin

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

Herbert Xu wrote:
> I'm currently working on cts and I'm removing the stack usage
> altogether by having it operate on the src/dst SG lists only.

Wow, I should see how you do that. I couldn't get it below 3
blocks of temporary, and the dst SG list only gives you
one and a half.

> BTW, the only cts user in our tree appears to be implementing
> CTS all over again and is only calling the crypto API cts for
> the last two blocks. Someone should fix that.

Hint taken. Although I'm having a hard time finding that only user
amidst all the drivers thinking it means Clear To Send or (for HDMI)
Cycle Time Stamp.

Um...the uses in fs/crypto/keyinfo.c and fs/ext4/crypto_key.c
don't seem to do anything untoward.

Is net/sunrpc/auth_gss/gss_krb5_mech.c doing something odd?


I have a request of you: like Andy, I find the crypto layer an
impenetrable thicket of wrapper structures. I'm not suggesting there
aren't reasons for it, but it's extremely hard to infer those reasons by
looking at the code. If I were to draft a (hilariously wrong) overview
document, would you be willing to edit it into correctness?

2016-06-28 12:42:41

by Herbert Xu

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Tue, Jun 28, 2016 at 08:37:43AM -0400, George Spelvin wrote:
> Just a note, crypto/cts.c also does a lot of sg_set_buf() in stack buffers.
>
> I have a local patch (appended, if anyone wants) to reduce the wasteful
> amount of buffer space it uses (from 7 to 3 blocks on encrypt, from
> 6 to 3 blocks on decrypt), but it would take some rework to convert to
> crypto_cipher_encrypt_one() or avoid stack buffers entirely.

I'm currently working on cts and I'm removing the stack usage
altogether by having it operate on the src/dst SG lists only.

It's part of the skcipher conversion though so it'll have to go
through the crypto tree.

BTW, the only cts user in our tree appears to be implementing
CTS all over again and is only calling the crypto API cts for
the last two blocks. Someone should fix that.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt