2016-06-21 17:45:02

by Andy Lutomirski

[permalink] [raw]
Subject: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

net/bluetooth/smp.c (in smp_e) wants to do AES-ECB on a 16-byte stack
buffer, which seems eminently reasonable to me. It does it like this:

sg_init_one(&sg, data, 16);

skcipher_request_set_tfm(req, tfm);
skcipher_request_set_callback(req, 0, NULL, NULL);
skcipher_request_set_crypt(req, &sg, &sg, 16, NULL);

err = crypto_skcipher_encrypt(req);
skcipher_request_zero(req);
if (err)
BT_ERR("Encrypt data error %d", err);

I tried to figure out what exactly that does, and I got like in so
many layers of indirection that I mostly gave up. But it appears to
map the stack address to a physical address, stick it in a
scatterlist, follow several function pointers, go through a bunch of
"scatterwalk" indirection, and call blkcipher_next_fast, which calls
blkcipher_map_src, which calls scatterwalk_map, which calls
kmap_atomic (!). This is anything but fast.

I think this code has several serious problems:

- It uses kmap_atomic to access a 16-byte stack buffer. This is absurd.

- It blows up if the stack is in vmalloc space, because you can't
virt_to_phys on the stack buffer in the first place. (This is why I
care.) And I really, really don't want to write sg_init_stack to
create a scatterlist that points to the stack, although such a thing
could be done if absolutely necessary.

- It's very, very compllicated and it does something very, very
simple (call aes_encrypt on a plain old u8 *).

Oh yeah, it sets CRYPTO_ALG_ASYNC, too. I can't even figure out what
that does because the actual handling of that flag is so many layers
deep.

Is there a straightforward way that bluetooth and, potentially, other
drivers can just do synchronous crypto in a small buffer specified by
its virtual address? The actual cryptography part of the crypto code
already works this way, but I can't find an API for it.

--Andy


2016-06-22 00:58:00

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Tue, Jun 21, 2016 at 5:42 PM, Herbert Xu <[email protected]> wrote:
> On Tue, Jun 21, 2016 at 10:43:40AM -0700, Andy Lutomirski wrote:
>>
>> Is there a straightforward way that bluetooth and, potentially, other
>> drivers can just do synchronous crypto in a small buffer specified by
>> its virtual address? The actual cryptography part of the crypto code
>> already works this way, but I can't find an API for it.
>
> Yes, single block users should use crypto_cipher_encrypt_one, an
> example would be drivers/md/dm-crypt.c.
>

Aha! I expected something like that to exist, but I couldn't find it.
I'll change the two offenders I've found so far to use it.

--Andy

2016-06-22 00:42:14

by Herbert Xu

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Tue, Jun 21, 2016 at 10:43:40AM -0700, Andy Lutomirski wrote:
>
> Is there a straightforward way that bluetooth and, potentially, other
> drivers can just do synchronous crypto in a small buffer specified by
> its virtual address? The actual cryptography part of the crypto code
> already works this way, but I can't find an API for it.

Yes, single block users should use crypto_cipher_encrypt_one, an
example would be drivers/md/dm-crypt.c.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-06-22 21:48:44

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Tue, Jun 21, 2016 at 5:52 PM, Andy Lutomirski <[email protected]> wrote:
> On Tue, Jun 21, 2016 at 5:42 PM, Herbert Xu <[email protected]> wrote:
>> On Tue, Jun 21, 2016 at 10:43:40AM -0700, Andy Lutomirski wrote:
>>>
>>> Is there a straightforward way that bluetooth and, potentially, other
>>> drivers can just do synchronous crypto in a small buffer specified by
>>> its virtual address? The actual cryptography part of the crypto code
>>> already works this way, but I can't find an API for it.
>>
>> Yes, single block users should use crypto_cipher_encrypt_one, an
>> example would be drivers/md/dm-crypt.c.
>>
>
> Aha! I expected something like that to exist, but I couldn't find it.
> I'll change the two offenders I've found so far to use it.
>

Before I do this, can you explain what the difference is between
crypto_cipher and crypto_skcipher? net/bluetooth/smp.c currently uses
crypto_alloc_skcipher, which you added in:

commit 71af2f6bb22a4bf42663e10f1d8913d4967ed07f
Author: Herbert Xu <[email protected]>
Date: Sun Jan 24 21:18:30 2016 +0800

Bluetooth: Use skcipher and hash

Am I just supposed to replace "skcipher" with "cipher" everywhere?

--Andy

2016-06-22 23:45:46

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Wed, Jun 22, 2016 at 2:48 PM, Andy Lutomirski <[email protected]> wrote:
> On Tue, Jun 21, 2016 at 5:52 PM, Andy Lutomirski <[email protected]> wrote:
>> On Tue, Jun 21, 2016 at 5:42 PM, Herbert Xu <[email protected]> wrote:
>>> On Tue, Jun 21, 2016 at 10:43:40AM -0700, Andy Lutomirski wrote:
>>>>
>>>> Is there a straightforward way that bluetooth and, potentially, other
>>>> drivers can just do synchronous crypto in a small buffer specified by
>>>> its virtual address? The actual cryptography part of the crypto code
>>>> already works this way, but I can't find an API for it.
>>>
>>> Yes, single block users should use crypto_cipher_encrypt_one, an
>>> example would be drivers/md/dm-crypt.c.
>>>
>>
>> Aha! I expected something like that to exist, but I couldn't find it.
>> I'll change the two offenders I've found so far to use it.
>>
>
> Before I do this, can you explain what the difference is between
> crypto_cipher and crypto_skcipher? net/bluetooth/smp.c currently uses
> crypto_alloc_skcipher, which you added in:
>
> commit 71af2f6bb22a4bf42663e10f1d8913d4967ed07f
> Author: Herbert Xu <[email protected]>
> Date: Sun Jan 24 21:18:30 2016 +0800
>
> Bluetooth: Use skcipher and hash
>
> Am I just supposed to replace "skcipher" with "cipher" everywhere?

It looks like I'm supposed to that and to use "aes" instead of "ebc(aes)".

*However*, the other offender I've found (net/rxrpc/rxkad.c) uses
"pcbc(fcrypt)", which doesn't appear to be usable with this API. Is
there no way to say "I want synchronous crypto on this VA range" using
the skcipher API?

--Andy


--
Andy Lutomirski
AMA Capital Management, LLC

2016-06-23 03:37:59

by Herbert Xu

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Wed, Jun 22, 2016 at 02:48:24PM -0700, Andy Lutomirski wrote:
>
> Before I do this, can you explain what the difference is between
> crypto_cipher and crypto_skcipher? net/bluetooth/smp.c currently uses
> crypto_alloc_skcipher, which you added in:

crypto_cipher operates on a single block. crypto_skcipher uses
modes of operations and works on multiple blocks.
>
> commit 71af2f6bb22a4bf42663e10f1d8913d4967ed07f
> Author: Herbert Xu <[email protected]>
> Date: Sun Jan 24 21:18:30 2016 +0800
>
> Bluetooth: Use skcipher and hash
>
> Am I just supposed to replace "skcipher" with "cipher" everywhere?

If you're operating on one block only then you should use cipher,
otherwise skcipher would be the appropriate choice.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-06-23 03:48:42

by Herbert Xu

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Wed, Jun 22, 2016 at 04:45:46PM -0700, Andy Lutomirski wrote:
>
> *However*, the other offender I've found (net/rxrpc/rxkad.c) uses
> "pcbc(fcrypt)", which doesn't appear to be usable with this API. Is
> there no way to say "I want synchronous crypto on this VA range" using
> the skcipher API?

No we never had such an API in the kernel. However, I see that
rxkad does some pretty silly things and we should be able to avoid
using the stack in pretty much all cases. Let me try to come up with
something.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-06-23 06:41:37

by Herbert Xu

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Thu, Jun 23, 2016 at 11:48:25AM +0800, Herbert Xu wrote:
>
> No we never had such an API in the kernel. However, I see that
> rxkad does some pretty silly things and we should be able to avoid
> using the stack in pretty much all cases. Let me try to come up with
> something.

Here it is:

---8<---
Subject: rxrpc: Avoid using stack memory in SG lists in rxkad

rxkad uses stack memory in SG lists which would not work if stacks
were allocated from vmalloc memory. In fact, in most cases this
isn't even necessary as the stack memory ends up getting copied
over to kmalloc memory.

This patch eliminates all the unnecessary stack memory uses by
supplying the final destination directly to the crypto API. In
two instances where a temporary buffer is actually needed we also
switch use the skb->cb area instead of the stack.

Finally there is no need to split a split-page buffer into two SG
entries so code dealing with that has been removed.

Signed-off-by: Herbert Xu <[email protected]>

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index f0b807a..8ee5933 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -277,6 +277,7 @@ struct rxrpc_connection {
struct key *key; /* security for this connection (client) */
struct key *server_key; /* security for this service */
struct crypto_skcipher *cipher; /* encryption handle */
+ struct rxrpc_crypt csum_iv_head; /* leading block for csum_iv */
struct rxrpc_crypt csum_iv; /* packet checksum base */
unsigned long events;
#define RXRPC_CONN_CHALLENGE 0 /* send challenge packet */
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index 6b726a0..ee142de 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -105,11 +105,9 @@ static void rxkad_prime_packet_security(struct rxrpc_connection *conn)
{
struct rxrpc_key_token *token;
SKCIPHER_REQUEST_ON_STACK(req, conn->cipher);
- struct scatterlist sg[2];
+ struct rxrpc_crypt *csum_iv;
+ struct scatterlist sg;
struct rxrpc_crypt iv;
- struct {
- __be32 x[4];
- } tmpbuf __attribute__((aligned(16))); /* must all be in same page */

_enter("");

@@ -119,24 +117,21 @@ static void rxkad_prime_packet_security(struct rxrpc_connection *conn)
token = conn->key->payload.data[0];
memcpy(&iv, token->kad->session_key, sizeof(iv));

- tmpbuf.x[0] = htonl(conn->epoch);
- tmpbuf.x[1] = htonl(conn->cid);
- tmpbuf.x[2] = 0;
- tmpbuf.x[3] = htonl(conn->security_ix);
+ csum_iv = &conn->csum_iv_head;
+ csum_iv[0].x[0] = htonl(conn->epoch);
+ csum_iv[0].x[1] = htonl(conn->cid);
+ csum_iv[1].x[0] = 0;
+ csum_iv[1].x[1] = htonl(conn->security_ix);

- sg_init_one(&sg[0], &tmpbuf, sizeof(tmpbuf));
- sg_init_one(&sg[1], &tmpbuf, sizeof(tmpbuf));
+ sg_init_one(&sg, csum_iv, 16);

skcipher_request_set_tfm(req, conn->cipher);
skcipher_request_set_callback(req, 0, NULL, NULL);
- skcipher_request_set_crypt(req, &sg[1], &sg[0], sizeof(tmpbuf), iv.x);
+ skcipher_request_set_crypt(req, &sg, &sg, 16, iv.x);

crypto_skcipher_encrypt(req);
skcipher_request_zero(req);

- memcpy(&conn->csum_iv, &tmpbuf.x[2], sizeof(conn->csum_iv));
- ASSERTCMP((u32 __force)conn->csum_iv.n[0], ==, (u32 __force)tmpbuf.x[2]);
-
_leave("");
}

@@ -150,12 +145,9 @@ static int rxkad_secure_packet_auth(const struct rxrpc_call *call,
{
struct rxrpc_skb_priv *sp;
SKCIPHER_REQUEST_ON_STACK(req, call->conn->cipher);
+ struct rxkad_level1_hdr hdr;
struct rxrpc_crypt iv;
- struct scatterlist sg[2];
- struct {
- struct rxkad_level1_hdr hdr;
- __be32 first; /* first four bytes of data and padding */
- } tmpbuf __attribute__((aligned(8))); /* must all be in same page */
+ struct scatterlist sg;
u16 check;

sp = rxrpc_skb(skb);
@@ -165,24 +157,21 @@ static int rxkad_secure_packet_auth(const struct rxrpc_call *call,
check = sp->hdr.seq ^ sp->hdr.callNumber;
data_size |= (u32)check << 16;

- tmpbuf.hdr.data_size = htonl(data_size);
- memcpy(&tmpbuf.first, sechdr + 4, sizeof(tmpbuf.first));
+ hdr.data_size = htonl(data_size);
+ memcpy(sechdr, &hdr, sizeof(hdr));

/* start the encryption afresh */
memset(&iv, 0, sizeof(iv));

- sg_init_one(&sg[0], &tmpbuf, sizeof(tmpbuf));
- sg_init_one(&sg[1], &tmpbuf, sizeof(tmpbuf));
+ sg_init_one(&sg, sechdr, 8);

skcipher_request_set_tfm(req, call->conn->cipher);
skcipher_request_set_callback(req, 0, NULL, NULL);
- skcipher_request_set_crypt(req, &sg[1], &sg[0], sizeof(tmpbuf), iv.x);
+ skcipher_request_set_crypt(req, &sg, &sg, 8, iv.x);

crypto_skcipher_encrypt(req);
skcipher_request_zero(req);

- memcpy(sechdr, &tmpbuf, sizeof(tmpbuf));
-
_leave(" = 0");
return 0;
}
@@ -196,8 +185,7 @@ static int rxkad_secure_packet_encrypt(const struct rxrpc_call *call,
void *sechdr)
{
const struct rxrpc_key_token *token;
- struct rxkad_level2_hdr rxkhdr
- __attribute__((aligned(8))); /* must be all on one page */
+ struct rxkad_level2_hdr rxkhdr;
struct rxrpc_skb_priv *sp;
SKCIPHER_REQUEST_ON_STACK(req, call->conn->cipher);
struct rxrpc_crypt iv;
@@ -216,17 +204,17 @@ static int rxkad_secure_packet_encrypt(const struct rxrpc_call *call,

rxkhdr.data_size = htonl(data_size | (u32)check << 16);
rxkhdr.checksum = 0;
+ memcpy(sechdr, &rxkhdr, sizeof(rxkhdr));

/* encrypt from the session key */
token = call->conn->key->payload.data[0];
memcpy(&iv, token->kad->session_key, sizeof(iv));

sg_init_one(&sg[0], sechdr, sizeof(rxkhdr));
- sg_init_one(&sg[1], &rxkhdr, sizeof(rxkhdr));

skcipher_request_set_tfm(req, call->conn->cipher);
skcipher_request_set_callback(req, 0, NULL, NULL);
- skcipher_request_set_crypt(req, &sg[1], &sg[0], sizeof(rxkhdr), iv.x);
+ skcipher_request_set_crypt(req, &sg[0], &sg[0], sizeof(rxkhdr), iv.x);

crypto_skcipher_encrypt(req);

@@ -265,10 +253,11 @@ static int rxkad_secure_packet(const struct rxrpc_call *call,
struct rxrpc_skb_priv *sp;
SKCIPHER_REQUEST_ON_STACK(req, call->conn->cipher);
struct rxrpc_crypt iv;
- struct scatterlist sg[2];
- struct {
+ struct scatterlist sg;
+ union {
__be32 x[2];
- } tmpbuf __attribute__((aligned(8))); /* must all be in same page */
+ __be64 xl;
+ } tmpbuf;
u32 x, y;
int ret;

@@ -294,16 +283,19 @@ static int rxkad_secure_packet(const struct rxrpc_call *call,
tmpbuf.x[0] = htonl(sp->hdr.callNumber);
tmpbuf.x[1] = htonl(x);

- sg_init_one(&sg[0], &tmpbuf, sizeof(tmpbuf));
- sg_init_one(&sg[1], &tmpbuf, sizeof(tmpbuf));
+ swap(tmpbuf.xl, *(__be64 *)sp);
+
+ sg_init_one(&sg, sp, sizeof(tmpbuf));

skcipher_request_set_tfm(req, call->conn->cipher);
skcipher_request_set_callback(req, 0, NULL, NULL);
- skcipher_request_set_crypt(req, &sg[1], &sg[0], sizeof(tmpbuf), iv.x);
+ skcipher_request_set_crypt(req, &sg, &sg, sizeof(tmpbuf), iv.x);

crypto_skcipher_encrypt(req);
skcipher_request_zero(req);

+ swap(tmpbuf.xl, *(__be64 *)sp);
+
y = ntohl(tmpbuf.x[1]);
y = (y >> 16) & 0xffff;
if (y == 0)
@@ -503,10 +495,11 @@ static int rxkad_verify_packet(const struct rxrpc_call *call,
SKCIPHER_REQUEST_ON_STACK(req, call->conn->cipher);
struct rxrpc_skb_priv *sp;
struct rxrpc_crypt iv;
- struct scatterlist sg[2];
- struct {
+ struct scatterlist sg;
+ union {
__be32 x[2];
- } tmpbuf __attribute__((aligned(8))); /* must all be in same page */
+ __be64 xl;
+ } tmpbuf;
u16 cksum;
u32 x, y;
int ret;
@@ -534,16 +527,19 @@ static int rxkad_verify_packet(const struct rxrpc_call *call,
tmpbuf.x[0] = htonl(call->call_id);
tmpbuf.x[1] = htonl(x);

- sg_init_one(&sg[0], &tmpbuf, sizeof(tmpbuf));
- sg_init_one(&sg[1], &tmpbuf, sizeof(tmpbuf));
+ swap(tmpbuf.xl, *(__be64 *)sp);
+
+ sg_init_one(&sg, sp, sizeof(tmpbuf));

skcipher_request_set_tfm(req, call->conn->cipher);
skcipher_request_set_callback(req, 0, NULL, NULL);
- skcipher_request_set_crypt(req, &sg[1], &sg[0], sizeof(tmpbuf), iv.x);
+ skcipher_request_set_crypt(req, &sg, &sg, sizeof(tmpbuf), iv.x);

crypto_skcipher_encrypt(req);
skcipher_request_zero(req);

+ swap(tmpbuf.xl, *(__be64 *)sp);
+
y = ntohl(tmpbuf.x[1]);
cksum = (y >> 16) & 0xffff;
if (cksum == 0)
@@ -708,26 +704,13 @@ static void rxkad_calc_response_checksum(struct rxkad_response *response)
}

/*
- * load a scatterlist with a potentially split-page buffer
+ * load a scatterlist
*/
-static void rxkad_sg_set_buf2(struct scatterlist sg[2],
+static void rxkad_sg_set_buf2(struct scatterlist sg[1],
void *buf, size_t buflen)
{
- int nsg = 1;
-
- sg_init_table(sg, 2);
-
+ sg_init_table(sg, 1);
sg_set_buf(&sg[0], buf, buflen);
- if (sg[0].offset + buflen > PAGE_SIZE) {
- /* the buffer was split over two pages */
- sg[0].length = PAGE_SIZE - sg[0].offset;
- sg_set_buf(&sg[1], buf + sg[0].length, buflen - sg[0].length);
- nsg++;
- }
-
- sg_mark_end(&sg[nsg - 1]);
-
- ASSERTCMP(sg[0].length + sg[1].length, ==, buflen);
}

/*
@@ -739,7 +722,7 @@ static void rxkad_encrypt_response(struct rxrpc_connection *conn,
{
SKCIPHER_REQUEST_ON_STACK(req, conn->cipher);
struct rxrpc_crypt iv;
- struct scatterlist sg[2];
+ struct scatterlist sg[1];

/* continue encrypting from where we left off */
memcpy(&iv, s2->session_key, sizeof(iv));
@@ -999,7 +982,7 @@ static void rxkad_decrypt_response(struct rxrpc_connection *conn,
const struct rxrpc_crypt *session_key)
{
SKCIPHER_REQUEST_ON_STACK(req, rxkad_ci);
- struct scatterlist sg[2];
+ struct scatterlist sg[1];
struct rxrpc_crypt iv;

_enter(",,%08x%08x",
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-06-23 22:11:21

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Wed, Jun 22, 2016 at 11:41 PM, Herbert Xu
<[email protected]> wrote:
> On Thu, Jun 23, 2016 at 11:48:25AM +0800, Herbert Xu wrote:
>>
>> No we never had such an API in the kernel. However, I see that
>> rxkad does some pretty silly things and we should be able to avoid
>> using the stack in pretty much all cases. Let me try to come up with
>> something.
>
> Here it is:
>
> ---8<---
> Subject: rxrpc: Avoid using stack memory in SG lists in rxkad

Looks reasonable to me. Unless anyone tells me otherwise, my plan is
to queue it in my virtually-mapped stack series and to ask Ingo to
apply it via -tip.

If it went in via the networking tree, that would work as well, but it
would introduce a bisectability problem.

Thanks!

--Andy

2016-06-28 12:37:43

by George Spelvin

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

Just a note, crypto/cts.c also does a lot of sg_set_buf() in stack buffers.

I have a local patch (appended, if anyone wants) to reduce the wasteful
amount of buffer space it uses (from 7 to 3 blocks on encrypt, from
6 to 3 blocks on decrypt), but it would take some rework to convert to
crypto_cipher_encrypt_one() or avoid stack buffers entirely.

commit c0aa0ae38dc6115b378939c5483ba6c7eb65d92a
Author: George Spelvin <[email protected]>
Date: Sat Oct 10 17:26:08 2015 -0400

crypto: cts - Reduce internal buffer usage

It only takes a 3-block temporary buffer to handle all the tricky
CTS cases. Encryption could in theory be done with two, but at a cost
in complexity.

But it's still a saving from the previous six blocks on the stack.

One issue I'm uncertain of and I'd like clarification on: to simplify
the cts_cbc_{en,de}crypt calls, I pass in the lcldesc structure which
contains the ctx->child transform rather than the parent one. I'm
assuming the block sizes are guaranteed to be the same (they're set up
in crypto_cts_alloc by copying), but I haven't been able to prove it to
my satisfaction.

Signed-off-by: George Spelvin <[email protected]>

diff --git a/crypto/cts.c b/crypto/cts.c
index e467ec0ac..e24d2e15 100644
--- a/crypto/cts.c
+++ b/crypto/cts.c
@@ -70,54 +70,44 @@ static int crypto_cts_setkey(struct crypto_tfm *parent, const u8 *key,
return err;
}

-static int cts_cbc_encrypt(struct crypto_cts_ctx *ctx,
- struct blkcipher_desc *desc,
+/*
+ * The final CTS encryption is just like CBC encryption except that:
+ * - the last plaintext block is zero-padded,
+ * - the second-last ciphertext block is trimmed, and
+ * - the last (complete) block of ciphertext is output before the
+ * (truncated) second-last one.
+ */
+static int cts_cbc_encrypt(struct blkcipher_desc *lcldesc,
struct scatterlist *dst,
struct scatterlist *src,
unsigned int offset,
unsigned int nbytes)
{
- int bsize = crypto_blkcipher_blocksize(desc->tfm);
- u8 tmp[bsize], tmp2[bsize];
- struct blkcipher_desc lcldesc;
- struct scatterlist sgsrc[1], sgdst[1];
+ int bsize = crypto_blkcipher_blocksize(lcldesc->tfm);
+ u8 tmp[3*bsize] __aligned(8);
+ struct scatterlist sgsrc[1], sgdst[2];
int lastn = nbytes - bsize;
- u8 iv[bsize];
- u8 s[bsize * 2], d[bsize * 2];
int err;

- if (lastn < 0)
+ if (lastn <= 0)
return -EINVAL;

- sg_init_table(sgsrc, 1);
- sg_init_table(sgdst, 1);
-
- memset(s, 0, sizeof(s));
- scatterwalk_map_and_copy(s, src, offset, nbytes, 0);
-
- memcpy(iv, desc->info, bsize);
-
- lcldesc.tfm = ctx->child;
- lcldesc.info = iv;
- lcldesc.flags = desc->flags;
-
- sg_set_buf(&sgsrc[0], s, bsize);
- sg_set_buf(&sgdst[0], tmp, bsize);
- err = crypto_blkcipher_encrypt_iv(&lcldesc, sgdst, sgsrc, bsize);
-
- memcpy(d + bsize, tmp, lastn);
-
- lcldesc.info = tmp;
-
- sg_set_buf(&sgsrc[0], s + bsize, bsize);
- sg_set_buf(&sgdst[0], tmp2, bsize);
- err = crypto_blkcipher_encrypt_iv(&lcldesc, sgdst, sgsrc, bsize);
-
- memcpy(d, tmp2, bsize);
-
- scatterwalk_map_and_copy(d, dst, offset, nbytes, 1);
-
- memcpy(desc->info, tmp2, bsize);
+ /* Copy the input to a temporary buffer; tmp = xxx, P[n-1], P[n] */
+ memset(tmp+2*bsize, 0, bsize);
+ scatterwalk_map_and_copy(tmp+bsize, src, offset, nbytes, 0);
+
+ sg_init_one(sgsrc, tmp+bsize, 2*bsize);
+ /* Initialize dst specially to do the rearrangement for us */
+ sg_init_table(sgdst, 2);
+ sg_set_buf(sgdst+0, tmp+bsize, bsize);
+ sg_set_buf(sgdst+1, tmp, bsize);
+
+ /* CBC-encrypt in place the two blocks; tmp = C[n], C[n-1], P[n] */
+ err = crypto_blkcipher_encrypt_iv(lcldesc, sgdst, sgsrc, 2*bsize);
+
+ /* Copy beginning of tmp to the output */
+ scatterwalk_map_and_copy(tmp, dst, offset, nbytes, 1);
+ memzero_explicit(tmp, sizeof(tmp));

return err;
}
@@ -126,8 +116,8 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc,
struct scatterlist *dst, struct scatterlist *src,
unsigned int nbytes)
{
- struct crypto_cts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
int bsize = crypto_blkcipher_blocksize(desc->tfm);
+ struct crypto_cts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
int tot_blocks = (nbytes + bsize - 1) / bsize;
int cbc_blocks = tot_blocks > 2 ? tot_blocks - 2 : 0;
struct blkcipher_desc lcldesc;
@@ -140,14 +130,14 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc,
if (tot_blocks == 1) {
err = crypto_blkcipher_encrypt_iv(&lcldesc, dst, src, bsize);
} else if (nbytes <= bsize * 2) {
- err = cts_cbc_encrypt(ctx, desc, dst, src, 0, nbytes);
+ err = cts_cbc_encrypt(&lcldesc, dst, src, 0, nbytes);
} else {
/* do normal function for tot_blocks - 2 */
err = crypto_blkcipher_encrypt_iv(&lcldesc, dst, src,
cbc_blocks * bsize);
if (err == 0) {
/* do cts for final two blocks */
- err = cts_cbc_encrypt(ctx, desc, dst, src,
+ err = cts_cbc_encrypt(&lcldesc, dst, src,
cbc_blocks * bsize,
nbytes - (cbc_blocks * bsize));
}
@@ -156,64 +146,68 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc,
return err;
}

-static int cts_cbc_decrypt(struct crypto_cts_ctx *ctx,
- struct blkcipher_desc *desc,
+/*
+ * Decrypting the final two blocks in CTS is a bit trickier;
+ * it has to be done in two separate steps.
+ *
+ * The last two blocks of the CTS ciphertext are (first) the
+ * last block C[n] of the equivalent zero-padded CBC encryption,
+ * followed by a truncated version of the second-last block C[n-1].
+ *
+ * Expressed in terms of CBC decryption (P[i] = decrypt(C[i]) ^ IV),
+ * CTS decryption can be expressed as:
+ * - Pad C[n-1] with zeros to get an IV for C[n].
+ * - CBC-decrypt C[n] to get an intermediate plaintext buffer P.
+ * - P[n] is the prefix of P (1..bsize bytes).
+ * - The suffix of P (0..bzize-1 bytes) is the missing part of C[n-1].
+ * - CBC-decrypt that C[n-1], with the incoming IV, to recover P[n-1].
+ */
+static int cts_cbc_decrypt(struct blkcipher_desc *lcldesc,
struct scatterlist *dst,
struct scatterlist *src,
unsigned int offset,
unsigned int nbytes)
{
- int bsize = crypto_blkcipher_blocksize(desc->tfm);
- u8 tmp[bsize];
- struct blkcipher_desc lcldesc;
+ int bsize = crypto_blkcipher_blocksize(lcldesc->tfm);
+ u8 tmp[3*bsize] __aligned(8);
struct scatterlist sgsrc[1], sgdst[1];
int lastn = nbytes - bsize;
- u8 iv[bsize];
- u8 s[bsize * 2], d[bsize * 2];
+ u8 *orig_iv;
int err;

- if (lastn < 0)
+ if (lastn <= 0)
return -EINVAL;

- sg_init_table(sgsrc, 1);
- sg_init_table(sgdst, 1);
+ /* 1. Copy source into tmp, zero-padded; tmp = C[n], C[n-1]+0, xxx */
+ memset(tmp+bsize, 0, bsize);
+ scatterwalk_map_and_copy(tmp, src, offset, nbytes, 0);

- scatterwalk_map_and_copy(s, src, offset, nbytes, 0);
-
- lcldesc.tfm = ctx->child;
- lcldesc.info = iv;
- lcldesc.flags = desc->flags;
-
- /* 1. Decrypt Cn-1 (s) to create Dn (tmp)*/
- memset(iv, 0, sizeof(iv));
- sg_set_buf(&sgsrc[0], s, bsize);
- sg_set_buf(&sgdst[0], tmp, bsize);
- err = crypto_blkcipher_decrypt_iv(&lcldesc, sgdst, sgsrc, bsize);
+ /* 2. Decrypt C[n] into P; tmp = C[n], C[n-1]+0, P */
+ sg_init_one(sgsrc, tmp, bsize);
+ sg_init_one(sgdst, tmp+2*bsize, bsize);
+ orig_iv = lcldesc->info;
+ lcldesc->info = tmp+bsize; /* IV for decryption: padded C[n-1] */
+ err = crypto_blkcipher_decrypt_iv(lcldesc, sgdst, sgsrc, bsize);
if (err)
- return err;
- /* 2. Pad Cn with zeros at the end to create C of length BB */
- memset(iv, 0, sizeof(iv));
- memcpy(iv, s + bsize, lastn);
- /* 3. Exclusive-or Dn (tmp) with C (iv) to create Xn (tmp) */
- crypto_xor(tmp, iv, bsize);
- /* 4. Select the first Ln bytes of Xn (tmp) to create Pn */
- memcpy(d + bsize, tmp, lastn);
+ goto cleanup;

- /* 5. Append the tail (BB - Ln) bytes of Xn (tmp) to Cn to create En */
- memcpy(s + bsize + lastn, tmp + lastn, bsize - lastn);
- /* 6. Decrypt En to create Pn-1 */
- memzero_explicit(iv, sizeof(iv));
+ /* 3. Copy tail of P to C[n-1]; tmp = C[n], C[n-1], P */
+ memcpy(tmp+bsize + lastn, tmp+2*bsize + lastn, bsize - lastn);

- sg_set_buf(&sgsrc[0], s + bsize, bsize);
- sg_set_buf(&sgdst[0], d, bsize);
- err = crypto_blkcipher_decrypt_iv(&lcldesc, sgdst, sgsrc, bsize);
+ /* 4. Decrypt C[n-1] in place; tmp = C[n], P[n-1], P */
+ sg_set_buf(sgsrc, tmp + bsize, bsize);
+ sg_set_buf(sgdst, tmp + bsize, bsize);
+ lcldesc->info = orig_iv;
+ err = crypto_blkcipher_decrypt_iv(lcldesc, sgdst, sgsrc, bsize);

- /* XOR with previous block */
- crypto_xor(d, desc->info, bsize);
+ /* 5. Copy P[n-1] and head of P to output */
+ scatterwalk_map_and_copy(tmp+bsize, dst, offset, nbytes, 1);

- scatterwalk_map_and_copy(d, dst, offset, nbytes, 1);
+ /* C[n] is the continuing IV (if anyone cares) */
+ memcpy(lcldesc->info, tmp, bsize);

- memcpy(desc->info, s, bsize);
+cleanup:
+ memzero_explicit(tmp, sizeof(tmp));
return err;
}

@@ -235,14 +229,14 @@ static int crypto_cts_decrypt(struct blkcipher_desc *desc,
if (tot_blocks == 1) {
err = crypto_blkcipher_decrypt_iv(&lcldesc, dst, src, bsize);
} else if (nbytes <= bsize * 2) {
- err = cts_cbc_decrypt(ctx, desc, dst, src, 0, nbytes);
+ err = cts_cbc_decrypt(&lcldesc, dst, src, 0, nbytes);
} else {
/* do normal function for tot_blocks - 2 */
err = crypto_blkcipher_decrypt_iv(&lcldesc, dst, src,
cbc_blocks * bsize);
if (err == 0) {
/* do cts for final two blocks */
- err = cts_cbc_decrypt(ctx, desc, dst, src,
+ err = cts_cbc_decrypt(&lcldesc, dst, src,
cbc_blocks * bsize,
nbytes - (cbc_blocks * bsize));
}

2016-06-28 12:42:41

by Herbert Xu

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Tue, Jun 28, 2016 at 08:37:43AM -0400, George Spelvin wrote:
> Just a note, crypto/cts.c also does a lot of sg_set_buf() in stack buffers.
>
> I have a local patch (appended, if anyone wants) to reduce the wasteful
> amount of buffer space it uses (from 7 to 3 blocks on encrypt, from
> 6 to 3 blocks on decrypt), but it would take some rework to convert to
> crypto_cipher_encrypt_one() or avoid stack buffers entirely.

I'm currently working on cts and I'm removing the stack usage
altogether by having it operate on the src/dst SG lists only.

It's part of the skcipher conversion though so it'll have to go
through the crypto tree.

BTW, the only cts user in our tree appears to be implementing
CTS all over again and is only calling the crypto API cts for
the last two blocks. Someone should fix that.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-06-28 13:23:01

by George Spelvin

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

Herbert Xu wrote:
> I'm currently working on cts and I'm removing the stack usage
> altogether by having it operate on the src/dst SG lists only.

Wow, I should see how you do that. I couldn't get it below 3
blocks of temporary, and the dst SG list only gives you
one and a half.

> BTW, the only cts user in our tree appears to be implementing
> CTS all over again and is only calling the crypto API cts for
> the last two blocks. Someone should fix that.

Hint taken. Although I'm having a hard time finding that only user
amidst all the drivers thinking it means Clear To Send or (for HDMI)
Cycle Time Stamp.

Um...the uses in fs/crypto/keyinfo.c and fs/ext4/crypto_key.c
don't seem to do anything untoward.

Is net/sunrpc/auth_gss/gss_krb5_mech.c doing something odd?


I have a request of you: like Andy, I find the crypto layer an
impenetrable thicket of wrapper structures. I'm not suggesting there
aren't reasons for it, but it's extremely hard to infer those reasons by
looking at the code. If I were to draft a (hilariously wrong) overview
document, would you be willing to edit it into correctness?

2016-06-28 13:30:50

by Herbert Xu

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Tue, Jun 28, 2016 at 09:23:01AM -0400, George Spelvin wrote:
>
> Wow, I should see how you do that. I couldn't get it below 3
> blocks of temporary, and the dst SG list only gives you
> one and a half.

I don't mean that I'm using no temporary buffers at all, just
that the actual crypto only operates on the SG lists. I'm still
doing the xoring and stitching in temp buffers. I just counted
and I'm using three blocks like you.

> Is net/sunrpc/auth_gss/gss_krb5_mech.c doing something odd?

Yes gss_krb5_crypto.c is the one.

> I have a request of you: like Andy, I find the crypto layer an
> impenetrable thicket of wrapper structures. I'm not suggesting there
> aren't reasons for it, but it's extremely hard to infer those reasons by
> looking at the code. If I were to draft a (hilariously wrong) overview
> document, would you be willing to edit it into correctness?

We have actually gained quite a bit of documentation recently.
Have you looked at Documentation/DocBook/crypto-API.tmpl?

More is always welcome of course.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-06-28 14:32:14

by George Spelvin

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

> We have actually gained quite a bit of documentation recently.
> Have you looked at Documentation/DocBook/crypto-API.tmpl?
>
> More is always welcome of course.

It's improved since I last looked at it, but there are still many structures
that aren't described:

- struct crypto_instance
- struct crypto_spawn
- struct crypto_blkcipher
- struct blkcipher_desc
- More on the context structures returned by crypto_tfm_ctx

Also not mentioned in the documentation is that some algorithms *do*
have different implementations depending on key size. SHA-2 is the
classic example.

2016-06-29 02:21:19

by Herbert Xu

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

On Tue, Jun 28, 2016 at 10:32:12AM -0400, George Spelvin wrote:
>
> - struct crypto_instance
> - struct crypto_spawn
> - struct crypto_blkcipher
> - struct blkcipher_desc
> - More on the context structures returned by crypto_tfm_ctx

blkcipher is obsolete and will be removed soon. So if you are
going to write this then please document skcipher instead.

> Also not mentioned in the documentation is that some algorithms *do*
> have different implementations depending on key size. SHA-2 is the
> classic example.

What do you mean by that? SHA has no keying at all.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2016-06-29 12:10:24

by George Spelvin

[permalink] [raw]
Subject: Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

>> Also not mentioned in the documentation is that some algorithms *do*
>> have different implementations depending on key size. SHA-2 is the
>> classic example.

> What do you mean by that? SHA has no keying at all.

In this case, the analagous property is hash size. Sorry, I thought
that was so obvious I didn't need to say it.

Specifically, SHA2-256 (and -224) and SHA2-512 (and -384) are separate
algorithms with similar structures but deparate implementations.