2023-07-03 15:31:55

by Ondrej Mosnáček

[permalink] [raw]
Subject: Regression bisected to "crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES"

Hello,

It seems that the commit:

commit fb800fa4c1f5aee1238267252e88a7837e645c02
Author: David Howells <[email protected]>
Date: Tue Jun 6 14:08:55 2023 +0100

crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES

...causes a segfault in libkcapi's test suite. A simplified reproducer:

git clone https://github.com/smuellerDD/libkcapi/
cd libkcapi
autoreconf -i
./configure --enable-kcapi-test
make
./bin/kcapi -x 2 -s -c "gcm(aes)" -i aac774c39e399e7d6371ec1d \
-k 38bd9eb2cb29cc42ac38d6cdbe116760 \
-a 5dbb2884f3fe93664613e863c3bd2572855cf808765880ef1fa5787ceef8c793 \
-t 34a21cc3562f0ba141d73242e5a3b666 \
-q d127b39d365d16246d2866b2ebabd201

The last command prints the result and then segfaults in a cleanup
free(3) call. Before the mentioned commit it printed the result and
completed successfully. I haven't dug any deeper to figure out the
root cause.

Cheers,
Ondrej


2023-07-04 09:04:24

by David Howells

[permalink] [raw]
Subject: Re: Regression bisected to "crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES"

One problem with libkcapi is that it's abusing vmsplice(). It must not use
vmsplice(SPLICE_F_GIFT) on a buffer that's in the heap. To quote the manual
page:

The user pages are a gift to the kernel. The application may
not modify this memory ever, otherwise the page cache and on-
disk data may differ. Gifting pages to the kernel means that a
subsequent splice(2) SPLICE_F_MOVE can successfully move the
pages; if this flag is not specified, then a subsequent
splice(2) SPLICE_F_MOVE must copy the pages. Data must also be
properly page aligned, both in memory and length.

Basically, this can destroy the integrity of the process's heap as the
allocator may have metadata there that then gets excised.

If I remove the flag, it still crashes, so that's not the only problem.

David


2023-07-04 09:11:26

by David Howells

[permalink] [raw]
Subject: Re: Regression bisected to "crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES"

The attached makes the SEGV go away. Just removing SPLICE_F_GIFT isn't
sufficient.

David
---
diff --git a/lib/kcapi-kernel-if.c b/lib/kcapi-kernel-if.c
index b4d7f74..33e0337 100644
--- a/lib/kcapi-kernel-if.c
+++ b/lib/kcapi-kernel-if.c
@@ -321,7 +321,7 @@ ssize_t _kcapi_common_vmsplice_iov(struct kcapi_handle *handle,
if (ret)
return ret;

- ret = vmsplice(handle->pipes[1], iov, iovlen, SPLICE_F_GIFT|flags);
+ ret = writev(handle->pipes[1], iov, (int)iovlen);
if (ret < 0) {
ret = -errno;
kcapi_dolog(KCAPI_LOG_DEBUG,


2023-07-04 09:15:08

by Herbert Xu

[permalink] [raw]
Subject: Re: Regression bisected to "crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES"

On Tue, Jul 04, 2023 at 09:50:37AM +0100, David Howells wrote:
> One problem with libkcapi is that it's abusing vmsplice(). It must not use
> vmsplice(SPLICE_F_GIFT) on a buffer that's in the heap. To quote the manual
> page:
>
> The user pages are a gift to the kernel. The application may
> not modify this memory ever, otherwise the page cache and on-
> disk data may differ. Gifting pages to the kernel means that a
> subsequent splice(2) SPLICE_F_MOVE can successfully move the
> pages; if this flag is not specified, then a subsequent
> splice(2) SPLICE_F_MOVE must copy the pages. Data must also be
> properly page aligned, both in memory and length.
>
> Basically, this can destroy the integrity of the process's heap as the
> allocator may have metadata there that then gets excised.

All it's saying is that if you modify the data after sending it off
via splice then the data that will be on the wire is undefined.

There is no reason why this should crash.

> If I remove the flag, it still crashes, so that's not the only problem.

If we can't fix this the patches should be reverted.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2023-07-04 10:26:26

by David Howells

[permalink] [raw]
Subject: Re: Regression bisected to "crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES"

Herbert Xu <[email protected]> wrote:

> All it's saying is that if you modify the data after sending it off
> via splice then the data that will be on the wire is undefined.

Er, no. It can literally remove the page from the process's VM and paste it
somewhere else - though in this case, that shouldn't happen. However, the
buffer passed to SPLICE_F_GIFT should also be page-aligned, which it might not
be because they used calloc().

There's no reason to use SPLICE_F_GIFT here. vmsplice() still attaches the

> There is no reason why this should crash.

Agreed. I'm still looking at it. Interestingly, the output comes out the
same, no matter whether vmsplice(), vmsplice() + SPLICE_F_GIFT or writev(), so
it looks like the buffers get to

> If we can't fix this the patches should be reverted.

I didn't change vmsplice() or the way pages are stored in the pipe.

And, note, there are also a bunch of GUP changes that could have an effect.

David


2023-07-04 14:47:08

by David Howells

[permalink] [raw]
Subject: Re: Regression bisected to "crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES"

The problem is caused by this bit in af_alg_sendmsg():

/* use the existing memory in an allocated page */
if (ctx->merge) {
sgl = list_entry(ctx->tsgl_list.prev,
struct af_alg_tsgl, list);
sg = sgl->sg + sgl->cur - 1;
len = min_t(size_t, len,
PAGE_SIZE - sg->offset - sg->length);

err = memcpy_from_msg(page_address(sg_page(sg)) +
sg->offset + sg->length,
msg, len);
if (err)
goto unlock;

sg->length += len;
ctx->merge = (sg->offset + sg->length) &
(PAGE_SIZE - 1);

ctx->used += len;
copied += len;
size -= len;
continue;
}

it doesn't exist in the old af_alg_sendpage() code. It merges data supplied
by sendmsg() into the last page in the list if there's space. So we need a
flag to keep track of whether the last page is appendable-to or not.

David


2023-07-04 16:02:06

by David Howells

[permalink] [raw]
Subject: Re: Regression bisected to "crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES"

Here's a smaller test program. If it works correctly, it will print:

000: 5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a
020: 5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a
...

With the merging problem, it will print:

000: 5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a
020: 5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a
040: 6162636465666768696a6b6c6d6e6f707172737475767778797a313233343536
060: 4142434445464748494a4b4c4d4e4f505152535455565758595a313233343536
...

You can see the data from the send() ("abcd...") got appended to the page
passed by vmsplice() ("5a" x 64).

Arguably, if vmsplice() is given SPLICE_F_GIFT, AF_ALG would be within its
rights to remove the page from the caller's address space (as fuse will do
with SPLICE_F_MOVE), attach it to its scatterlist and append data to it. In
such a case, however, the calling process should no longer be able to access
the page - and in the case of libkcapi, the heap would be corrupted.

David
---
// SPDX-License-Identifier: GPL-2.0-or-later
/* AF_ALG vmsplice test
*
* Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
* Written by David Howells ([email protected])
*/

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <limits.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/mman.h>
#include <linux/if_alg.h>

#define OSERROR(X, Y) \
do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0)
#define min(x, y) ((x) < (y) ? (x) : (y))

static unsigned char buffer[4096 * 32] __attribute__((aligned(4096)));
static unsigned char iv[16];
static unsigned char key[16];

static const struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "skcipher",
.salg_name = "cbc(aes)",
};

static void algif_add_set_op(struct msghdr *msg, unsigned int op)
{
struct cmsghdr *__cmsg;

__cmsg = msg->msg_control + msg->msg_controllen;
__cmsg->cmsg_len = CMSG_LEN(sizeof(unsigned int));
__cmsg->cmsg_level = SOL_ALG;
__cmsg->cmsg_type = ALG_SET_OP;
*(unsigned int *)CMSG_DATA(__cmsg) = op;
msg->msg_controllen += CMSG_ALIGN(__cmsg->cmsg_len);
}

static void algif_add_set_iv(struct msghdr *msg, const void *iv, size_t ivlen)
{
struct af_alg_iv *ivbuf;
struct cmsghdr *__cmsg;

__cmsg = msg->msg_control + msg->msg_controllen;
__cmsg->cmsg_len = CMSG_LEN(sizeof(*ivbuf) + ivlen);
__cmsg->cmsg_level = SOL_ALG;
__cmsg->cmsg_type = ALG_SET_IV;
ivbuf = (struct af_alg_iv *)CMSG_DATA(__cmsg);
ivbuf->ivlen = ivlen;
memcpy(ivbuf->iv, iv, ivlen);
msg->msg_controllen += CMSG_ALIGN(__cmsg->cmsg_len);
}

void check(const unsigned char *p)
{
unsigned int i, j;
int blank = 0;

for (i = 0; i < 4096; i += 32) {
for (j = 0; j < 32; j++)
if (p[i + j])
break;
if (j == 32) {
if (!blank) {
printf("...\n");
blank = 1;
}
continue;
}

printf("%03x: ", i);
for (j = 0; j < 32; j++)
printf("%02x", p[i + j]);
printf("\n");
blank = 0;
}
}

int main(int argc, char *argv[])
{
struct iovec iov;
struct msghdr msg;
unsigned char ctrl[4096];
ssize_t ret;
size_t total = 0, i, out = 160;
unsigned char *buf;
int alg, sock, pfd[2];

alg = socket(AF_ALG, SOCK_SEQPACKET, 0);
OSERROR(alg, "AF_ALG");
OSERROR(bind(alg, (struct sockaddr *)&sa, sizeof(sa)), "bind");
OSERROR(setsockopt(alg, SOL_ALG, ALG_SET_KEY, key, sizeof(key)),
"ALG_SET_KEY");
sock = accept(alg, NULL, 0);
OSERROR(sock, "accept");

memset(&msg, 0, sizeof(msg));
msg.msg_control = ctrl;
algif_add_set_op(&msg, ALG_OP_ENCRYPT);
algif_add_set_iv(&msg, iv, sizeof(iv));

OSERROR(sendmsg(sock, &msg, MSG_MORE), "sock/sendmsg");

OSERROR(pipe(pfd), "pipe");

buf = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANON, -1, 0);
OSERROR(buf, "mmap");

memset(buf, 0x5a, 64);
iov.iov_base = buf;
iov.iov_len = 64;
OSERROR(vmsplice(pfd[1], &iov, 1, SPLICE_F_MORE), "vmsplice");
OSERROR(splice(pfd[0], NULL, sock, NULL, 64, SPLICE_F_MORE), "splice");

OSERROR(send(sock, "abcdefghijklmnopqrstuvwxyz123456ABCDEFGHIJKLMNOPQRSTUVWXYZ123456",
64, 0), "sock/send");

while (total > 0) {
ret = read(sock, buffer, min(sizeof(buffer), total));
OSERROR(ret, "sock/read");
if (ret == 0)
break;
total -= ret;

if (out > 0) {
ret = min(out, ret);
out -= ret;
for (i = 0; i < ret; i++)
printf("%02x", (unsigned char)buffer[i]);
}
printf("...\n");
}

check(buf);

OSERROR(close(sock), "sock/close");
OSERROR(close(alg), "alg/close");
OSERROR(close(pfd[0]), "close/p0");
OSERROR(close(pfd[1]), "close/p1");
return 0;
}