2023-10-07 17:29:26

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v3 00/12] vsock/virtio: continue MSG_ZEROCOPY support

Hello,

this patchset contains second and third parts of another big patchset
for MSG_ZEROCOPY flag support:
https://lore.kernel.org/netdev/[email protected]/

During review of this series, Stefano Garzarella <[email protected]>
suggested to split it for three parts to simplify review and merging:

1) virtio and vhost updates (for fragged skbs) (merged to net-next, see
link below)
2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
tx completions) and update for Documentation/. <-- this patchset
3) Updates for tests and utils. <-- this patchset

Part 1) was merged:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=71b263e79370348349553ecdf46f4a69eb436dc7

Head for this patchset is:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=91e43ca0090b5fd59302c3d150835299785f30ea

Link to v1:
https://lore.kernel.org/netdev/[email protected]/
Link to v2:
https://lore.kernel.org/netdev/[email protected]/

Changelog:
v1 -> v2:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
v2 -> v3:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.

Arseniy Krasnov (12):
vsock: set EPOLLERR on non-empty error queue
vsock: read from socket's error queue
vsock: check for MSG_ZEROCOPY support on send
vsock: enable SOCK_SUPPORT_ZC bit
vhost/vsock: support MSG_ZEROCOPY for transport
vsock/virtio: support MSG_ZEROCOPY for transport
vsock/loopback: support MSG_ZEROCOPY for transport
vsock: enable setting SO_ZEROCOPY
docs: net: description of MSG_ZEROCOPY for AF_VSOCK
test/vsock: MSG_ZEROCOPY flag tests
test/vsock: MSG_ZEROCOPY support for vsock_perf
test/vsock: io_uring rx/tx tests

Documentation/networking/msg_zerocopy.rst | 13 +-
drivers/vhost/vsock.c | 7 +
include/linux/socket.h | 1 +
include/net/af_vsock.h | 7 +
include/uapi/linux/vm_sockets.h | 12 +
net/vmw_vsock/af_vsock.c | 63 +++-
net/vmw_vsock/virtio_transport.c | 7 +
net/vmw_vsock/vsock_loopback.c | 6 +
tools/testing/vsock/.gitignore | 1 +
tools/testing/vsock/Makefile | 9 +-
tools/testing/vsock/msg_zerocopy_common.h | 92 ++++++
tools/testing/vsock/util.c | 110 +++++++
tools/testing/vsock/util.h | 5 +
tools/testing/vsock/vsock_perf.c | 80 ++++-
tools/testing/vsock/vsock_test.c | 16 +
tools/testing/vsock/vsock_test_zerocopy.c | 367 ++++++++++++++++++++++
tools/testing/vsock/vsock_test_zerocopy.h | 15 +
tools/testing/vsock/vsock_uring_test.c | 350 +++++++++++++++++++++
18 files changed, 1145 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/vsock/msg_zerocopy_common.h
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h
create mode 100644 tools/testing/vsock/vsock_uring_test.c

--
2.25.1


2023-10-07 17:29:29

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v3 07/12] vsock/loopback: support MSG_ZEROCOPY for transport

Add 'msgzerocopy_allow()' callback for loopback transport.

Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
---
net/vmw_vsock/vsock_loopback.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index 5c6360df1f31..048640167411 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -47,6 +47,10 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
}

static bool vsock_loopback_seqpacket_allow(u32 remote_cid);
+static bool vsock_loopback_msgzerocopy_allow(void)
+{
+ return true;
+}

static struct virtio_transport loopback_transport = {
.transport = {
@@ -79,6 +83,8 @@ static struct virtio_transport loopback_transport = {
.seqpacket_allow = vsock_loopback_seqpacket_allow,
.seqpacket_has_data = virtio_transport_seqpacket_has_data,

+ .msgzerocopy_allow = vsock_loopback_msgzerocopy_allow,
+
.notify_poll_in = virtio_transport_notify_poll_in,
.notify_poll_out = virtio_transport_notify_poll_out,
.notify_recv_init = virtio_transport_notify_recv_init,
--
2.25.1

2023-10-07 17:29:33

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v3 05/12] vhost/vsock: support MSG_ZEROCOPY for transport

Add 'msgzerocopy_allow()' callback for vhost transport.

Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
---
drivers/vhost/vsock.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 83711aad855c..f75731396b7e 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -398,6 +398,11 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
return val < vq->num;
}

+static bool vhost_transport_msgzerocopy_allow(void)
+{
+ return true;
+}
+
static bool vhost_transport_seqpacket_allow(u32 remote_cid);

static struct virtio_transport vhost_transport = {
@@ -431,6 +436,8 @@ static struct virtio_transport vhost_transport = {
.seqpacket_allow = vhost_transport_seqpacket_allow,
.seqpacket_has_data = virtio_transport_seqpacket_has_data,

+ .msgzerocopy_allow = vhost_transport_msgzerocopy_allow,
+
.notify_poll_in = virtio_transport_notify_poll_in,
.notify_poll_out = virtio_transport_notify_poll_out,
.notify_recv_init = virtio_transport_notify_recv_init,
--
2.25.1

2023-10-07 17:29:42

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v3 01/12] vsock: set EPOLLERR on non-empty error queue

If socket's error queue is not empty, EPOLLERR must be set. Otherwise,
reader of error queue won't detect data in it using EPOLLERR bit.
Currently for AF_VSOCK this is actual only with MSG_ZEROCOPY, as this
feature is the only user of an error queue of the socket.

Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
---
net/vmw_vsock/af_vsock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 013b65241b65..d841f4de33b0 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1030,7 +1030,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
poll_wait(file, sk_sleep(sk), wait);
mask = 0;

- if (sk->sk_err)
+ if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue))
/* Signify that there has been an error on this socket. */
mask |= EPOLLERR;

--
2.25.1

2023-10-07 17:30:00

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v3 11/12] test/vsock: MSG_ZEROCOPY support for vsock_perf

To use this option pass '--zerocopy' parameter:

./vsock_perf --zerocopy --sender <cid> ...

With this option MSG_ZEROCOPY flag will be passed to the 'send()' call.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
Changelog:
v1 -> v2:
* Move 'SOL_VSOCK' and 'VSOCK_RECVERR' from 'util.c' to 'util.h'.
v2 -> v3:
* Use 'msg_zerocopy_common.h' for MSG_ZEROCOPY related things.
* Rename '--zc' option to '--zerocopy'.
* Add detail in help that zerocopy mode is for sender mode only.

tools/testing/vsock/vsock_perf.c | 80 ++++++++++++++++++++++++++++----
1 file changed, 71 insertions(+), 9 deletions(-)

diff --git a/tools/testing/vsock/vsock_perf.c b/tools/testing/vsock/vsock_perf.c
index a72520338f84..4e8578f815e0 100644
--- a/tools/testing/vsock/vsock_perf.c
+++ b/tools/testing/vsock/vsock_perf.c
@@ -18,6 +18,9 @@
#include <poll.h>
#include <sys/socket.h>
#include <linux/vm_sockets.h>
+#include <sys/mman.h>
+
+#include "msg_zerocopy_common.h"

#define DEFAULT_BUF_SIZE_BYTES (128 * 1024)
#define DEFAULT_TO_SEND_BYTES (64 * 1024)
@@ -31,6 +34,7 @@
static unsigned int port = DEFAULT_PORT;
static unsigned long buf_size_bytes = DEFAULT_BUF_SIZE_BYTES;
static unsigned long vsock_buf_bytes = DEFAULT_VSOCK_BUF_BYTES;
+static bool zerocopy;

static void error(const char *s)
{
@@ -252,10 +256,15 @@ static void run_sender(int peer_cid, unsigned long to_send_bytes)
time_t tx_begin_ns;
time_t tx_total_ns;
size_t total_send;
+ time_t time_in_send;
void *data;
int fd;

- printf("Run as sender\n");
+ if (zerocopy)
+ printf("Run as sender MSG_ZEROCOPY\n");
+ else
+ printf("Run as sender\n");
+
printf("Connect to %i:%u\n", peer_cid, port);
printf("Send %lu bytes\n", to_send_bytes);
printf("TX buffer %lu bytes\n", buf_size_bytes);
@@ -265,38 +274,82 @@ static void run_sender(int peer_cid, unsigned long to_send_bytes)
if (fd < 0)
exit(EXIT_FAILURE);

- data = malloc(buf_size_bytes);
+ if (zerocopy) {
+ enable_so_zerocopy(fd);

- if (!data) {
- fprintf(stderr, "'malloc()' failed\n");
- exit(EXIT_FAILURE);
+ data = mmap(NULL, buf_size_bytes, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (data == MAP_FAILED) {
+ perror("mmap");
+ exit(EXIT_FAILURE);
+ }
+ } else {
+ data = malloc(buf_size_bytes);
+
+ if (!data) {
+ fprintf(stderr, "'malloc()' failed\n");
+ exit(EXIT_FAILURE);
+ }
}

memset(data, 0, buf_size_bytes);
total_send = 0;
+ time_in_send = 0;
tx_begin_ns = current_nsec();

while (total_send < to_send_bytes) {
ssize_t sent;
+ size_t rest_bytes;
+ time_t before;

- sent = write(fd, data, buf_size_bytes);
+ rest_bytes = to_send_bytes - total_send;
+
+ before = current_nsec();
+ sent = send(fd, data, (rest_bytes > buf_size_bytes) ?
+ buf_size_bytes : rest_bytes,
+ zerocopy ? MSG_ZEROCOPY : 0);
+ time_in_send += (current_nsec() - before);

if (sent <= 0)
error("write");

total_send += sent;
+
+ if (zerocopy) {
+ struct pollfd fds = { 0 };
+
+ fds.fd = fd;
+
+ if (poll(&fds, 1, -1) < 0) {
+ perror("poll");
+ exit(EXIT_FAILURE);
+ }
+
+ if (!(fds.revents & POLLERR)) {
+ fprintf(stderr, "POLLERR expected\n");
+ exit(EXIT_FAILURE);
+ }
+
+ vsock_recv_completion(fd, NULL);
+ }
}

tx_total_ns = current_nsec() - tx_begin_ns;

printf("total bytes sent: %zu\n", total_send);
printf("tx performance: %f Gbits/s\n",
- get_gbps(total_send * 8, tx_total_ns));
- printf("total time in 'write()': %f sec\n",
+ get_gbps(total_send * 8, time_in_send));
+ printf("total time in tx loop: %f sec\n",
(float)tx_total_ns / NSEC_PER_SEC);
+ printf("time in 'send()': %f sec\n",
+ (float)time_in_send / NSEC_PER_SEC);

close(fd);
- free(data);
+
+ if (zerocopy)
+ munmap(data, buf_size_bytes);
+ else
+ free(data);
}

static const char optstring[] = "";
@@ -336,6 +389,11 @@ static const struct option longopts[] = {
.has_arg = required_argument,
.val = 'R',
},
+ {
+ .name = "zerocopy",
+ .has_arg = no_argument,
+ .val = 'Z',
+ },
{},
};

@@ -351,6 +409,7 @@ static void usage(void)
" --help This message\n"
" --sender <cid> Sender mode (receiver default)\n"
" <cid> of the receiver to connect to\n"
+ " --zerocopy Enable zerocopy (for sender mode only)\n"
" --port <port> Port (default %d)\n"
" --bytes <bytes>KMG Bytes to send (default %d)\n"
" --buf-size <bytes>KMG Data buffer size (default %d). In sender mode\n"
@@ -413,6 +472,9 @@ int main(int argc, char **argv)
case 'H': /* Help. */
usage();
break;
+ case 'Z': /* Zerocopy. */
+ zerocopy = true;
+ break;
default:
usage();
}
--
2.25.1

2023-10-07 17:30:01

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v3 12/12] test/vsock: io_uring rx/tx tests

This adds set of tests which use io_uring for rx/tx. This test suite is
implemented as separated util like 'vsock_test' and has the same set of
input arguments as 'vsock_test'. These tests only cover cases of data
transmission (no connect/bind/accept etc).

Signed-off-by: Arseniy Krasnov <[email protected]>
---
Changelog:
v1 -> v2:
* Add 'LDLIBS = -luring' to the target 'vsock_uring_test'.
* Add 'vsock_uring_test' to the target 'test'.
v2 -> v3:
* Make 'struct vsock_test_data' private by placing it to the .c file.
Rename it and add comments to this struct to clarify sense of its
fields.
* Add 'vsock_uring_test' to the '.gitignore'.
* Add receive loop to the server side - this is needed to read entire
data sent by client.

tools/testing/vsock/.gitignore | 1 +
tools/testing/vsock/Makefile | 7 +-
tools/testing/vsock/vsock_uring_test.c | 350 +++++++++++++++++++++++++
3 files changed, 356 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/vsock/vsock_uring_test.c

diff --git a/tools/testing/vsock/.gitignore b/tools/testing/vsock/.gitignore
index a8adcfdc292b..d9f798713cd7 100644
--- a/tools/testing/vsock/.gitignore
+++ b/tools/testing/vsock/.gitignore
@@ -3,3 +3,4 @@
vsock_test
vsock_diag_test
vsock_perf
+vsock_uring_test
diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index 1a26f60a596c..b80e7c7def1e 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -1,12 +1,15 @@
# SPDX-License-Identifier: GPL-2.0-only
all: test vsock_perf
-test: vsock_test vsock_diag_test
+test: vsock_test vsock_diag_test vsock_uring_test
vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
vsock_perf: vsock_perf.o

+vsock_uring_test: LDLIBS = -luring
+vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o
+
CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE
.PHONY: all test clean
clean:
- ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf
+ ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test
-include *.d
diff --git a/tools/testing/vsock/vsock_uring_test.c b/tools/testing/vsock/vsock_uring_test.c
new file mode 100644
index 000000000000..889887cf3989
--- /dev/null
+++ b/tools/testing/vsock/vsock_uring_test.c
@@ -0,0 +1,350 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* io_uring tests for vsock
+ *
+ * Copyright (C) 2023 SberDevices.
+ *
+ * Author: Arseniy Krasnov <[email protected]>
+ */
+
+#include <getopt.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <liburing.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <linux/kernel.h>
+#include <error.h>
+
+#include "util.h"
+#include "control.h"
+#include "msg_zerocopy_common.h"
+
+#define PAGE_SIZE 4096
+#define RING_ENTRIES_NUM 4
+
+#define VSOCK_TEST_DATA_MAX_IOV 3
+
+struct vsock_io_uring_test {
+ /* Number of valid elements in 'vecs'. */
+ int vecs_cnt;
+ /* Array how to allocate buffers for test.
+ * 'iov_base' == NULL -> valid buf: mmap('iov_len').
+ *
+ * 'iov_base' == MAP_FAILED -> invalid buf:
+ * mmap('iov_len'), then munmap('iov_len').
+ * 'iov_base' still contains result of
+ * mmap().
+ *
+ * 'iov_base' == number -> unaligned valid buf:
+ * mmap('iov_len') + number.
+ */
+ struct iovec vecs[VSOCK_TEST_DATA_MAX_IOV];
+};
+
+static struct vsock_io_uring_test test_data_array[] = {
+ /* All elements have page aligned base and size. */
+ {
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE },
+ { NULL, 2 * PAGE_SIZE },
+ { NULL, 3 * PAGE_SIZE },
+ }
+ },
+ /* Middle element has both non-page aligned base and size. */
+ {
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE },
+ { (void *)1, 200 },
+ { NULL, 3 * PAGE_SIZE },
+ }
+ }
+};
+
+static void vsock_io_uring_client(const struct test_opts *opts,
+ const struct vsock_io_uring_test *test_data,
+ bool msg_zerocopy)
+{
+ struct io_uring_sqe *sqe;
+ struct io_uring_cqe *cqe;
+ struct io_uring ring;
+ struct iovec *iovec;
+ struct msghdr msg;
+ int fd;
+
+ fd = vsock_stream_connect(opts->peer_cid, 1234);
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ if (msg_zerocopy)
+ enable_so_zerocopy(fd);
+
+ iovec = iovec_from_test_data(test_data->vecs, test_data->vecs_cnt);
+
+ if (io_uring_queue_init(RING_ENTRIES_NUM, &ring, 0))
+ error(1, errno, "io_uring_queue_init");
+
+ if (io_uring_register_buffers(&ring, iovec, test_data->vecs_cnt))
+ error(1, errno, "io_uring_register_buffers");
+
+ memset(&msg, 0, sizeof(msg));
+ msg.msg_iov = iovec;
+ msg.msg_iovlen = test_data->vecs_cnt;
+ sqe = io_uring_get_sqe(&ring);
+
+ if (msg_zerocopy)
+ io_uring_prep_sendmsg_zc(sqe, fd, &msg, 0);
+ else
+ io_uring_prep_sendmsg(sqe, fd, &msg, 0);
+
+ if (io_uring_submit(&ring) != 1)
+ error(1, errno, "io_uring_submit");
+
+ if (io_uring_wait_cqe(&ring, &cqe))
+ error(1, errno, "io_uring_wait_cqe");
+
+ io_uring_cqe_seen(&ring, cqe);
+
+ control_writeulong(iovec_hash_djb2(iovec, test_data->vecs_cnt));
+
+ control_writeln("DONE");
+ io_uring_queue_exit(&ring);
+ free_iovec_test_data(test_data->vecs, iovec, test_data->vecs_cnt);
+ close(fd);
+}
+
+static void vsock_io_uring_server(const struct test_opts *opts,
+ const struct vsock_io_uring_test *test_data)
+{
+ unsigned long remote_hash;
+ unsigned long local_hash;
+ struct io_uring ring;
+ size_t data_len;
+ size_t recv_len;
+ void *data;
+ int fd;
+
+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+ if (fd < 0) {
+ perror("accept");
+ exit(EXIT_FAILURE);
+ }
+
+ data_len = iovec_bytes(test_data->vecs, test_data->vecs_cnt);
+
+ data = malloc(data_len);
+ if (!data) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ if (io_uring_queue_init(RING_ENTRIES_NUM, &ring, 0))
+ error(1, errno, "io_uring_queue_init");
+
+ recv_len = 0;
+
+ while (recv_len < data_len) {
+ struct io_uring_sqe *sqe;
+ struct io_uring_cqe *cqe;
+ struct iovec iovec;
+
+ sqe = io_uring_get_sqe(&ring);
+ iovec.iov_base = data + recv_len;
+ iovec.iov_len = data_len;
+
+ io_uring_prep_readv(sqe, fd, &iovec, 1, 0);
+
+ if (io_uring_submit(&ring) != 1)
+ error(1, errno, "io_uring_submit");
+
+ if (io_uring_wait_cqe(&ring, &cqe))
+ error(1, errno, "io_uring_wait_cqe");
+
+ recv_len += cqe->res;
+ io_uring_cqe_seen(&ring, cqe);
+ }
+
+ if (recv_len != data_len) {
+ fprintf(stderr, "expected %zu, got %zu\n", data_len,
+ recv_len);
+ exit(EXIT_FAILURE);
+ }
+
+ local_hash = hash_djb2(data, data_len);
+
+ remote_hash = control_readulong();
+ if (remote_hash != local_hash) {
+ fprintf(stderr, "hash mismatch\n");
+ exit(EXIT_FAILURE);
+ }
+
+ control_expectln("DONE");
+ io_uring_queue_exit(&ring);
+ free(data);
+}
+
+void test_stream_uring_server(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
+ vsock_io_uring_server(opts, &test_data_array[i]);
+}
+
+void test_stream_uring_client(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
+ vsock_io_uring_client(opts, &test_data_array[i], false);
+}
+
+void test_stream_uring_msg_zc_server(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
+ vsock_io_uring_server(opts, &test_data_array[i]);
+}
+
+void test_stream_uring_msg_zc_client(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
+ vsock_io_uring_client(opts, &test_data_array[i], true);
+}
+
+static struct test_case test_cases[] = {
+ {
+ .name = "SOCK_STREAM io_uring test",
+ .run_server = test_stream_uring_server,
+ .run_client = test_stream_uring_client,
+ },
+ {
+ .name = "SOCK_STREAM io_uring MSG_ZEROCOPY test",
+ .run_server = test_stream_uring_msg_zc_server,
+ .run_client = test_stream_uring_msg_zc_client,
+ },
+ {},
+};
+
+static const char optstring[] = "";
+static const struct option longopts[] = {
+ {
+ .name = "control-host",
+ .has_arg = required_argument,
+ .val = 'H',
+ },
+ {
+ .name = "control-port",
+ .has_arg = required_argument,
+ .val = 'P',
+ },
+ {
+ .name = "mode",
+ .has_arg = required_argument,
+ .val = 'm',
+ },
+ {
+ .name = "peer-cid",
+ .has_arg = required_argument,
+ .val = 'p',
+ },
+ {
+ .name = "help",
+ .has_arg = no_argument,
+ .val = '?',
+ },
+ {},
+};
+
+static void usage(void)
+{
+ fprintf(stderr, "Usage: vsock_uring_test [--help] [--control-host=<host>] --control-port=<port> --mode=client|server --peer-cid=<cid>\n"
+ "\n"
+ " Server: vsock_uring_test --control-port=1234 --mode=server --peer-cid=3\n"
+ " Client: vsock_uring_test --control-host=192.168.0.1 --control-port=1234 --mode=client --peer-cid=2\n"
+ "\n"
+ "Run transmission tests using io_uring. Usage is the same as\n"
+ "in ./vsock_test\n"
+ "\n"
+ "Options:\n"
+ " --help This help message\n"
+ " --control-host <host> Server IP address to connect to\n"
+ " --control-port <port> Server port to listen on/connect to\n"
+ " --mode client|server Server or client mode\n"
+ " --peer-cid <cid> CID of the other side\n"
+ );
+ exit(EXIT_FAILURE);
+}
+
+int main(int argc, char **argv)
+{
+ const char *control_host = NULL;
+ const char *control_port = NULL;
+ struct test_opts opts = {
+ .mode = TEST_MODE_UNSET,
+ .peer_cid = VMADDR_CID_ANY,
+ };
+
+ init_signals();
+
+ for (;;) {
+ int opt = getopt_long(argc, argv, optstring, longopts, NULL);
+
+ if (opt == -1)
+ break;
+
+ switch (opt) {
+ case 'H':
+ control_host = optarg;
+ break;
+ case 'm':
+ if (strcmp(optarg, "client") == 0) {
+ opts.mode = TEST_MODE_CLIENT;
+ } else if (strcmp(optarg, "server") == 0) {
+ opts.mode = TEST_MODE_SERVER;
+ } else {
+ fprintf(stderr, "--mode must be \"client\" or \"server\"\n");
+ return EXIT_FAILURE;
+ }
+ break;
+ case 'p':
+ opts.peer_cid = parse_cid(optarg);
+ break;
+ case 'P':
+ control_port = optarg;
+ break;
+ case '?':
+ default:
+ usage();
+ }
+ }
+
+ if (!control_port)
+ usage();
+ if (opts.mode == TEST_MODE_UNSET)
+ usage();
+ if (opts.peer_cid == VMADDR_CID_ANY)
+ usage();
+
+ if (!control_host) {
+ if (opts.mode != TEST_MODE_SERVER)
+ usage();
+ control_host = "0.0.0.0";
+ }
+
+ control_init(control_host, control_port,
+ opts.mode == TEST_MODE_SERVER);
+
+ run_tests(test_cases, &opts);
+
+ control_cleanup();
+
+ return 0;
+}
--
2.25.1

2023-10-07 17:30:02

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v3 10/12] test/vsock: MSG_ZEROCOPY flag tests

This adds three tests for MSG_ZEROCOPY feature:
1) SOCK_STREAM tx with different buffers.
2) SOCK_SEQPACKET tx with different buffers.
3) SOCK_STREAM test to read empty error queue of the socket.

Patch also works as preparation for the next patches for tools in this
patchset: vsock_perf and vsock_uring_test:
1) Adds several new functions to util.c - they will be also used by
vsock_uring_test.
2) Adds two new functions for MSG_ZEROCOPY handling to a new header
file - such header will be shared between vsock_test, vsock_perf and
vsock_uring_test, thus avoiding code copy-pasting.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
Changelog:
v1 -> v2:
* Move 'SOL_VSOCK' and 'VSOCK_RECVERR' from 'util.c' to 'util.h'.
v2 -> v3:
* Patch was reworked. Now it is also preparation patch (see commit
message). Shared stuff for 'vsock_perf' and tests is placed to a
new header file, while shared code between current test tool and
future uring test is placed to the 'util.c'. I think, that making
this patch as preparation allows to reduce number of changes in the
next patches in this patchset.
* Make 'struct vsock_test_data' private by placing it to the .c file.
Also add comments to this struct to clarify sense of its fields.

tools/testing/vsock/Makefile | 2 +-
tools/testing/vsock/msg_zerocopy_common.h | 92 ++++++
tools/testing/vsock/util.c | 110 +++++++
tools/testing/vsock/util.h | 5 +
tools/testing/vsock/vsock_test.c | 16 +
tools/testing/vsock/vsock_test_zerocopy.c | 367 ++++++++++++++++++++++
tools/testing/vsock/vsock_test_zerocopy.h | 15 +
7 files changed, 606 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/vsock/msg_zerocopy_common.h
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h

diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index 21a98ba565ab..1a26f60a596c 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
all: test vsock_perf
test: vsock_test vsock_diag_test
-vsock_test: vsock_test.o timeout.o control.o util.o
+vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
vsock_perf: vsock_perf.o

diff --git a/tools/testing/vsock/msg_zerocopy_common.h b/tools/testing/vsock/msg_zerocopy_common.h
new file mode 100644
index 000000000000..ce89f1281584
--- /dev/null
+++ b/tools/testing/vsock/msg_zerocopy_common.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef MSG_ZEROCOPY_COMMON_H
+#define MSG_ZEROCOPY_COMMON_H
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <linux/errqueue.h>
+
+#ifndef SOL_VSOCK
+#define SOL_VSOCK 287
+#endif
+
+#ifndef VSOCK_RECVERR
+#define VSOCK_RECVERR 1
+#endif
+
+static void enable_so_zerocopy(int fd)
+{
+ int val = 1;
+
+ if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) {
+ perror("setsockopt");
+ exit(EXIT_FAILURE);
+ }
+}
+
+static void vsock_recv_completion(int fd, const bool *zerocopied) __maybe_unused;
+static void vsock_recv_completion(int fd, const bool *zerocopied)
+{
+ struct sock_extended_err *serr;
+ struct msghdr msg = { 0 };
+ char cmsg_data[128];
+ struct cmsghdr *cm;
+ ssize_t res;
+
+ msg.msg_control = cmsg_data;
+ msg.msg_controllen = sizeof(cmsg_data);
+
+ res = recvmsg(fd, &msg, MSG_ERRQUEUE);
+ if (res) {
+ fprintf(stderr, "failed to read error queue: %zi\n", res);
+ exit(EXIT_FAILURE);
+ }
+
+ cm = CMSG_FIRSTHDR(&msg);
+ if (!cm) {
+ fprintf(stderr, "cmsg: no cmsg\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (cm->cmsg_level != SOL_VSOCK) {
+ fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (cm->cmsg_type != VSOCK_RECVERR) {
+ fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n");
+ exit(EXIT_FAILURE);
+ }
+
+ serr = (void *)CMSG_DATA(cm);
+ if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
+ fprintf(stderr, "serr: wrong origin: %u\n", serr->ee_origin);
+ exit(EXIT_FAILURE);
+ }
+
+ if (serr->ee_errno) {
+ fprintf(stderr, "serr: wrong error code: %u\n", serr->ee_errno);
+ exit(EXIT_FAILURE);
+ }
+
+ /* This flag is used for tests, to check that transmission was
+ * performed as expected: zerocopy or fallback to copy. If NULL
+ * - don't care.
+ */
+ if (!zerocopied)
+ return;
+
+ if (*zerocopied && (serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
+ fprintf(stderr, "serr: was copy instead of zerocopy\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (!*zerocopied && !(serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
+ fprintf(stderr, "serr: was zerocopy instead of copy\n");
+ exit(EXIT_FAILURE);
+ }
+}
+
+#endif /* MSG_ZEROCOPY_COMMON_H */
diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 6779d5008b27..b1770edd8cc1 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -11,10 +11,12 @@
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
+#include <string.h>
#include <signal.h>
#include <unistd.h>
#include <assert.h>
#include <sys/epoll.h>
+#include <sys/mman.h>

#include "timeout.h"
#include "control.h"
@@ -444,3 +446,111 @@ unsigned long hash_djb2(const void *data, size_t len)

return hash;
}
+
+size_t iovec_bytes(const struct iovec *iov, size_t iovnum)
+{
+ size_t bytes;
+ int i;
+
+ for (bytes = 0, i = 0; i < iovnum; i++)
+ bytes += iov[i].iov_len;
+
+ return bytes;
+}
+
+unsigned long iovec_hash_djb2(const struct iovec *iov, size_t iovnum)
+{
+ unsigned long hash;
+ size_t iov_bytes;
+ size_t offs;
+ void *tmp;
+ int i;
+
+ iov_bytes = iovec_bytes(iov, iovnum);
+
+ tmp = malloc(iov_bytes);
+ if (!tmp) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ for (offs = 0, i = 0; i < iovnum; i++) {
+ memcpy(tmp + offs, iov[i].iov_base, iov[i].iov_len);
+ offs += iov[i].iov_len;
+ }
+
+ hash = hash_djb2(tmp, iov_bytes);
+ free(tmp);
+
+ return hash;
+}
+
+struct iovec *iovec_from_test_data(const struct iovec *test_iovec, int iovnum)
+{
+ struct iovec *iovec;
+ int i;
+
+ iovec = malloc(sizeof(*iovec) * iovnum);
+ if (!iovec) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ for (i = 0; i < iovnum; i++) {
+ iovec[i].iov_len = test_iovec[i].iov_len;
+
+ iovec[i].iov_base = mmap(NULL, iovec[i].iov_len,
+ PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE,
+ -1, 0);
+ if (iovec[i].iov_base == MAP_FAILED) {
+ perror("mmap");
+ exit(EXIT_FAILURE);
+ }
+
+ if (test_iovec[i].iov_base != MAP_FAILED)
+ iovec[i].iov_base += (uintptr_t)test_iovec[i].iov_base;
+ }
+
+ /* Unmap "invalid" elements. */
+ for (i = 0; i < iovnum; i++) {
+ if (test_iovec[i].iov_base == MAP_FAILED) {
+ if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
+ perror("munmap");
+ exit(EXIT_FAILURE);
+ }
+ }
+ }
+
+ for (i = 0; i < iovnum; i++) {
+ int j;
+
+ if (test_iovec[i].iov_base == MAP_FAILED)
+ continue;
+
+ for (j = 0; j < iovec[i].iov_len; j++)
+ ((uint8_t *)iovec[i].iov_base)[j] = rand() & 0xff;
+ }
+
+ return iovec;
+}
+
+void free_iovec_test_data(const struct iovec *test_iovec,
+ struct iovec *iovec, int iovnum)
+{
+ int i;
+
+ for (i = 0; i < iovnum; i++) {
+ if (test_iovec[i].iov_base != MAP_FAILED) {
+ if (test_iovec[i].iov_base)
+ iovec[i].iov_base -= (uintptr_t)test_iovec[i].iov_base;
+
+ if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
+ perror("munmap");
+ exit(EXIT_FAILURE);
+ }
+ }
+ }
+
+ free(iovec);
+}
diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
index e5407677ce05..4cacb8d804c1 100644
--- a/tools/testing/vsock/util.h
+++ b/tools/testing/vsock/util.h
@@ -53,4 +53,9 @@ void list_tests(const struct test_case *test_cases);
void skip_test(struct test_case *test_cases, size_t test_cases_len,
const char *test_id_str);
unsigned long hash_djb2(const void *data, size_t len);
+size_t iovec_bytes(const struct iovec *iov, size_t iovnum);
+unsigned long iovec_hash_djb2(const struct iovec *iov, size_t iovnum);
+struct iovec *iovec_from_test_data(const struct iovec *test_iovec, int iovnum);
+void free_iovec_test_data(const struct iovec *test_iovec,
+ struct iovec *iovec, int iovnum);
#endif /* UTIL_H */
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index da4cb819a183..c1f7bc9abd22 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -21,6 +21,7 @@
#include <poll.h>
#include <signal.h>

+#include "vsock_test_zerocopy.h"
#include "timeout.h"
#include "control.h"
#include "util.h"
@@ -1269,6 +1270,21 @@ static struct test_case test_cases[] = {
.run_client = test_stream_shutrd_client,
.run_server = test_stream_shutrd_server,
},
+ {
+ .name = "SOCK_STREAM MSG_ZEROCOPY",
+ .run_client = test_stream_msgzcopy_client,
+ .run_server = test_stream_msgzcopy_server,
+ },
+ {
+ .name = "SOCK_SEQPACKET MSG_ZEROCOPY",
+ .run_client = test_seqpacket_msgzcopy_client,
+ .run_server = test_seqpacket_msgzcopy_server,
+ },
+ {
+ .name = "SOCK_STREAM MSG_ZEROCOPY empty MSG_ERRQUEUE",
+ .run_client = test_stream_msgzcopy_empty_errq_client,
+ .run_server = test_stream_msgzcopy_empty_errq_server,
+ },
{},
};

diff --git a/tools/testing/vsock/vsock_test_zerocopy.c b/tools/testing/vsock/vsock_test_zerocopy.c
new file mode 100644
index 000000000000..af14efdf334b
--- /dev/null
+++ b/tools/testing/vsock/vsock_test_zerocopy.c
@@ -0,0 +1,367 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* MSG_ZEROCOPY feature tests for vsock
+ *
+ * Copyright (C) 2023 SberDevices.
+ *
+ * Author: Arseniy Krasnov <[email protected]>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <poll.h>
+#include <linux/errqueue.h>
+#include <linux/kernel.h>
+#include <errno.h>
+
+#include "control.h"
+#include "vsock_test_zerocopy.h"
+#include "msg_zerocopy_common.h"
+
+#define PAGE_SIZE 4096
+
+#define VSOCK_TEST_DATA_MAX_IOV 3
+
+struct vsock_test_data {
+ /* This test case if for SOCK_STREAM only. */
+ bool stream_only;
+ /* Data must be zerocopied. This field is checked against
+ * field 'ee_code' of the 'struct sock_extended_err', which
+ * contains bit to detect that zerocopy transmission was
+ * fallbacked to copy mode.
+ */
+ bool zerocopied;
+ /* Enable SO_ZEROCOPY option on the socket. Without enabled
+ * SO_ZEROCOPY, every MSG_ZEROCOPY transmission will behave
+ * like without MSG_ZEROCOPY flag.
+ */
+ bool so_zerocopy;
+ /* 'errno' after 'sendmsg()' call. */
+ int sendmsg_errno;
+ /* Number of valid elements in 'vecs'. */
+ int vecs_cnt;
+ /* Array how to allocate buffers for test.
+ * 'iov_base' == NULL -> valid buf: mmap('iov_len').
+ *
+ * 'iov_base' == MAP_FAILED -> invalid buf:
+ * mmap('iov_len'), then munmap('iov_len').
+ * 'iov_base' still contains result of
+ * mmap().
+ *
+ * 'iov_base' == number -> unaligned valid buf:
+ * mmap('iov_len') + number.
+ */
+ struct iovec vecs[VSOCK_TEST_DATA_MAX_IOV];
+};
+
+static struct vsock_test_data test_data_array[] = {
+ /* Last element has non-page aligned size. */
+ {
+ .zerocopied = true,
+ .so_zerocopy = true,
+ .sendmsg_errno = 0,
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE },
+ { NULL, PAGE_SIZE },
+ { NULL, 200 }
+ }
+ },
+ /* All elements have page aligned base and size. */
+ {
+ .zerocopied = true,
+ .so_zerocopy = true,
+ .sendmsg_errno = 0,
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE },
+ { NULL, PAGE_SIZE * 2 },
+ { NULL, PAGE_SIZE * 3 }
+ }
+ },
+ /* All elements have page aligned base and size. But
+ * data length is bigger than 64Kb.
+ */
+ {
+ .zerocopied = true,
+ .so_zerocopy = true,
+ .sendmsg_errno = 0,
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE * 16 },
+ { NULL, PAGE_SIZE * 16 },
+ { NULL, PAGE_SIZE * 16 }
+ }
+ },
+ /* Middle element has both non-page aligned base and size. */
+ {
+ .zerocopied = true,
+ .so_zerocopy = true,
+ .sendmsg_errno = 0,
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE },
+ { (void *)1, 100 },
+ { NULL, PAGE_SIZE }
+ }
+ },
+ /* Middle element is unmapped. */
+ {
+ .zerocopied = false,
+ .so_zerocopy = true,
+ .sendmsg_errno = ENOMEM,
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE },
+ { MAP_FAILED, PAGE_SIZE },
+ { NULL, PAGE_SIZE }
+ }
+ },
+ /* Valid data, but SO_ZEROCOPY is off. This
+ * will trigger fallback to copy.
+ */
+ {
+ .zerocopied = false,
+ .so_zerocopy = false,
+ .sendmsg_errno = 0,
+ .vecs_cnt = 1,
+ {
+ { NULL, PAGE_SIZE }
+ }
+ },
+ /* Valid data, but message is bigger than peer's
+ * buffer, so this will trigger fallback to copy.
+ * This test is for SOCK_STREAM only, because
+ * for SOCK_SEQPACKET, 'sendmsg()' returns EMSGSIZE.
+ */
+ {
+ .stream_only = true,
+ .zerocopied = false,
+ .so_zerocopy = true,
+ .sendmsg_errno = 0,
+ .vecs_cnt = 1,
+ {
+ { NULL, 100 * PAGE_SIZE }
+ }
+ },
+};
+
+#define POLL_TIMEOUT_MS 100
+
+static void test_client(const struct test_opts *opts,
+ const struct vsock_test_data *test_data,
+ bool sock_seqpacket)
+{
+ struct pollfd fds = { 0 };
+ struct msghdr msg = { 0 };
+ ssize_t sendmsg_res;
+ struct iovec *iovec;
+ int fd;
+
+ if (sock_seqpacket)
+ fd = vsock_seqpacket_connect(opts->peer_cid, 1234);
+ else
+ fd = vsock_stream_connect(opts->peer_cid, 1234);
+
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ if (test_data->so_zerocopy)
+ enable_so_zerocopy(fd);
+
+ iovec = iovec_from_test_data(test_data->vecs, test_data->vecs_cnt);
+
+ msg.msg_iov = iovec;
+ msg.msg_iovlen = test_data->vecs_cnt;
+
+ errno = 0;
+
+ sendmsg_res = sendmsg(fd, &msg, MSG_ZEROCOPY);
+ if (errno != test_data->sendmsg_errno) {
+ fprintf(stderr, "expected 'errno' == %i, got %i\n",
+ test_data->sendmsg_errno, errno);
+ exit(EXIT_FAILURE);
+ }
+
+ if (!errno) {
+ if (sendmsg_res != iovec_bytes(iovec, test_data->vecs_cnt)) {
+ fprintf(stderr, "expected 'sendmsg()' == %li, got %li\n",
+ iovec_bytes(iovec, test_data->vecs_cnt),
+ sendmsg_res);
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ fds.fd = fd;
+ fds.events = 0;
+
+ if (poll(&fds, 1, POLL_TIMEOUT_MS) < 0) {
+ perror("poll");
+ exit(EXIT_FAILURE);
+ }
+
+ if (fds.revents & POLLERR) {
+ vsock_recv_completion(fd, &test_data->zerocopied);
+ } else if (test_data->so_zerocopy && !test_data->sendmsg_errno) {
+ /* If we don't have data in the error queue, but
+ * SO_ZEROCOPY was enabled and 'sendmsg()' was
+ * successful - this is an error.
+ */
+ fprintf(stderr, "POLLERR expected\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (!test_data->sendmsg_errno)
+ control_writeulong(iovec_hash_djb2(iovec, test_data->vecs_cnt));
+ else
+ control_writeulong(0);
+
+ control_writeln("DONE");
+ free_iovec_test_data(test_data->vecs, iovec, test_data->vecs_cnt);
+ close(fd);
+}
+
+void test_stream_msgzcopy_client(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
+ test_client(opts, &test_data_array[i], false);
+}
+
+void test_seqpacket_msgzcopy_client(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++) {
+ if (test_data_array[i].stream_only)
+ continue;
+
+ test_client(opts, &test_data_array[i], true);
+ }
+}
+
+static void test_server(const struct test_opts *opts,
+ const struct vsock_test_data *test_data,
+ bool sock_seqpacket)
+{
+ unsigned long remote_hash;
+ unsigned long local_hash;
+ ssize_t total_bytes_rec;
+ unsigned char *data;
+ size_t data_len;
+ int fd;
+
+ if (sock_seqpacket)
+ fd = vsock_seqpacket_accept(VMADDR_CID_ANY, 1234, NULL);
+ else
+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+
+ if (fd < 0) {
+ perror("accept");
+ exit(EXIT_FAILURE);
+ }
+
+ data_len = iovec_bytes(test_data->vecs, test_data->vecs_cnt);
+
+ data = malloc(data_len);
+ if (!data) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ total_bytes_rec = 0;
+
+ while (total_bytes_rec != data_len) {
+ ssize_t bytes_rec;
+
+ bytes_rec = read(fd, data + total_bytes_rec,
+ data_len - total_bytes_rec);
+ if (bytes_rec <= 0)
+ break;
+
+ total_bytes_rec += bytes_rec;
+ }
+
+ if (test_data->sendmsg_errno == 0)
+ local_hash = hash_djb2(data, data_len);
+ else
+ local_hash = 0;
+
+ free(data);
+
+ /* Waiting for some result. */
+ remote_hash = control_readulong();
+ if (remote_hash != local_hash) {
+ fprintf(stderr, "hash mismatch\n");
+ exit(EXIT_FAILURE);
+ }
+
+ control_expectln("DONE");
+ close(fd);
+}
+
+void test_stream_msgzcopy_server(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
+ test_server(opts, &test_data_array[i], false);
+}
+
+void test_seqpacket_msgzcopy_server(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++) {
+ if (test_data_array[i].stream_only)
+ continue;
+
+ test_server(opts, &test_data_array[i], true);
+ }
+}
+
+void test_stream_msgzcopy_empty_errq_client(const struct test_opts *opts)
+{
+ struct msghdr msg = { 0 };
+ char cmsg_data[128];
+ ssize_t res;
+ int fd;
+
+ fd = vsock_stream_connect(opts->peer_cid, 1234);
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ msg.msg_control = cmsg_data;
+ msg.msg_controllen = sizeof(cmsg_data);
+
+ res = recvmsg(fd, &msg, MSG_ERRQUEUE);
+ if (res != -1) {
+ fprintf(stderr, "expected 'recvmsg(2)' failure, got %zi\n",
+ res);
+ exit(EXIT_FAILURE);
+ }
+
+ control_writeln("DONE");
+ close(fd);
+}
+
+void test_stream_msgzcopy_empty_errq_server(const struct test_opts *opts)
+{
+ int fd;
+
+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+ if (fd < 0) {
+ perror("accept");
+ exit(EXIT_FAILURE);
+ }
+
+ control_expectln("DONE");
+ close(fd);
+}
diff --git a/tools/testing/vsock/vsock_test_zerocopy.h b/tools/testing/vsock/vsock_test_zerocopy.h
new file mode 100644
index 000000000000..3ef2579e024d
--- /dev/null
+++ b/tools/testing/vsock/vsock_test_zerocopy.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef VSOCK_TEST_ZEROCOPY_H
+#define VSOCK_TEST_ZEROCOPY_H
+#include "util.h"
+
+void test_stream_msgzcopy_client(const struct test_opts *opts);
+void test_stream_msgzcopy_server(const struct test_opts *opts);
+
+void test_seqpacket_msgzcopy_client(const struct test_opts *opts);
+void test_seqpacket_msgzcopy_server(const struct test_opts *opts);
+
+void test_stream_msgzcopy_empty_errq_client(const struct test_opts *opts);
+void test_stream_msgzcopy_empty_errq_server(const struct test_opts *opts);
+
+#endif /* VSOCK_TEST_ZEROCOPY_H */
--
2.25.1

2023-10-07 17:30:57

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v3 09/12] docs: net: description of MSG_ZEROCOPY for AF_VSOCK

This adds description of MSG_ZEROCOPY flag support for AF_VSOCK type of
socket.

Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
---
Documentation/networking/msg_zerocopy.rst | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/msg_zerocopy.rst b/Documentation/networking/msg_zerocopy.rst
index b3ea96af9b49..78fb70e748b7 100644
--- a/Documentation/networking/msg_zerocopy.rst
+++ b/Documentation/networking/msg_zerocopy.rst
@@ -7,7 +7,8 @@ Intro
=====

The MSG_ZEROCOPY flag enables copy avoidance for socket send calls.
-The feature is currently implemented for TCP and UDP sockets.
+The feature is currently implemented for TCP, UDP and VSOCK (with
+virtio transport) sockets.


Opportunity and Caveats
@@ -174,7 +175,9 @@ read_notification() call in the previous snippet. A notification
is encoded in the standard error format, sock_extended_err.

The level and type fields in the control data are protocol family
-specific, IP_RECVERR or IPV6_RECVERR.
+specific, IP_RECVERR or IPV6_RECVERR (for TCP or UDP socket).
+For VSOCK socket, cmsg_level will be SOL_VSOCK and cmsg_type will be
+VSOCK_RECVERR.

Error origin is the new type SO_EE_ORIGIN_ZEROCOPY. ee_errno is zero,
as explained before, to avoid blocking read and write system calls on
@@ -235,12 +238,15 @@ Implementation
Loopback
--------

+For TCP and UDP:
Data sent to local sockets can be queued indefinitely if the receive
process does not read its socket. Unbound notification latency is not
acceptable. For this reason all packets generated with MSG_ZEROCOPY
that are looped to a local socket will incur a deferred copy. This
includes looping onto packet sockets (e.g., tcpdump) and tun devices.

+For VSOCK:
+Data path sent to local sockets is the same as for non-local sockets.

Testing
=======
@@ -254,3 +260,6 @@ instance when run with msg_zerocopy.sh between a veth pair across
namespaces, the test will not show any improvement. For testing, the
loopback restriction can be temporarily relaxed by making
skb_orphan_frags_rx identical to skb_orphan_frags.
+
+For VSOCK type of socket example can be found in
+tools/testing/vsock/vsock_test_zerocopy.c.
--
2.25.1

2023-10-09 15:18:12

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH net-next v3 12/12] test/vsock: io_uring rx/tx tests

On Sat, Oct 07, 2023 at 08:21:39PM +0300, Arseniy Krasnov wrote:
>This adds set of tests which use io_uring for rx/tx. This test suite is
>implemented as separated util like 'vsock_test' and has the same set of
>input arguments as 'vsock_test'. These tests only cover cases of data
>transmission (no connect/bind/accept etc).
>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> Changelog:
> v1 -> v2:
> * Add 'LDLIBS = -luring' to the target 'vsock_uring_test'.
> * Add 'vsock_uring_test' to the target 'test'.
> v2 -> v3:
> * Make 'struct vsock_test_data' private by placing it to the .c file.
> Rename it and add comments to this struct to clarify sense of its
> fields.
> * Add 'vsock_uring_test' to the '.gitignore'.
> * Add receive loop to the server side - this is needed to read entire
> data sent by client.
>
> tools/testing/vsock/.gitignore | 1 +
> tools/testing/vsock/Makefile | 7 +-
> tools/testing/vsock/vsock_uring_test.c | 350 +++++++++++++++++++++++++
> 3 files changed, 356 insertions(+), 2 deletions(-)
> create mode 100644 tools/testing/vsock/vsock_uring_test.c
>
>diff --git a/tools/testing/vsock/.gitignore b/tools/testing/vsock/.gitignore
>index a8adcfdc292b..d9f798713cd7 100644
>--- a/tools/testing/vsock/.gitignore
>+++ b/tools/testing/vsock/.gitignore
>@@ -3,3 +3,4 @@
> vsock_test
> vsock_diag_test
> vsock_perf
>+vsock_uring_test
>diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
>index 1a26f60a596c..b80e7c7def1e 100644
>--- a/tools/testing/vsock/Makefile
>+++ b/tools/testing/vsock/Makefile
>@@ -1,12 +1,15 @@
> # SPDX-License-Identifier: GPL-2.0-only
> all: test vsock_perf
>-test: vsock_test vsock_diag_test
>+test: vsock_test vsock_diag_test vsock_uring_test
> vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
> vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
> vsock_perf: vsock_perf.o
>
>+vsock_uring_test: LDLIBS = -luring
>+vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o
>+
> CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE
> .PHONY: all test clean
> clean:
>- ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf
>+ ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test
> -include *.d
>diff --git a/tools/testing/vsock/vsock_uring_test.c b/tools/testing/vsock/vsock_uring_test.c
>new file mode 100644
>index 000000000000..889887cf3989
>--- /dev/null
>+++ b/tools/testing/vsock/vsock_uring_test.c
>@@ -0,0 +1,350 @@
>+// SPDX-License-Identifier: GPL-2.0-only
>+/* io_uring tests for vsock
>+ *
>+ * Copyright (C) 2023 SberDevices.
>+ *
>+ * Author: Arseniy Krasnov <[email protected]>
>+ */
>+
>+#include <getopt.h>
>+#include <stdio.h>
>+#include <stdlib.h>
>+#include <string.h>
>+#include <liburing.h>
>+#include <unistd.h>
>+#include <sys/mman.h>
>+#include <linux/kernel.h>
>+#include <error.h>
>+
>+#include "util.h"
>+#include "control.h"
>+#include "msg_zerocopy_common.h"
>+
>+#define PAGE_SIZE 4096

Ditto.

>+#define RING_ENTRIES_NUM 4
>+
>+#define VSOCK_TEST_DATA_MAX_IOV 3
>+
>+struct vsock_io_uring_test {
>+ /* Number of valid elements in 'vecs'. */
>+ int vecs_cnt;
>+ /* Array how to allocate buffers for test.
>+ * 'iov_base' == NULL -> valid buf: mmap('iov_len').
>+ *
>+ * 'iov_base' == MAP_FAILED -> invalid buf:
>+ * mmap('iov_len'), then munmap('iov_len').
>+ * 'iov_base' still contains result of
>+ * mmap().
>+ *
>+ * 'iov_base' == number -> unaligned valid buf:
>+ * mmap('iov_len') + number.
>+ */
>+ struct iovec vecs[VSOCK_TEST_DATA_MAX_IOV];
>+};
>+
>+static struct vsock_io_uring_test test_data_array[] = {
>+ /* All elements have page aligned base and size. */
>+ {
>+ .vecs_cnt = 3,
>+ {
>+ { NULL, PAGE_SIZE },
>+ { NULL, 2 * PAGE_SIZE },
>+ { NULL, 3 * PAGE_SIZE },
>+ }
>+ },
>+ /* Middle element has both non-page aligned base and size. */
>+ {
>+ .vecs_cnt = 3,
>+ {
>+ { NULL, PAGE_SIZE },
>+ { (void *)1, 200 },
>+ { NULL, 3 * PAGE_SIZE },
>+ }
>+ }
>+};
>+
>+static void vsock_io_uring_client(const struct test_opts *opts,
>+ const struct vsock_io_uring_test *test_data,
>+ bool msg_zerocopy)
>+{
>+ struct io_uring_sqe *sqe;
>+ struct io_uring_cqe *cqe;
>+ struct io_uring ring;
>+ struct iovec *iovec;
>+ struct msghdr msg;
>+ int fd;
>+
>+ fd = vsock_stream_connect(opts->peer_cid, 1234);
>+ if (fd < 0) {
>+ perror("connect");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (msg_zerocopy)
>+ enable_so_zerocopy(fd);
>+
>+ iovec = iovec_from_test_data(test_data->vecs, test_data->vecs_cnt);

Ah, I see this is used also here, so now I get why in util.h

Okay, it is fine, but please change the name in something like
`alloc_test_iovec`/`free_test_iovec` and add a bit of documentation
in util.c about the input and output of that function.

The rest LGMT.

Stefano

>+
>+ if (io_uring_queue_init(RING_ENTRIES_NUM, &ring, 0))
>+ error(1, errno, "io_uring_queue_init");
>+
>+ if (io_uring_register_buffers(&ring, iovec, test_data->vecs_cnt))
>+ error(1, errno, "io_uring_register_buffers");
>+
>+ memset(&msg, 0, sizeof(msg));
>+ msg.msg_iov = iovec;
>+ msg.msg_iovlen = test_data->vecs_cnt;
>+ sqe = io_uring_get_sqe(&ring);
>+
>+ if (msg_zerocopy)
>+ io_uring_prep_sendmsg_zc(sqe, fd, &msg, 0);
>+ else
>+ io_uring_prep_sendmsg(sqe, fd, &msg, 0);
>+
>+ if (io_uring_submit(&ring) != 1)
>+ error(1, errno, "io_uring_submit");
>+
>+ if (io_uring_wait_cqe(&ring, &cqe))
>+ error(1, errno, "io_uring_wait_cqe");
>+
>+ io_uring_cqe_seen(&ring, cqe);
>+
>+ control_writeulong(iovec_hash_djb2(iovec, test_data->vecs_cnt));
>+
>+ control_writeln("DONE");
>+ io_uring_queue_exit(&ring);
>+ free_iovec_test_data(test_data->vecs, iovec, test_data->vecs_cnt);
>+ close(fd);
>+}
>+
>+static void vsock_io_uring_server(const struct test_opts *opts,
>+ const struct vsock_io_uring_test *test_data)
>+{
>+ unsigned long remote_hash;
>+ unsigned long local_hash;
>+ struct io_uring ring;
>+ size_t data_len;
>+ size_t recv_len;
>+ void *data;
>+ int fd;
>+
>+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
>+ if (fd < 0) {
>+ perror("accept");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ data_len = iovec_bytes(test_data->vecs, test_data->vecs_cnt);
>+
>+ data = malloc(data_len);
>+ if (!data) {
>+ perror("malloc");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (io_uring_queue_init(RING_ENTRIES_NUM, &ring, 0))
>+ error(1, errno, "io_uring_queue_init");
>+
>+ recv_len = 0;
>+
>+ while (recv_len < data_len) {
>+ struct io_uring_sqe *sqe;
>+ struct io_uring_cqe *cqe;
>+ struct iovec iovec;
>+
>+ sqe = io_uring_get_sqe(&ring);
>+ iovec.iov_base = data + recv_len;
>+ iovec.iov_len = data_len;
>+
>+ io_uring_prep_readv(sqe, fd, &iovec, 1, 0);
>+
>+ if (io_uring_submit(&ring) != 1)
>+ error(1, errno, "io_uring_submit");
>+
>+ if (io_uring_wait_cqe(&ring, &cqe))
>+ error(1, errno, "io_uring_wait_cqe");
>+
>+ recv_len += cqe->res;
>+ io_uring_cqe_seen(&ring, cqe);
>+ }
>+
>+ if (recv_len != data_len) {
>+ fprintf(stderr, "expected %zu, got %zu\n", data_len,
>+ recv_len);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ local_hash = hash_djb2(data, data_len);
>+
>+ remote_hash = control_readulong();
>+ if (remote_hash != local_hash) {
>+ fprintf(stderr, "hash mismatch\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ control_expectln("DONE");
>+ io_uring_queue_exit(&ring);
>+ free(data);
>+}
>+
>+void test_stream_uring_server(const struct test_opts *opts)
>+{
>+ int i;
>+
>+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>+ vsock_io_uring_server(opts, &test_data_array[i]);
>+}
>+
>+void test_stream_uring_client(const struct test_opts *opts)
>+{
>+ int i;
>+
>+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>+ vsock_io_uring_client(opts, &test_data_array[i], false);
>+}
>+
>+void test_stream_uring_msg_zc_server(const struct test_opts *opts)
>+{
>+ int i;
>+
>+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>+ vsock_io_uring_server(opts, &test_data_array[i]);
>+}
>+
>+void test_stream_uring_msg_zc_client(const struct test_opts *opts)
>+{
>+ int i;
>+
>+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>+ vsock_io_uring_client(opts, &test_data_array[i], true);
>+}
>+
>+static struct test_case test_cases[] = {
>+ {
>+ .name = "SOCK_STREAM io_uring test",
>+ .run_server = test_stream_uring_server,
>+ .run_client = test_stream_uring_client,
>+ },
>+ {
>+ .name = "SOCK_STREAM io_uring MSG_ZEROCOPY test",
>+ .run_server = test_stream_uring_msg_zc_server,
>+ .run_client = test_stream_uring_msg_zc_client,
>+ },
>+ {},
>+};
>+
>+static const char optstring[] = "";
>+static const struct option longopts[] = {
>+ {
>+ .name = "control-host",
>+ .has_arg = required_argument,
>+ .val = 'H',
>+ },
>+ {
>+ .name = "control-port",
>+ .has_arg = required_argument,
>+ .val = 'P',
>+ },
>+ {
>+ .name = "mode",
>+ .has_arg = required_argument,
>+ .val = 'm',
>+ },
>+ {
>+ .name = "peer-cid",
>+ .has_arg = required_argument,
>+ .val = 'p',
>+ },
>+ {
>+ .name = "help",
>+ .has_arg = no_argument,
>+ .val = '?',
>+ },
>+ {},
>+};
>+
>+static void usage(void)
>+{
>+ fprintf(stderr, "Usage: vsock_uring_test [--help] [--control-host=<host>] --control-port=<port> --mode=client|server --peer-cid=<cid>\n"
>+ "\n"
>+ " Server: vsock_uring_test --control-port=1234 --mode=server --peer-cid=3\n"
>+ " Client: vsock_uring_test --control-host=192.168.0.1 --control-port=1234 --mode=client --peer-cid=2\n"
>+ "\n"
>+ "Run transmission tests using io_uring. Usage is the same as\n"
>+ "in ./vsock_test\n"
>+ "\n"
>+ "Options:\n"
>+ " --help This help message\n"
>+ " --control-host <host> Server IP address to connect to\n"
>+ " --control-port <port> Server port to listen on/connect to\n"
>+ " --mode client|server Server or client mode\n"
>+ " --peer-cid <cid> CID of the other side\n"
>+ );
>+ exit(EXIT_FAILURE);
>+}
>+
>+int main(int argc, char **argv)
>+{
>+ const char *control_host = NULL;
>+ const char *control_port = NULL;
>+ struct test_opts opts = {
>+ .mode = TEST_MODE_UNSET,
>+ .peer_cid = VMADDR_CID_ANY,
>+ };
>+
>+ init_signals();
>+
>+ for (;;) {
>+ int opt = getopt_long(argc, argv, optstring, longopts, NULL);
>+
>+ if (opt == -1)
>+ break;
>+
>+ switch (opt) {
>+ case 'H':
>+ control_host = optarg;
>+ break;
>+ case 'm':
>+ if (strcmp(optarg, "client") == 0) {
>+ opts.mode = TEST_MODE_CLIENT;
>+ } else if (strcmp(optarg, "server") == 0) {
>+ opts.mode = TEST_MODE_SERVER;
>+ } else {
>+ fprintf(stderr, "--mode must be \"client\" or \"server\"\n");
>+ return EXIT_FAILURE;
>+ }
>+ break;
>+ case 'p':
>+ opts.peer_cid = parse_cid(optarg);
>+ break;
>+ case 'P':
>+ control_port = optarg;
>+ break;
>+ case '?':
>+ default:
>+ usage();
>+ }
>+ }
>+
>+ if (!control_port)
>+ usage();
>+ if (opts.mode == TEST_MODE_UNSET)
>+ usage();
>+ if (opts.peer_cid == VMADDR_CID_ANY)
>+ usage();
>+
>+ if (!control_host) {
>+ if (opts.mode != TEST_MODE_SERVER)
>+ usage();
>+ control_host = "0.0.0.0";
>+ }
>+
>+ control_init(control_host, control_port,
>+ opts.mode == TEST_MODE_SERVER);
>+
>+ run_tests(test_cases, &opts);
>+
>+ control_cleanup();
>+
>+ return 0;
>+}
>--
>2.25.1
>

2023-10-09 15:18:38

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH net-next v3 10/12] test/vsock: MSG_ZEROCOPY flag tests

On Sat, Oct 07, 2023 at 08:21:37PM +0300, Arseniy Krasnov wrote:
>This adds three tests for MSG_ZEROCOPY feature:
>1) SOCK_STREAM tx with different buffers.
>2) SOCK_SEQPACKET tx with different buffers.
>3) SOCK_STREAM test to read empty error queue of the socket.
>
>Patch also works as preparation for the next patches for tools in this
>patchset: vsock_perf and vsock_uring_test:
>1) Adds several new functions to util.c - they will be also used by
> vsock_uring_test.
>2) Adds two new functions for MSG_ZEROCOPY handling to a new header
> file - such header will be shared between vsock_test, vsock_perf and
> vsock_uring_test, thus avoiding code copy-pasting.
>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> Changelog:
> v1 -> v2:
> * Move 'SOL_VSOCK' and 'VSOCK_RECVERR' from 'util.c' to 'util.h'.
> v2 -> v3:
> * Patch was reworked. Now it is also preparation patch (see commit
> message). Shared stuff for 'vsock_perf' and tests is placed to a
> new header file, while shared code between current test tool and
> future uring test is placed to the 'util.c'. I think, that making
> this patch as preparation allows to reduce number of changes in the
> next patches in this patchset.
> * Make 'struct vsock_test_data' private by placing it to the .c file.
> Also add comments to this struct to clarify sense of its fields.
>
> tools/testing/vsock/Makefile | 2 +-
> tools/testing/vsock/msg_zerocopy_common.h | 92 ++++++
> tools/testing/vsock/util.c | 110 +++++++
> tools/testing/vsock/util.h | 5 +
> tools/testing/vsock/vsock_test.c | 16 +
> tools/testing/vsock/vsock_test_zerocopy.c | 367 ++++++++++++++++++++++
> tools/testing/vsock/vsock_test_zerocopy.h | 15 +
> 7 files changed, 606 insertions(+), 1 deletion(-)
> create mode 100644 tools/testing/vsock/msg_zerocopy_common.h
> create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c
> create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h
>
>diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
>index 21a98ba565ab..1a26f60a596c 100644
>--- a/tools/testing/vsock/Makefile
>+++ b/tools/testing/vsock/Makefile
>@@ -1,7 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0-only
> all: test vsock_perf
> test: vsock_test vsock_diag_test
>-vsock_test: vsock_test.o timeout.o control.o util.o
>+vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
> vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
> vsock_perf: vsock_perf.o
>
>diff --git a/tools/testing/vsock/msg_zerocopy_common.h b/tools/testing/vsock/msg_zerocopy_common.h
>new file mode 100644
>index 000000000000..ce89f1281584
>--- /dev/null
>+++ b/tools/testing/vsock/msg_zerocopy_common.h
>@@ -0,0 +1,92 @@
>+/* SPDX-License-Identifier: GPL-2.0-only */
>+#ifndef MSG_ZEROCOPY_COMMON_H
>+#define MSG_ZEROCOPY_COMMON_H
>+
>+#include <stdio.h>
>+#include <stdlib.h>
>+#include <sys/types.h>
>+#include <sys/socket.h>
>+#include <linux/errqueue.h>
>+
>+#ifndef SOL_VSOCK
>+#define SOL_VSOCK 287
>+#endif
>+
>+#ifndef VSOCK_RECVERR
>+#define VSOCK_RECVERR 1
>+#endif
>+
>+static void enable_so_zerocopy(int fd)
>+{
>+ int val = 1;
>+
>+ if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) {
>+ perror("setsockopt");
>+ exit(EXIT_FAILURE);
>+ }
>+}
>+
>+static void vsock_recv_completion(int fd, const bool *zerocopied) __maybe_unused;

To avoid this, maybe we can implement those functions in .c file and
link the object.

WDYT?

Ah, here (cc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1)) the build is
failing:

In file included from vsock_perf.c:23:
msg_zerocopy_common.h: In function ‘vsock_recv_completion’:
msg_zerocopy_common.h:29:67: error: expected declaration specifiers before ‘__maybe_unused’
29 | static void vsock_recv_completion(int fd, const bool *zerocopied) __maybe_unused;
| ^~~~~~~~~~~~~~
msg_zerocopy_common.h:31:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
31 | {
| ^

>+static void vsock_recv_completion(int fd, const bool *zerocopied)
>+{
>+ struct sock_extended_err *serr;
>+ struct msghdr msg = { 0 };
>+ char cmsg_data[128];
>+ struct cmsghdr *cm;
>+ ssize_t res;
>+
>+ msg.msg_control = cmsg_data;
>+ msg.msg_controllen = sizeof(cmsg_data);
>+
>+ res = recvmsg(fd, &msg, MSG_ERRQUEUE);
>+ if (res) {
>+ fprintf(stderr, "failed to read error queue: %zi\n", res);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ cm = CMSG_FIRSTHDR(&msg);
>+ if (!cm) {
>+ fprintf(stderr, "cmsg: no cmsg\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (cm->cmsg_level != SOL_VSOCK) {
>+ fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (cm->cmsg_type != VSOCK_RECVERR) {
>+ fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ serr = (void *)CMSG_DATA(cm);
>+ if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
>+ fprintf(stderr, "serr: wrong origin: %u\n", serr->ee_origin);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (serr->ee_errno) {
>+ fprintf(stderr, "serr: wrong error code: %u\n", serr->ee_errno);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* This flag is used for tests, to check that transmission was
>+ * performed as expected: zerocopy or fallback to copy. If NULL
>+ * - don't care.
>+ */
>+ if (!zerocopied)
>+ return;
>+
>+ if (*zerocopied && (serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
>+ fprintf(stderr, "serr: was copy instead of zerocopy\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (!*zerocopied && !(serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
>+ fprintf(stderr, "serr: was zerocopy instead of copy\n");
>+ exit(EXIT_FAILURE);
>+ }
>+}
>+
>+#endif /* MSG_ZEROCOPY_COMMON_H */
>diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
>index 6779d5008b27..b1770edd8cc1 100644
>--- a/tools/testing/vsock/util.c
>+++ b/tools/testing/vsock/util.c
>@@ -11,10 +11,12 @@
> #include <stdio.h>
> #include <stdint.h>
> #include <stdlib.h>
>+#include <string.h>
> #include <signal.h>
> #include <unistd.h>
> #include <assert.h>
> #include <sys/epoll.h>
>+#include <sys/mman.h>
>
> #include "timeout.h"
> #include "control.h"
>@@ -444,3 +446,111 @@ unsigned long hash_djb2(const void *data, size_t len)
>
> return hash;
> }
>+
>+size_t iovec_bytes(const struct iovec *iov, size_t iovnum)
>+{
>+ size_t bytes;
>+ int i;
>+
>+ for (bytes = 0, i = 0; i < iovnum; i++)
>+ bytes += iov[i].iov_len;
>+
>+ return bytes;
>+}
>+
>+unsigned long iovec_hash_djb2(const struct iovec *iov, size_t iovnum)
>+{
>+ unsigned long hash;
>+ size_t iov_bytes;
>+ size_t offs;
>+ void *tmp;
>+ int i;
>+
>+ iov_bytes = iovec_bytes(iov, iovnum);
>+
>+ tmp = malloc(iov_bytes);
>+ if (!tmp) {
>+ perror("malloc");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ for (offs = 0, i = 0; i < iovnum; i++) {
>+ memcpy(tmp + offs, iov[i].iov_base, iov[i].iov_len);
>+ offs += iov[i].iov_len;
>+ }
>+
>+ hash = hash_djb2(tmp, iov_bytes);
>+ free(tmp);
>+
>+ return hash;
>+}
>+
>+struct iovec *iovec_from_test_data(const struct iovec *test_iovec, int
>iovnum)

From the name this function seems related to vsock_test_data, so I'd
suggest to move this and free_iovec_test_data() in vsock_test_zerocopy.c

>+{
>+ struct iovec *iovec;
>+ int i;
>+
>+ iovec = malloc(sizeof(*iovec) * iovnum);
>+ if (!iovec) {
>+ perror("malloc");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ for (i = 0; i < iovnum; i++) {
>+ iovec[i].iov_len = test_iovec[i].iov_len;
>+
>+ iovec[i].iov_base = mmap(NULL, iovec[i].iov_len,
>+ PROT_READ | PROT_WRITE,
>+ MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE,
>+ -1, 0);
>+ if (iovec[i].iov_base == MAP_FAILED) {
>+ perror("mmap");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (test_iovec[i].iov_base != MAP_FAILED)
>+ iovec[i].iov_base += (uintptr_t)test_iovec[i].iov_base;
>+ }
>+
>+ /* Unmap "invalid" elements. */
>+ for (i = 0; i < iovnum; i++) {
>+ if (test_iovec[i].iov_base == MAP_FAILED) {
>+ if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
>+ perror("munmap");
>+ exit(EXIT_FAILURE);
>+ }
>+ }
>+ }
>+
>+ for (i = 0; i < iovnum; i++) {
>+ int j;
>+
>+ if (test_iovec[i].iov_base == MAP_FAILED)
>+ continue;
>+
>+ for (j = 0; j < iovec[i].iov_len; j++)
>+ ((uint8_t *)iovec[i].iov_base)[j] = rand() & 0xff;
>+ }
>+
>+ return iovec;
>+}
>+
>+void free_iovec_test_data(const struct iovec *test_iovec,
>+ struct iovec *iovec, int iovnum)
>+{
>+ int i;
>+
>+ for (i = 0; i < iovnum; i++) {
>+ if (test_iovec[i].iov_base != MAP_FAILED) {
>+ if (test_iovec[i].iov_base)
>+ iovec[i].iov_base -= (uintptr_t)test_iovec[i].iov_base;
>+
>+ if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
>+ perror("munmap");
>+ exit(EXIT_FAILURE);
>+ }
>+ }
>+ }
>+
>+ free(iovec);
>+}
>diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
>index e5407677ce05..4cacb8d804c1 100644
>--- a/tools/testing/vsock/util.h
>+++ b/tools/testing/vsock/util.h
>@@ -53,4 +53,9 @@ void list_tests(const struct test_case *test_cases);
> void skip_test(struct test_case *test_cases, size_t test_cases_len,
> const char *test_id_str);
> unsigned long hash_djb2(const void *data, size_t len);
>+size_t iovec_bytes(const struct iovec *iov, size_t iovnum);
>+unsigned long iovec_hash_djb2(const struct iovec *iov, size_t iovnum);
>+struct iovec *iovec_from_test_data(const struct iovec *test_iovec, int iovnum);
>+void free_iovec_test_data(const struct iovec *test_iovec,
>+ struct iovec *iovec, int iovnum);
> #endif /* UTIL_H */
>diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>index da4cb819a183..c1f7bc9abd22 100644
>--- a/tools/testing/vsock/vsock_test.c
>+++ b/tools/testing/vsock/vsock_test.c
>@@ -21,6 +21,7 @@
> #include <poll.h>
> #include <signal.h>
>
>+#include "vsock_test_zerocopy.h"
> #include "timeout.h"
> #include "control.h"
> #include "util.h"
>@@ -1269,6 +1270,21 @@ static struct test_case test_cases[] = {
> .run_client = test_stream_shutrd_client,
> .run_server = test_stream_shutrd_server,
> },
>+ {
>+ .name = "SOCK_STREAM MSG_ZEROCOPY",
>+ .run_client = test_stream_msgzcopy_client,
>+ .run_server = test_stream_msgzcopy_server,
>+ },
>+ {
>+ .name = "SOCK_SEQPACKET MSG_ZEROCOPY",
>+ .run_client = test_seqpacket_msgzcopy_client,
>+ .run_server = test_seqpacket_msgzcopy_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM MSG_ZEROCOPY empty MSG_ERRQUEUE",
>+ .run_client = test_stream_msgzcopy_empty_errq_client,
>+ .run_server = test_stream_msgzcopy_empty_errq_server,
>+ },
> {},
> };
>
>diff --git a/tools/testing/vsock/vsock_test_zerocopy.c b/tools/testing/vsock/vsock_test_zerocopy.c
>new file mode 100644
>index 000000000000..af14efdf334b
>--- /dev/null
>+++ b/tools/testing/vsock/vsock_test_zerocopy.c
>@@ -0,0 +1,367 @@
>+// SPDX-License-Identifier: GPL-2.0-only
>+/* MSG_ZEROCOPY feature tests for vsock
>+ *
>+ * Copyright (C) 2023 SberDevices.
>+ *
>+ * Author: Arseniy Krasnov <[email protected]>
>+ */
>+
>+#include <stdio.h>
>+#include <stdlib.h>
>+#include <string.h>
>+#include <sys/mman.h>
>+#include <unistd.h>
>+#include <poll.h>
>+#include <linux/errqueue.h>
>+#include <linux/kernel.h>
>+#include <errno.h>
>+
>+#include "control.h"
>+#include "vsock_test_zerocopy.h"
>+#include "msg_zerocopy_common.h"
>+
>+#define PAGE_SIZE 4096

In some tests I saw `sysconf(_SC_PAGESIZE)` is used,
e.g. in selftests/ptrace/peeksiginfo.c:

#ifndef PAGE_SIZE
#define PAGE_SIZE sysconf(_SC_PAGESIZE)
#endif

WDYT?

>+
>+#define VSOCK_TEST_DATA_MAX_IOV 3
>+
>+struct vsock_test_data {
>+ /* This test case if for SOCK_STREAM only. */
>+ bool stream_only;
>+ /* Data must be zerocopied. This field is checked against
>+ * field 'ee_code' of the 'struct sock_extended_err', which
>+ * contains bit to detect that zerocopy transmission was
>+ * fallbacked to copy mode.
>+ */
>+ bool zerocopied;
>+ /* Enable SO_ZEROCOPY option on the socket. Without enabled
>+ * SO_ZEROCOPY, every MSG_ZEROCOPY transmission will behave
>+ * like without MSG_ZEROCOPY flag.
>+ */
>+ bool so_zerocopy;
>+ /* 'errno' after 'sendmsg()' call. */
>+ int sendmsg_errno;
>+ /* Number of valid elements in 'vecs'. */
>+ int vecs_cnt;
>+ /* Array how to allocate buffers for test.
>+ * 'iov_base' == NULL -> valid buf: mmap('iov_len').
>+ *
>+ * 'iov_base' == MAP_FAILED -> invalid buf:
>+ * mmap('iov_len'), then munmap('iov_len').
>+ * 'iov_base' still contains result of
>+ * mmap().
>+ *
>+ * 'iov_base' == number -> unaligned valid buf:
>+ * mmap('iov_len') + number.
>+ */
>+ struct iovec vecs[VSOCK_TEST_DATA_MAX_IOV];
>+};
>+
>+static struct vsock_test_data test_data_array[] = {
>+ /* Last element has non-page aligned size. */
>+ {
>+ .zerocopied = true,
>+ .so_zerocopy = true,
>+ .sendmsg_errno = 0,
>+ .vecs_cnt = 3,
>+ {
>+ { NULL, PAGE_SIZE },
>+ { NULL, PAGE_SIZE },
>+ { NULL, 200 }
>+ }
>+ },
>+ /* All elements have page aligned base and size. */
>+ {
>+ .zerocopied = true,
>+ .so_zerocopy = true,
>+ .sendmsg_errno = 0,
>+ .vecs_cnt = 3,
>+ {
>+ { NULL, PAGE_SIZE },
>+ { NULL, PAGE_SIZE * 2 },
>+ { NULL, PAGE_SIZE * 3 }
>+ }
>+ },
>+ /* All elements have page aligned base and size. But
>+ * data length is bigger than 64Kb.
>+ */
>+ {
>+ .zerocopied = true,
>+ .so_zerocopy = true,
>+ .sendmsg_errno = 0,
>+ .vecs_cnt = 3,
>+ {
>+ { NULL, PAGE_SIZE * 16 },
>+ { NULL, PAGE_SIZE * 16 },
>+ { NULL, PAGE_SIZE * 16 }
>+ }
>+ },
>+ /* Middle element has both non-page aligned base and size. */
>+ {
>+ .zerocopied = true,
>+ .so_zerocopy = true,
>+ .sendmsg_errno = 0,
>+ .vecs_cnt = 3,
>+ {
>+ { NULL, PAGE_SIZE },
>+ { (void *)1, 100 },
>+ { NULL, PAGE_SIZE }
>+ }
>+ },
>+ /* Middle element is unmapped. */
>+ {
>+ .zerocopied = false,
>+ .so_zerocopy = true,
>+ .sendmsg_errno = ENOMEM,
>+ .vecs_cnt = 3,
>+ {
>+ { NULL, PAGE_SIZE },
>+ { MAP_FAILED, PAGE_SIZE },
>+ { NULL, PAGE_SIZE }
>+ }
>+ },
>+ /* Valid data, but SO_ZEROCOPY is off. This
>+ * will trigger fallback to copy.
>+ */
>+ {
>+ .zerocopied = false,
>+ .so_zerocopy = false,
>+ .sendmsg_errno = 0,
>+ .vecs_cnt = 1,
>+ {
>+ { NULL, PAGE_SIZE }
>+ }
>+ },
>+ /* Valid data, but message is bigger than peer's
>+ * buffer, so this will trigger fallback to copy.
>+ * This test is for SOCK_STREAM only, because
>+ * for SOCK_SEQPACKET, 'sendmsg()' returns EMSGSIZE.
>+ */
>+ {
>+ .stream_only = true,
>+ .zerocopied = false,
>+ .so_zerocopy = true,
>+ .sendmsg_errno = 0,
>+ .vecs_cnt = 1,
>+ {
>+ { NULL, 100 * PAGE_SIZE }
>+ }
>+ },
>+};
>+
>+#define POLL_TIMEOUT_MS 100
>+
>+static void test_client(const struct test_opts *opts,
>+ const struct vsock_test_data *test_data,
>+ bool sock_seqpacket)
>+{
>+ struct pollfd fds = { 0 };
>+ struct msghdr msg = { 0 };
>+ ssize_t sendmsg_res;
>+ struct iovec *iovec;
>+ int fd;
>+
>+ if (sock_seqpacket)
>+ fd = vsock_seqpacket_connect(opts->peer_cid, 1234);
>+ else
>+ fd = vsock_stream_connect(opts->peer_cid, 1234);
>+
>+ if (fd < 0) {
>+ perror("connect");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (test_data->so_zerocopy)
>+ enable_so_zerocopy(fd);
>+
>+ iovec = iovec_from_test_data(test_data->vecs, test_data->vecs_cnt);
>+
>+ msg.msg_iov = iovec;
>+ msg.msg_iovlen = test_data->vecs_cnt;
>+
>+ errno = 0;
>+
>+ sendmsg_res = sendmsg(fd, &msg, MSG_ZEROCOPY);
>+ if (errno != test_data->sendmsg_errno) {
>+ fprintf(stderr, "expected 'errno' == %i, got %i\n",
>+ test_data->sendmsg_errno, errno);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (!errno) {
>+ if (sendmsg_res != iovec_bytes(iovec, test_data->vecs_cnt)) {
>+ fprintf(stderr, "expected 'sendmsg()' == %li, got %li\n",
>+ iovec_bytes(iovec, test_data->vecs_cnt),
>+ sendmsg_res);
>+ exit(EXIT_FAILURE);
>+ }
>+ }
>+
>+ fds.fd = fd;
>+ fds.events = 0;
>+
>+ if (poll(&fds, 1, POLL_TIMEOUT_MS) < 0) {
>+ perror("poll");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (fds.revents & POLLERR) {
>+ vsock_recv_completion(fd, &test_data->zerocopied);
>+ } else if (test_data->so_zerocopy && !test_data->sendmsg_errno) {
>+ /* If we don't have data in the error queue, but
>+ * SO_ZEROCOPY was enabled and 'sendmsg()' was
>+ * successful - this is an error.
>+ */
>+ fprintf(stderr, "POLLERR expected\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (!test_data->sendmsg_errno)
>+ control_writeulong(iovec_hash_djb2(iovec, test_data->vecs_cnt));
>+ else
>+ control_writeulong(0);
>+
>+ control_writeln("DONE");
>+ free_iovec_test_data(test_data->vecs, iovec, test_data->vecs_cnt);
>+ close(fd);
>+}
>+
>+void test_stream_msgzcopy_client(const struct test_opts *opts)
>+{
>+ int i;
>+
>+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>+ test_client(opts, &test_data_array[i], false);
>+}
>+
>+void test_seqpacket_msgzcopy_client(const struct test_opts *opts)
>+{
>+ int i;
>+
>+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++) {
>+ if (test_data_array[i].stream_only)
>+ continue;
>+
>+ test_client(opts, &test_data_array[i], true);
>+ }
>+}
>+
>+static void test_server(const struct test_opts *opts,
>+ const struct vsock_test_data *test_data,
>+ bool sock_seqpacket)
>+{
>+ unsigned long remote_hash;
>+ unsigned long local_hash;
>+ ssize_t total_bytes_rec;
>+ unsigned char *data;
>+ size_t data_len;
>+ int fd;
>+
>+ if (sock_seqpacket)
>+ fd = vsock_seqpacket_accept(VMADDR_CID_ANY, 1234, NULL);
>+ else
>+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
>+
>+ if (fd < 0) {
>+ perror("accept");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ data_len = iovec_bytes(test_data->vecs, test_data->vecs_cnt);
>+
>+ data = malloc(data_len);
>+ if (!data) {
>+ perror("malloc");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ total_bytes_rec = 0;
>+
>+ while (total_bytes_rec != data_len) {
>+ ssize_t bytes_rec;
>+
>+ bytes_rec = read(fd, data + total_bytes_rec,
>+ data_len - total_bytes_rec);
>+ if (bytes_rec <= 0)
>+ break;
>+
>+ total_bytes_rec += bytes_rec;
>+ }
>+
>+ if (test_data->sendmsg_errno == 0)
>+ local_hash = hash_djb2(data, data_len);
>+ else
>+ local_hash = 0;
>+
>+ free(data);
>+
>+ /* Waiting for some result. */
>+ remote_hash = control_readulong();
>+ if (remote_hash != local_hash) {
>+ fprintf(stderr, "hash mismatch\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ control_expectln("DONE");
>+ close(fd);
>+}
>+
>+void test_stream_msgzcopy_server(const struct test_opts *opts)
>+{
>+ int i;
>+
>+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>+ test_server(opts, &test_data_array[i], false);
>+}
>+
>+void test_seqpacket_msgzcopy_server(const struct test_opts *opts)
>+{
>+ int i;
>+
>+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++) {
>+ if (test_data_array[i].stream_only)
>+ continue;
>+
>+ test_server(opts, &test_data_array[i], true);
>+ }
>+}
>+
>+void test_stream_msgzcopy_empty_errq_client(const struct test_opts *opts)
>+{
>+ struct msghdr msg = { 0 };
>+ char cmsg_data[128];
>+ ssize_t res;
>+ int fd;
>+
>+ fd = vsock_stream_connect(opts->peer_cid, 1234);
>+ if (fd < 0) {
>+ perror("connect");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ msg.msg_control = cmsg_data;
>+ msg.msg_controllen = sizeof(cmsg_data);
>+
>+ res = recvmsg(fd, &msg, MSG_ERRQUEUE);
>+ if (res != -1) {
>+ fprintf(stderr, "expected 'recvmsg(2)' failure, got %zi\n",
>+ res);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ control_writeln("DONE");
>+ close(fd);
>+}
>+
>+void test_stream_msgzcopy_empty_errq_server(const struct test_opts *opts)
>+{
>+ int fd;
>+
>+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
>+ if (fd < 0) {
>+ perror("accept");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ control_expectln("DONE");
>+ close(fd);
>+}
>diff --git a/tools/testing/vsock/vsock_test_zerocopy.h b/tools/testing/vsock/vsock_test_zerocopy.h
>new file mode 100644
>index 000000000000..3ef2579e024d
>--- /dev/null
>+++ b/tools/testing/vsock/vsock_test_zerocopy.h
>@@ -0,0 +1,15 @@
>+/* SPDX-License-Identifier: GPL-2.0-only */
>+#ifndef VSOCK_TEST_ZEROCOPY_H
>+#define VSOCK_TEST_ZEROCOPY_H
>+#include "util.h"
>+
>+void test_stream_msgzcopy_client(const struct test_opts *opts);
>+void test_stream_msgzcopy_server(const struct test_opts *opts);
>+
>+void test_seqpacket_msgzcopy_client(const struct test_opts *opts);
>+void test_seqpacket_msgzcopy_server(const struct test_opts *opts);
>+
>+void test_stream_msgzcopy_empty_errq_client(const struct test_opts *opts);
>+void test_stream_msgzcopy_empty_errq_server(const struct test_opts *opts);
>+
>+#endif /* VSOCK_TEST_ZEROCOPY_H */
>--
>2.25.1
>

2023-10-09 15:18:55

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH net-next v3 11/12] test/vsock: MSG_ZEROCOPY support for vsock_perf

On Sat, Oct 07, 2023 at 08:21:38PM +0300, Arseniy Krasnov wrote:
>To use this option pass '--zerocopy' parameter:
>
>./vsock_perf --zerocopy --sender <cid> ...
>
>With this option MSG_ZEROCOPY flag will be passed to the 'send()' call.
>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> Changelog:
> v1 -> v2:
> * Move 'SOL_VSOCK' and 'VSOCK_RECVERR' from 'util.c' to 'util.h'.
> v2 -> v3:
> * Use 'msg_zerocopy_common.h' for MSG_ZEROCOPY related things.
> * Rename '--zc' option to '--zerocopy'.
> * Add detail in help that zerocopy mode is for sender mode only.
>
> tools/testing/vsock/vsock_perf.c | 80 ++++++++++++++++++++++++++++----
> 1 file changed, 71 insertions(+), 9 deletions(-)

Reviewed-by: Stefano Garzarella <[email protected]>

2023-10-09 19:29:35

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH net-next v3 12/12] test/vsock: io_uring rx/tx tests



On 09.10.2023 18:16, Stefano Garzarella wrote:
> On Sat, Oct 07, 2023 at 08:21:39PM +0300, Arseniy Krasnov wrote:
>> This adds set of tests which use io_uring for rx/tx. This test suite is
>> implemented as separated util like 'vsock_test' and has the same set of
>> input arguments as 'vsock_test'. These tests only cover cases of data
>> transmission (no connect/bind/accept etc).
>>
>> Signed-off-by: Arseniy Krasnov <[email protected]>
>> ---
>> Changelog:
>> v1 -> v2:
>>  * Add 'LDLIBS = -luring' to the target 'vsock_uring_test'.
>>  * Add 'vsock_uring_test' to the target 'test'.
>> v2 -> v3:
>>  * Make 'struct vsock_test_data' private by placing it to the .c file.
>>    Rename it and add comments to this struct to clarify sense of its
>>    fields.
>>  * Add 'vsock_uring_test' to the '.gitignore'.
>>  * Add receive loop to the server side - this is needed to read entire
>>    data sent by client.
>>
>> tools/testing/vsock/.gitignore         |   1 +
>> tools/testing/vsock/Makefile           |   7 +-
>> tools/testing/vsock/vsock_uring_test.c | 350 +++++++++++++++++++++++++
>> 3 files changed, 356 insertions(+), 2 deletions(-)
>> create mode 100644 tools/testing/vsock/vsock_uring_test.c
>>
>> diff --git a/tools/testing/vsock/.gitignore b/tools/testing/vsock/.gitignore
>> index a8adcfdc292b..d9f798713cd7 100644
>> --- a/tools/testing/vsock/.gitignore
>> +++ b/tools/testing/vsock/.gitignore
>> @@ -3,3 +3,4 @@
>> vsock_test
>> vsock_diag_test
>> vsock_perf
>> +vsock_uring_test
>> diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
>> index 1a26f60a596c..b80e7c7def1e 100644
>> --- a/tools/testing/vsock/Makefile
>> +++ b/tools/testing/vsock/Makefile
>> @@ -1,12 +1,15 @@
>> # SPDX-License-Identifier: GPL-2.0-only
>> all: test vsock_perf
>> -test: vsock_test vsock_diag_test
>> +test: vsock_test vsock_diag_test vsock_uring_test
>> vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
>> vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
>> vsock_perf: vsock_perf.o
>>
>> +vsock_uring_test: LDLIBS = -luring
>> +vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o
>> +
>> CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE
>> .PHONY: all test clean
>> clean:
>> -    ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf
>> +    ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test
>> -include *.d
>> diff --git a/tools/testing/vsock/vsock_uring_test.c b/tools/testing/vsock/vsock_uring_test.c
>> new file mode 100644
>> index 000000000000..889887cf3989
>> --- /dev/null
>> +++ b/tools/testing/vsock/vsock_uring_test.c
>> @@ -0,0 +1,350 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* io_uring tests for vsock
>> + *
>> + * Copyright (C) 2023 SberDevices.
>> + *
>> + * Author: Arseniy Krasnov <[email protected]>
>> + */
>> +
>> +#include <getopt.h>
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +#include <string.h>
>> +#include <liburing.h>
>> +#include <unistd.h>
>> +#include <sys/mman.h>
>> +#include <linux/kernel.h>
>> +#include <error.h>
>> +
>> +#include "util.h"
>> +#include "control.h"
>> +#include "msg_zerocopy_common.h"
>> +
>> +#define PAGE_SIZE        4096
>
> Ditto.
>
>> +#define RING_ENTRIES_NUM    4
>> +
>> +#define VSOCK_TEST_DATA_MAX_IOV 3
>> +
>> +struct vsock_io_uring_test {
>> +    /* Number of valid elements in 'vecs'. */
>> +    int vecs_cnt;
>> +    /* Array how to allocate buffers for test.
>> +     * 'iov_base' == NULL -> valid buf: mmap('iov_len').
>> +     *
>> +     * 'iov_base' == MAP_FAILED -> invalid buf:
>> +     *               mmap('iov_len'), then munmap('iov_len').
>> +     *               'iov_base' still contains result of
>> +     *               mmap().
>> +     *
>> +     * 'iov_base' == number -> unaligned valid buf:
>> +     *               mmap('iov_len') + number.
>> +     */
>> +    struct iovec vecs[VSOCK_TEST_DATA_MAX_IOV];
>> +};
>> +
>> +static struct vsock_io_uring_test test_data_array[] = {
>> +    /* All elements have page aligned base and size. */
>> +    {
>> +        .vecs_cnt = 3,
>> +        {
>> +            { NULL, PAGE_SIZE },
>> +            { NULL, 2 * PAGE_SIZE },
>> +            { NULL, 3 * PAGE_SIZE },
>> +        }
>> +    },
>> +    /* Middle element has both non-page aligned base and size. */
>> +    {
>> +        .vecs_cnt = 3,
>> +        {
>> +            { NULL, PAGE_SIZE },
>> +            { (void *)1, 200  },
>> +            { NULL, 3 * PAGE_SIZE },
>> +        }
>> +    }
>> +};
>> +
>> +static void vsock_io_uring_client(const struct test_opts *opts,
>> +                  const struct vsock_io_uring_test *test_data,
>> +                  bool msg_zerocopy)
>> +{
>> +    struct io_uring_sqe *sqe;
>> +    struct io_uring_cqe *cqe;
>> +    struct io_uring ring;
>> +    struct iovec *iovec;
>> +    struct msghdr msg;
>> +    int fd;
>> +
>> +    fd = vsock_stream_connect(opts->peer_cid, 1234);
>> +    if (fd < 0) {
>> +        perror("connect");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (msg_zerocopy)
>> +        enable_so_zerocopy(fd);
>> +
>> +    iovec = iovec_from_test_data(test_data->vecs, test_data->vecs_cnt);
>
> Ah, I see this is used also here, so now I get why in util.h
>
> Okay, it is fine, but please change the name in something like
> `alloc_test_iovec`/`free_test_iovec` and add a bit of documentation
> in util.c about the input and output of that function.
>
> The rest LGMT.

Hello!

Thanks for review, seems comments are clear and easy to fix!

Thanks, Arseniy

>
> Stefano
>
>> +
>> +    if (io_uring_queue_init(RING_ENTRIES_NUM, &ring, 0))
>> +        error(1, errno, "io_uring_queue_init");
>> +
>> +    if (io_uring_register_buffers(&ring, iovec, test_data->vecs_cnt))
>> +        error(1, errno, "io_uring_register_buffers");
>> +
>> +    memset(&msg, 0, sizeof(msg));
>> +    msg.msg_iov = iovec;
>> +    msg.msg_iovlen = test_data->vecs_cnt;
>> +    sqe = io_uring_get_sqe(&ring);
>> +
>> +    if (msg_zerocopy)
>> +        io_uring_prep_sendmsg_zc(sqe, fd, &msg, 0);
>> +    else
>> +        io_uring_prep_sendmsg(sqe, fd, &msg, 0);
>> +
>> +    if (io_uring_submit(&ring) != 1)
>> +        error(1, errno, "io_uring_submit");
>> +
>> +    if (io_uring_wait_cqe(&ring, &cqe))
>> +        error(1, errno, "io_uring_wait_cqe");
>> +
>> +    io_uring_cqe_seen(&ring, cqe);
>> +
>> +    control_writeulong(iovec_hash_djb2(iovec, test_data->vecs_cnt));
>> +
>> +    control_writeln("DONE");
>> +    io_uring_queue_exit(&ring);
>> +    free_iovec_test_data(test_data->vecs, iovec, test_data->vecs_cnt);
>> +    close(fd);
>> +}
>> +
>> +static void vsock_io_uring_server(const struct test_opts *opts,
>> +                  const struct vsock_io_uring_test *test_data)
>> +{
>> +    unsigned long remote_hash;
>> +    unsigned long local_hash;
>> +    struct io_uring ring;
>> +    size_t data_len;
>> +    size_t recv_len;
>> +    void *data;
>> +    int fd;
>> +
>> +    fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
>> +    if (fd < 0) {
>> +        perror("accept");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    data_len = iovec_bytes(test_data->vecs, test_data->vecs_cnt);
>> +
>> +    data = malloc(data_len);
>> +    if (!data) {
>> +        perror("malloc");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (io_uring_queue_init(RING_ENTRIES_NUM, &ring, 0))
>> +        error(1, errno, "io_uring_queue_init");
>> +
>> +    recv_len = 0;
>> +
>> +    while (recv_len < data_len) {
>> +        struct io_uring_sqe *sqe;
>> +        struct io_uring_cqe *cqe;
>> +        struct iovec iovec;
>> +
>> +        sqe = io_uring_get_sqe(&ring);
>> +        iovec.iov_base = data + recv_len;
>> +        iovec.iov_len = data_len;
>> +
>> +        io_uring_prep_readv(sqe, fd, &iovec, 1, 0);
>> +
>> +        if (io_uring_submit(&ring) != 1)
>> +            error(1, errno, "io_uring_submit");
>> +
>> +        if (io_uring_wait_cqe(&ring, &cqe))
>> +            error(1, errno, "io_uring_wait_cqe");
>> +
>> +        recv_len += cqe->res;
>> +        io_uring_cqe_seen(&ring, cqe);
>> +    }
>> +
>> +    if (recv_len != data_len) {
>> +        fprintf(stderr, "expected %zu, got %zu\n", data_len,
>> +            recv_len);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    local_hash = hash_djb2(data, data_len);
>> +
>> +    remote_hash = control_readulong();
>> +    if (remote_hash != local_hash) {
>> +        fprintf(stderr, "hash mismatch\n");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    control_expectln("DONE");
>> +    io_uring_queue_exit(&ring);
>> +    free(data);
>> +}
>> +
>> +void test_stream_uring_server(const struct test_opts *opts)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>> +        vsock_io_uring_server(opts, &test_data_array[i]);
>> +}
>> +
>> +void test_stream_uring_client(const struct test_opts *opts)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>> +        vsock_io_uring_client(opts, &test_data_array[i], false);
>> +}
>> +
>> +void test_stream_uring_msg_zc_server(const struct test_opts *opts)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>> +        vsock_io_uring_server(opts, &test_data_array[i]);
>> +}
>> +
>> +void test_stream_uring_msg_zc_client(const struct test_opts *opts)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>> +        vsock_io_uring_client(opts, &test_data_array[i], true);
>> +}
>> +
>> +static struct test_case test_cases[] = {
>> +    {
>> +        .name = "SOCK_STREAM io_uring test",
>> +        .run_server = test_stream_uring_server,
>> +        .run_client = test_stream_uring_client,
>> +    },
>> +    {
>> +        .name = "SOCK_STREAM io_uring MSG_ZEROCOPY test",
>> +        .run_server = test_stream_uring_msg_zc_server,
>> +        .run_client = test_stream_uring_msg_zc_client,
>> +    },
>> +    {},
>> +};
>> +
>> +static const char optstring[] = "";
>> +static const struct option longopts[] = {
>> +    {
>> +        .name = "control-host",
>> +        .has_arg = required_argument,
>> +        .val = 'H',
>> +    },
>> +    {
>> +        .name = "control-port",
>> +        .has_arg = required_argument,
>> +        .val = 'P',
>> +    },
>> +    {
>> +        .name = "mode",
>> +        .has_arg = required_argument,
>> +        .val = 'm',
>> +    },
>> +    {
>> +        .name = "peer-cid",
>> +        .has_arg = required_argument,
>> +        .val = 'p',
>> +    },
>> +    {
>> +        .name = "help",
>> +        .has_arg = no_argument,
>> +        .val = '?',
>> +    },
>> +    {},
>> +};
>> +
>> +static void usage(void)
>> +{
>> +    fprintf(stderr, "Usage: vsock_uring_test [--help] [--control-host=<host>] --control-port=<port> --mode=client|server --peer-cid=<cid>\n"
>> +        "\n"
>> +        "  Server: vsock_uring_test --control-port=1234 --mode=server --peer-cid=3\n"
>> +        "  Client: vsock_uring_test --control-host=192.168.0.1 --control-port=1234 --mode=client --peer-cid=2\n"
>> +        "\n"
>> +        "Run transmission tests using io_uring. Usage is the same as\n"
>> +        "in ./vsock_test\n"
>> +        "\n"
>> +        "Options:\n"
>> +        "  --help                 This help message\n"
>> +        "  --control-host <host>  Server IP address to connect to\n"
>> +        "  --control-port <port>  Server port to listen on/connect to\n"
>> +        "  --mode client|server   Server or client mode\n"
>> +        "  --peer-cid <cid>       CID of the other side\n"
>> +        );
>> +    exit(EXIT_FAILURE);
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> +    const char *control_host = NULL;
>> +    const char *control_port = NULL;
>> +    struct test_opts opts = {
>> +        .mode = TEST_MODE_UNSET,
>> +        .peer_cid = VMADDR_CID_ANY,
>> +    };
>> +
>> +    init_signals();
>> +
>> +    for (;;) {
>> +        int opt = getopt_long(argc, argv, optstring, longopts, NULL);
>> +
>> +        if (opt == -1)
>> +            break;
>> +
>> +        switch (opt) {
>> +        case 'H':
>> +            control_host = optarg;
>> +            break;
>> +        case 'm':
>> +            if (strcmp(optarg, "client") == 0) {
>> +                opts.mode = TEST_MODE_CLIENT;
>> +            } else if (strcmp(optarg, "server") == 0) {
>> +                opts.mode = TEST_MODE_SERVER;
>> +            } else {
>> +                fprintf(stderr, "--mode must be \"client\" or \"server\"\n");
>> +                return EXIT_FAILURE;
>> +            }
>> +            break;
>> +        case 'p':
>> +            opts.peer_cid = parse_cid(optarg);
>> +            break;
>> +        case 'P':
>> +            control_port = optarg;
>> +            break;
>> +        case '?':
>> +        default:
>> +            usage();
>> +        }
>> +    }
>> +
>> +    if (!control_port)
>> +        usage();
>> +    if (opts.mode == TEST_MODE_UNSET)
>> +        usage();
>> +    if (opts.peer_cid == VMADDR_CID_ANY)
>> +        usage();
>> +
>> +    if (!control_host) {
>> +        if (opts.mode != TEST_MODE_SERVER)
>> +            usage();
>> +        control_host = "0.0.0.0";
>> +    }
>> +
>> +    control_init(control_host, control_port,
>> +             opts.mode == TEST_MODE_SERVER);
>> +
>> +    run_tests(test_cases, &opts);
>> +
>> +    control_cleanup();
>> +
>> +    return 0;
>> +}
>> -- 
>> 2.25.1
>>
>

2023-10-09 20:32:15

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH net-next v3 10/12] test/vsock: MSG_ZEROCOPY flag tests



On 09.10.2023 18:17, Stefano Garzarella wrote:
> On Sat, Oct 07, 2023 at 08:21:37PM +0300, Arseniy Krasnov wrote:
>> This adds three tests for MSG_ZEROCOPY feature:
>> 1) SOCK_STREAM tx with different buffers.
>> 2) SOCK_SEQPACKET tx with different buffers.
>> 3) SOCK_STREAM test to read empty error queue of the socket.
>>
>> Patch also works as preparation for the next patches for tools in this
>> patchset: vsock_perf and vsock_uring_test:
>> 1) Adds several new functions to util.c - they will be also used by
>>   vsock_uring_test.
>> 2) Adds two new functions for MSG_ZEROCOPY handling to a new header
>>   file - such header will be shared between vsock_test, vsock_perf and
>>   vsock_uring_test, thus avoiding code copy-pasting.
>>
>> Signed-off-by: Arseniy Krasnov <[email protected]>
>> ---
>> Changelog:
>> v1 -> v2:
>>  * Move 'SOL_VSOCK' and 'VSOCK_RECVERR' from 'util.c' to 'util.h'.
>> v2 -> v3:
>>  * Patch was reworked. Now it is also preparation patch (see commit
>>    message). Shared stuff for 'vsock_perf' and tests is placed to a
>>    new header file, while shared code between current test tool and
>>    future uring test is placed to the 'util.c'. I think, that making
>>    this patch as preparation allows to reduce number of changes in the
>>    next patches in this patchset.
>>  * Make 'struct vsock_test_data' private by placing it to the .c file.
>>    Also add comments to this struct to clarify sense of its fields.
>>
>> tools/testing/vsock/Makefile              |   2 +-
>> tools/testing/vsock/msg_zerocopy_common.h |  92 ++++++
>> tools/testing/vsock/util.c                | 110 +++++++
>> tools/testing/vsock/util.h                |   5 +
>> tools/testing/vsock/vsock_test.c          |  16 +
>> tools/testing/vsock/vsock_test_zerocopy.c | 367 ++++++++++++++++++++++
>> tools/testing/vsock/vsock_test_zerocopy.h |  15 +
>> 7 files changed, 606 insertions(+), 1 deletion(-)
>> create mode 100644 tools/testing/vsock/msg_zerocopy_common.h
>> create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c
>> create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h
>>
>> diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
>> index 21a98ba565ab..1a26f60a596c 100644
>> --- a/tools/testing/vsock/Makefile
>> +++ b/tools/testing/vsock/Makefile
>> @@ -1,7 +1,7 @@
>> # SPDX-License-Identifier: GPL-2.0-only
>> all: test vsock_perf
>> test: vsock_test vsock_diag_test
>> -vsock_test: vsock_test.o timeout.o control.o util.o
>> +vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
>> vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
>> vsock_perf: vsock_perf.o
>>
>> diff --git a/tools/testing/vsock/msg_zerocopy_common.h b/tools/testing/vsock/msg_zerocopy_common.h
>> new file mode 100644
>> index 000000000000..ce89f1281584
>> --- /dev/null
>> +++ b/tools/testing/vsock/msg_zerocopy_common.h
>> @@ -0,0 +1,92 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +#ifndef MSG_ZEROCOPY_COMMON_H
>> +#define MSG_ZEROCOPY_COMMON_H
>> +
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +#include <sys/types.h>
>> +#include <sys/socket.h>
>> +#include <linux/errqueue.h>
>> +
>> +#ifndef SOL_VSOCK
>> +#define SOL_VSOCK    287
>> +#endif
>> +
>> +#ifndef VSOCK_RECVERR
>> +#define VSOCK_RECVERR    1
>> +#endif
>> +
>> +static void enable_so_zerocopy(int fd)
>> +{
>> +    int val = 1;
>> +
>> +    if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) {
>> +        perror("setsockopt");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +}
>> +
>> +static void vsock_recv_completion(int fd, const bool *zerocopied) __maybe_unused;
>
> To avoid this, maybe we can implement those functions in .c file and
> link the object.
>
> WDYT?
>
> Ah, here (cc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1)) the build is
> failing:
>
> In file included from vsock_perf.c:23:
> msg_zerocopy_common.h: In function ‘vsock_recv_completion’:
> msg_zerocopy_common.h:29:67: error: expected declaration specifiers before ‘__maybe_unused’
>    29 | static void vsock_recv_completion(int fd, const bool *zerocopied) __maybe_unused;
>       |                                                                   ^~~~~~~~~~~~~~
> msg_zerocopy_common.h:31:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
>    31 | {
>       | ^
>
>> +static void vsock_recv_completion(int fd, const bool *zerocopied)
>> +{
>> +    struct sock_extended_err *serr;
>> +    struct msghdr msg = { 0 };
>> +    char cmsg_data[128];
>> +    struct cmsghdr *cm;
>> +    ssize_t res;
>> +
>> +    msg.msg_control = cmsg_data;
>> +    msg.msg_controllen = sizeof(cmsg_data);
>> +
>> +    res = recvmsg(fd, &msg, MSG_ERRQUEUE);
>> +    if (res) {
>> +        fprintf(stderr, "failed to read error queue: %zi\n", res);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    cm = CMSG_FIRSTHDR(&msg);
>> +    if (!cm) {
>> +        fprintf(stderr, "cmsg: no cmsg\n");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (cm->cmsg_level != SOL_VSOCK) {
>> +        fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (cm->cmsg_type != VSOCK_RECVERR) {
>> +        fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    serr = (void *)CMSG_DATA(cm);
>> +    if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
>> +        fprintf(stderr, "serr: wrong origin: %u\n", serr->ee_origin);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (serr->ee_errno) {
>> +        fprintf(stderr, "serr: wrong error code: %u\n", serr->ee_errno);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    /* This flag is used for tests, to check that transmission was
>> +     * performed as expected: zerocopy or fallback to copy. If NULL
>> +     * - don't care.
>> +     */
>> +    if (!zerocopied)
>> +        return;
>> +
>> +    if (*zerocopied && (serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
>> +        fprintf(stderr, "serr: was copy instead of zerocopy\n");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (!*zerocopied && !(serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
>> +        fprintf(stderr, "serr: was zerocopy instead of copy\n");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +}
>> +
>> +#endif /* MSG_ZEROCOPY_COMMON_H */
>> diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
>> index 6779d5008b27..b1770edd8cc1 100644
>> --- a/tools/testing/vsock/util.c
>> +++ b/tools/testing/vsock/util.c
>> @@ -11,10 +11,12 @@
>> #include <stdio.h>
>> #include <stdint.h>
>> #include <stdlib.h>
>> +#include <string.h>
>> #include <signal.h>
>> #include <unistd.h>
>> #include <assert.h>
>> #include <sys/epoll.h>
>> +#include <sys/mman.h>
>>
>> #include "timeout.h"
>> #include "control.h"
>> @@ -444,3 +446,111 @@ unsigned long hash_djb2(const void *data, size_t len)
>>
>>     return hash;
>> }
>> +
>> +size_t iovec_bytes(const struct iovec *iov, size_t iovnum)
>> +{
>> +    size_t bytes;
>> +    int i;
>> +
>> +    for (bytes = 0, i = 0; i < iovnum; i++)
>> +        bytes += iov[i].iov_len;
>> +
>> +    return bytes;
>> +}
>> +
>> +unsigned long iovec_hash_djb2(const struct iovec *iov, size_t iovnum)
>> +{
>> +    unsigned long hash;
>> +    size_t iov_bytes;
>> +    size_t offs;
>> +    void *tmp;
>> +    int i;
>> +
>> +    iov_bytes = iovec_bytes(iov, iovnum);
>> +
>> +    tmp = malloc(iov_bytes);
>> +    if (!tmp) {
>> +        perror("malloc");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    for (offs = 0, i = 0; i < iovnum; i++) {
>> +        memcpy(tmp + offs, iov[i].iov_base, iov[i].iov_len);
>> +        offs += iov[i].iov_len;
>> +    }
>> +
>> +    hash = hash_djb2(tmp, iov_bytes);
>> +    free(tmp);
>> +
>> +    return hash;
>> +}
>> +
>> +struct iovec *iovec_from_test_data(const struct iovec *test_iovec, int iovnum)
>
> From the name this function seems related to vsock_test_data, so I'd
> suggest to move this and free_iovec_test_data() in vsock_test_zerocopy.c
>
>> +{
>> +    struct iovec *iovec;
>> +    int i;
>> +
>> +    iovec = malloc(sizeof(*iovec) * iovnum);
>> +    if (!iovec) {
>> +        perror("malloc");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    for (i = 0; i < iovnum; i++) {
>> +        iovec[i].iov_len = test_iovec[i].iov_len;
>> +
>> +        iovec[i].iov_base = mmap(NULL, iovec[i].iov_len,
>> +                     PROT_READ | PROT_WRITE,
>> +                     MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE,
>> +                     -1, 0);
>> +        if (iovec[i].iov_base == MAP_FAILED) {
>> +            perror("mmap");
>> +            exit(EXIT_FAILURE);
>> +        }
>> +
>> +        if (test_iovec[i].iov_base != MAP_FAILED)
>> +            iovec[i].iov_base += (uintptr_t)test_iovec[i].iov_base;
>> +    }
>> +
>> +    /* Unmap "invalid" elements. */
>> +    for (i = 0; i < iovnum; i++) {
>> +        if (test_iovec[i].iov_base == MAP_FAILED) {
>> +            if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
>> +                perror("munmap");
>> +                exit(EXIT_FAILURE);
>> +            }
>> +        }
>> +    }
>> +
>> +    for (i = 0; i < iovnum; i++) {
>> +        int j;
>> +
>> +        if (test_iovec[i].iov_base == MAP_FAILED)
>> +            continue;
>> +
>> +        for (j = 0; j < iovec[i].iov_len; j++)
>> +            ((uint8_t *)iovec[i].iov_base)[j] = rand() & 0xff;
>> +    }
>> +
>> +    return iovec;
>> +}
>> +
>> +void free_iovec_test_data(const struct iovec *test_iovec,
>> +              struct iovec *iovec, int iovnum)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < iovnum; i++) {
>> +        if (test_iovec[i].iov_base != MAP_FAILED) {
>> +            if (test_iovec[i].iov_base)
>> +                iovec[i].iov_base -= (uintptr_t)test_iovec[i].iov_base;
>> +
>> +            if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
>> +                perror("munmap");
>> +                exit(EXIT_FAILURE);
>> +            }
>> +        }
>> +    }
>> +
>> +    free(iovec);
>> +}
>> diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
>> index e5407677ce05..4cacb8d804c1 100644
>> --- a/tools/testing/vsock/util.h
>> +++ b/tools/testing/vsock/util.h
>> @@ -53,4 +53,9 @@ void list_tests(const struct test_case *test_cases);
>> void skip_test(struct test_case *test_cases, size_t test_cases_len,
>>            const char *test_id_str);
>> unsigned long hash_djb2(const void *data, size_t len);
>> +size_t iovec_bytes(const struct iovec *iov, size_t iovnum);
>> +unsigned long iovec_hash_djb2(const struct iovec *iov, size_t iovnum);
>> +struct iovec *iovec_from_test_data(const struct iovec *test_iovec, int iovnum);
>> +void free_iovec_test_data(const struct iovec *test_iovec,
>> +              struct iovec *iovec, int iovnum);
>> #endif /* UTIL_H */
>> diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>> index da4cb819a183..c1f7bc9abd22 100644
>> --- a/tools/testing/vsock/vsock_test.c
>> +++ b/tools/testing/vsock/vsock_test.c
>> @@ -21,6 +21,7 @@
>> #include <poll.h>
>> #include <signal.h>
>>
>> +#include "vsock_test_zerocopy.h"
>> #include "timeout.h"
>> #include "control.h"
>> #include "util.h"
>> @@ -1269,6 +1270,21 @@ static struct test_case test_cases[] = {
>>         .run_client = test_stream_shutrd_client,
>>         .run_server = test_stream_shutrd_server,
>>     },
>> +    {
>> +        .name = "SOCK_STREAM MSG_ZEROCOPY",
>> +        .run_client = test_stream_msgzcopy_client,
>> +        .run_server = test_stream_msgzcopy_server,
>> +    },
>> +    {
>> +        .name = "SOCK_SEQPACKET MSG_ZEROCOPY",
>> +        .run_client = test_seqpacket_msgzcopy_client,
>> +        .run_server = test_seqpacket_msgzcopy_server,
>> +    },
>> +    {
>> +        .name = "SOCK_STREAM MSG_ZEROCOPY empty MSG_ERRQUEUE",
>> +        .run_client = test_stream_msgzcopy_empty_errq_client,
>> +        .run_server = test_stream_msgzcopy_empty_errq_server,
>> +    },
>>     {},
>> };
>>
>> diff --git a/tools/testing/vsock/vsock_test_zerocopy.c b/tools/testing/vsock/vsock_test_zerocopy.c
>> new file mode 100644
>> index 000000000000..af14efdf334b
>> --- /dev/null
>> +++ b/tools/testing/vsock/vsock_test_zerocopy.c
>> @@ -0,0 +1,367 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* MSG_ZEROCOPY feature tests for vsock
>> + *
>> + * Copyright (C) 2023 SberDevices.
>> + *
>> + * Author: Arseniy Krasnov <[email protected]>
>> + */
>> +
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +#include <string.h>
>> +#include <sys/mman.h>
>> +#include <unistd.h>
>> +#include <poll.h>
>> +#include <linux/errqueue.h>
>> +#include <linux/kernel.h>
>> +#include <errno.h>
>> +
>> +#include "control.h"
>> +#include "vsock_test_zerocopy.h"
>> +#include "msg_zerocopy_common.h"
>> +
>> +#define PAGE_SIZE        4096
>
> In some tests I saw `sysconf(_SC_PAGESIZE)` is used,
> e.g. in selftests/ptrace/peeksiginfo.c:
>
> #ifndef PAGE_SIZE
> #define PAGE_SIZE sysconf(_SC_PAGESIZE)
> #endif
>
> WDYT?

Only small problem with that - in this case I can't use PAGE_SIZE
as array initializer. I think to add some reserved constant value
to designate that iov element must be size of page, then use this
value as initializer and handle it during test iov creating...

Thanks, Arseniy

>
>> +
>> +#define VSOCK_TEST_DATA_MAX_IOV 3
>> +
>> +struct vsock_test_data {
>> +    /* This test case if for SOCK_STREAM only. */
>> +    bool stream_only;
>> +    /* Data must be zerocopied. This field is checked against
>> +     * field 'ee_code' of the 'struct sock_extended_err', which
>> +     * contains bit to detect that zerocopy transmission was
>> +     * fallbacked to copy mode.
>> +     */
>> +    bool zerocopied;
>> +    /* Enable SO_ZEROCOPY option on the socket. Without enabled
>> +     * SO_ZEROCOPY, every MSG_ZEROCOPY transmission will behave
>> +     * like without MSG_ZEROCOPY flag.
>> +     */
>> +    bool so_zerocopy;
>> +    /* 'errno' after 'sendmsg()' call. */
>> +    int sendmsg_errno;
>> +    /* Number of valid elements in 'vecs'. */
>> +    int vecs_cnt;
>> +    /* Array how to allocate buffers for test.
>> +     * 'iov_base' == NULL -> valid buf: mmap('iov_len').
>> +     *
>> +     * 'iov_base' == MAP_FAILED -> invalid buf:
>> +     *               mmap('iov_len'), then munmap('iov_len').
>> +     *               'iov_base' still contains result of
>> +     *               mmap().
>> +     *
>> +     * 'iov_base' == number -> unaligned valid buf:
>> +     *               mmap('iov_len') + number.
>> +     */
>> +    struct iovec vecs[VSOCK_TEST_DATA_MAX_IOV];
>> +};
>> +
>> +static struct vsock_test_data test_data_array[] = {
>> +    /* Last element has non-page aligned size. */
>> +    {
>> +        .zerocopied = true,
>> +        .so_zerocopy = true,
>> +        .sendmsg_errno = 0,
>> +        .vecs_cnt = 3,
>> +        {
>> +            { NULL, PAGE_SIZE },
>> +            { NULL, PAGE_SIZE },
>> +            { NULL, 200 }
>> +        }
>> +    },
>> +    /* All elements have page aligned base and size. */
>> +    {
>> +        .zerocopied = true,
>> +        .so_zerocopy = true,
>> +        .sendmsg_errno = 0,
>> +        .vecs_cnt = 3,
>> +        {
>> +            { NULL, PAGE_SIZE },
>> +            { NULL, PAGE_SIZE * 2 },
>> +            { NULL, PAGE_SIZE * 3 }
>> +        }
>> +    },
>> +    /* All elements have page aligned base and size. But
>> +     * data length is bigger than 64Kb.
>> +     */
>> +    {
>> +        .zerocopied = true,
>> +        .so_zerocopy = true,
>> +        .sendmsg_errno = 0,
>> +        .vecs_cnt = 3,
>> +        {
>> +            { NULL, PAGE_SIZE * 16 },
>> +            { NULL, PAGE_SIZE * 16 },
>> +            { NULL, PAGE_SIZE * 16 }
>> +        }
>> +    },
>> +    /* Middle element has both non-page aligned base and size. */
>> +    {
>> +        .zerocopied = true,
>> +        .so_zerocopy = true,
>> +        .sendmsg_errno = 0,
>> +        .vecs_cnt = 3,
>> +        {
>> +            { NULL, PAGE_SIZE },
>> +            { (void *)1, 100 },
>> +            { NULL, PAGE_SIZE }
>> +        }
>> +    },
>> +    /* Middle element is unmapped. */
>> +    {
>> +        .zerocopied = false,
>> +        .so_zerocopy = true,
>> +        .sendmsg_errno = ENOMEM,
>> +        .vecs_cnt = 3,
>> +        {
>> +            { NULL, PAGE_SIZE },
>> +            { MAP_FAILED, PAGE_SIZE },
>> +            { NULL, PAGE_SIZE }
>> +        }
>> +    },
>> +    /* Valid data, but SO_ZEROCOPY is off. This
>> +     * will trigger fallback to copy.
>> +     */
>> +    {
>> +        .zerocopied = false,
>> +        .so_zerocopy = false,
>> +        .sendmsg_errno = 0,
>> +        .vecs_cnt = 1,
>> +        {
>> +            { NULL, PAGE_SIZE }
>> +        }
>> +    },
>> +    /* Valid data, but message is bigger than peer's
>> +     * buffer, so this will trigger fallback to copy.
>> +     * This test is for SOCK_STREAM only, because
>> +     * for SOCK_SEQPACKET, 'sendmsg()' returns EMSGSIZE.
>> +     */
>> +    {
>> +        .stream_only = true,
>> +        .zerocopied = false,
>> +        .so_zerocopy = true,
>> +        .sendmsg_errno = 0,
>> +        .vecs_cnt = 1,
>> +        {
>> +            { NULL, 100 * PAGE_SIZE }
>> +        }
>> +    },
>> +};
>> +
>> +#define POLL_TIMEOUT_MS        100
>> +
>> +static void test_client(const struct test_opts *opts,
>> +            const struct vsock_test_data *test_data,
>> +            bool sock_seqpacket)
>> +{
>> +    struct pollfd fds = { 0 };
>> +    struct msghdr msg = { 0 };
>> +    ssize_t sendmsg_res;
>> +    struct iovec *iovec;
>> +    int fd;
>> +
>> +    if (sock_seqpacket)
>> +        fd = vsock_seqpacket_connect(opts->peer_cid, 1234);
>> +    else
>> +        fd = vsock_stream_connect(opts->peer_cid, 1234);
>> +
>> +    if (fd < 0) {
>> +        perror("connect");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (test_data->so_zerocopy)
>> +        enable_so_zerocopy(fd);
>> +
>> +    iovec = iovec_from_test_data(test_data->vecs, test_data->vecs_cnt);
>> +
>> +    msg.msg_iov = iovec;
>> +    msg.msg_iovlen = test_data->vecs_cnt;
>> +
>> +    errno = 0;
>> +
>> +    sendmsg_res = sendmsg(fd, &msg, MSG_ZEROCOPY);
>> +    if (errno != test_data->sendmsg_errno) {
>> +        fprintf(stderr, "expected 'errno' == %i, got %i\n",
>> +            test_data->sendmsg_errno, errno);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (!errno) {
>> +        if (sendmsg_res != iovec_bytes(iovec, test_data->vecs_cnt)) {
>> +            fprintf(stderr, "expected 'sendmsg()' == %li, got %li\n",
>> +                iovec_bytes(iovec, test_data->vecs_cnt),
>> +                sendmsg_res);
>> +            exit(EXIT_FAILURE);
>> +        }
>> +    }
>> +
>> +    fds.fd = fd;
>> +    fds.events = 0;
>> +
>> +    if (poll(&fds, 1, POLL_TIMEOUT_MS) < 0) {
>> +        perror("poll");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (fds.revents & POLLERR) {
>> +        vsock_recv_completion(fd, &test_data->zerocopied);
>> +    } else if (test_data->so_zerocopy && !test_data->sendmsg_errno) {
>> +        /* If we don't have data in the error queue, but
>> +         * SO_ZEROCOPY was enabled and 'sendmsg()' was
>> +         * successful - this is an error.
>> +         */
>> +        fprintf(stderr, "POLLERR expected\n");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (!test_data->sendmsg_errno)
>> +        control_writeulong(iovec_hash_djb2(iovec, test_data->vecs_cnt));
>> +    else
>> +        control_writeulong(0);
>> +
>> +    control_writeln("DONE");
>> +    free_iovec_test_data(test_data->vecs, iovec, test_data->vecs_cnt);
>> +    close(fd);
>> +}
>> +
>> +void test_stream_msgzcopy_client(const struct test_opts *opts)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>> +        test_client(opts, &test_data_array[i], false);
>> +}
>> +
>> +void test_seqpacket_msgzcopy_client(const struct test_opts *opts)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(test_data_array); i++) {
>> +        if (test_data_array[i].stream_only)
>> +            continue;
>> +
>> +        test_client(opts, &test_data_array[i], true);
>> +    }
>> +}
>> +
>> +static void test_server(const struct test_opts *opts,
>> +            const struct vsock_test_data *test_data,
>> +            bool sock_seqpacket)
>> +{
>> +    unsigned long remote_hash;
>> +    unsigned long local_hash;
>> +    ssize_t total_bytes_rec;
>> +    unsigned char *data;
>> +    size_t data_len;
>> +    int fd;
>> +
>> +    if (sock_seqpacket)
>> +        fd = vsock_seqpacket_accept(VMADDR_CID_ANY, 1234, NULL);
>> +    else
>> +        fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
>> +
>> +    if (fd < 0) {
>> +        perror("accept");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    data_len = iovec_bytes(test_data->vecs, test_data->vecs_cnt);
>> +
>> +    data = malloc(data_len);
>> +    if (!data) {
>> +        perror("malloc");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    total_bytes_rec = 0;
>> +
>> +    while (total_bytes_rec != data_len) {
>> +        ssize_t bytes_rec;
>> +
>> +        bytes_rec = read(fd, data + total_bytes_rec,
>> +                 data_len - total_bytes_rec);
>> +        if (bytes_rec <= 0)
>> +            break;
>> +
>> +        total_bytes_rec += bytes_rec;
>> +    }
>> +
>> +    if (test_data->sendmsg_errno == 0)
>> +        local_hash = hash_djb2(data, data_len);
>> +    else
>> +        local_hash = 0;
>> +
>> +    free(data);
>> +
>> +    /* Waiting for some result. */
>> +    remote_hash = control_readulong();
>> +    if (remote_hash != local_hash) {
>> +        fprintf(stderr, "hash mismatch\n");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    control_expectln("DONE");
>> +    close(fd);
>> +}
>> +
>> +void test_stream_msgzcopy_server(const struct test_opts *opts)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>> +        test_server(opts, &test_data_array[i], false);
>> +}
>> +
>> +void test_seqpacket_msgzcopy_server(const struct test_opts *opts)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(test_data_array); i++) {
>> +        if (test_data_array[i].stream_only)
>> +            continue;
>> +
>> +        test_server(opts, &test_data_array[i], true);
>> +    }
>> +}
>> +
>> +void test_stream_msgzcopy_empty_errq_client(const struct test_opts *opts)
>> +{
>> +    struct msghdr msg = { 0 };
>> +    char cmsg_data[128];
>> +    ssize_t res;
>> +    int fd;
>> +
>> +    fd = vsock_stream_connect(opts->peer_cid, 1234);
>> +    if (fd < 0) {
>> +        perror("connect");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    msg.msg_control = cmsg_data;
>> +    msg.msg_controllen = sizeof(cmsg_data);
>> +
>> +    res = recvmsg(fd, &msg, MSG_ERRQUEUE);
>> +    if (res != -1) {
>> +        fprintf(stderr, "expected 'recvmsg(2)' failure, got %zi\n",
>> +            res);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    control_writeln("DONE");
>> +    close(fd);
>> +}
>> +
>> +void test_stream_msgzcopy_empty_errq_server(const struct test_opts *opts)
>> +{
>> +    int fd;
>> +
>> +    fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
>> +    if (fd < 0) {
>> +        perror("accept");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    control_expectln("DONE");
>> +    close(fd);
>> +}
>> diff --git a/tools/testing/vsock/vsock_test_zerocopy.h b/tools/testing/vsock/vsock_test_zerocopy.h
>> new file mode 100644
>> index 000000000000..3ef2579e024d
>> --- /dev/null
>> +++ b/tools/testing/vsock/vsock_test_zerocopy.h
>> @@ -0,0 +1,15 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +#ifndef VSOCK_TEST_ZEROCOPY_H
>> +#define VSOCK_TEST_ZEROCOPY_H
>> +#include "util.h"
>> +
>> +void test_stream_msgzcopy_client(const struct test_opts *opts);
>> +void test_stream_msgzcopy_server(const struct test_opts *opts);
>> +
>> +void test_seqpacket_msgzcopy_client(const struct test_opts *opts);
>> +void test_seqpacket_msgzcopy_server(const struct test_opts *opts);
>> +
>> +void test_stream_msgzcopy_empty_errq_client(const struct test_opts *opts);
>> +void test_stream_msgzcopy_empty_errq_server(const struct test_opts *opts);
>> +
>> +#endif /* VSOCK_TEST_ZEROCOPY_H */
>> -- 
>> 2.25.1
>>
>

2023-10-10 07:21:01

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH net-next v3 10/12] test/vsock: MSG_ZEROCOPY flag tests

On Mon, Oct 09, 2023 at 11:24:18PM +0300, Arseniy Krasnov wrote:
>
>
>On 09.10.2023 18:17, Stefano Garzarella wrote:
>> On Sat, Oct 07, 2023 at 08:21:37PM +0300, Arseniy Krasnov wrote:
>>> This adds three tests for MSG_ZEROCOPY feature:
>>> 1) SOCK_STREAM tx with different buffers.
>>> 2) SOCK_SEQPACKET tx with different buffers.
>>> 3) SOCK_STREAM test to read empty error queue of the socket.
>>>
>>> Patch also works as preparation for the next patches for tools in this
>>> patchset: vsock_perf and vsock_uring_test:
>>> 1) Adds several new functions to util.c - they will be also used by
>>>   vsock_uring_test.
>>> 2) Adds two new functions for MSG_ZEROCOPY handling to a new header
>>>   file - such header will be shared between vsock_test, vsock_perf and
>>>   vsock_uring_test, thus avoiding code copy-pasting.
>>>
>>> Signed-off-by: Arseniy Krasnov <[email protected]>
>>> ---
>>> Changelog:
>>> v1 -> v2:
>>>  * Move 'SOL_VSOCK' and 'VSOCK_RECVERR' from 'util.c' to 'util.h'.
>>> v2 -> v3:
>>>  * Patch was reworked. Now it is also preparation patch (see commit
>>>    message). Shared stuff for 'vsock_perf' and tests is placed to a
>>>    new header file, while shared code between current test tool and
>>>    future uring test is placed to the 'util.c'. I think, that making
>>>    this patch as preparation allows to reduce number of changes in the
>>>    next patches in this patchset.
>>>  * Make 'struct vsock_test_data' private by placing it to the .c file.
>>>    Also add comments to this struct to clarify sense of its fields.
>>>
>>> tools/testing/vsock/Makefile              |   2 +-
>>> tools/testing/vsock/msg_zerocopy_common.h |  92 ++++++
>>> tools/testing/vsock/util.c                | 110 +++++++
>>> tools/testing/vsock/util.h                |   5 +
>>> tools/testing/vsock/vsock_test.c          |  16 +
>>> tools/testing/vsock/vsock_test_zerocopy.c | 367 ++++++++++++++++++++++
>>> tools/testing/vsock/vsock_test_zerocopy.h |  15 +
>>> 7 files changed, 606 insertions(+), 1 deletion(-)
>>> create mode 100644 tools/testing/vsock/msg_zerocopy_common.h
>>> create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c
>>> create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h
>>>
>>> diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
>>> index 21a98ba565ab..1a26f60a596c 100644
>>> --- a/tools/testing/vsock/Makefile
>>> +++ b/tools/testing/vsock/Makefile
>>> @@ -1,7 +1,7 @@
>>> # SPDX-License-Identifier: GPL-2.0-only
>>> all: test vsock_perf
>>> test: vsock_test vsock_diag_test
>>> -vsock_test: vsock_test.o timeout.o control.o util.o
>>> +vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
>>> vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
>>> vsock_perf: vsock_perf.o
>>>
>>> diff --git a/tools/testing/vsock/msg_zerocopy_common.h b/tools/testing/vsock/msg_zerocopy_common.h
>>> new file mode 100644
>>> index 000000000000..ce89f1281584
>>> --- /dev/null
>>> +++ b/tools/testing/vsock/msg_zerocopy_common.h
>>> @@ -0,0 +1,92 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +#ifndef MSG_ZEROCOPY_COMMON_H
>>> +#define MSG_ZEROCOPY_COMMON_H
>>> +
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +#include <sys/types.h>
>>> +#include <sys/socket.h>
>>> +#include <linux/errqueue.h>
>>> +
>>> +#ifndef SOL_VSOCK
>>> +#define SOL_VSOCK    287
>>> +#endif
>>> +
>>> +#ifndef VSOCK_RECVERR
>>> +#define VSOCK_RECVERR    1
>>> +#endif
>>> +
>>> +static void enable_so_zerocopy(int fd)
>>> +{
>>> +    int val = 1;
>>> +
>>> +    if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) {
>>> +        perror("setsockopt");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +}
>>> +
>>> +static void vsock_recv_completion(int fd, const bool *zerocopied) __maybe_unused;
>>
>> To avoid this, maybe we can implement those functions in .c file and
>> link the object.
>>
>> WDYT?
>>
>> Ah, here (cc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1)) the build is
>> failing:
>>
>> In file included from vsock_perf.c:23:
>> msg_zerocopy_common.h: In function ‘vsock_recv_completion’:
>> msg_zerocopy_common.h:29:67: error: expected declaration specifiers before ‘__maybe_unused’
>>    29 | static void vsock_recv_completion(int fd, const bool *zerocopied) __maybe_unused;
>>       |                                                                   ^~~~~~~~~~~~~~
>> msg_zerocopy_common.h:31:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
>>    31 | {
>>       | ^
>>
>>> +static void vsock_recv_completion(int fd, const bool *zerocopied)
>>> +{
>>> +    struct sock_extended_err *serr;
>>> +    struct msghdr msg = { 0 };
>>> +    char cmsg_data[128];
>>> +    struct cmsghdr *cm;
>>> +    ssize_t res;
>>> +
>>> +    msg.msg_control = cmsg_data;
>>> +    msg.msg_controllen = sizeof(cmsg_data);
>>> +
>>> +    res = recvmsg(fd, &msg, MSG_ERRQUEUE);
>>> +    if (res) {
>>> +        fprintf(stderr, "failed to read error queue: %zi\n", res);
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    cm = CMSG_FIRSTHDR(&msg);
>>> +    if (!cm) {
>>> +        fprintf(stderr, "cmsg: no cmsg\n");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    if (cm->cmsg_level != SOL_VSOCK) {
>>> +        fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    if (cm->cmsg_type != VSOCK_RECVERR) {
>>> +        fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    serr = (void *)CMSG_DATA(cm);
>>> +    if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
>>> +        fprintf(stderr, "serr: wrong origin: %u\n", serr->ee_origin);
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    if (serr->ee_errno) {
>>> +        fprintf(stderr, "serr: wrong error code: %u\n", serr->ee_errno);
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    /* This flag is used for tests, to check that transmission was
>>> +     * performed as expected: zerocopy or fallback to copy. If NULL
>>> +     * - don't care.
>>> +     */
>>> +    if (!zerocopied)
>>> +        return;
>>> +
>>> +    if (*zerocopied && (serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
>>> +        fprintf(stderr, "serr: was copy instead of zerocopy\n");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    if (!*zerocopied && !(serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
>>> +        fprintf(stderr, "serr: was zerocopy instead of copy\n");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +}
>>> +
>>> +#endif /* MSG_ZEROCOPY_COMMON_H */
>>> diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
>>> index 6779d5008b27..b1770edd8cc1 100644
>>> --- a/tools/testing/vsock/util.c
>>> +++ b/tools/testing/vsock/util.c
>>> @@ -11,10 +11,12 @@
>>> #include <stdio.h>
>>> #include <stdint.h>
>>> #include <stdlib.h>
>>> +#include <string.h>
>>> #include <signal.h>
>>> #include <unistd.h>
>>> #include <assert.h>
>>> #include <sys/epoll.h>
>>> +#include <sys/mman.h>
>>>
>>> #include "timeout.h"
>>> #include "control.h"
>>> @@ -444,3 +446,111 @@ unsigned long hash_djb2(const void *data, size_t len)
>>>
>>>     return hash;
>>> }
>>> +
>>> +size_t iovec_bytes(const struct iovec *iov, size_t iovnum)
>>> +{
>>> +    size_t bytes;
>>> +    int i;
>>> +
>>> +    for (bytes = 0, i = 0; i < iovnum; i++)
>>> +        bytes += iov[i].iov_len;
>>> +
>>> +    return bytes;
>>> +}
>>> +
>>> +unsigned long iovec_hash_djb2(const struct iovec *iov, size_t iovnum)
>>> +{
>>> +    unsigned long hash;
>>> +    size_t iov_bytes;
>>> +    size_t offs;
>>> +    void *tmp;
>>> +    int i;
>>> +
>>> +    iov_bytes = iovec_bytes(iov, iovnum);
>>> +
>>> +    tmp = malloc(iov_bytes);
>>> +    if (!tmp) {
>>> +        perror("malloc");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    for (offs = 0, i = 0; i < iovnum; i++) {
>>> +        memcpy(tmp + offs, iov[i].iov_base, iov[i].iov_len);
>>> +        offs += iov[i].iov_len;
>>> +    }
>>> +
>>> +    hash = hash_djb2(tmp, iov_bytes);
>>> +    free(tmp);
>>> +
>>> +    return hash;
>>> +}
>>> +
>>> +struct iovec *iovec_from_test_data(const struct iovec *test_iovec, int iovnum)
>>
>> From the name this function seems related to vsock_test_data, so I'd
>> suggest to move this and free_iovec_test_data() in vsock_test_zerocopy.c
>>
>>> +{
>>> +    struct iovec *iovec;
>>> +    int i;
>>> +
>>> +    iovec = malloc(sizeof(*iovec) * iovnum);
>>> +    if (!iovec) {
>>> +        perror("malloc");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    for (i = 0; i < iovnum; i++) {
>>> +        iovec[i].iov_len = test_iovec[i].iov_len;
>>> +
>>> +        iovec[i].iov_base = mmap(NULL, iovec[i].iov_len,
>>> +                     PROT_READ | PROT_WRITE,
>>> +                     MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE,
>>> +                     -1, 0);
>>> +        if (iovec[i].iov_base == MAP_FAILED) {
>>> +            perror("mmap");
>>> +            exit(EXIT_FAILURE);
>>> +        }
>>> +
>>> +        if (test_iovec[i].iov_base != MAP_FAILED)
>>> +            iovec[i].iov_base += (uintptr_t)test_iovec[i].iov_base;
>>> +    }
>>> +
>>> +    /* Unmap "invalid" elements. */
>>> +    for (i = 0; i < iovnum; i++) {
>>> +        if (test_iovec[i].iov_base == MAP_FAILED) {
>>> +            if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
>>> +                perror("munmap");
>>> +                exit(EXIT_FAILURE);
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    for (i = 0; i < iovnum; i++) {
>>> +        int j;
>>> +
>>> +        if (test_iovec[i].iov_base == MAP_FAILED)
>>> +            continue;
>>> +
>>> +        for (j = 0; j < iovec[i].iov_len; j++)
>>> +            ((uint8_t *)iovec[i].iov_base)[j] = rand() & 0xff;
>>> +    }
>>> +
>>> +    return iovec;
>>> +}
>>> +
>>> +void free_iovec_test_data(const struct iovec *test_iovec,
>>> +              struct iovec *iovec, int iovnum)
>>> +{
>>> +    int i;
>>> +
>>> +    for (i = 0; i < iovnum; i++) {
>>> +        if (test_iovec[i].iov_base != MAP_FAILED) {
>>> +            if (test_iovec[i].iov_base)
>>> +                iovec[i].iov_base -= (uintptr_t)test_iovec[i].iov_base;
>>> +
>>> +            if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
>>> +                perror("munmap");
>>> +                exit(EXIT_FAILURE);
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    free(iovec);
>>> +}
>>> diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
>>> index e5407677ce05..4cacb8d804c1 100644
>>> --- a/tools/testing/vsock/util.h
>>> +++ b/tools/testing/vsock/util.h
>>> @@ -53,4 +53,9 @@ void list_tests(const struct test_case *test_cases);
>>> void skip_test(struct test_case *test_cases, size_t test_cases_len,
>>>            const char *test_id_str);
>>> unsigned long hash_djb2(const void *data, size_t len);
>>> +size_t iovec_bytes(const struct iovec *iov, size_t iovnum);
>>> +unsigned long iovec_hash_djb2(const struct iovec *iov, size_t iovnum);
>>> +struct iovec *iovec_from_test_data(const struct iovec *test_iovec, int iovnum);
>>> +void free_iovec_test_data(const struct iovec *test_iovec,
>>> +              struct iovec *iovec, int iovnum);
>>> #endif /* UTIL_H */
>>> diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>>> index da4cb819a183..c1f7bc9abd22 100644
>>> --- a/tools/testing/vsock/vsock_test.c
>>> +++ b/tools/testing/vsock/vsock_test.c
>>> @@ -21,6 +21,7 @@
>>> #include <poll.h>
>>> #include <signal.h>
>>>
>>> +#include "vsock_test_zerocopy.h"
>>> #include "timeout.h"
>>> #include "control.h"
>>> #include "util.h"
>>> @@ -1269,6 +1270,21 @@ static struct test_case test_cases[] = {
>>>         .run_client = test_stream_shutrd_client,
>>>         .run_server = test_stream_shutrd_server,
>>>     },
>>> +    {
>>> +        .name = "SOCK_STREAM MSG_ZEROCOPY",
>>> +        .run_client = test_stream_msgzcopy_client,
>>> +        .run_server = test_stream_msgzcopy_server,
>>> +    },
>>> +    {
>>> +        .name = "SOCK_SEQPACKET MSG_ZEROCOPY",
>>> +        .run_client = test_seqpacket_msgzcopy_client,
>>> +        .run_server = test_seqpacket_msgzcopy_server,
>>> +    },
>>> +    {
>>> +        .name = "SOCK_STREAM MSG_ZEROCOPY empty MSG_ERRQUEUE",
>>> +        .run_client = test_stream_msgzcopy_empty_errq_client,
>>> +        .run_server = test_stream_msgzcopy_empty_errq_server,
>>> +    },
>>>     {},
>>> };
>>>
>>> diff --git a/tools/testing/vsock/vsock_test_zerocopy.c b/tools/testing/vsock/vsock_test_zerocopy.c
>>> new file mode 100644
>>> index 000000000000..af14efdf334b
>>> --- /dev/null
>>> +++ b/tools/testing/vsock/vsock_test_zerocopy.c
>>> @@ -0,0 +1,367 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +/* MSG_ZEROCOPY feature tests for vsock
>>> + *
>>> + * Copyright (C) 2023 SberDevices.
>>> + *
>>> + * Author: Arseniy Krasnov <[email protected]>
>>> + */
>>> +
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +#include <string.h>
>>> +#include <sys/mman.h>
>>> +#include <unistd.h>
>>> +#include <poll.h>
>>> +#include <linux/errqueue.h>
>>> +#include <linux/kernel.h>
>>> +#include <errno.h>
>>> +
>>> +#include "control.h"
>>> +#include "vsock_test_zerocopy.h"
>>> +#include "msg_zerocopy_common.h"
>>> +
>>> +#define PAGE_SIZE        4096
>>
>> In some tests I saw `sysconf(_SC_PAGESIZE)` is used,
>> e.g. in selftests/ptrace/peeksiginfo.c:
>>
>> #ifndef PAGE_SIZE
>> #define PAGE_SIZE sysconf(_SC_PAGESIZE)
>> #endif
>>
>> WDYT?
>
>Only small problem with that - in this case I can't use PAGE_SIZE
>as array initializer. I think to add some reserved constant value
>to designate that iov element must be size of page, then use this
>value as initializer and handle it during test iov creating...

Okay I see. Maybe I'm overthinking!
It is just a test, let's do not complicate it.

Feel free to use the previous version, I'd just add the guards.

Thanks,
Stefano

2023-10-10 07:21:34

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH net-next v3 10/12] test/vsock: MSG_ZEROCOPY flag tests



On 10.10.2023 10:19, Stefano Garzarella wrote:
> On Mon, Oct 09, 2023 at 11:24:18PM +0300, Arseniy Krasnov wrote:
>>
>>
>> On 09.10.2023 18:17, Stefano Garzarella wrote:
>>> On Sat, Oct 07, 2023 at 08:21:37PM +0300, Arseniy Krasnov wrote:
>>>> This adds three tests for MSG_ZEROCOPY feature:
>>>> 1) SOCK_STREAM tx with different buffers.
>>>> 2) SOCK_SEQPACKET tx with different buffers.
>>>> 3) SOCK_STREAM test to read empty error queue of the socket.
>>>>
>>>> Patch also works as preparation for the next patches for tools in this
>>>> patchset: vsock_perf and vsock_uring_test:
>>>> 1) Adds several new functions to util.c - they will be also used by
>>>>   vsock_uring_test.
>>>> 2) Adds two new functions for MSG_ZEROCOPY handling to a new header
>>>>   file - such header will be shared between vsock_test, vsock_perf and
>>>>   vsock_uring_test, thus avoiding code copy-pasting.
>>>>
>>>> Signed-off-by: Arseniy Krasnov <[email protected]>
>>>> ---
>>>> Changelog:
>>>> v1 -> v2:
>>>>  * Move 'SOL_VSOCK' and 'VSOCK_RECVERR' from 'util.c' to 'util.h'.
>>>> v2 -> v3:
>>>>  * Patch was reworked. Now it is also preparation patch (see commit
>>>>    message). Shared stuff for 'vsock_perf' and tests is placed to a
>>>>    new header file, while shared code between current test tool and
>>>>    future uring test is placed to the 'util.c'. I think, that making
>>>>    this patch as preparation allows to reduce number of changes in the
>>>>    next patches in this patchset.
>>>>  * Make 'struct vsock_test_data' private by placing it to the .c file.
>>>>    Also add comments to this struct to clarify sense of its fields.
>>>>
>>>> tools/testing/vsock/Makefile              |   2 +-
>>>> tools/testing/vsock/msg_zerocopy_common.h |  92 ++++++
>>>> tools/testing/vsock/util.c                | 110 +++++++
>>>> tools/testing/vsock/util.h                |   5 +
>>>> tools/testing/vsock/vsock_test.c          |  16 +
>>>> tools/testing/vsock/vsock_test_zerocopy.c | 367 ++++++++++++++++++++++
>>>> tools/testing/vsock/vsock_test_zerocopy.h |  15 +
>>>> 7 files changed, 606 insertions(+), 1 deletion(-)
>>>> create mode 100644 tools/testing/vsock/msg_zerocopy_common.h
>>>> create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c
>>>> create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h
>>>>
>>>> diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
>>>> index 21a98ba565ab..1a26f60a596c 100644
>>>> --- a/tools/testing/vsock/Makefile
>>>> +++ b/tools/testing/vsock/Makefile
>>>> @@ -1,7 +1,7 @@
>>>> # SPDX-License-Identifier: GPL-2.0-only
>>>> all: test vsock_perf
>>>> test: vsock_test vsock_diag_test
>>>> -vsock_test: vsock_test.o timeout.o control.o util.o
>>>> +vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
>>>> vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
>>>> vsock_perf: vsock_perf.o
>>>>
>>>> diff --git a/tools/testing/vsock/msg_zerocopy_common.h b/tools/testing/vsock/msg_zerocopy_common.h
>>>> new file mode 100644
>>>> index 000000000000..ce89f1281584
>>>> --- /dev/null
>>>> +++ b/tools/testing/vsock/msg_zerocopy_common.h
>>>> @@ -0,0 +1,92 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>> +#ifndef MSG_ZEROCOPY_COMMON_H
>>>> +#define MSG_ZEROCOPY_COMMON_H
>>>> +
>>>> +#include <stdio.h>
>>>> +#include <stdlib.h>
>>>> +#include <sys/types.h>
>>>> +#include <sys/socket.h>
>>>> +#include <linux/errqueue.h>
>>>> +
>>>> +#ifndef SOL_VSOCK
>>>> +#define SOL_VSOCK    287
>>>> +#endif
>>>> +
>>>> +#ifndef VSOCK_RECVERR
>>>> +#define VSOCK_RECVERR    1
>>>> +#endif
>>>> +
>>>> +static void enable_so_zerocopy(int fd)
>>>> +{
>>>> +    int val = 1;
>>>> +
>>>> +    if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) {
>>>> +        perror("setsockopt");
>>>> +        exit(EXIT_FAILURE);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void vsock_recv_completion(int fd, const bool *zerocopied) __maybe_unused;
>>>
>>> To avoid this, maybe we can implement those functions in .c file and
>>> link the object.
>>>
>>> WDYT?
>>>
>>> Ah, here (cc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1)) the build is
>>> failing:
>>>
>>> In file included from vsock_perf.c:23:
>>> msg_zerocopy_common.h: In function ‘vsock_recv_completion’:
>>> msg_zerocopy_common.h:29:67: error: expected declaration specifiers before ‘__maybe_unused’
>>>    29 | static void vsock_recv_completion(int fd, const bool *zerocopied) __maybe_unused;
>>>       |                                                                   ^~~~~~~~~~~~~~
>>> msg_zerocopy_common.h:31:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
>>>    31 | {
>>>       | ^
>>>
>>>> +static void vsock_recv_completion(int fd, const bool *zerocopied)
>>>> +{
>>>> +    struct sock_extended_err *serr;
>>>> +    struct msghdr msg = { 0 };
>>>> +    char cmsg_data[128];
>>>> +    struct cmsghdr *cm;
>>>> +    ssize_t res;
>>>> +
>>>> +    msg.msg_control = cmsg_data;
>>>> +    msg.msg_controllen = sizeof(cmsg_data);
>>>> +
>>>> +    res = recvmsg(fd, &msg, MSG_ERRQUEUE);
>>>> +    if (res) {
>>>> +        fprintf(stderr, "failed to read error queue: %zi\n", res);
>>>> +        exit(EXIT_FAILURE);
>>>> +    }
>>>> +
>>>> +    cm = CMSG_FIRSTHDR(&msg);
>>>> +    if (!cm) {
>>>> +        fprintf(stderr, "cmsg: no cmsg\n");
>>>> +        exit(EXIT_FAILURE);
>>>> +    }
>>>> +
>>>> +    if (cm->cmsg_level != SOL_VSOCK) {
>>>> +        fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n");
>>>> +        exit(EXIT_FAILURE);
>>>> +    }
>>>> +
>>>> +    if (cm->cmsg_type != VSOCK_RECVERR) {
>>>> +        fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n");
>>>> +        exit(EXIT_FAILURE);
>>>> +    }
>>>> +
>>>> +    serr = (void *)CMSG_DATA(cm);
>>>> +    if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
>>>> +        fprintf(stderr, "serr: wrong origin: %u\n", serr->ee_origin);
>>>> +        exit(EXIT_FAILURE);
>>>> +    }
>>>> +
>>>> +    if (serr->ee_errno) {
>>>> +        fprintf(stderr, "serr: wrong error code: %u\n", serr->ee_errno);
>>>> +        exit(EXIT_FAILURE);
>>>> +    }
>>>> +
>>>> +    /* This flag is used for tests, to check that transmission was
>>>> +     * performed as expected: zerocopy or fallback to copy. If NULL
>>>> +     * - don't care.
>>>> +     */
>>>> +    if (!zerocopied)
>>>> +        return;
>>>> +
>>>> +    if (*zerocopied && (serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
>>>> +        fprintf(stderr, "serr: was copy instead of zerocopy\n");
>>>> +        exit(EXIT_FAILURE);
>>>> +    }
>>>> +
>>>> +    if (!*zerocopied && !(serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
>>>> +        fprintf(stderr, "serr: was zerocopy instead of copy\n");
>>>> +        exit(EXIT_FAILURE);
>>>> +    }
>>>> +}
>>>> +
>>>> +#endif /* MSG_ZEROCOPY_COMMON_H */
>>>> diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
>>>> index 6779d5008b27..b1770edd8cc1 100644
>>>> --- a/tools/testing/vsock/util.c
>>>> +++ b/tools/testing/vsock/util.c
>>>> @@ -11,10 +11,12 @@
>>>> #include <stdio.h>
>>>> #include <stdint.h>
>>>> #include <stdlib.h>
>>>> +#include <string.h>
>>>> #include <signal.h>
>>>> #include <unistd.h>
>>>> #include <assert.h>
>>>> #include <sys/epoll.h>
>>>> +#include <sys/mman.h>
>>>>
>>>> #include "timeout.h"
>>>> #include "control.h"
>>>> @@ -444,3 +446,111 @@ unsigned long hash_djb2(const void *data, size_t len)
>>>>
>>>>     return hash;
>>>> }
>>>> +
>>>> +size_t iovec_bytes(const struct iovec *iov, size_t iovnum)
>>>> +{
>>>> +    size_t bytes;
>>>> +    int i;
>>>> +
>>>> +    for (bytes = 0, i = 0; i < iovnum; i++)
>>>> +        bytes += iov[i].iov_len;
>>>> +
>>>> +    return bytes;
>>>> +}
>>>> +
>>>> +unsigned long iovec_hash_djb2(const struct iovec *iov, size_t iovnum)
>>>> +{
>>>> +    unsigned long hash;
>>>> +    size_t iov_bytes;
>>>> +    size_t offs;
>>>> +    void *tmp;
>>>> +    int i;
>>>> +
>>>> +    iov_bytes = iovec_bytes(iov, iovnum);
>>>> +
>>>> +    tmp = malloc(iov_bytes);
>>>> +    if (!tmp) {
>>>> +        perror("malloc");
>>>> +        exit(EXIT_FAILURE);
>>>> +    }
>>>> +
>>>> +    for (offs = 0, i = 0; i < iovnum; i++) {
>>>> +        memcpy(tmp + offs, iov[i].iov_base, iov[i].iov_len);
>>>> +        offs += iov[i].iov_len;
>>>> +    }
>>>> +
>>>> +    hash = hash_djb2(tmp, iov_bytes);
>>>> +    free(tmp);
>>>> +
>>>> +    return hash;
>>>> +}
>>>> +
>>>> +struct iovec *iovec_from_test_data(const struct iovec *test_iovec, int iovnum)
>>>
>>> From the name this function seems related to vsock_test_data, so I'd
>>> suggest to move this and free_iovec_test_data() in vsock_test_zerocopy.c
>>>
>>>> +{
>>>> +    struct iovec *iovec;
>>>> +    int i;
>>>> +
>>>> +    iovec = malloc(sizeof(*iovec) * iovnum);
>>>> +    if (!iovec) {
>>>> +        perror("malloc");
>>>> +        exit(EXIT_FAILURE);
>>>> +    }
>>>> +
>>>> +    for (i = 0; i < iovnum; i++) {
>>>> +        iovec[i].iov_len = test_iovec[i].iov_len;
>>>> +
>>>> +        iovec[i].iov_base = mmap(NULL, iovec[i].iov_len,
>>>> +                     PROT_READ | PROT_WRITE,
>>>> +                     MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE,
>>>> +                     -1, 0);
>>>> +        if (iovec[i].iov_base == MAP_FAILED) {
>>>> +            perror("mmap");
>>>> +            exit(EXIT_FAILURE);
>>>> +        }
>>>> +
>>>> +        if (test_iovec[i].iov_base != MAP_FAILED)
>>>> +            iovec[i].iov_base += (uintptr_t)test_iovec[i].iov_base;
>>>> +    }
>>>> +
>>>> +    /* Unmap "invalid" elements. */
>>>> +    for (i = 0; i < iovnum; i++) {
>>>> +        if (test_iovec[i].iov_base == MAP_FAILED) {
>>>> +            if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
>>>> +                perror("munmap");
>>>> +                exit(EXIT_FAILURE);
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +    for (i = 0; i < iovnum; i++) {
>>>> +        int j;
>>>> +
>>>> +        if (test_iovec[i].iov_base == MAP_FAILED)
>>>> +            continue;
>>>> +
>>>> +        for (j = 0; j < iovec[i].iov_len; j++)
>>>> +            ((uint8_t *)iovec[i].iov_base)[j] = rand() & 0xff;
>>>> +    }
>>>> +
>>>> +    return iovec;
>>>> +}
>>>> +
>>>> +void free_iovec_test_data(const struct iovec *test_iovec,
>>>> +              struct iovec *iovec, int iovnum)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i < iovnum; i++) {
>>>> +        if (test_iovec[i].iov_base != MAP_FAILED) {
>>>> +            if (test_iovec[i].iov_base)
>>>> +                iovec[i].iov_base -= (uintptr_t)test_iovec[i].iov_base;
>>>> +
>>>> +            if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
>>>> +                perror("munmap");
>>>> +                exit(EXIT_FAILURE);
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +    free(iovec);
>>>> +}
>>>> diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
>>>> index e5407677ce05..4cacb8d804c1 100644
>>>> --- a/tools/testing/vsock/util.h
>>>> +++ b/tools/testing/vsock/util.h
>>>> @@ -53,4 +53,9 @@ void list_tests(const struct test_case *test_cases);
>>>> void skip_test(struct test_case *test_cases, size_t test_cases_len,
>>>>            const char *test_id_str);
>>>> unsigned long hash_djb2(const void *data, size_t len);
>>>> +size_t iovec_bytes(const struct iovec *iov, size_t iovnum);
>>>> +unsigned long iovec_hash_djb2(const struct iovec *iov, size_t iovnum);
>>>> +struct iovec *iovec_from_test_data(const struct iovec *test_iovec, int iovnum);
>>>> +void free_iovec_test_data(const struct iovec *test_iovec,
>>>> +              struct iovec *iovec, int iovnum);
>>>> #endif /* UTIL_H */
>>>> diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>>>> index da4cb819a183..c1f7bc9abd22 100644
>>>> --- a/tools/testing/vsock/vsock_test.c
>>>> +++ b/tools/testing/vsock/vsock_test.c
>>>> @@ -21,6 +21,7 @@
>>>> #include <poll.h>
>>>> #include <signal.h>
>>>>
>>>> +#include "vsock_test_zerocopy.h"
>>>> #include "timeout.h"
>>>> #include "control.h"
>>>> #include "util.h"
>>>> @@ -1269,6 +1270,21 @@ static struct test_case test_cases[] = {
>>>>         .run_client = test_stream_shutrd_client,
>>>>         .run_server = test_stream_shutrd_server,
>>>>     },
>>>> +    {
>>>> +        .name = "SOCK_STREAM MSG_ZEROCOPY",
>>>> +        .run_client = test_stream_msgzcopy_client,
>>>> +        .run_server = test_stream_msgzcopy_server,
>>>> +    },
>>>> +    {
>>>> +        .name = "SOCK_SEQPACKET MSG_ZEROCOPY",
>>>> +        .run_client = test_seqpacket_msgzcopy_client,
>>>> +        .run_server = test_seqpacket_msgzcopy_server,
>>>> +    },
>>>> +    {
>>>> +        .name = "SOCK_STREAM MSG_ZEROCOPY empty MSG_ERRQUEUE",
>>>> +        .run_client = test_stream_msgzcopy_empty_errq_client,
>>>> +        .run_server = test_stream_msgzcopy_empty_errq_server,
>>>> +    },
>>>>     {},
>>>> };
>>>>
>>>> diff --git a/tools/testing/vsock/vsock_test_zerocopy.c b/tools/testing/vsock/vsock_test_zerocopy.c
>>>> new file mode 100644
>>>> index 000000000000..af14efdf334b
>>>> --- /dev/null
>>>> +++ b/tools/testing/vsock/vsock_test_zerocopy.c
>>>> @@ -0,0 +1,367 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +/* MSG_ZEROCOPY feature tests for vsock
>>>> + *
>>>> + * Copyright (C) 2023 SberDevices.
>>>> + *
>>>> + * Author: Arseniy Krasnov <[email protected]>
>>>> + */
>>>> +
>>>> +#include <stdio.h>
>>>> +#include <stdlib.h>
>>>> +#include <string.h>
>>>> +#include <sys/mman.h>
>>>> +#include <unistd.h>
>>>> +#include <poll.h>
>>>> +#include <linux/errqueue.h>
>>>> +#include <linux/kernel.h>
>>>> +#include <errno.h>
>>>> +
>>>> +#include "control.h"
>>>> +#include "vsock_test_zerocopy.h"
>>>> +#include "msg_zerocopy_common.h"
>>>> +
>>>> +#define PAGE_SIZE        4096
>>>
>>> In some tests I saw `sysconf(_SC_PAGESIZE)` is used,
>>> e.g. in selftests/ptrace/peeksiginfo.c:
>>>
>>> #ifndef PAGE_SIZE
>>> #define PAGE_SIZE sysconf(_SC_PAGESIZE)
>>> #endif
>>>
>>> WDYT?
>>
>> Only small problem with that - in this case I can't use PAGE_SIZE
>> as array initializer. I think to add some reserved constant value
>> to designate that iov element must be size of page, then use this
>> value as initializer and handle it during test iov creating...
>
> Okay I see. Maybe I'm overthinking!
> It is just a test, let's do not complicate it.
>
> Feel free to use the previous version, I'd just add the guards.

Ok, got it!

Thanks, Arseniy

>
> Thanks,
> Stefano
>