2023-09-22 12:51:31

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v1 00/12] vsock/virtio: continue MSG_ZEROCOPY support

Hello,

this patchset contains second and third parts of another big patchset
for MSG_ZEROCOPY flag support:
https://lore.kernel.org/netdev/[email protected]/

During review of this series, Stefano Garzarella <[email protected]>
suggested to split it for three parts to simplify review and merging:

1) virtio and vhost updates (for fragged skbs) (merged to net-next, see
link below)
2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
tx completions) and update for Documentation/. <-- this patchset
3) Updates for tests and utils. <-- this patchset

Part 1) was merged:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=71b263e79370348349553ecdf46f4a69eb436dc7

Head for this patchset is:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=71b263e79370348349553ecdf46f4a69eb436dc7

Arseniy Krasnov (12):
vsock: fix EPOLLERR set on non-empty error queue
vsock: read from socket's error queue
vsock: check for MSG_ZEROCOPY support on send
vsock: enable SOCK_SUPPORT_ZC bit
vhost/vsock: support MSG_ZEROCOPY for transport
vsock/virtio: support MSG_ZEROCOPY for transport
vsock/loopback: support MSG_ZEROCOPY for transport
vsock: enable setting SO_ZEROCOPY
docs: net: description of MSG_ZEROCOPY for AF_VSOCK
test/vsock: MSG_ZEROCOPY flag tests
test/vsock: MSG_ZEROCOPY support for vsock_perf
test/vsock: io_uring rx/tx tests

Documentation/networking/msg_zerocopy.rst | 13 +-
drivers/vhost/vsock.c | 7 +
include/linux/socket.h | 1 +
include/net/af_vsock.h | 7 +
include/uapi/linux/vsock.h | 9 +
net/vmw_vsock/af_vsock.c | 64 ++++-
net/vmw_vsock/virtio_transport.c | 7 +
net/vmw_vsock/vsock_loopback.c | 6 +
tools/testing/vsock/Makefile | 9 +-
tools/testing/vsock/util.c | 222 +++++++++++++++
tools/testing/vsock/util.h | 19 ++
tools/testing/vsock/vsock_perf.c | 143 +++++++++-
tools/testing/vsock/vsock_test.c | 16 ++
tools/testing/vsock/vsock_test_zerocopy.c | 314 +++++++++++++++++++++
tools/testing/vsock/vsock_test_zerocopy.h | 15 +
tools/testing/vsock/vsock_uring_test.c | 321 ++++++++++++++++++++++
16 files changed, 1158 insertions(+), 15 deletions(-)
create mode 100644 include/uapi/linux/vsock.h
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h
create mode 100644 tools/testing/vsock/vsock_uring_test.c

--
2.25.1


2023-09-22 13:53:43

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v1 01/12] vsock: fix EPOLLERR set on non-empty error queue

If socket's error queue is not empty, EPOLLERR must be set. Otherwise,
reader of error queue won't detect data in it using EPOLLERR bit.
Currently for AF_VSOCK this is reproducible only with MSG_ZEROCOPY, as
this feature is the only user of an error queue of the socket.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
net/vmw_vsock/af_vsock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 013b65241b65..d841f4de33b0 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1030,7 +1030,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
poll_wait(file, sk_sleep(sk), wait);
mask = 0;

- if (sk->sk_err)
+ if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue))
/* Signify that there has been an error on this socket. */
mask |= EPOLLERR;

--
2.25.1

2023-09-22 14:44:51

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v1 04/12] vsock: enable SOCK_SUPPORT_ZC bit

This bit is used by io_uring in case of zerocopy tx mode. io_uring code
checks, that socket has this feature. This patch sets it in two places:
1) For socket in 'connect()' call.
2) For new socket which is returned by 'accept()' call.

Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
---
net/vmw_vsock/af_vsock.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 134494a78c14..482300eb88e0 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1406,6 +1406,9 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
goto out;
}

+ if (vsock_msgzerocopy_allow(transport))
+ set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags);
+
err = vsock_auto_bind(vsk);
if (err)
goto out;
@@ -1560,6 +1563,9 @@ static int vsock_accept(struct socket *sock, struct socket *newsock, int flags,
} else {
newsock->state = SS_CONNECTED;
sock_graft(connected, newsock);
+ if (vsock_msgzerocopy_allow(vconnected->transport))
+ set_bit(SOCK_SUPPORT_ZC,
+ &connected->sk_socket->flags);
}

release_sock(connected);
--
2.25.1

2023-09-22 14:50:36

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v1 02/12] vsock: read from socket's error queue

This adds handling of MSG_ERRQUEUE input flag in receive call. This flag
is used to read socket's error queue instead of data queue. Possible
scenario of error queue usage is receiving completions for transmission
with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK'
and 'VSOCK_RECVERR'.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
Changelog:
v5(big patchset) -> v1:
* R-b tag removed, due to added defines to 'include/uapi/linux/vsock.h'.
Both 'SOL_VSOCK' and 'VSOCK_RECVERR' are needed by userspace, so
they were placed to 'include/uapi/linux/vsock.h'. At the same time,
the same define for 'SOL_VSOCK' was placed to 'include/linux/socket.h'.
This is needed because this file contains SOL_XXX defines for different
types of socket, so it prevents situation when another new SOL_XXX
will use constant 287.

include/linux/socket.h | 1 +
include/uapi/linux/vsock.h | 9 +++++++++
net/vmw_vsock/af_vsock.c | 6 ++++++
3 files changed, 16 insertions(+)
create mode 100644 include/uapi/linux/vsock.h

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 39b74d83c7c4..cfcb7e2c3813 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -383,6 +383,7 @@ struct ucred {
#define SOL_MPTCP 284
#define SOL_MCTP 285
#define SOL_SMC 286
+#define SOL_VSOCK 287

/* IPX options */
#define IPX_TYPE 1
diff --git a/include/uapi/linux/vsock.h b/include/uapi/linux/vsock.h
new file mode 100644
index 000000000000..b25c1347a3b8
--- /dev/null
+++ b/include/uapi/linux/vsock.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_LINUX_VSOCK_H
+#define _UAPI_LINUX_VSOCK_H
+
+#define SOL_VSOCK 287
+
+#define VSOCK_RECVERR 1
+
+#endif /* _UAPI_LINUX_VSOCK_H */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d841f4de33b0..4fd11bf34bc7 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -110,6 +110,8 @@
#include <linux/workqueue.h>
#include <net/sock.h>
#include <net/af_vsock.h>
+#include <linux/errqueue.h>
+#include <uapi/linux/vsock.h>

static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
static void vsock_sk_destruct(struct sock *sk);
@@ -2137,6 +2139,10 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
int err;

sk = sock->sk;
+
+ if (unlikely(flags & MSG_ERRQUEUE))
+ return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, VSOCK_RECVERR);
+
vsk = vsock_sk(sk);
err = 0;

--
2.25.1

2023-09-22 15:35:25

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v1 10/12] test/vsock: MSG_ZEROCOPY flag tests

This adds three tests for MSG_ZEROCOPY feature:
1) SOCK_STREAM tx with different buffers.
2) SOCK_SEQPACKET tx with different buffers.
3) SOCK_STREAM test to read empty error queue of the socket.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
tools/testing/vsock/Makefile | 2 +-
tools/testing/vsock/util.c | 222 +++++++++++++++
tools/testing/vsock/util.h | 19 ++
tools/testing/vsock/vsock_test.c | 16 ++
tools/testing/vsock/vsock_test_zerocopy.c | 314 ++++++++++++++++++++++
tools/testing/vsock/vsock_test_zerocopy.h | 15 ++
6 files changed, 587 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h

diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index 21a98ba565ab..1a26f60a596c 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
all: test vsock_perf
test: vsock_test vsock_diag_test
-vsock_test: vsock_test.o timeout.o control.o util.o
+vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
vsock_perf: vsock_perf.o

diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 6779d5008b27..d531dbbfa8ff 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -11,15 +11,27 @@
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
+#include <string.h>
#include <signal.h>
#include <unistd.h>
#include <assert.h>
#include <sys/epoll.h>
+#include <sys/mman.h>
+#include <linux/errqueue.h>
+#include <poll.h>

#include "timeout.h"
#include "control.h"
#include "util.h"

+#ifndef SOL_VSOCK
+#define SOL_VSOCK 287
+#endif
+
+#ifndef VSOCK_RECVERR
+#define VSOCK_RECVERR 1
+#endif
+
/* Install signal handlers */
void init_signals(void)
{
@@ -444,3 +456,213 @@ unsigned long hash_djb2(const void *data, size_t len)

return hash;
}
+
+void enable_so_zerocopy(int fd)
+{
+ int val = 1;
+
+ if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) {
+ perror("setsockopt");
+ exit(EXIT_FAILURE);
+ }
+}
+
+static void *mmap_no_fail(size_t bytes)
+{
+ void *res;
+
+ res = mmap(NULL, bytes, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
+ if (res == MAP_FAILED) {
+ perror("mmap");
+ exit(EXIT_FAILURE);
+ }
+
+ return res;
+}
+
+size_t iovec_bytes(const struct iovec *iov, size_t iovnum)
+{
+ size_t bytes;
+ int i;
+
+ for (bytes = 0, i = 0; i < iovnum; i++)
+ bytes += iov[i].iov_len;
+
+ return bytes;
+}
+
+static void iovec_random_init(struct iovec *iov,
+ const struct vsock_test_data *test_data)
+{
+ int i;
+
+ for (i = 0; i < test_data->vecs_cnt; i++) {
+ int j;
+
+ if (test_data->vecs[i].iov_base == MAP_FAILED)
+ continue;
+
+ for (j = 0; j < iov[i].iov_len; j++)
+ ((uint8_t *)iov[i].iov_base)[j] = rand() & 0xff;
+ }
+}
+
+unsigned long iovec_hash_djb2(struct iovec *iov, size_t iovnum)
+{
+ unsigned long hash;
+ size_t iov_bytes;
+ size_t offs;
+ void *tmp;
+ int i;
+
+ iov_bytes = iovec_bytes(iov, iovnum);
+
+ tmp = malloc(iov_bytes);
+ if (!tmp) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ for (offs = 0, i = 0; i < iovnum; i++) {
+ memcpy(tmp + offs, iov[i].iov_base, iov[i].iov_len);
+ offs += iov[i].iov_len;
+ }
+
+ hash = hash_djb2(tmp, iov_bytes);
+ free(tmp);
+
+ return hash;
+}
+
+struct iovec *iovec_from_test_data(const struct vsock_test_data *test_data)
+{
+ const struct iovec *test_iovec;
+ struct iovec *iovec;
+ int i;
+
+ iovec = malloc(sizeof(*iovec) * test_data->vecs_cnt);
+ if (!iovec) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ test_iovec = test_data->vecs;
+
+ for (i = 0; i < test_data->vecs_cnt; i++) {
+ iovec[i].iov_len = test_iovec[i].iov_len;
+ iovec[i].iov_base = mmap_no_fail(test_iovec[i].iov_len);
+
+ if (test_iovec[i].iov_base != MAP_FAILED &&
+ test_iovec[i].iov_base)
+ iovec[i].iov_base += (uintptr_t)test_iovec[i].iov_base;
+ }
+
+ /* Unmap "invalid" elements. */
+ for (i = 0; i < test_data->vecs_cnt; i++) {
+ if (test_iovec[i].iov_base == MAP_FAILED) {
+ if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
+ perror("munmap");
+ exit(EXIT_FAILURE);
+ }
+ }
+ }
+
+ iovec_random_init(iovec, test_data);
+
+ return iovec;
+}
+
+void free_iovec_test_data(const struct vsock_test_data *test_data,
+ struct iovec *iovec)
+{
+ int i;
+
+ for (i = 0; i < test_data->vecs_cnt; i++) {
+ if (test_data->vecs[i].iov_base != MAP_FAILED) {
+ if (test_data->vecs[i].iov_base)
+ iovec[i].iov_base -= (uintptr_t)test_data->vecs[i].iov_base;
+
+ if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
+ perror("munmap");
+ exit(EXIT_FAILURE);
+ }
+ }
+ }
+
+ free(iovec);
+}
+
+#define POLL_TIMEOUT_MS 100
+void vsock_recv_completion(int fd, bool zerocopied, bool completion)
+{
+ struct sock_extended_err *serr;
+ struct msghdr msg = { 0 };
+ struct pollfd fds = { 0 };
+ char cmsg_data[128];
+ struct cmsghdr *cm;
+ ssize_t res;
+
+ fds.fd = fd;
+ fds.events = 0;
+
+ if (poll(&fds, 1, POLL_TIMEOUT_MS) < 0) {
+ perror("poll");
+ exit(EXIT_FAILURE);
+ }
+
+ if (!(fds.revents & POLLERR)) {
+ if (completion) {
+ fprintf(stderr, "POLLERR expected\n");
+ exit(EXIT_FAILURE);
+ } else {
+ return;
+ }
+ }
+
+ msg.msg_control = cmsg_data;
+ msg.msg_controllen = sizeof(cmsg_data);
+
+ res = recvmsg(fd, &msg, MSG_ERRQUEUE);
+ if (res) {
+ fprintf(stderr, "failed to read error queue: %zi\n", res);
+ exit(EXIT_FAILURE);
+ }
+
+ cm = CMSG_FIRSTHDR(&msg);
+ if (!cm) {
+ fprintf(stderr, "cmsg: no cmsg\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (cm->cmsg_level != SOL_VSOCK) {
+ fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (cm->cmsg_type != VSOCK_RECVERR) {
+ fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n");
+ exit(EXIT_FAILURE);
+ }
+
+ serr = (void *)CMSG_DATA(cm);
+ if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
+ fprintf(stderr, "serr: wrong origin: %u\n", serr->ee_origin);
+ exit(EXIT_FAILURE);
+ }
+
+ if (serr->ee_errno) {
+ fprintf(stderr, "serr: wrong error code: %u\n", serr->ee_errno);
+ exit(EXIT_FAILURE);
+ }
+
+ if (zerocopied && (serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
+ fprintf(stderr, "serr: was copy instead of zerocopy\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (!zerocopied && !(serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
+ fprintf(stderr, "serr: was zerocopy instead of copy\n");
+ exit(EXIT_FAILURE);
+ }
+}
diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
index e5407677ce05..8656a5ae63aa 100644
--- a/tools/testing/vsock/util.h
+++ b/tools/testing/vsock/util.h
@@ -2,6 +2,7 @@
#ifndef UTIL_H
#define UTIL_H

+#include <stdbool.h>
#include <sys/socket.h>
#include <linux/vm_sockets.h>

@@ -18,6 +19,17 @@ struct test_opts {
unsigned int peer_cid;
};

+#define VSOCK_TEST_DATA_MAX_IOV 4
+
+struct vsock_test_data {
+ bool stream_only; /* Only for SOCK_STREAM. */
+ bool zerocopied; /* Data must be zerocopied. */
+ bool so_zerocopy; /* Enable zerocopy mode. */
+ int sendmsg_errno; /* 'errno' after 'sendmsg()'. */
+ int vecs_cnt; /* Number of elements in 'vecs'. */
+ struct iovec vecs[VSOCK_TEST_DATA_MAX_IOV];
+};
+
/* A test case definition. Test functions must print failures to stderr and
* terminate with exit(EXIT_FAILURE).
*/
@@ -53,4 +65,11 @@ void list_tests(const struct test_case *test_cases);
void skip_test(struct test_case *test_cases, size_t test_cases_len,
const char *test_id_str);
unsigned long hash_djb2(const void *data, size_t len);
+void enable_so_zerocopy(int fd);
+size_t iovec_bytes(const struct iovec *iov, size_t iovnum);
+unsigned long iovec_hash_djb2(struct iovec *iov, size_t iovnum);
+struct iovec *iovec_from_test_data(const struct vsock_test_data *test_data);
+void free_iovec_test_data(const struct vsock_test_data *test_data,
+ struct iovec *iovec);
+void vsock_recv_completion(int fd, bool zerocopied, bool completion);
#endif /* UTIL_H */
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index da4cb819a183..c1f7bc9abd22 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -21,6 +21,7 @@
#include <poll.h>
#include <signal.h>

+#include "vsock_test_zerocopy.h"
#include "timeout.h"
#include "control.h"
#include "util.h"
@@ -1269,6 +1270,21 @@ static struct test_case test_cases[] = {
.run_client = test_stream_shutrd_client,
.run_server = test_stream_shutrd_server,
},
+ {
+ .name = "SOCK_STREAM MSG_ZEROCOPY",
+ .run_client = test_stream_msgzcopy_client,
+ .run_server = test_stream_msgzcopy_server,
+ },
+ {
+ .name = "SOCK_SEQPACKET MSG_ZEROCOPY",
+ .run_client = test_seqpacket_msgzcopy_client,
+ .run_server = test_seqpacket_msgzcopy_server,
+ },
+ {
+ .name = "SOCK_STREAM MSG_ZEROCOPY empty MSG_ERRQUEUE",
+ .run_client = test_stream_msgzcopy_empty_errq_client,
+ .run_server = test_stream_msgzcopy_empty_errq_server,
+ },
{},
};

diff --git a/tools/testing/vsock/vsock_test_zerocopy.c b/tools/testing/vsock/vsock_test_zerocopy.c
new file mode 100644
index 000000000000..6968555d3611
--- /dev/null
+++ b/tools/testing/vsock/vsock_test_zerocopy.c
@@ -0,0 +1,314 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* MSG_ZEROCOPY feature tests for vsock
+ *
+ * Copyright (C) 2023 SaluteDevices.
+ *
+ * Author: Arseniy Krasnov <[email protected]>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <poll.h>
+#include <linux/errqueue.h>
+#include <linux/kernel.h>
+#include <errno.h>
+
+#include "control.h"
+#include "vsock_test_zerocopy.h"
+
+#define PAGE_SIZE 4096
+
+static struct vsock_test_data test_data_array[] = {
+ /* Last element has non-page aligned size. */
+ {
+ .zerocopied = true,
+ .so_zerocopy = true,
+ .sendmsg_errno = 0,
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE },
+ { NULL, PAGE_SIZE },
+ { NULL, 200 }
+ }
+ },
+ /* All elements have page aligned base and size. */
+ {
+ .zerocopied = true,
+ .so_zerocopy = true,
+ .sendmsg_errno = 0,
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE },
+ { NULL, PAGE_SIZE * 2 },
+ { NULL, PAGE_SIZE * 3 }
+ }
+ },
+ /* All elements have page aligned base and size. But
+ * data length is bigger than 64Kb.
+ */
+ {
+ .zerocopied = true,
+ .so_zerocopy = true,
+ .sendmsg_errno = 0,
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE * 16 },
+ { NULL, PAGE_SIZE * 16 },
+ { NULL, PAGE_SIZE * 16 }
+ }
+ },
+ /* Middle element has both non-page aligned base and size. */
+ {
+ .zerocopied = true,
+ .so_zerocopy = true,
+ .sendmsg_errno = 0,
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE },
+ { (void *)1, 100 },
+ { NULL, PAGE_SIZE }
+ }
+ },
+ /* Middle element is unmapped. */
+ {
+ .zerocopied = false,
+ .so_zerocopy = true,
+ .sendmsg_errno = ENOMEM,
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE },
+ { MAP_FAILED, PAGE_SIZE },
+ { NULL, PAGE_SIZE }
+ }
+ },
+ /* Valid data, but SO_ZEROCOPY is off. This
+ * will trigger fallback to copy.
+ */
+ {
+ .zerocopied = false,
+ .so_zerocopy = false,
+ .sendmsg_errno = 0,
+ .vecs_cnt = 1,
+ {
+ { NULL, PAGE_SIZE }
+ }
+ },
+ /* Valid data, but message is bigger than peer's
+ * buffer, so this will trigger fallback to copy.
+ * This test is for SOCK_STREAM only, because
+ * for SOCK_SEQPACKET, 'sendmsg()' returns EMSGSIZE.
+ */
+ {
+ .stream_only = true,
+ .zerocopied = false,
+ .so_zerocopy = true,
+ .sendmsg_errno = 0,
+ .vecs_cnt = 1,
+ {
+ { NULL, 100 * PAGE_SIZE }
+ }
+ },
+};
+
+static void test_client(const struct test_opts *opts,
+ const struct vsock_test_data *test_data,
+ bool sock_seqpacket)
+{
+ struct msghdr msg = { 0 };
+ ssize_t sendmsg_res;
+ struct iovec *iovec;
+ int fd;
+
+ if (sock_seqpacket)
+ fd = vsock_seqpacket_connect(opts->peer_cid, 1234);
+ else
+ fd = vsock_stream_connect(opts->peer_cid, 1234);
+
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ if (test_data->so_zerocopy)
+ enable_so_zerocopy(fd);
+
+ iovec = iovec_from_test_data(test_data);
+
+ msg.msg_iov = iovec;
+ msg.msg_iovlen = test_data->vecs_cnt;
+
+ errno = 0;
+
+ sendmsg_res = sendmsg(fd, &msg, MSG_ZEROCOPY);
+ if (errno != test_data->sendmsg_errno) {
+ fprintf(stderr, "expected 'errno' == %i, got %i\n",
+ test_data->sendmsg_errno, errno);
+ exit(EXIT_FAILURE);
+ }
+
+ if (!errno) {
+ if (sendmsg_res != iovec_bytes(iovec, test_data->vecs_cnt)) {
+ fprintf(stderr, "expected 'sendmsg()' == %li, got %li\n",
+ iovec_bytes(iovec, test_data->vecs_cnt),
+ sendmsg_res);
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ /* Receive completion only in case of successful 'sendmsg()'. */
+ vsock_recv_completion(fd, test_data->zerocopied,
+ test_data->so_zerocopy && !test_data->sendmsg_errno);
+
+ if (!test_data->sendmsg_errno)
+ control_writeulong(iovec_hash_djb2(iovec, test_data->vecs_cnt));
+ else
+ control_writeulong(0);
+
+ control_writeln("DONE");
+ free_iovec_test_data(test_data, iovec);
+ close(fd);
+}
+
+void test_stream_msgzcopy_client(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
+ test_client(opts, &test_data_array[i], false);
+}
+
+void test_seqpacket_msgzcopy_client(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++) {
+ if (test_data_array[i].stream_only)
+ continue;
+
+ test_client(opts, &test_data_array[i], true);
+ }
+}
+
+static void test_server(const struct test_opts *opts,
+ const struct vsock_test_data *test_data,
+ bool sock_seqpacket)
+{
+ unsigned long remote_hash;
+ unsigned long local_hash;
+ ssize_t total_bytes_rec;
+ unsigned char *data;
+ size_t data_len;
+ int fd;
+
+ if (sock_seqpacket)
+ fd = vsock_seqpacket_accept(VMADDR_CID_ANY, 1234, NULL);
+ else
+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+
+ if (fd < 0) {
+ perror("accept");
+ exit(EXIT_FAILURE);
+ }
+
+ data_len = iovec_bytes(test_data->vecs, test_data->vecs_cnt);
+
+ data = malloc(data_len);
+ if (!data) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ total_bytes_rec = 0;
+
+ while (total_bytes_rec != data_len) {
+ ssize_t bytes_rec;
+
+ bytes_rec = read(fd, data + total_bytes_rec,
+ data_len - total_bytes_rec);
+ if (bytes_rec <= 0)
+ break;
+
+ total_bytes_rec += bytes_rec;
+ }
+
+ if (test_data->sendmsg_errno == 0)
+ local_hash = hash_djb2(data, data_len);
+ else
+ local_hash = 0;
+
+ free(data);
+
+ /* Waiting for some result. */
+ remote_hash = control_readulong();
+ if (remote_hash != local_hash) {
+ fprintf(stderr, "hash mismatch\n");
+ exit(EXIT_FAILURE);
+ }
+
+ control_expectln("DONE");
+ close(fd);
+}
+
+void test_stream_msgzcopy_server(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
+ test_server(opts, &test_data_array[i], false);
+}
+
+void test_seqpacket_msgzcopy_server(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++) {
+ if (test_data_array[i].stream_only)
+ continue;
+
+ test_server(opts, &test_data_array[i], true);
+ }
+}
+
+void test_stream_msgzcopy_empty_errq_client(const struct test_opts *opts)
+{
+ struct msghdr msg = { 0 };
+ char cmsg_data[128];
+ ssize_t res;
+ int fd;
+
+ fd = vsock_stream_connect(opts->peer_cid, 1234);
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ msg.msg_control = cmsg_data;
+ msg.msg_controllen = sizeof(cmsg_data);
+
+ res = recvmsg(fd, &msg, MSG_ERRQUEUE);
+ if (res != -1) {
+ fprintf(stderr, "expected 'recvmsg(2)' failure, got %zi\n",
+ res);
+ exit(EXIT_FAILURE);
+ }
+
+ control_writeln("DONE");
+ close(fd);
+}
+
+void test_stream_msgzcopy_empty_errq_server(const struct test_opts *opts)
+{
+ int fd;
+
+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+ if (fd < 0) {
+ perror("accept");
+ exit(EXIT_FAILURE);
+ }
+
+ control_expectln("DONE");
+ close(fd);
+}
diff --git a/tools/testing/vsock/vsock_test_zerocopy.h b/tools/testing/vsock/vsock_test_zerocopy.h
new file mode 100644
index 000000000000..3ef2579e024d
--- /dev/null
+++ b/tools/testing/vsock/vsock_test_zerocopy.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef VSOCK_TEST_ZEROCOPY_H
+#define VSOCK_TEST_ZEROCOPY_H
+#include "util.h"
+
+void test_stream_msgzcopy_client(const struct test_opts *opts);
+void test_stream_msgzcopy_server(const struct test_opts *opts);
+
+void test_seqpacket_msgzcopy_client(const struct test_opts *opts);
+void test_seqpacket_msgzcopy_server(const struct test_opts *opts);
+
+void test_stream_msgzcopy_empty_errq_client(const struct test_opts *opts);
+void test_stream_msgzcopy_empty_errq_server(const struct test_opts *opts);
+
+#endif /* VSOCK_TEST_ZEROCOPY_H */
--
2.25.1

2023-09-22 16:06:46

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v1 08/12] vsock: enable setting SO_ZEROCOPY

For AF_VSOCK, zerocopy tx mode depends on transport, so this option must
be set in AF_VSOCK implementation where transport is accessible (if
transport is not set during setting SO_ZEROCOPY: for example socket is
not connected, then SO_ZEROCOPY will be enabled, but once transport will
be assigned, support of this type of transmission will be checked).

To handle SO_ZEROCOPY, AF_VSOCK implementation uses SOCK_CUSTOM_SOCKOPT
bit, thus handling SOL_SOCKET option operations, but all of them except
SO_ZEROCOPY will be forwarded to the generic handler by calling
'sock_setsockopt()'.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
Changelog:
v5(big patchset) -> v1:
* Compact 'if' conditions.
* Rename 'zc_val' to 'zerocopy'.
* Use 'zerocopy' value directly in 'sock_valbool_flag()', without
?: operator.
* Set 'SOCK_CUSTOM_SOCKOPT' bit for connectible sockets only, as
suggested by Bobby Eshleman <[email protected]>.

net/vmw_vsock/af_vsock.c | 46 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 482300eb88e0..c05a42e02a17 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1406,8 +1406,16 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
goto out;
}

- if (vsock_msgzerocopy_allow(transport))
+ if (vsock_msgzerocopy_allow(transport)) {
set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags);
+ } else if (sock_flag(sk, SOCK_ZEROCOPY)) {
+ /* If this option was set before 'connect()',
+ * when transport was unknown, check that this
+ * feature is supported here.
+ */
+ err = -EOPNOTSUPP;
+ goto out;
+ }

err = vsock_auto_bind(vsk);
if (err)
@@ -1643,7 +1651,7 @@ static int vsock_connectible_setsockopt(struct socket *sock,
const struct vsock_transport *transport;
u64 val;

- if (level != AF_VSOCK)
+ if (level != AF_VSOCK && level != SOL_SOCKET)
return -ENOPROTOOPT;

#define COPY_IN(_v) \
@@ -1666,6 +1674,34 @@ static int vsock_connectible_setsockopt(struct socket *sock,

transport = vsk->transport;

+ if (level == SOL_SOCKET) {
+ int zerocopy;
+
+ if (optname != SO_ZEROCOPY) {
+ release_sock(sk);
+ return sock_setsockopt(sock, level, optname, optval, optlen);
+ }
+
+ /* Use 'int' type here, because variable to
+ * set this option usually has this type.
+ */
+ COPY_IN(zerocopy);
+
+ if (zerocopy < 0 || zerocopy > 1) {
+ err = -EINVAL;
+ goto exit;
+ }
+
+ if (transport && !vsock_msgzerocopy_allow(transport)) {
+ err = -EOPNOTSUPP;
+ goto exit;
+ }
+
+ sock_valbool_flag(sk, SOCK_ZEROCOPY,
+ zerocopy);
+ goto exit;
+ }
+
switch (optname) {
case SO_VM_SOCKETS_BUFFER_SIZE:
COPY_IN(val);
@@ -2322,6 +2358,12 @@ static int vsock_create(struct net *net, struct socket *sock,
}
}

+ /* SOCK_DGRAM doesn't have 'setsockopt' callback set in its
+ * proto_ops, so there is no handler for custom logic.
+ */
+ if (sock_type_connectible(sock->type))
+ set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags);
+
vsock_insert_unbound(vsk);

return 0;
--
2.25.1

2023-09-22 18:15:48

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v1 03/12] vsock: check for MSG_ZEROCOPY support on send

This feature totally depends on transport, so if transport doesn't
support it, return error.

Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
---
include/net/af_vsock.h | 7 +++++++
net/vmw_vsock/af_vsock.c | 6 ++++++
2 files changed, 13 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index b01cf9ac2437..e302c0e804d0 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -177,6 +177,9 @@ struct vsock_transport {

/* Read a single skb */
int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
+
+ /* Zero-copy. */
+ bool (*msgzerocopy_allow)(void);
};

/**** CORE ****/
@@ -241,4 +244,8 @@ static inline void __init vsock_bpf_build_proto(void)
{}
#endif

+static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t)
+{
+ return t->msgzerocopy_allow && t->msgzerocopy_allow();
+}
#endif /* __AF_VSOCK_H__ */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 4fd11bf34bc7..134494a78c14 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1824,6 +1824,12 @@ static int vsock_connectible_sendmsg(struct socket *sock, struct msghdr *msg,
goto out;
}

+ if (msg->msg_flags & MSG_ZEROCOPY &&
+ !vsock_msgzerocopy_allow(transport)) {
+ err = -EOPNOTSUPP;
+ goto out;
+ }
+
/* Wait for room in the produce queue to enqueue our user's data. */
timeout = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);

--
2.25.1

2023-09-22 18:48:44

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v1 09/12] docs: net: description of MSG_ZEROCOPY for AF_VSOCK

This adds description of MSG_ZEROCOPY flag support for AF_VSOCK type of
socket.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
Changelog:
v5(big patchset) -> v1:
* Add information about 'VSOCK_RECVERR'.
* Do not break line for test path.

Documentation/networking/msg_zerocopy.rst | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/msg_zerocopy.rst b/Documentation/networking/msg_zerocopy.rst
index b3ea96af9b49..78fb70e748b7 100644
--- a/Documentation/networking/msg_zerocopy.rst
+++ b/Documentation/networking/msg_zerocopy.rst
@@ -7,7 +7,8 @@ Intro
=====

The MSG_ZEROCOPY flag enables copy avoidance for socket send calls.
-The feature is currently implemented for TCP and UDP sockets.
+The feature is currently implemented for TCP, UDP and VSOCK (with
+virtio transport) sockets.


Opportunity and Caveats
@@ -174,7 +175,9 @@ read_notification() call in the previous snippet. A notification
is encoded in the standard error format, sock_extended_err.

The level and type fields in the control data are protocol family
-specific, IP_RECVERR or IPV6_RECVERR.
+specific, IP_RECVERR or IPV6_RECVERR (for TCP or UDP socket).
+For VSOCK socket, cmsg_level will be SOL_VSOCK and cmsg_type will be
+VSOCK_RECVERR.

Error origin is the new type SO_EE_ORIGIN_ZEROCOPY. ee_errno is zero,
as explained before, to avoid blocking read and write system calls on
@@ -235,12 +238,15 @@ Implementation
Loopback
--------

+For TCP and UDP:
Data sent to local sockets can be queued indefinitely if the receive
process does not read its socket. Unbound notification latency is not
acceptable. For this reason all packets generated with MSG_ZEROCOPY
that are looped to a local socket will incur a deferred copy. This
includes looping onto packet sockets (e.g., tcpdump) and tun devices.

+For VSOCK:
+Data path sent to local sockets is the same as for non-local sockets.

Testing
=======
@@ -254,3 +260,6 @@ instance when run with msg_zerocopy.sh between a veth pair across
namespaces, the test will not show any improvement. For testing, the
loopback restriction can be temporarily relaxed by making
skb_orphan_frags_rx identical to skb_orphan_frags.
+
+For VSOCK type of socket example can be found in
+tools/testing/vsock/vsock_test_zerocopy.c.
--
2.25.1

2023-09-22 22:17:56

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v1 12/12] test/vsock: io_uring rx/tx tests

This adds set of tests which use io_uring for rx/tx. This test suite is
implemented as separated util like 'vsock_test' and has the same set of
input arguments as 'vsock_test'. These tests only cover cases of data
transmission (no connect/bind/accept etc).

Signed-off-by: Arseniy Krasnov <[email protected]>
---
Changelog:
v5(big patchset) -> v1:
* Use LDLIBS instead of LDFLAGS.

tools/testing/vsock/Makefile | 7 +-
tools/testing/vsock/vsock_uring_test.c | 321 +++++++++++++++++++++++++
2 files changed, 327 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/vsock/vsock_uring_test.c

diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index 1a26f60a596c..c84380bfc18d 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -1,12 +1,17 @@
# SPDX-License-Identifier: GPL-2.0-only
+ifeq ($(MAKECMDGOALS),vsock_uring_test)
+LDLIBS = -luring
+endif
+
all: test vsock_perf
test: vsock_test vsock_diag_test
vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
vsock_perf: vsock_perf.o
+vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o

CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE
.PHONY: all test clean
clean:
- ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf
+ ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test
-include *.d
diff --git a/tools/testing/vsock/vsock_uring_test.c b/tools/testing/vsock/vsock_uring_test.c
new file mode 100644
index 000000000000..5ecc4c0d62ab
--- /dev/null
+++ b/tools/testing/vsock/vsock_uring_test.c
@@ -0,0 +1,321 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* io_uring tests for vsock
+ *
+ * Copyright (C) 2023 SaluteDevices.
+ *
+ * Author: Arseniy Krasnov <[email protected]>
+ */
+
+#include <getopt.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <liburing.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <linux/kernel.h>
+#include <error.h>
+
+#include "util.h"
+#include "control.h"
+
+#define PAGE_SIZE 4096
+#define RING_ENTRIES_NUM 4
+
+static struct vsock_test_data test_data_array[] = {
+ /* All elements have page aligned base and size. */
+ {
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE },
+ { NULL, 2 * PAGE_SIZE },
+ { NULL, 3 * PAGE_SIZE },
+ }
+ },
+ /* Middle element has both non-page aligned base and size. */
+ {
+ .vecs_cnt = 3,
+ {
+ { NULL, PAGE_SIZE },
+ { (void *)1, 200 },
+ { NULL, 3 * PAGE_SIZE },
+ }
+ }
+};
+
+static void vsock_io_uring_client(const struct test_opts *opts,
+ const struct vsock_test_data *test_data,
+ bool msg_zerocopy)
+{
+ struct io_uring_sqe *sqe;
+ struct io_uring_cqe *cqe;
+ struct io_uring ring;
+ struct iovec *iovec;
+ struct msghdr msg;
+ int fd;
+
+ fd = vsock_stream_connect(opts->peer_cid, 1234);
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ if (msg_zerocopy)
+ enable_so_zerocopy(fd);
+
+ iovec = iovec_from_test_data(test_data);
+
+ if (io_uring_queue_init(RING_ENTRIES_NUM, &ring, 0))
+ error(1, errno, "io_uring_queue_init");
+
+ if (io_uring_register_buffers(&ring, iovec, test_data->vecs_cnt))
+ error(1, errno, "io_uring_register_buffers");
+
+ memset(&msg, 0, sizeof(msg));
+ msg.msg_iov = iovec;
+ msg.msg_iovlen = test_data->vecs_cnt;
+ sqe = io_uring_get_sqe(&ring);
+
+ if (msg_zerocopy)
+ io_uring_prep_sendmsg_zc(sqe, fd, &msg, 0);
+ else
+ io_uring_prep_sendmsg(sqe, fd, &msg, 0);
+
+ if (io_uring_submit(&ring) != 1)
+ error(1, errno, "io_uring_submit");
+
+ if (io_uring_wait_cqe(&ring, &cqe))
+ error(1, errno, "io_uring_wait_cqe");
+
+ io_uring_cqe_seen(&ring, cqe);
+
+ control_writeulong(iovec_hash_djb2(iovec, test_data->vecs_cnt));
+
+ control_writeln("DONE");
+ io_uring_queue_exit(&ring);
+ free_iovec_test_data(test_data, iovec);
+ close(fd);
+}
+
+static void vsock_io_uring_server(const struct test_opts *opts,
+ const struct vsock_test_data *test_data)
+{
+ unsigned long remote_hash;
+ unsigned long local_hash;
+ struct io_uring_sqe *sqe;
+ struct io_uring_cqe *cqe;
+ struct io_uring ring;
+ struct iovec iovec;
+ size_t data_len;
+ void *data;
+ int fd;
+
+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+ if (fd < 0) {
+ perror("accept");
+ exit(EXIT_FAILURE);
+ }
+
+ data_len = iovec_bytes(test_data->vecs, test_data->vecs_cnt);
+
+ data = malloc(data_len);
+ if (!data) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ if (io_uring_queue_init(RING_ENTRIES_NUM, &ring, 0))
+ error(1, errno, "io_uring_queue_init");
+
+ sqe = io_uring_get_sqe(&ring);
+ iovec.iov_base = data;
+ iovec.iov_len = data_len;
+
+ io_uring_prep_readv(sqe, fd, &iovec, 1, 0);
+
+ if (io_uring_submit(&ring) != 1)
+ error(1, errno, "io_uring_submit");
+
+ if (io_uring_wait_cqe(&ring, &cqe))
+ error(1, errno, "io_uring_wait_cqe");
+
+ if (cqe->res != data_len) {
+ fprintf(stderr, "expected %zu, got %u\n", data_len,
+ cqe->res);
+ exit(EXIT_FAILURE);
+ }
+
+ local_hash = hash_djb2(data, data_len);
+
+ remote_hash = control_readulong();
+ if (remote_hash != local_hash) {
+ fprintf(stderr, "hash mismatch\n");
+ exit(EXIT_FAILURE);
+ }
+
+ control_expectln("DONE");
+ io_uring_queue_exit(&ring);
+ free(data);
+}
+
+void test_stream_uring_server(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
+ vsock_io_uring_server(opts, &test_data_array[i]);
+}
+
+void test_stream_uring_client(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
+ vsock_io_uring_client(opts, &test_data_array[i], false);
+}
+
+void test_stream_uring_msg_zc_server(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
+ vsock_io_uring_server(opts, &test_data_array[i]);
+}
+
+void test_stream_uring_msg_zc_client(const struct test_opts *opts)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
+ vsock_io_uring_client(opts, &test_data_array[i], true);
+}
+
+static struct test_case test_cases[] = {
+ {
+ .name = "SOCK_STREAM io_uring test",
+ .run_server = test_stream_uring_server,
+ .run_client = test_stream_uring_client,
+ },
+ {
+ .name = "SOCK_STREAM io_uring MSG_ZEROCOPY test",
+ .run_server = test_stream_uring_msg_zc_server,
+ .run_client = test_stream_uring_msg_zc_client,
+ },
+ {},
+};
+
+static const char optstring[] = "";
+static const struct option longopts[] = {
+ {
+ .name = "control-host",
+ .has_arg = required_argument,
+ .val = 'H',
+ },
+ {
+ .name = "control-port",
+ .has_arg = required_argument,
+ .val = 'P',
+ },
+ {
+ .name = "mode",
+ .has_arg = required_argument,
+ .val = 'm',
+ },
+ {
+ .name = "peer-cid",
+ .has_arg = required_argument,
+ .val = 'p',
+ },
+ {
+ .name = "help",
+ .has_arg = no_argument,
+ .val = '?',
+ },
+ {},
+};
+
+static void usage(void)
+{
+ fprintf(stderr, "Usage: vsock_uring_test [--help] [--control-host=<host>] --control-port=<port> --mode=client|server --peer-cid=<cid>\n"
+ "\n"
+ " Server: vsock_uring_test --control-port=1234 --mode=server --peer-cid=3\n"
+ " Client: vsock_uring_test --control-host=192.168.0.1 --control-port=1234 --mode=client --peer-cid=2\n"
+ "\n"
+ "Run transmission tests using io_uring. Usage is the same as\n"
+ "in ./vsock_test\n"
+ "\n"
+ "Options:\n"
+ " --help This help message\n"
+ " --control-host <host> Server IP address to connect to\n"
+ " --control-port <port> Server port to listen on/connect to\n"
+ " --mode client|server Server or client mode\n"
+ " --peer-cid <cid> CID of the other side\n"
+ );
+ exit(EXIT_FAILURE);
+}
+
+int main(int argc, char **argv)
+{
+ const char *control_host = NULL;
+ const char *control_port = NULL;
+ struct test_opts opts = {
+ .mode = TEST_MODE_UNSET,
+ .peer_cid = VMADDR_CID_ANY,
+ };
+
+ init_signals();
+
+ for (;;) {
+ int opt = getopt_long(argc, argv, optstring, longopts, NULL);
+
+ if (opt == -1)
+ break;
+
+ switch (opt) {
+ case 'H':
+ control_host = optarg;
+ break;
+ case 'm':
+ if (strcmp(optarg, "client") == 0) {
+ opts.mode = TEST_MODE_CLIENT;
+ } else if (strcmp(optarg, "server") == 0) {
+ opts.mode = TEST_MODE_SERVER;
+ } else {
+ fprintf(stderr, "--mode must be \"client\" or \"server\"\n");
+ return EXIT_FAILURE;
+ }
+ break;
+ case 'p':
+ opts.peer_cid = parse_cid(optarg);
+ break;
+ case 'P':
+ control_port = optarg;
+ break;
+ case '?':
+ default:
+ usage();
+ }
+ }
+
+ if (!control_port)
+ usage();
+ if (opts.mode == TEST_MODE_UNSET)
+ usage();
+ if (opts.peer_cid == VMADDR_CID_ANY)
+ usage();
+
+ if (!control_host) {
+ if (opts.mode != TEST_MODE_SERVER)
+ usage();
+ control_host = "0.0.0.0";
+ }
+
+ control_init(control_host, control_port,
+ opts.mode == TEST_MODE_SERVER);
+
+ run_tests(test_cases, &opts);
+
+ control_cleanup();
+
+ return 0;
+}
--
2.25.1

2023-09-26 14:14:04

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH net-next v1 10/12] test/vsock: MSG_ZEROCOPY flag tests

On Fri, Sep 22, 2023 at 08:24:26AM +0300, Arseniy Krasnov wrote:
>This adds three tests for MSG_ZEROCOPY feature:
>1) SOCK_STREAM tx with different buffers.
>2) SOCK_SEQPACKET tx with different buffers.
>3) SOCK_STREAM test to read empty error queue of the socket.
>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> tools/testing/vsock/Makefile | 2 +-
> tools/testing/vsock/util.c | 222 +++++++++++++++
> tools/testing/vsock/util.h | 19 ++
> tools/testing/vsock/vsock_test.c | 16 ++
> tools/testing/vsock/vsock_test_zerocopy.c | 314 ++++++++++++++++++++++
> tools/testing/vsock/vsock_test_zerocopy.h | 15 ++
> 6 files changed, 587 insertions(+), 1 deletion(-)
> create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c
> create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h
>
>diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
>index 21a98ba565ab..1a26f60a596c 100644
>--- a/tools/testing/vsock/Makefile
>+++ b/tools/testing/vsock/Makefile
>@@ -1,7 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0-only
> all: test vsock_perf
> test: vsock_test vsock_diag_test
>-vsock_test: vsock_test.o timeout.o control.o util.o
>+vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
> vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
> vsock_perf: vsock_perf.o
>
>diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
>index 6779d5008b27..d531dbbfa8ff 100644
>--- a/tools/testing/vsock/util.c
>+++ b/tools/testing/vsock/util.c
>@@ -11,15 +11,27 @@
> #include <stdio.h>
> #include <stdint.h>
> #include <stdlib.h>
>+#include <string.h>
> #include <signal.h>
> #include <unistd.h>
> #include <assert.h>
> #include <sys/epoll.h>
>+#include <sys/mman.h>
>+#include <linux/errqueue.h>
>+#include <poll.h>
>
> #include "timeout.h"
> #include "control.h"
> #include "util.h"
>
>+#ifndef SOL_VSOCK
>+#define SOL_VSOCK 287
>+#endif
>+
>+#ifndef VSOCK_RECVERR
>+#define VSOCK_RECVERR 1
>+#endif

Maybe better to re-define them in util.h where we include vm_socktes.h

>+
> /* Install signal handlers */
> void init_signals(void)
> {
>@@ -444,3 +456,213 @@ unsigned long hash_djb2(const void *data, size_t len)
>
> return hash;
> }
>+
>+void enable_so_zerocopy(int fd)
>+{
>+ int val = 1;
>+
>+ if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) {
>+ perror("setsockopt");
>+ exit(EXIT_FAILURE);
>+ }
>+}
>+
>+static void *mmap_no_fail(size_t bytes)
>+{
>+ void *res;
>+
>+ res = mmap(NULL, bytes, PROT_READ | PROT_WRITE,
>+ MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
>+ if (res == MAP_FAILED) {
>+ perror("mmap");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ return res;
>+}
>+
>+size_t iovec_bytes(const struct iovec *iov, size_t iovnum)
>+{
>+ size_t bytes;
>+ int i;
>+
>+ for (bytes = 0, i = 0; i < iovnum; i++)
>+ bytes += iov[i].iov_len;
>+
>+ return bytes;
>+}
>+
>+static void iovec_random_init(struct iovec *iov,
>+ const struct vsock_test_data *test_data)
>+{
>+ int i;
>+
>+ for (i = 0; i < test_data->vecs_cnt; i++) {
>+ int j;
>+
>+ if (test_data->vecs[i].iov_base == MAP_FAILED)
>+ continue;
>+
>+ for (j = 0; j < iov[i].iov_len; j++)
>+ ((uint8_t *)iov[i].iov_base)[j] = rand() & 0xff;
>+ }
>+}
>+
>+unsigned long iovec_hash_djb2(struct iovec *iov, size_t iovnum)
>+{
>+ unsigned long hash;
>+ size_t iov_bytes;
>+ size_t offs;
>+ void *tmp;
>+ int i;
>+
>+ iov_bytes = iovec_bytes(iov, iovnum);
>+
>+ tmp = malloc(iov_bytes);
>+ if (!tmp) {
>+ perror("malloc");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ for (offs = 0, i = 0; i < iovnum; i++) {
>+ memcpy(tmp + offs, iov[i].iov_base, iov[i].iov_len);
>+ offs += iov[i].iov_len;
>+ }
>+
>+ hash = hash_djb2(tmp, iov_bytes);
>+ free(tmp);
>+
>+ return hash;
>+}
>+
>+struct iovec *iovec_from_test_data(const struct vsock_test_data *test_data)
>+{
>+ const struct iovec *test_iovec;
>+ struct iovec *iovec;
>+ int i;
>+
>+ iovec = malloc(sizeof(*iovec) * test_data->vecs_cnt);
>+ if (!iovec) {
>+ perror("malloc");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ test_iovec = test_data->vecs;
>+
>+ for (i = 0; i < test_data->vecs_cnt; i++) {
>+ iovec[i].iov_len = test_iovec[i].iov_len;
>+ iovec[i].iov_base = mmap_no_fail(test_iovec[i].iov_len);
>+
>+ if (test_iovec[i].iov_base != MAP_FAILED &&
>+ test_iovec[i].iov_base)
>+ iovec[i].iov_base += (uintptr_t)test_iovec[i].iov_base;
>+ }
>+
>+ /* Unmap "invalid" elements. */
>+ for (i = 0; i < test_data->vecs_cnt; i++) {
>+ if (test_iovec[i].iov_base == MAP_FAILED) {
>+ if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
>+ perror("munmap");
>+ exit(EXIT_FAILURE);
>+ }
>+ }
>+ }
>+
>+ iovec_random_init(iovec, test_data);
>+
>+ return iovec;
>+}
>+
>+void free_iovec_test_data(const struct vsock_test_data *test_data,
>+ struct iovec *iovec)
>+{
>+ int i;
>+
>+ for (i = 0; i < test_data->vecs_cnt; i++) {
>+ if (test_data->vecs[i].iov_base != MAP_FAILED) {
>+ if (test_data->vecs[i].iov_base)
>+ iovec[i].iov_base -= (uintptr_t)test_data->vecs[i].iov_base;
>+
>+ if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
>+ perror("munmap");
>+ exit(EXIT_FAILURE);
>+ }
>+ }
>+ }
>+
>+ free(iovec);
>+}
>+
>+#define POLL_TIMEOUT_MS 100
>+void vsock_recv_completion(int fd, bool zerocopied, bool completion)
>+{
>+ struct sock_extended_err *serr;
>+ struct msghdr msg = { 0 };
>+ struct pollfd fds = { 0 };
>+ char cmsg_data[128];
>+ struct cmsghdr *cm;
>+ ssize_t res;
>+
>+ fds.fd = fd;
>+ fds.events = 0;
>+
>+ if (poll(&fds, 1, POLL_TIMEOUT_MS) < 0) {
>+ perror("poll");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (!(fds.revents & POLLERR)) {
>+ if (completion) {
>+ fprintf(stderr, "POLLERR expected\n");
>+ exit(EXIT_FAILURE);
>+ } else {
>+ return;
>+ }
>+ }
>+
>+ msg.msg_control = cmsg_data;
>+ msg.msg_controllen = sizeof(cmsg_data);
>+
>+ res = recvmsg(fd, &msg, MSG_ERRQUEUE);
>+ if (res) {
>+ fprintf(stderr, "failed to read error queue: %zi\n", res);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ cm = CMSG_FIRSTHDR(&msg);
>+ if (!cm) {
>+ fprintf(stderr, "cmsg: no cmsg\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (cm->cmsg_level != SOL_VSOCK) {
>+ fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (cm->cmsg_type != VSOCK_RECVERR) {
>+ fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ serr = (void *)CMSG_DATA(cm);
>+ if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
>+ fprintf(stderr, "serr: wrong origin: %u\n", serr->ee_origin);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (serr->ee_errno) {
>+ fprintf(stderr, "serr: wrong error code: %u\n", serr->ee_errno);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (zerocopied && (serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
>+ fprintf(stderr, "serr: was copy instead of zerocopy\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (!zerocopied && !(serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) {
>+ fprintf(stderr, "serr: was zerocopy instead of copy\n");
>+ exit(EXIT_FAILURE);
>+ }
>+}
>diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
>index e5407677ce05..8656a5ae63aa 100644
>--- a/tools/testing/vsock/util.h
>+++ b/tools/testing/vsock/util.h
>@@ -2,6 +2,7 @@
> #ifndef UTIL_H
> #define UTIL_H
>
>+#include <stdbool.h>
> #include <sys/socket.h>
> #include <linux/vm_sockets.h>
>
>@@ -18,6 +19,17 @@ struct test_opts {
> unsigned int peer_cid;
> };
>
>+#define VSOCK_TEST_DATA_MAX_IOV 4
>+
>+struct vsock_test_data {
>+ bool stream_only; /* Only for SOCK_STREAM. */
>+ bool zerocopied; /* Data must be zerocopied. */
>+ bool so_zerocopy; /* Enable zerocopy mode. */
>+ int sendmsg_errno; /* 'errno' after 'sendmsg()'. */
>+ int vecs_cnt; /* Number of elements in 'vecs'. */
>+ struct iovec vecs[VSOCK_TEST_DATA_MAX_IOV];
>+};
>+
> /* A test case definition. Test functions must print failures to stderr and
> * terminate with exit(EXIT_FAILURE).
> */
>@@ -53,4 +65,11 @@ void list_tests(const struct test_case *test_cases);
> void skip_test(struct test_case *test_cases, size_t test_cases_len,
> const char *test_id_str);
> unsigned long hash_djb2(const void *data, size_t len);
>+void enable_so_zerocopy(int fd);
>+size_t iovec_bytes(const struct iovec *iov, size_t iovnum);
>+unsigned long iovec_hash_djb2(struct iovec *iov, size_t iovnum);
>+struct iovec *iovec_from_test_data(const struct vsock_test_data *test_data);
>+void free_iovec_test_data(const struct vsock_test_data *test_data,
>+ struct iovec *iovec);
>+void vsock_recv_completion(int fd, bool zerocopied, bool completion);
> #endif /* UTIL_H */
>diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>index da4cb819a183..c1f7bc9abd22 100644
>--- a/tools/testing/vsock/vsock_test.c
>+++ b/tools/testing/vsock/vsock_test.c
>@@ -21,6 +21,7 @@
> #include <poll.h>
> #include <signal.h>
>
>+#include "vsock_test_zerocopy.h"
> #include "timeout.h"
> #include "control.h"
> #include "util.h"
>@@ -1269,6 +1270,21 @@ static struct test_case test_cases[] = {
> .run_client = test_stream_shutrd_client,
> .run_server = test_stream_shutrd_server,
> },
>+ {
>+ .name = "SOCK_STREAM MSG_ZEROCOPY",
>+ .run_client = test_stream_msgzcopy_client,
>+ .run_server = test_stream_msgzcopy_server,
>+ },
>+ {
>+ .name = "SOCK_SEQPACKET MSG_ZEROCOPY",
>+ .run_client = test_seqpacket_msgzcopy_client,
>+ .run_server = test_seqpacket_msgzcopy_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM MSG_ZEROCOPY empty MSG_ERRQUEUE",
>+ .run_client = test_stream_msgzcopy_empty_errq_client,
>+ .run_server = test_stream_msgzcopy_empty_errq_server,
>+ },
> {},
> };
>
>diff --git a/tools/testing/vsock/vsock_test_zerocopy.c b/tools/testing/vsock/vsock_test_zerocopy.c
>new file mode 100644
>index 000000000000..6968555d3611
>--- /dev/null
>+++ b/tools/testing/vsock/vsock_test_zerocopy.c
>@@ -0,0 +1,314 @@
>+// SPDX-License-Identifier: GPL-2.0-only
>+/* MSG_ZEROCOPY feature tests for vsock
>+ *
>+ * Copyright (C) 2023 SaluteDevices.
>+ *
>+ * Author: Arseniy Krasnov <[email protected]>
>+ */
>+
>+#include <stdio.h>
>+#include <stdlib.h>
>+#include <string.h>
>+#include <sys/mman.h>
>+#include <unistd.h>
>+#include <poll.h>
>+#include <linux/errqueue.h>
>+#include <linux/kernel.h>
>+#include <errno.h>
>+
>+#include "control.h"
>+#include "vsock_test_zerocopy.h"
>+
>+#define PAGE_SIZE 4096
>+
>+static struct vsock_test_data test_data_array[] = {
>+ /* Last element has non-page aligned size. */
>+ {
>+ .zerocopied = true,
>+ .so_zerocopy = true,
>+ .sendmsg_errno = 0,
>+ .vecs_cnt = 3,
>+ {
>+ { NULL, PAGE_SIZE },
>+ { NULL, PAGE_SIZE },
>+ { NULL, 200 }
>+ }
>+ },
>+ /* All elements have page aligned base and size. */
>+ {
>+ .zerocopied = true,
>+ .so_zerocopy = true,
>+ .sendmsg_errno = 0,
>+ .vecs_cnt = 3,
>+ {
>+ { NULL, PAGE_SIZE },
>+ { NULL, PAGE_SIZE * 2 },
>+ { NULL, PAGE_SIZE * 3 }
>+ }
>+ },
>+ /* All elements have page aligned base and size. But
>+ * data length is bigger than 64Kb.
>+ */
>+ {
>+ .zerocopied = true,
>+ .so_zerocopy = true,
>+ .sendmsg_errno = 0,
>+ .vecs_cnt = 3,
>+ {
>+ { NULL, PAGE_SIZE * 16 },
>+ { NULL, PAGE_SIZE * 16 },
>+ { NULL, PAGE_SIZE * 16 }
>+ }
>+ },
>+ /* Middle element has both non-page aligned base and size. */
>+ {
>+ .zerocopied = true,
>+ .so_zerocopy = true,
>+ .sendmsg_errno = 0,
>+ .vecs_cnt = 3,
>+ {
>+ { NULL, PAGE_SIZE },
>+ { (void *)1, 100 },
>+ { NULL, PAGE_SIZE }
>+ }
>+ },
>+ /* Middle element is unmapped. */
>+ {
>+ .zerocopied = false,
>+ .so_zerocopy = true,
>+ .sendmsg_errno = ENOMEM,
>+ .vecs_cnt = 3,
>+ {
>+ { NULL, PAGE_SIZE },
>+ { MAP_FAILED, PAGE_SIZE },
>+ { NULL, PAGE_SIZE }
>+ }
>+ },
>+ /* Valid data, but SO_ZEROCOPY is off. This
>+ * will trigger fallback to copy.
>+ */
>+ {
>+ .zerocopied = false,
>+ .so_zerocopy = false,
>+ .sendmsg_errno = 0,
>+ .vecs_cnt = 1,
>+ {
>+ { NULL, PAGE_SIZE }
>+ }
>+ },
>+ /* Valid data, but message is bigger than peer's
>+ * buffer, so this will trigger fallback to copy.
>+ * This test is for SOCK_STREAM only, because
>+ * for SOCK_SEQPACKET, 'sendmsg()' returns EMSGSIZE.
>+ */
>+ {
>+ .stream_only = true,
>+ .zerocopied = false,
>+ .so_zerocopy = true,
>+ .sendmsg_errno = 0,
>+ .vecs_cnt = 1,
>+ {
>+ { NULL, 100 * PAGE_SIZE }
>+ }
>+ },
>+};
>+
>+static void test_client(const struct test_opts *opts,
>+ const struct vsock_test_data *test_data,
>+ bool sock_seqpacket)
>+{
>+ struct msghdr msg = { 0 };
>+ ssize_t sendmsg_res;
>+ struct iovec *iovec;
>+ int fd;
>+
>+ if (sock_seqpacket)
>+ fd = vsock_seqpacket_connect(opts->peer_cid, 1234);
>+ else
>+ fd = vsock_stream_connect(opts->peer_cid, 1234);
>+
>+ if (fd < 0) {
>+ perror("connect");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (test_data->so_zerocopy)
>+ enable_so_zerocopy(fd);
>+
>+ iovec = iovec_from_test_data(test_data);
>+
>+ msg.msg_iov = iovec;
>+ msg.msg_iovlen = test_data->vecs_cnt;
>+
>+ errno = 0;
>+
>+ sendmsg_res = sendmsg(fd, &msg, MSG_ZEROCOPY);
>+ if (errno != test_data->sendmsg_errno) {
>+ fprintf(stderr, "expected 'errno' == %i, got %i\n",
>+ test_data->sendmsg_errno, errno);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (!errno) {
>+ if (sendmsg_res != iovec_bytes(iovec, test_data->vecs_cnt)) {
>+ fprintf(stderr, "expected 'sendmsg()' == %li, got %li\n",
>+ iovec_bytes(iovec, test_data->vecs_cnt),
>+ sendmsg_res);
>+ exit(EXIT_FAILURE);
>+ }
>+ }
>+
>+ /* Receive completion only in case of successful 'sendmsg()'. */
>+ vsock_recv_completion(fd, test_data->zerocopied,
>+ test_data->so_zerocopy && !test_data->sendmsg_errno);
>+
>+ if (!test_data->sendmsg_errno)
>+ control_writeulong(iovec_hash_djb2(iovec, test_data->vecs_cnt));
>+ else
>+ control_writeulong(0);
>+
>+ control_writeln("DONE");
>+ free_iovec_test_data(test_data, iovec);
>+ close(fd);
>+}
>+
>+void test_stream_msgzcopy_client(const struct test_opts *opts)
>+{
>+ int i;
>+
>+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>+ test_client(opts, &test_data_array[i], false);
>+}
>+
>+void test_seqpacket_msgzcopy_client(const struct test_opts *opts)
>+{
>+ int i;
>+
>+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++) {
>+ if (test_data_array[i].stream_only)
>+ continue;
>+
>+ test_client(opts, &test_data_array[i], true);
>+ }
>+}
>+
>+static void test_server(const struct test_opts *opts,
>+ const struct vsock_test_data *test_data,
>+ bool sock_seqpacket)
>+{
>+ unsigned long remote_hash;
>+ unsigned long local_hash;
>+ ssize_t total_bytes_rec;
>+ unsigned char *data;
>+ size_t data_len;
>+ int fd;
>+
>+ if (sock_seqpacket)
>+ fd = vsock_seqpacket_accept(VMADDR_CID_ANY, 1234, NULL);
>+ else
>+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
>+
>+ if (fd < 0) {
>+ perror("accept");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ data_len = iovec_bytes(test_data->vecs, test_data->vecs_cnt);
>+
>+ data = malloc(data_len);
>+ if (!data) {
>+ perror("malloc");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ total_bytes_rec = 0;
>+
>+ while (total_bytes_rec != data_len) {
>+ ssize_t bytes_rec;
>+
>+ bytes_rec = read(fd, data + total_bytes_rec,
>+ data_len - total_bytes_rec);
>+ if (bytes_rec <= 0)
>+ break;
>+
>+ total_bytes_rec += bytes_rec;
>+ }
>+
>+ if (test_data->sendmsg_errno == 0)
>+ local_hash = hash_djb2(data, data_len);
>+ else
>+ local_hash = 0;
>+
>+ free(data);
>+
>+ /* Waiting for some result. */
>+ remote_hash = control_readulong();
>+ if (remote_hash != local_hash) {
>+ fprintf(stderr, "hash mismatch\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ control_expectln("DONE");
>+ close(fd);
>+}
>+
>+void test_stream_msgzcopy_server(const struct test_opts *opts)
>+{
>+ int i;
>+
>+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++)
>+ test_server(opts, &test_data_array[i], false);
>+}
>+
>+void test_seqpacket_msgzcopy_server(const struct test_opts *opts)
>+{
>+ int i;
>+
>+ for (i = 0; i < ARRAY_SIZE(test_data_array); i++) {
>+ if (test_data_array[i].stream_only)
>+ continue;
>+
>+ test_server(opts, &test_data_array[i], true);
>+ }
>+}
>+
>+void test_stream_msgzcopy_empty_errq_client(const struct test_opts *opts)
>+{
>+ struct msghdr msg = { 0 };
>+ char cmsg_data[128];
>+ ssize_t res;
>+ int fd;
>+
>+ fd = vsock_stream_connect(opts->peer_cid, 1234);
>+ if (fd < 0) {
>+ perror("connect");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ msg.msg_control = cmsg_data;
>+ msg.msg_controllen = sizeof(cmsg_data);
>+
>+ res = recvmsg(fd, &msg, MSG_ERRQUEUE);
>+ if (res != -1) {
>+ fprintf(stderr, "expected 'recvmsg(2)' failure, got %zi\n",
>+ res);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ control_writeln("DONE");
>+ close(fd);
>+}
>+
>+void test_stream_msgzcopy_empty_errq_server(const struct test_opts *opts)
>+{
>+ int fd;
>+
>+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
>+ if (fd < 0) {
>+ perror("accept");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ control_expectln("DONE");
>+ close(fd);
>+}
>diff --git a/tools/testing/vsock/vsock_test_zerocopy.h b/tools/testing/vsock/vsock_test_zerocopy.h
>new file mode 100644
>index 000000000000..3ef2579e024d
>--- /dev/null
>+++ b/tools/testing/vsock/vsock_test_zerocopy.h
>@@ -0,0 +1,15 @@
>+/* SPDX-License-Identifier: GPL-2.0-only */
>+#ifndef VSOCK_TEST_ZEROCOPY_H
>+#define VSOCK_TEST_ZEROCOPY_H
>+#include "util.h"
>+
>+void test_stream_msgzcopy_client(const struct test_opts *opts);
>+void test_stream_msgzcopy_server(const struct test_opts *opts);
>+
>+void test_seqpacket_msgzcopy_client(const struct test_opts *opts);
>+void test_seqpacket_msgzcopy_server(const struct test_opts *opts);
>+
>+void test_stream_msgzcopy_empty_errq_client(const struct test_opts *opts);
>+void test_stream_msgzcopy_empty_errq_server(const struct test_opts *opts);
>+
>+#endif /* VSOCK_TEST_ZEROCOPY_H */
>--
>2.25.1
>

2023-09-26 17:25:55

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH net-next v1 08/12] vsock: enable setting SO_ZEROCOPY

On Fri, Sep 22, 2023 at 08:24:24AM +0300, Arseniy Krasnov wrote:
>For AF_VSOCK, zerocopy tx mode depends on transport, so this option must
>be set in AF_VSOCK implementation where transport is accessible (if
>transport is not set during setting SO_ZEROCOPY: for example socket is
>not connected, then SO_ZEROCOPY will be enabled, but once transport will
>be assigned, support of this type of transmission will be checked).
>
>To handle SO_ZEROCOPY, AF_VSOCK implementation uses SOCK_CUSTOM_SOCKOPT
>bit, thus handling SOL_SOCKET option operations, but all of them except
>SO_ZEROCOPY will be forwarded to the generic handler by calling
>'sock_setsockopt()'.
>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> Changelog:
> v5(big patchset) -> v1:
> * Compact 'if' conditions.
> * Rename 'zc_val' to 'zerocopy'.
> * Use 'zerocopy' value directly in 'sock_valbool_flag()', without
> ?: operator.
> * Set 'SOCK_CUSTOM_SOCKOPT' bit for connectible sockets only, as
> suggested by Bobby Eshleman <[email protected]>.
>
> net/vmw_vsock/af_vsock.c | 46 ++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 44 insertions(+), 2 deletions(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 482300eb88e0..c05a42e02a17 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -1406,8 +1406,16 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
> goto out;
> }
>
>- if (vsock_msgzerocopy_allow(transport))
>+ if (vsock_msgzerocopy_allow(transport)) {
> set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags);
>+ } else if (sock_flag(sk, SOCK_ZEROCOPY)) {
>+ /* If this option was set before 'connect()',
>+ * when transport was unknown, check that this
>+ * feature is supported here.
>+ */
>+ err = -EOPNOTSUPP;
>+ goto out;
>+ }
>
> err = vsock_auto_bind(vsk);
> if (err)
>@@ -1643,7 +1651,7 @@ static int vsock_connectible_setsockopt(struct socket *sock,
> const struct vsock_transport *transport;
> u64 val;
>
>- if (level != AF_VSOCK)
>+ if (level != AF_VSOCK && level != SOL_SOCKET)
> return -ENOPROTOOPT;
>
> #define COPY_IN(_v) \
>@@ -1666,6 +1674,34 @@ static int vsock_connectible_setsockopt(struct socket *sock,
>
> transport = vsk->transport;
>
>+ if (level == SOL_SOCKET) {
>+ int zerocopy;
>+
>+ if (optname != SO_ZEROCOPY) {
>+ release_sock(sk);
>+ return sock_setsockopt(sock, level, optname, optval, optlen);
>+ }
>+
>+ /* Use 'int' type here, because variable to
>+ * set this option usually has this type.
>+ */
>+ COPY_IN(zerocopy);
>+
>+ if (zerocopy < 0 || zerocopy > 1) {
>+ err = -EINVAL;
>+ goto exit;
>+ }
>+
>+ if (transport && !vsock_msgzerocopy_allow(transport)) {
>+ err = -EOPNOTSUPP;
>+ goto exit;
>+ }
>+
>+ sock_valbool_flag(sk, SOCK_ZEROCOPY,
>+ zerocopy);

it's not necessary to wrap this call.

>+ goto exit;
>+ }
>+
> switch (optname) {
> case SO_VM_SOCKETS_BUFFER_SIZE:
> COPY_IN(val);
>@@ -2322,6 +2358,12 @@ static int vsock_create(struct net *net, struct socket *sock,
> }
> }
>
>+ /* SOCK_DGRAM doesn't have 'setsockopt' callback set in its
>+ * proto_ops, so there is no handler for custom logic.
>+ */
>+ if (sock_type_connectible(sock->type))
>+ set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags);
>+
> vsock_insert_unbound(vsk);
>
> return 0;
>--
>2.25.1
>

2023-09-26 17:57:21

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH net-next v1 00/12] vsock/virtio: continue MSG_ZEROCOPY support

Hi Arseniy,

On Fri, Sep 22, 2023 at 08:24:16AM +0300, Arseniy Krasnov wrote:
>Hello,
>
>this patchset contains second and third parts of another big patchset
>for MSG_ZEROCOPY flag support:
>https://lore.kernel.org/netdev/[email protected]/
>
>During review of this series, Stefano Garzarella <[email protected]>
>suggested to split it for three parts to simplify review and merging:
>
>1) virtio and vhost updates (for fragged skbs) (merged to net-next, see
> link below)
>2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
> tx completions) and update for Documentation/. <-- this patchset
>3) Updates for tests and utils. <-- this patchset
>
>Part 1) was merged:
>https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=71b263e79370348349553ecdf46f4a69eb436dc7
>
>Head for this patchset is:
>https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=71b263e79370348349553ecdf46f4a69eb436dc7

Thanks for the series.
I did a quick review highlighting some things that need to be changed.

Overall, the series seems to be in good shape. The tests went well.

In the next few days I'll see if I can get a better look at the larger
patches like the tests, or I'll check in the next version.

Thanks,
Stefano

2023-09-26 19:24:36

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH net-next v1 02/12] vsock: read from socket's error queue

On Fri, Sep 22, 2023 at 08:24:18AM +0300, Arseniy Krasnov wrote:
>This adds handling of MSG_ERRQUEUE input flag in receive call. This flag
>is used to read socket's error queue instead of data queue. Possible
>scenario of error queue usage is receiving completions for transmission
>with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK'
>and 'VSOCK_RECVERR'.
>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> Changelog:
> v5(big patchset) -> v1:
> * R-b tag removed, due to added defines to 'include/uapi/linux/vsock.h'.
> Both 'SOL_VSOCK' and 'VSOCK_RECVERR' are needed by userspace, so
> they were placed to 'include/uapi/linux/vsock.h'. At the same time,
> the same define for 'SOL_VSOCK' was placed to 'include/linux/socket.h'.
> This is needed because this file contains SOL_XXX defines for different
> types of socket, so it prevents situation when another new SOL_XXX
> will use constant 287.
>
> include/linux/socket.h | 1 +
> include/uapi/linux/vsock.h | 9 +++++++++
> net/vmw_vsock/af_vsock.c | 6 ++++++
> 3 files changed, 16 insertions(+)
> create mode 100644 include/uapi/linux/vsock.h
>
>diff --git a/include/linux/socket.h b/include/linux/socket.h
>index 39b74d83c7c4..cfcb7e2c3813 100644
>--- a/include/linux/socket.h
>+++ b/include/linux/socket.h
>@@ -383,6 +383,7 @@ struct ucred {
> #define SOL_MPTCP 284
> #define SOL_MCTP 285
> #define SOL_SMC 286
>+#define SOL_VSOCK 287
>
> /* IPX options */
> #define IPX_TYPE 1
>diff --git a/include/uapi/linux/vsock.h b/include/uapi/linux/vsock.h
>new file mode 100644
>index 000000000000..b25c1347a3b8
>--- /dev/null
>+++ b/include/uapi/linux/vsock.h

We already have include/uapi/linux/vm_sockets.h

Should we include these changes there instead of creating a new header?

>@@ -0,0 +1,9 @@
>+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>+#ifndef _UAPI_LINUX_VSOCK_H
>+#define _UAPI_LINUX_VSOCK_H
>+
>+#define SOL_VSOCK 287

Why we need to re-define this also here?

In that case, should we protect with some guards to avoid double
defines?

>+
>+#define VSOCK_RECVERR 1
>+
>+#endif /* _UAPI_LINUX_VSOCK_H */
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index d841f4de33b0..4fd11bf34bc7 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -110,6 +110,8 @@
> #include <linux/workqueue.h>
> #include <net/sock.h>
> #include <net/af_vsock.h>
>+#include <linux/errqueue.h>
>+#include <uapi/linux/vsock.h>
>
> static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
> static void vsock_sk_destruct(struct sock *sk);
>@@ -2137,6 +2139,10 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> int err;
>
> sk = sock->sk;
>+
>+ if (unlikely(flags & MSG_ERRQUEUE))
>+ return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, VSOCK_RECVERR);
>+
> vsk = vsock_sk(sk);
> err = 0;
>
>--
>2.25.1
>

2023-09-26 21:29:56

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH net-next v1 01/12] vsock: fix EPOLLERR set on non-empty error queue

On Fri, Sep 22, 2023 at 08:24:17AM +0300, Arseniy Krasnov wrote:
>If socket's error queue is not empty, EPOLLERR must be set. Otherwise,
>reader of error queue won't detect data in it using EPOLLERR bit.
>Currently for AF_VSOCK this is reproducible only with MSG_ZEROCOPY, as
>this feature is the only user of an error queue of the socket.

So this is not really a fix. I'd use a different title to avoid
confusion on backporting this on stable branches or not.

Maybe just "vsock: set EPOLLERR on non-empty error queue"

The change LGTM.

Stefano

>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> net/vmw_vsock/af_vsock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 013b65241b65..d841f4de33b0 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -1030,7 +1030,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
> poll_wait(file, sk_sleep(sk), wait);
> mask = 0;
>
>- if (sk->sk_err)
>+ if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue))
> /* Signify that there has been an error on this socket. */
> mask |= EPOLLERR;
>
>--
>2.25.1
>

2023-09-26 23:14:11

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH net-next v1 12/12] test/vsock: io_uring rx/tx tests



On 26.09.2023 16:04, Stefano Garzarella wrote:
> On Fri, Sep 22, 2023 at 08:24:28AM +0300, Arseniy Krasnov wrote:
>> This adds set of tests which use io_uring for rx/tx. This test suite is
>> implemented as separated util like 'vsock_test' and has the same set of
>> input arguments as 'vsock_test'. These tests only cover cases of data
>> transmission (no connect/bind/accept etc).
>>
>> Signed-off-by: Arseniy Krasnov <[email protected]>
>> ---
>> Changelog:
>> v5(big patchset) -> v1:
>>  * Use LDLIBS instead of LDFLAGS.
>>
>> tools/testing/vsock/Makefile           |   7 +-
>> tools/testing/vsock/vsock_uring_test.c | 321 +++++++++++++++++++++++++
>> 2 files changed, 327 insertions(+), 1 deletion(-)
>> create mode 100644 tools/testing/vsock/vsock_uring_test.c
>>
>> diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
>> index 1a26f60a596c..c84380bfc18d 100644
>> --- a/tools/testing/vsock/Makefile
>> +++ b/tools/testing/vsock/Makefile
>> @@ -1,12 +1,17 @@
>> # SPDX-License-Identifier: GPL-2.0-only
>> +ifeq ($(MAKECMDGOALS),vsock_uring_test)
>> +LDLIBS = -luring
>> +endif
>> +
>
> This will fails if for example we call make with more targets,
> e.g. `make vsock_test vsock_uring_test`.
>
> I'd suggest to use something like this:
>
> --- a/tools/testing/vsock/Makefile
> +++ b/tools/testing/vsock/Makefile
> @@ -1,13 +1,11 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -ifeq ($(MAKECMDGOALS),vsock_uring_test)
> -LDLIBS = -luring
> -endif
> -
>  all: test vsock_perf
>  test: vsock_test vsock_diag_test
>  vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
>  vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
>  vsock_perf: vsock_perf.o
> +
> +vsock_uring_test: LDLIBS = -luring
>  vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o
>
>  CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE
>
>> all: test vsock_perf
>> test: vsock_test vsock_diag_test
>> vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
>> vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
>> vsock_perf: vsock_perf.o
>> +vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o
>
> Shoud we add this new test to the "test" target as well?

Ok, but in this case, this target will always depend on liburing.

Thanks, Arseniy

>
> Stefano
>

2023-09-27 00:44:57

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH net-next v1 12/12] test/vsock: io_uring rx/tx tests

On Fri, Sep 22, 2023 at 08:24:28AM +0300, Arseniy Krasnov wrote:
>This adds set of tests which use io_uring for rx/tx. This test suite is
>implemented as separated util like 'vsock_test' and has the same set of
>input arguments as 'vsock_test'. These tests only cover cases of data
>transmission (no connect/bind/accept etc).
>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> Changelog:
> v5(big patchset) -> v1:
> * Use LDLIBS instead of LDFLAGS.
>
> tools/testing/vsock/Makefile | 7 +-
> tools/testing/vsock/vsock_uring_test.c | 321 +++++++++++++++++++++++++
> 2 files changed, 327 insertions(+), 1 deletion(-)
> create mode 100644 tools/testing/vsock/vsock_uring_test.c
>
>diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
>index 1a26f60a596c..c84380bfc18d 100644
>--- a/tools/testing/vsock/Makefile
>+++ b/tools/testing/vsock/Makefile
>@@ -1,12 +1,17 @@
> # SPDX-License-Identifier: GPL-2.0-only
>+ifeq ($(MAKECMDGOALS),vsock_uring_test)
>+LDLIBS = -luring
>+endif
>+

This will fails if for example we call make with more targets,
e.g. `make vsock_test vsock_uring_test`.

I'd suggest to use something like this:

--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -1,13 +1,11 @@
# SPDX-License-Identifier: GPL-2.0-only
-ifeq ($(MAKECMDGOALS),vsock_uring_test)
-LDLIBS = -luring
-endif
-
all: test vsock_perf
test: vsock_test vsock_diag_test
vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
vsock_perf: vsock_perf.o
+
+vsock_uring_test: LDLIBS = -luring
vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o

CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE

> all: test vsock_perf
> test: vsock_test vsock_diag_test
> vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
> vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
> vsock_perf: vsock_perf.o
>+vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o

Shoud we add this new test to the "test" target as well?

Stefano

2023-09-27 07:10:34

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH net-next v1 08/12] vsock: enable setting SO_ZEROCOPY



On 26.09.2023 15:56, Stefano Garzarella wrote:
> On Fri, Sep 22, 2023 at 08:24:24AM +0300, Arseniy Krasnov wrote:
>> For AF_VSOCK, zerocopy tx mode depends on transport, so this option must
>> be set in AF_VSOCK implementation where transport is accessible (if
>> transport is not set during setting SO_ZEROCOPY: for example socket is
>> not connected, then SO_ZEROCOPY will be enabled, but once transport will
>> be assigned, support of this type of transmission will be checked).
>>
>> To handle SO_ZEROCOPY, AF_VSOCK implementation uses SOCK_CUSTOM_SOCKOPT
>> bit, thus handling SOL_SOCKET option operations, but all of them except
>> SO_ZEROCOPY will be forwarded to the generic handler by calling
>> 'sock_setsockopt()'.
>>
>> Signed-off-by: Arseniy Krasnov <[email protected]>
>> ---
>> Changelog:
>> v5(big patchset) -> v1:
>>  * Compact 'if' conditions.
>>  * Rename 'zc_val' to 'zerocopy'.
>>  * Use 'zerocopy' value directly in 'sock_valbool_flag()', without
>>    ?: operator.
>>  * Set 'SOCK_CUSTOM_SOCKOPT' bit for connectible sockets only, as
>>    suggested by Bobby Eshleman <[email protected]>.
>>
>> net/vmw_vsock/af_vsock.c | 46 ++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 44 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index 482300eb88e0..c05a42e02a17 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -1406,8 +1406,16 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
>>             goto out;
>>         }
>>
>> -        if (vsock_msgzerocopy_allow(transport))
>> +        if (vsock_msgzerocopy_allow(transport)) {
>>             set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags);
>> +        } else if (sock_flag(sk, SOCK_ZEROCOPY)) {
>> +            /* If this option was set before 'connect()',
>> +             * when transport was unknown, check that this
>> +             * feature is supported here.
>> +             */
>> +            err = -EOPNOTSUPP;
>> +            goto out;
>> +        }
>>
>>         err = vsock_auto_bind(vsk);
>>         if (err)
>> @@ -1643,7 +1651,7 @@ static int vsock_connectible_setsockopt(struct socket *sock,
>>     const struct vsock_transport *transport;
>>     u64 val;
>>
>> -    if (level != AF_VSOCK)
>> +    if (level != AF_VSOCK && level != SOL_SOCKET)
>>         return -ENOPROTOOPT;
>>
>> #define COPY_IN(_v)                                       \
>> @@ -1666,6 +1674,34 @@ static int vsock_connectible_setsockopt(struct socket *sock,
>>
>>     transport = vsk->transport;
>>
>> +    if (level == SOL_SOCKET) {
>> +        int zerocopy;
>> +
>> +        if (optname != SO_ZEROCOPY) {
>> +            release_sock(sk);
>> +            return sock_setsockopt(sock, level, optname, optval, optlen);
>> +        }
>> +
>> +        /* Use 'int' type here, because variable to
>> +         * set this option usually has this type.
>> +         */
>> +        COPY_IN(zerocopy);
>> +
>> +        if (zerocopy < 0 || zerocopy > 1) {
>> +            err = -EINVAL;
>> +            goto exit;
>> +        }
>> +
>> +        if (transport && !vsock_msgzerocopy_allow(transport)) {
>> +            err = -EOPNOTSUPP;
>> +            goto exit;
>> +        }
>> +
>> +        sock_valbool_flag(sk, SOCK_ZEROCOPY,
>> +                  zerocopy);
>
> it's not necessary to wrap this call.

Sorry, what do you mean ?

Thanks, Arseniy

>
>> +        goto exit;
>> +    }
>> +
>>     switch (optname) {
>>     case SO_VM_SOCKETS_BUFFER_SIZE:
>>         COPY_IN(val);
>> @@ -2322,6 +2358,12 @@ static int vsock_create(struct net *net, struct socket *sock,
>>         }
>>     }
>>
>> +    /* SOCK_DGRAM doesn't have 'setsockopt' callback set in its
>> +     * proto_ops, so there is no handler for custom logic.
>> +     */
>> +    if (sock_type_connectible(sock->type))
>> +        set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags);
>> +
>>     vsock_insert_unbound(vsk);
>>
>>     return 0;
>> -- 
>> 2.25.1
>>
>

2023-09-27 07:26:41

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH net-next v1 02/12] vsock: read from socket's error queue



On 26.09.2023 15:55, Stefano Garzarella wrote:
> On Fri, Sep 22, 2023 at 08:24:18AM +0300, Arseniy Krasnov wrote:
>> This adds handling of MSG_ERRQUEUE input flag in receive call. This flag
>> is used to read socket's error queue instead of data queue. Possible
>> scenario of error queue usage is receiving completions for transmission
>> with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK'
>> and 'VSOCK_RECVERR'.
>>
>> Signed-off-by: Arseniy Krasnov <[email protected]>
>> ---
>> Changelog:
>> v5(big patchset) -> v1:
>>  * R-b tag removed, due to added defines to 'include/uapi/linux/vsock.h'.
>>    Both 'SOL_VSOCK' and 'VSOCK_RECVERR' are needed by userspace, so
>>    they were placed to 'include/uapi/linux/vsock.h'. At the same time,
>>    the same define for 'SOL_VSOCK' was placed to 'include/linux/socket.h'.
>>    This is needed because this file contains SOL_XXX defines for different
>>    types of socket, so it prevents situation when another new SOL_XXX
>>    will use constant 287.
>>
>> include/linux/socket.h     | 1 +
>> include/uapi/linux/vsock.h | 9 +++++++++
>> net/vmw_vsock/af_vsock.c   | 6 ++++++
>> 3 files changed, 16 insertions(+)
>> create mode 100644 include/uapi/linux/vsock.h
>>
>> diff --git a/include/linux/socket.h b/include/linux/socket.h
>> index 39b74d83c7c4..cfcb7e2c3813 100644
>> --- a/include/linux/socket.h
>> +++ b/include/linux/socket.h
>> @@ -383,6 +383,7 @@ struct ucred {
>> #define SOL_MPTCP    284
>> #define SOL_MCTP    285
>> #define SOL_SMC        286
>> +#define SOL_VSOCK    287
>>
>> /* IPX options */
>> #define IPX_TYPE    1
>> diff --git a/include/uapi/linux/vsock.h b/include/uapi/linux/vsock.h
>> new file mode 100644
>> index 000000000000..b25c1347a3b8
>> --- /dev/null
>> +++ b/include/uapi/linux/vsock.h
>
> We already have include/uapi/linux/vm_sockets.h
>
> Should we include these changes there instead of creating a new header?
>
>> @@ -0,0 +1,9 @@
>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>> +#ifndef _UAPI_LINUX_VSOCK_H
>> +#define _UAPI_LINUX_VSOCK_H
>> +
>> +#define SOL_VSOCK    287
>
> Why we need to re-define this also here?

Reason of this re-define is that SOL_VSOCK must be exported to userspace, so
i place it to include/uapi/XXX. At the same time include/linux/socket.h contains
constants for SOL_XXX and they goes sequentially in this file (e.g. one by one,
each new value is +1 to the previous). So if I add SOL_VSOCK to include/uapi/XXX
only, it is possible that someone will add new SOL_VERY_NEW_SOCKET == 287 to
include/linux/socket.h in future. I think it is not good that two SOL_XXX will
have same value.

For example SOL_RDS and SOL_TIPS uses the same approach - there are two same defines:
one in include/uapi/ and another is in include/linux/socket.h

>
> In that case, should we protect with some guards to avoid double
> defines?

May be:

in include/linux/socket.h

#ifndef SOL_VSOCK
#define SOL_VSOCK 287
#endif

But not sure...

>
>> +
>> +#define VSOCK_RECVERR    1
>> +
>> +#endif /* _UAPI_LINUX_VSOCK_H */
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index d841f4de33b0..4fd11bf34bc7 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -110,6 +110,8 @@
>> #include <linux/workqueue.h>
>> #include <net/sock.h>
>> #include <net/af_vsock.h>
>> +#include <linux/errqueue.h>
>> +#include <uapi/linux/vsock.h>
>>
>> static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
>> static void vsock_sk_destruct(struct sock *sk);
>> @@ -2137,6 +2139,10 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
>>     int err;
>>
>>     sk = sock->sk;
>> +
>> +    if (unlikely(flags & MSG_ERRQUEUE))
>> +        return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, VSOCK_RECVERR);
>> +
>>     vsk = vsock_sk(sk);
>>     err = 0;
>>
>> -- 
>> 2.25.1
>>
>

2023-09-27 09:04:14

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH net-next v1 12/12] test/vsock: io_uring rx/tx tests

On Tue, Sep 26, 2023 at 11:00:19PM +0300, Arseniy Krasnov wrote:
>
>
>On 26.09.2023 16:04, Stefano Garzarella wrote:
>> On Fri, Sep 22, 2023 at 08:24:28AM +0300, Arseniy Krasnov wrote:
>>> This adds set of tests which use io_uring for rx/tx. This test suite is
>>> implemented as separated util like 'vsock_test' and has the same set of
>>> input arguments as 'vsock_test'. These tests only cover cases of data
>>> transmission (no connect/bind/accept etc).
>>>
>>> Signed-off-by: Arseniy Krasnov <[email protected]>
>>> ---
>>> Changelog:
>>> v5(big patchset) -> v1:
>>> ?* Use LDLIBS instead of LDFLAGS.
>>>
>>> tools/testing/vsock/Makefile?????????? |?? 7 +-
>>> tools/testing/vsock/vsock_uring_test.c | 321 +++++++++++++++++++++++++
>>> 2 files changed, 327 insertions(+), 1 deletion(-)
>>> create mode 100644 tools/testing/vsock/vsock_uring_test.c
>>>
>>> diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
>>> index 1a26f60a596c..c84380bfc18d 100644
>>> --- a/tools/testing/vsock/Makefile
>>> +++ b/tools/testing/vsock/Makefile
>>> @@ -1,12 +1,17 @@
>>> # SPDX-License-Identifier: GPL-2.0-only
>>> +ifeq ($(MAKECMDGOALS),vsock_uring_test)
>>> +LDLIBS = -luring
>>> +endif
>>> +
>>
>> This will fails if for example we call make with more targets,
>> e.g. `make vsock_test vsock_uring_test`.
>>
>> I'd suggest to use something like this:
>>
>> --- a/tools/testing/vsock/Makefile
>> +++ b/tools/testing/vsock/Makefile
>> @@ -1,13 +1,11 @@
>> ?# SPDX-License-Identifier: GPL-2.0-only
>> -ifeq ($(MAKECMDGOALS),vsock_uring_test)
>> -LDLIBS = -luring
>> -endif
>> -
>> ?all: test vsock_perf
>> ?test: vsock_test vsock_diag_test
>> ?vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
>> ?vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
>> ?vsock_perf: vsock_perf.o
>> +
>> +vsock_uring_test: LDLIBS = -luring
>> ?vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o
>>
>> ?CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE
>>
>>> all: test vsock_perf
>>> test: vsock_test vsock_diag_test
>>> vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
>>> vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
>>> vsock_perf: vsock_perf.o
>>> +vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o
>>
>> Shoud we add this new test to the "test" target as well?
>
>Ok, but in this case, this target will always depend on liburing.

I think it's fine.

If they want to run all the tests, they need liburing. If they don't
want to build io_uring tests, they can just do `make vsock_test`.

Stefano

2023-09-27 10:48:50

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH net-next v1 08/12] vsock: enable setting SO_ZEROCOPY

On Tue, Sep 26, 2023 at 10:38:06PM +0300, Arseniy Krasnov wrote:
>
>
>On 26.09.2023 15:56, Stefano Garzarella wrote:
>> On Fri, Sep 22, 2023 at 08:24:24AM +0300, Arseniy Krasnov wrote:
>>> For AF_VSOCK, zerocopy tx mode depends on transport, so this option must
>>> be set in AF_VSOCK implementation where transport is accessible (if
>>> transport is not set during setting SO_ZEROCOPY: for example socket is
>>> not connected, then SO_ZEROCOPY will be enabled, but once transport will
>>> be assigned, support of this type of transmission will be checked).
>>>
>>> To handle SO_ZEROCOPY, AF_VSOCK implementation uses SOCK_CUSTOM_SOCKOPT
>>> bit, thus handling SOL_SOCKET option operations, but all of them except
>>> SO_ZEROCOPY will be forwarded to the generic handler by calling
>>> 'sock_setsockopt()'.
>>>
>>> Signed-off-by: Arseniy Krasnov <[email protected]>
>>> ---
>>> Changelog:
>>> v5(big patchset) -> v1:
>>> ?* Compact 'if' conditions.
>>> ?* Rename 'zc_val' to 'zerocopy'.
>>> ?* Use 'zerocopy' value directly in 'sock_valbool_flag()', without
>>> ?? ?: operator.
>>> ?* Set 'SOCK_CUSTOM_SOCKOPT' bit for connectible sockets only, as
>>> ?? suggested by Bobby Eshleman <[email protected]>.
>>>
>>> net/vmw_vsock/af_vsock.c | 46 ++++++++++++++++++++++++++++++++++++++--
>>> 1 file changed, 44 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>> index 482300eb88e0..c05a42e02a17 100644
>>> --- a/net/vmw_vsock/af_vsock.c
>>> +++ b/net/vmw_vsock/af_vsock.c
>>> @@ -1406,8 +1406,16 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
>>> ??????????? goto out;
>>> ??????? }
>>>
>>> -??????? if (vsock_msgzerocopy_allow(transport))
>>> +??????? if (vsock_msgzerocopy_allow(transport)) {
>>> ??????????? set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags);
>>> +??????? } else if (sock_flag(sk, SOCK_ZEROCOPY)) {
>>> +??????????? /* If this option was set before 'connect()',
>>> +???????????? * when transport was unknown, check that this
>>> +???????????? * feature is supported here.
>>> +???????????? */
>>> +??????????? err = -EOPNOTSUPP;
>>> +??????????? goto out;
>>> +??????? }
>>>
>>> ??????? err = vsock_auto_bind(vsk);
>>> ??????? if (err)
>>> @@ -1643,7 +1651,7 @@ static int vsock_connectible_setsockopt(struct socket *sock,
>>> ????const struct vsock_transport *transport;
>>> ????u64 val;
>>>
>>> -??? if (level != AF_VSOCK)
>>> +??? if (level != AF_VSOCK && level != SOL_SOCKET)
>>> ??????? return -ENOPROTOOPT;
>>>
>>> #define COPY_IN(_v)?????????????????????????????????????? \
>>> @@ -1666,6 +1674,34 @@ static int vsock_connectible_setsockopt(struct socket *sock,
>>>
>>> ????transport = vsk->transport;
>>>
>>> +??? if (level == SOL_SOCKET) {
>>> +??????? int zerocopy;
>>> +
>>> +??????? if (optname != SO_ZEROCOPY) {
>>> +??????????? release_sock(sk);
>>> +??????????? return sock_setsockopt(sock, level, optname, optval, optlen);
>>> +??????? }
>>> +
>>> +??????? /* Use 'int' type here, because variable to
>>> +???????? * set this option usually has this type.
>>> +???????? */
>>> +??????? COPY_IN(zerocopy);
>>> +
>>> +??????? if (zerocopy < 0 || zerocopy > 1) {
>>> +??????????? err = -EINVAL;
>>> +??????????? goto exit;
>>> +??????? }
>>> +
>>> +??????? if (transport && !vsock_msgzerocopy_allow(transport)) {
>>> +??????????? err = -EOPNOTSUPP;
>>> +??????????? goto exit;
>>> +??????? }
>>> +
>>> +??????? sock_valbool_flag(sk, SOCK_ZEROCOPY,
>>> +????????????????? zerocopy);
>>
>> it's not necessary to wrap this call.
>
>Sorry, what do you mean ?

I mean that can be on the same line:

sock_valbool_flag(sk, SOCK_ZEROCOPY, zerocopy);

Stefano

2023-09-27 12:33:21

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH net-next v1 00/12] vsock/virtio: continue MSG_ZEROCOPY support



On 26.09.2023 16:10, Stefano Garzarella wrote:
> Hi Arseniy,
>
> On Fri, Sep 22, 2023 at 08:24:16AM +0300, Arseniy Krasnov wrote:
>> Hello,
>>
>> this patchset contains second and third parts of another big patchset
>> for MSG_ZEROCOPY flag support:
>> https://lore.kernel.org/netdev/[email protected]/
>>
>> During review of this series, Stefano Garzarella <[email protected]>
>> suggested to split it for three parts to simplify review and merging:
>>
>> 1) virtio and vhost updates (for fragged skbs) (merged to net-next, see
>>   link below)
>> 2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
>>   tx completions) and update for Documentation/. <-- this patchset
>> 3) Updates for tests and utils. <-- this patchset
>>
>> Part 1) was merged:
>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=71b263e79370348349553ecdf46f4a69eb436dc7
>>
>> Head for this patchset is:
>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=71b263e79370348349553ecdf46f4a69eb436dc7
>
> Thanks for the series.
> I did a quick review highlighting some things that need to be changed.
>
> Overall, the series seems to be in good shape. The tests went well.
>
> In the next few days I'll see if I can get a better look at the larger patches like the tests, or I'll check in the next version.

Hello Stefano,

Thanks for review, almost all comments are clear to me, I'll fix them.

Thanks, Arseniy

>
> Thanks,
> Stefano
>

2023-09-27 17:27:21

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH net-next v1 02/12] vsock: read from socket's error queue

On Tue, Sep 26, 2023 at 10:36:58PM +0300, Arseniy Krasnov wrote:
>
>
>On 26.09.2023 15:55, Stefano Garzarella wrote:
>> On Fri, Sep 22, 2023 at 08:24:18AM +0300, Arseniy Krasnov wrote:
>>> This adds handling of MSG_ERRQUEUE input flag in receive call. This flag
>>> is used to read socket's error queue instead of data queue. Possible
>>> scenario of error queue usage is receiving completions for transmission
>>> with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK'
>>> and 'VSOCK_RECVERR'.
>>>
>>> Signed-off-by: Arseniy Krasnov <[email protected]>
>>> ---
>>> Changelog:
>>> v5(big patchset) -> v1:
>>> ?* R-b tag removed, due to added defines to 'include/uapi/linux/vsock.h'.
>>> ?? Both 'SOL_VSOCK' and 'VSOCK_RECVERR' are needed by userspace, so
>>> ?? they were placed to 'include/uapi/linux/vsock.h'. At the same time,
>>> ?? the same define for 'SOL_VSOCK' was placed to 'include/linux/socket.h'.
>>> ?? This is needed because this file contains SOL_XXX defines for different
>>> ?? types of socket, so it prevents situation when another new SOL_XXX
>>> ?? will use constant 287.
>>>
>>> include/linux/socket.h???? | 1 +
>>> include/uapi/linux/vsock.h | 9 +++++++++
>>> net/vmw_vsock/af_vsock.c?? | 6 ++++++
>>> 3 files changed, 16 insertions(+)
>>> create mode 100644 include/uapi/linux/vsock.h
>>>
>>> diff --git a/include/linux/socket.h b/include/linux/socket.h
>>> index 39b74d83c7c4..cfcb7e2c3813 100644
>>> --- a/include/linux/socket.h
>>> +++ b/include/linux/socket.h
>>> @@ -383,6 +383,7 @@ struct ucred {
>>> #define SOL_MPTCP??? 284
>>> #define SOL_MCTP??? 285
>>> #define SOL_SMC??????? 286
>>> +#define SOL_VSOCK??? 287
>>>
>>> /* IPX options */
>>> #define IPX_TYPE??? 1
>>> diff --git a/include/uapi/linux/vsock.h b/include/uapi/linux/vsock.h
>>> new file mode 100644
>>> index 000000000000..b25c1347a3b8
>>> --- /dev/null
>>> +++ b/include/uapi/linux/vsock.h
>>
>> We already have include/uapi/linux/vm_sockets.h
>>
>> Should we include these changes there instead of creating a new header?
>>
>>> @@ -0,0 +1,9 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>>> +#ifndef _UAPI_LINUX_VSOCK_H
>>> +#define _UAPI_LINUX_VSOCK_H
>>> +
>>> +#define SOL_VSOCK??? 287
>>
>> Why we need to re-define this also here?
>
>Reason of this re-define is that SOL_VSOCK must be exported to userspace, so
>i place it to include/uapi/XXX. At the same time include/linux/socket.h contains
>constants for SOL_XXX and they goes sequentially in this file (e.g. one by one,
>each new value is +1 to the previous). So if I add SOL_VSOCK to include/uapi/XXX
>only, it is possible that someone will add new SOL_VERY_NEW_SOCKET == 287 to
>include/linux/socket.h in future. I think it is not good that two SOL_XXX will
>have same value.
>
>For example SOL_RDS and SOL_TIPS uses the same approach - there are two same defines:
>one in include/uapi/ and another is in include/linux/socket.h

Okay, I was confused, I though socket.h was the uapi one.
If others do the same, it's fine.

But why adding a new vsock.h instead of reusing vm_sockets.h?

>
>>
>> In that case, should we protect with some guards to avoid double
>> defines?
>
>May be:
>
>in include/linux/socket.h
>
>#ifndef SOL_VSOCK
>#define SOL_VSOCK 287
>#endif
>
>But not sure...

Nope, let's follow others definition.

Sorry for the confusion ;-)

>
>>
>>> +
>>> +#define VSOCK_RECVERR??? 1
>>> +
>>> +#endif /* _UAPI_LINUX_VSOCK_H */
>>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>> index d841f4de33b0..4fd11bf34bc7 100644
>>> --- a/net/vmw_vsock/af_vsock.c
>>> +++ b/net/vmw_vsock/af_vsock.c
>>> @@ -110,6 +110,8 @@
>>> #include <linux/workqueue.h>
>>> #include <net/sock.h>
>>> #include <net/af_vsock.h>
>>> +#include <linux/errqueue.h>
>>> +#include <uapi/linux/vsock.h>
>>>
>>> static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
>>> static void vsock_sk_destruct(struct sock *sk);
>>> @@ -2137,6 +2139,10 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
>>> ????int err;
>>>
>>> ????sk = sock->sk;
>>> +
>>> +??? if (unlikely(flags & MSG_ERRQUEUE))
>>> +??????? return sock_recv_errqueue(sk, msg, len, SOL_VSOCK,
>>> VSOCK_RECVERR);
>>> +
>>> ????vsk = vsock_sk(sk);
>>> ????err = 0;
>>>
>>> --?
>>> 2.25.1
>>>
>>
>

2023-09-28 11:25:25

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH net-next v1 02/12] vsock: read from socket's error queue



On 27.09.2023 10:34, Stefano Garzarella wrote:
> On Tue, Sep 26, 2023 at 10:36:58PM +0300, Arseniy Krasnov wrote:
>>
>>
>> On 26.09.2023 15:55, Stefano Garzarella wrote:
>>> On Fri, Sep 22, 2023 at 08:24:18AM +0300, Arseniy Krasnov wrote:
>>>> This adds handling of MSG_ERRQUEUE input flag in receive call. This flag
>>>> is used to read socket's error queue instead of data queue. Possible
>>>> scenario of error queue usage is receiving completions for transmission
>>>> with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK'
>>>> and 'VSOCK_RECVERR'.
>>>>
>>>> Signed-off-by: Arseniy Krasnov <[email protected]>
>>>> ---
>>>> Changelog:
>>>> v5(big patchset) -> v1:
>>>>  * R-b tag removed, due to added defines to 'include/uapi/linux/vsock.h'.
>>>>    Both 'SOL_VSOCK' and 'VSOCK_RECVERR' are needed by userspace, so
>>>>    they were placed to 'include/uapi/linux/vsock.h'. At the same time,
>>>>    the same define for 'SOL_VSOCK' was placed to 'include/linux/socket.h'.
>>>>    This is needed because this file contains SOL_XXX defines for different
>>>>    types of socket, so it prevents situation when another new SOL_XXX
>>>>    will use constant 287.
>>>>
>>>> include/linux/socket.h     | 1 +
>>>> include/uapi/linux/vsock.h | 9 +++++++++
>>>> net/vmw_vsock/af_vsock.c   | 6 ++++++
>>>> 3 files changed, 16 insertions(+)
>>>> create mode 100644 include/uapi/linux/vsock.h
>>>>
>>>> diff --git a/include/linux/socket.h b/include/linux/socket.h
>>>> index 39b74d83c7c4..cfcb7e2c3813 100644
>>>> --- a/include/linux/socket.h
>>>> +++ b/include/linux/socket.h
>>>> @@ -383,6 +383,7 @@ struct ucred {
>>>> #define SOL_MPTCP    284
>>>> #define SOL_MCTP    285
>>>> #define SOL_SMC        286
>>>> +#define SOL_VSOCK    287
>>>>
>>>> /* IPX options */
>>>> #define IPX_TYPE    1
>>>> diff --git a/include/uapi/linux/vsock.h b/include/uapi/linux/vsock.h
>>>> new file mode 100644
>>>> index 000000000000..b25c1347a3b8
>>>> --- /dev/null
>>>> +++ b/include/uapi/linux/vsock.h
>>>
>>> We already have include/uapi/linux/vm_sockets.h
>>>
>>> Should we include these changes there instead of creating a new header?
>>>
>>>> @@ -0,0 +1,9 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>>>> +#ifndef _UAPI_LINUX_VSOCK_H
>>>> +#define _UAPI_LINUX_VSOCK_H
>>>> +
>>>> +#define SOL_VSOCK    287
>>>
>>> Why we need to re-define this also here?
>>
>> Reason of this re-define is that SOL_VSOCK must be exported to userspace, so
>> i place it to include/uapi/XXX. At the same time include/linux/socket.h contains
>> constants for SOL_XXX and they goes sequentially in this file (e.g. one by one,
>> each new value is +1 to the previous). So if I add SOL_VSOCK to include/uapi/XXX
>> only, it is possible that someone will add new SOL_VERY_NEW_SOCKET == 287 to
>> include/linux/socket.h in future. I think it is not good that two SOL_XXX will
>> have same value.
>>
>> For example SOL_RDS and SOL_TIPS uses the same approach - there are two same defines:
>> one in include/uapi/ and another is in include/linux/socket.h
>
> Okay, I was confused, I though socket.h was the uapi one.
> If others do the same, it's fine.
>
> But why adding a new vsock.h instead of reusing vm_sockets.h?

Yes, that's my mistake, I'll use vm_sockets.h. Seems I just forget about
vm_sockets.h...

>
>>
>>>
>>> In that case, should we protect with some guards to avoid double
>>> defines?
>>
>> May be:
>>
>> in include/linux/socket.h
>>
>> #ifndef SOL_VSOCK
>> #define SOL_VSOCK 287
>> #endif
>>
>> But not sure...
>
> Nope, let's follow others definition.
>
> Sorry for the confusion ;-)

No problem!

Thanks, Arseniy

>
>>
>>>
>>>> +
>>>> +#define VSOCK_RECVERR    1
>>>> +
>>>> +#endif /* _UAPI_LINUX_VSOCK_H */
>>>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>>> index d841f4de33b0..4fd11bf34bc7 100644
>>>> --- a/net/vmw_vsock/af_vsock.c
>>>> +++ b/net/vmw_vsock/af_vsock.c
>>>> @@ -110,6 +110,8 @@
>>>> #include <linux/workqueue.h>
>>>> #include <net/sock.h>
>>>> #include <net/af_vsock.h>
>>>> +#include <linux/errqueue.h>
>>>> +#include <uapi/linux/vsock.h>
>>>>
>>>> static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
>>>> static void vsock_sk_destruct(struct sock *sk);
>>>> @@ -2137,6 +2139,10 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
>>>>     int err;
>>>>
>>>>     sk = sock->sk;
>>>> +
>>>> +    if (unlikely(flags & MSG_ERRQUEUE))
>>>> +        return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, VSOCK_RECVERR);
>>>> +
>>>>     vsk = vsock_sk(sk);
>>>>     err = 0;
>>>>
>>>> -- 
>>>> 2.25.1
>>>>
>>>
>>
>