2023-09-14 01:26:51

by Breno Leitao

[permalink] [raw]
Subject: [PATCH v6 0/8] io_uring: Initial support for {s,g}etsockopt commands

This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
SOCKET_URING_OP_SETSOCKOPT and SOCKET_URING_OP_GETSOCKOPT implement generic
case, covering all levels and optnames.

In order to keep the implementation (and tests) simple, some refactors
were done prior to the changes, as follows:

Patch 1-2: Remove the core {s,g}etsockopt() core function from
__sys_{g,s}etsockopt, so, the code could be reused by other callers,
such as io_uring.

Patch 3: Pass compat mode to the file/socket callbacks

Patch 4: Move io_uring helpers from io_uring_zerocopy_tx to a generic
io_uring headers. This simplify the test case (last patch)

Patch 5: Protect io_uring_cmd_sock() to not be called if CONFIG_NET is
disabled.

PS: The userspace pointers need to be alive until the operation is
completed.

These changes were tested with a new test[1] in liburing, LTP sockopt*
tests, as also with bpf/progs/sockopt test case, which is now adapted to
run using both system calls and io_uring commands.

[1] Link: https://github.com/leitao/liburing/blob/getsock/test/socket-getsetsock-cmd.c

RFC -> V1:
* Copy user memory at io_uring subsystem, and call proto_ops
callbacks using kernel memory
* Implement all the cases for SOCKET_URING_OP_SETSOCKOPT

V1 -> V2
* Implemented the BPF part
* Using user pointers from optval to avoid kmalloc in io_uring part.

V2 -> V3:
* Break down __sys_setsockopt and reuse the core code, avoiding
duplicated code. This removed the requirement to expose
sock_use_custom_sol_socket().
* Added io_uring test to selftests/bpf/sockopt.
* Fixed compat argument, by passing it to the issue_flags.

V3 -> V4:
* Rebase on top of commit 1ded5e5a5931b ("net: annotate data-races around sock->ops")
* Also broke down __sys_setsockopt() to reuse the core function
from io_uring.
* Create a new patch to return -EOPNOTSUPP if CONFIG_NET is
disabled.
* Added two SOL_SOCKET tests in bpf/prog_tests/sockopt.

V4 -> V5:
* Do not use sockptr anymore, by changing the optlen getsock argument
to be a user pointer (instead of a kernel pointer). This change also drop
the limitation on getsockopt from previous versions, and now all
levels are supported.
* Simplified the BPF sockopt test, since there is no more limitation on
the io_uring commands.
* No more changes in the BPF subsystem.
* Moved the optlen field in the SQE struct. It is now a pointer instead
of u32.

V5 -> V6:
* Removed the need for #ifdef CONFIG_NET as suggested by Gabriel
Krisman.
* Changed the variable declaration order to respect the reverse
xmas declaration as suggested by Paolo Abeni.

Breno Leitao (8):
net/socket: Break down __sys_setsockopt
net/socket: Break down __sys_getsockopt
io_uring/cmd: Pass compat mode in issue_flags
selftests/net: Extract uring helpers to be reusable
io_uring/cmd: return -EOPNOTSUPP if net is disabled
io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
selftests/bpf/sockopt: Add io_uring support

include/linux/io_uring.h | 1 +
include/net/sock.h | 5 +
include/uapi/linux/io_uring.h | 10 +
io_uring/uring_cmd.c | 35 +++
net/socket.c | 86 ++++--
tools/include/io_uring/mini_liburing.h | 292 ++++++++++++++++++
.../selftests/bpf/prog_tests/sockopt.c | 95 +++++-
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/io_uring_zerocopy_tx.c | 268 +---------------
9 files changed, 490 insertions(+), 303 deletions(-)
create mode 100644 tools/include/io_uring/mini_liburing.h

--
2.34.1


2023-09-14 01:26:57

by Breno Leitao

[permalink] [raw]
Subject: [PATCH v6 2/8] net/socket: Break down __sys_getsockopt

Split __sys_getsockopt() into two functions by removing the core
logic into a sub-function (do_sock_getsockopt()). This will avoid
code duplication when executing the same operation in other callers, for
instance.

do_sock_getsockopt() will be called by io_uring getsockopt() command
operation in the following patch.

Suggested-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Breno Leitao <[email protected]>
---
include/net/sock.h | 3 +++
net/socket.c | 48 +++++++++++++++++++++++++++++-----------------
2 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index aa8fb54ad0af..fbd568a43d28 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1863,6 +1863,9 @@ int sock_setsockopt(struct socket *sock, int level, int op,
sockptr_t optval, unsigned int optlen);
int do_sock_setsockopt(struct socket *sock, bool compat, int level,
int optname, char __user *user_optval, int optlen);
+int do_sock_getsockopt(struct socket *sock, bool compat, int level,
+ int optname, char __user *user_optval,
+ int __user *user_optlen);

int sk_getsockopt(struct sock *sk, int level, int optname,
sockptr_t optval, sockptr_t optlen);
diff --git a/net/socket.c b/net/socket.c
index 360332e098d4..fb943602186e 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2333,28 +2333,17 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
int optname));

-/*
- * Get a socket option. Because we don't know the option lengths we have
- * to pass a user mode parameter for the protocols to sort out.
- */
-int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
- int __user *optlen)
+int do_sock_getsockopt(struct socket *sock, bool compat, int level,
+ int optname, char __user *optval,
+ int __user *optlen)
{
int max_optlen __maybe_unused;
const struct proto_ops *ops;
- int err, fput_needed;
- struct socket *sock;
-
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (!sock)
- return err;
+ int err;

err = security_socket_getsockopt(sock, level, optname);
if (err)
- goto out_put;
-
- if (!in_compat_syscall())
- max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+ return err;

ops = READ_ONCE(sock->ops);
if (level == SOL_SOCKET)
@@ -2365,11 +2354,34 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
err = ops->getsockopt(sock, level, optname, optval,
optlen);

- if (!in_compat_syscall())
+ if (!compat) {
+ max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
optval, optlen, max_optlen,
err);
-out_put:
+ }
+
+ return err;
+}
+EXPORT_SYMBOL(do_sock_getsockopt);
+
+/* Get a socket option. Because we don't know the option lengths we have
+ * to pass a user mode parameter for the protocols to sort out.
+ */
+int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
+ int __user *optlen)
+{
+ bool compat = in_compat_syscall();
+ int err, fput_needed;
+ struct socket *sock;
+
+ sock = sockfd_lookup_light(fd, &err, &fput_needed);
+ if (!sock)
+ return err;
+
+ err = do_sock_getsockopt(sock, compat, level, optname, optval,
+ optlen);
+
fput_light(sock->file, fput_needed);
return err;
}
--
2.34.1

2023-09-14 01:27:33

by Breno Leitao

[permalink] [raw]
Subject: [PATCH v6 4/8] selftests/net: Extract uring helpers to be reusable

Instead of defining basic io_uring functions in the test case, move them
to a common directory, so, other tests can use them.

This simplify the test code and reuse the common liburing
infrastructure. This is basically a copy of what we have in
io_uring_zerocopy_tx with some minor improvements to make checkpatch
happy.

A follow-up test will use the same helpers in a BPF sockopt test.

Signed-off-by: Breno Leitao <[email protected]>
---
tools/include/io_uring/mini_liburing.h | 292 ++++++++++++++++++
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/io_uring_zerocopy_tx.c | 268 +---------------
3 files changed, 295 insertions(+), 266 deletions(-)
create mode 100644 tools/include/io_uring/mini_liburing.h

diff --git a/tools/include/io_uring/mini_liburing.h b/tools/include/io_uring/mini_liburing.h
new file mode 100644
index 000000000000..e0e1e76def25
--- /dev/null
+++ b/tools/include/io_uring/mini_liburing.h
@@ -0,0 +1,292 @@
+/* SPDX-License-Identifier: MIT */
+
+#include <linux/io_uring.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+struct io_sq_ring {
+ unsigned int *head;
+ unsigned int *tail;
+ unsigned int *ring_mask;
+ unsigned int *ring_entries;
+ unsigned int *flags;
+ unsigned int *array;
+};
+
+struct io_cq_ring {
+ unsigned int *head;
+ unsigned int *tail;
+ unsigned int *ring_mask;
+ unsigned int *ring_entries;
+ struct io_uring_cqe *cqes;
+};
+
+struct io_uring_sq {
+ unsigned int *khead;
+ unsigned int *ktail;
+ unsigned int *kring_mask;
+ unsigned int *kring_entries;
+ unsigned int *kflags;
+ unsigned int *kdropped;
+ unsigned int *array;
+ struct io_uring_sqe *sqes;
+
+ unsigned int sqe_head;
+ unsigned int sqe_tail;
+
+ size_t ring_sz;
+};
+
+struct io_uring_cq {
+ unsigned int *khead;
+ unsigned int *ktail;
+ unsigned int *kring_mask;
+ unsigned int *kring_entries;
+ unsigned int *koverflow;
+ struct io_uring_cqe *cqes;
+
+ size_t ring_sz;
+};
+
+struct io_uring {
+ struct io_uring_sq sq;
+ struct io_uring_cq cq;
+ int ring_fd;
+};
+
+#if defined(__x86_64) || defined(__i386__)
+#define read_barrier() __asm__ __volatile__("":::"memory")
+#define write_barrier() __asm__ __volatile__("":::"memory")
+#else
+#define read_barrier() __sync_synchronize()
+#define write_barrier() __sync_synchronize()
+#endif
+
+static inline int io_uring_mmap(int fd, struct io_uring_params *p,
+ struct io_uring_sq *sq, struct io_uring_cq *cq)
+{
+ size_t size;
+ void *ptr;
+ int ret;
+
+ sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned int);
+ ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING);
+ if (ptr == MAP_FAILED)
+ return -errno;
+ sq->khead = ptr + p->sq_off.head;
+ sq->ktail = ptr + p->sq_off.tail;
+ sq->kring_mask = ptr + p->sq_off.ring_mask;
+ sq->kring_entries = ptr + p->sq_off.ring_entries;
+ sq->kflags = ptr + p->sq_off.flags;
+ sq->kdropped = ptr + p->sq_off.dropped;
+ sq->array = ptr + p->sq_off.array;
+
+ size = p->sq_entries * sizeof(struct io_uring_sqe);
+ sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES);
+ if (sq->sqes == MAP_FAILED) {
+ ret = -errno;
+err:
+ munmap(sq->khead, sq->ring_sz);
+ return ret;
+ }
+
+ cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe);
+ ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING);
+ if (ptr == MAP_FAILED) {
+ ret = -errno;
+ munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe));
+ goto err;
+ }
+ cq->khead = ptr + p->cq_off.head;
+ cq->ktail = ptr + p->cq_off.tail;
+ cq->kring_mask = ptr + p->cq_off.ring_mask;
+ cq->kring_entries = ptr + p->cq_off.ring_entries;
+ cq->koverflow = ptr + p->cq_off.overflow;
+ cq->cqes = ptr + p->cq_off.cqes;
+ return 0;
+}
+
+static inline int io_uring_setup(unsigned int entries,
+ struct io_uring_params *p)
+{
+ return syscall(__NR_io_uring_setup, entries, p);
+}
+
+static inline int io_uring_enter(int fd, unsigned int to_submit,
+ unsigned int min_complete,
+ unsigned int flags, sigset_t *sig)
+{
+ return syscall(__NR_io_uring_enter, fd, to_submit, min_complete,
+ flags, sig, _NSIG / 8);
+}
+
+static inline int io_uring_queue_init(unsigned int entries,
+ struct io_uring *ring,
+ unsigned int flags)
+{
+ struct io_uring_params p;
+ int fd, ret;
+
+ memset(ring, 0, sizeof(*ring));
+ memset(&p, 0, sizeof(p));
+ p.flags = flags;
+
+ fd = io_uring_setup(entries, &p);
+ if (fd < 0)
+ return fd;
+ ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq);
+ if (!ret)
+ ring->ring_fd = fd;
+ else
+ close(fd);
+ return ret;
+}
+
+/* Get a sqe */
+static inline struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring)
+{
+ struct io_uring_sq *sq = &ring->sq;
+
+ if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries)
+ return NULL;
+ return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask];
+}
+
+static inline int io_uring_wait_cqe(struct io_uring *ring,
+ struct io_uring_cqe **cqe_ptr)
+{
+ struct io_uring_cq *cq = &ring->cq;
+ const unsigned int mask = *cq->kring_mask;
+ unsigned int head = *cq->khead;
+ int ret;
+
+ *cqe_ptr = NULL;
+ do {
+ read_barrier();
+ if (head != *cq->ktail) {
+ *cqe_ptr = &cq->cqes[head & mask];
+ break;
+ }
+ ret = io_uring_enter(ring->ring_fd, 0, 1,
+ IORING_ENTER_GETEVENTS, NULL);
+ if (ret < 0)
+ return -errno;
+ } while (1);
+
+ return 0;
+}
+
+static inline int io_uring_submit(struct io_uring *ring)
+{
+ struct io_uring_sq *sq = &ring->sq;
+ const unsigned int mask = *sq->kring_mask;
+ unsigned int ktail, submitted, to_submit;
+ int ret;
+
+ read_barrier();
+ if (*sq->khead != *sq->ktail) {
+ submitted = *sq->kring_entries;
+ goto submit;
+ }
+ if (sq->sqe_head == sq->sqe_tail)
+ return 0;
+
+ ktail = *sq->ktail;
+ to_submit = sq->sqe_tail - sq->sqe_head;
+ for (submitted = 0; submitted < to_submit; submitted++) {
+ read_barrier();
+ sq->array[ktail++ & mask] = sq->sqe_head++ & mask;
+ }
+ if (!submitted)
+ return 0;
+
+ if (*sq->ktail != ktail) {
+ write_barrier();
+ *sq->ktail = ktail;
+ write_barrier();
+ }
+submit:
+ ret = io_uring_enter(ring->ring_fd, submitted, 0,
+ IORING_ENTER_GETEVENTS, NULL);
+ return ret < 0 ? -errno : ret;
+}
+
+static inline void io_uring_queue_exit(struct io_uring *ring)
+{
+ struct io_uring_sq *sq = &ring->sq;
+
+ munmap(sq->sqes, *sq->kring_entries * sizeof(struct io_uring_sqe));
+ munmap(sq->khead, sq->ring_sz);
+ close(ring->ring_fd);
+}
+
+/* Prepare and send the SQE */
+static inline void io_uring_prep_cmd(struct io_uring_sqe *sqe, int op,
+ int sockfd,
+ int level, int optname,
+ const void *optval,
+ const socklen_t optlen)
+{
+ memset(sqe, 0, sizeof(*sqe));
+ sqe->opcode = (__u8)IORING_OP_URING_CMD;
+ sqe->fd = sockfd;
+ sqe->cmd_op = op;
+
+ sqe->level = level;
+ sqe->optname = optname;
+ sqe->optval = (unsigned long long)optval;
+ sqe->optlen = (unsigned long long)optlen;
+}
+
+static inline void io_uring_prep_cmd_get(struct io_uring_sqe *sqe, int op,
+ int sockfd,
+ int level, int optname,
+ const void *optval,
+ const socklen_t *optlen)
+{
+ io_uring_prep_cmd(sqe, op, sockfd, level, optname, optval, 0);
+ sqe->optlen = (unsigned long long)optlen;
+}
+
+static inline int io_uring_register_buffers(struct io_uring *ring,
+ const struct iovec *iovecs,
+ unsigned int nr_iovecs)
+{
+ int ret;
+
+ ret = syscall(__NR_io_uring_register, ring->ring_fd,
+ IORING_REGISTER_BUFFERS, iovecs, nr_iovecs);
+ return (ret < 0) ? -errno : ret;
+}
+
+static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd,
+ const void *buf, size_t len, int flags)
+{
+ memset(sqe, 0, sizeof(*sqe));
+ sqe->opcode = (__u8)IORING_OP_SEND;
+ sqe->fd = sockfd;
+ sqe->addr = (unsigned long)buf;
+ sqe->len = len;
+ sqe->msg_flags = (__u32)flags;
+}
+
+static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd,
+ const void *buf, size_t len, int flags,
+ unsigned int zc_flags)
+{
+ io_uring_prep_send(sqe, sockfd, buf, len, flags);
+ sqe->opcode = (__u8)IORING_OP_SEND_ZC;
+ sqe->ioprio = zc_flags;
+}
+
+static inline void io_uring_cqe_seen(struct io_uring *ring)
+{
+ *(&ring->cq)->khead += 1;
+ write_barrier();
+}
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 8b017070960d..f8d99837b9dc 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -98,6 +98,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma
$(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto
$(OUTPUT)/tcp_inq: LDLIBS += -lpthread
$(OUTPUT)/bind_bhash: LDLIBS += -lpthread
+$(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/

# Rules to generate bpf obj nat6to4.o
CLANG ?= clang
diff --git a/tools/testing/selftests/net/io_uring_zerocopy_tx.c b/tools/testing/selftests/net/io_uring_zerocopy_tx.c
index 154287740172..76e604e4810e 100644
--- a/tools/testing/selftests/net/io_uring_zerocopy_tx.c
+++ b/tools/testing/selftests/net/io_uring_zerocopy_tx.c
@@ -36,6 +36,8 @@
#include <sys/un.h>
#include <sys/wait.h>

+#include <io_uring/mini_liburing.h>
+
#define NOTIF_TAG 0xfffffffULL
#define NONZC_TAG 0
#define ZC_TAG 1
@@ -60,272 +62,6 @@ static struct sockaddr_storage cfg_dst_addr;

static char payload[IP_MAXPACKET] __attribute__((aligned(4096)));

-struct io_sq_ring {
- unsigned *head;
- unsigned *tail;
- unsigned *ring_mask;
- unsigned *ring_entries;
- unsigned *flags;
- unsigned *array;
-};
-
-struct io_cq_ring {
- unsigned *head;
- unsigned *tail;
- unsigned *ring_mask;
- unsigned *ring_entries;
- struct io_uring_cqe *cqes;
-};
-
-struct io_uring_sq {
- unsigned *khead;
- unsigned *ktail;
- unsigned *kring_mask;
- unsigned *kring_entries;
- unsigned *kflags;
- unsigned *kdropped;
- unsigned *array;
- struct io_uring_sqe *sqes;
-
- unsigned sqe_head;
- unsigned sqe_tail;
-
- size_t ring_sz;
-};
-
-struct io_uring_cq {
- unsigned *khead;
- unsigned *ktail;
- unsigned *kring_mask;
- unsigned *kring_entries;
- unsigned *koverflow;
- struct io_uring_cqe *cqes;
-
- size_t ring_sz;
-};
-
-struct io_uring {
- struct io_uring_sq sq;
- struct io_uring_cq cq;
- int ring_fd;
-};
-
-#ifdef __alpha__
-# ifndef __NR_io_uring_setup
-# define __NR_io_uring_setup 535
-# endif
-# ifndef __NR_io_uring_enter
-# define __NR_io_uring_enter 536
-# endif
-# ifndef __NR_io_uring_register
-# define __NR_io_uring_register 537
-# endif
-#else /* !__alpha__ */
-# ifndef __NR_io_uring_setup
-# define __NR_io_uring_setup 425
-# endif
-# ifndef __NR_io_uring_enter
-# define __NR_io_uring_enter 426
-# endif
-# ifndef __NR_io_uring_register
-# define __NR_io_uring_register 427
-# endif
-#endif
-
-#if defined(__x86_64) || defined(__i386__)
-#define read_barrier() __asm__ __volatile__("":::"memory")
-#define write_barrier() __asm__ __volatile__("":::"memory")
-#else
-
-#define read_barrier() __sync_synchronize()
-#define write_barrier() __sync_synchronize()
-#endif
-
-static int io_uring_setup(unsigned int entries, struct io_uring_params *p)
-{
- return syscall(__NR_io_uring_setup, entries, p);
-}
-
-static int io_uring_enter(int fd, unsigned int to_submit,
- unsigned int min_complete,
- unsigned int flags, sigset_t *sig)
-{
- return syscall(__NR_io_uring_enter, fd, to_submit, min_complete,
- flags, sig, _NSIG / 8);
-}
-
-static int io_uring_register_buffers(struct io_uring *ring,
- const struct iovec *iovecs,
- unsigned nr_iovecs)
-{
- int ret;
-
- ret = syscall(__NR_io_uring_register, ring->ring_fd,
- IORING_REGISTER_BUFFERS, iovecs, nr_iovecs);
- return (ret < 0) ? -errno : ret;
-}
-
-static int io_uring_mmap(int fd, struct io_uring_params *p,
- struct io_uring_sq *sq, struct io_uring_cq *cq)
-{
- size_t size;
- void *ptr;
- int ret;
-
- sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned);
- ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING);
- if (ptr == MAP_FAILED)
- return -errno;
- sq->khead = ptr + p->sq_off.head;
- sq->ktail = ptr + p->sq_off.tail;
- sq->kring_mask = ptr + p->sq_off.ring_mask;
- sq->kring_entries = ptr + p->sq_off.ring_entries;
- sq->kflags = ptr + p->sq_off.flags;
- sq->kdropped = ptr + p->sq_off.dropped;
- sq->array = ptr + p->sq_off.array;
-
- size = p->sq_entries * sizeof(struct io_uring_sqe);
- sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES);
- if (sq->sqes == MAP_FAILED) {
- ret = -errno;
-err:
- munmap(sq->khead, sq->ring_sz);
- return ret;
- }
-
- cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe);
- ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING);
- if (ptr == MAP_FAILED) {
- ret = -errno;
- munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe));
- goto err;
- }
- cq->khead = ptr + p->cq_off.head;
- cq->ktail = ptr + p->cq_off.tail;
- cq->kring_mask = ptr + p->cq_off.ring_mask;
- cq->kring_entries = ptr + p->cq_off.ring_entries;
- cq->koverflow = ptr + p->cq_off.overflow;
- cq->cqes = ptr + p->cq_off.cqes;
- return 0;
-}
-
-static int io_uring_queue_init(unsigned entries, struct io_uring *ring,
- unsigned flags)
-{
- struct io_uring_params p;
- int fd, ret;
-
- memset(ring, 0, sizeof(*ring));
- memset(&p, 0, sizeof(p));
- p.flags = flags;
-
- fd = io_uring_setup(entries, &p);
- if (fd < 0)
- return fd;
- ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq);
- if (!ret)
- ring->ring_fd = fd;
- else
- close(fd);
- return ret;
-}
-
-static int io_uring_submit(struct io_uring *ring)
-{
- struct io_uring_sq *sq = &ring->sq;
- const unsigned mask = *sq->kring_mask;
- unsigned ktail, submitted, to_submit;
- int ret;
-
- read_barrier();
- if (*sq->khead != *sq->ktail) {
- submitted = *sq->kring_entries;
- goto submit;
- }
- if (sq->sqe_head == sq->sqe_tail)
- return 0;
-
- ktail = *sq->ktail;
- to_submit = sq->sqe_tail - sq->sqe_head;
- for (submitted = 0; submitted < to_submit; submitted++) {
- read_barrier();
- sq->array[ktail++ & mask] = sq->sqe_head++ & mask;
- }
- if (!submitted)
- return 0;
-
- if (*sq->ktail != ktail) {
- write_barrier();
- *sq->ktail = ktail;
- write_barrier();
- }
-submit:
- ret = io_uring_enter(ring->ring_fd, submitted, 0,
- IORING_ENTER_GETEVENTS, NULL);
- return ret < 0 ? -errno : ret;
-}
-
-static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd,
- const void *buf, size_t len, int flags)
-{
- memset(sqe, 0, sizeof(*sqe));
- sqe->opcode = (__u8) IORING_OP_SEND;
- sqe->fd = sockfd;
- sqe->addr = (unsigned long) buf;
- sqe->len = len;
- sqe->msg_flags = (__u32) flags;
-}
-
-static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd,
- const void *buf, size_t len, int flags,
- unsigned zc_flags)
-{
- io_uring_prep_send(sqe, sockfd, buf, len, flags);
- sqe->opcode = (__u8) IORING_OP_SEND_ZC;
- sqe->ioprio = zc_flags;
-}
-
-static struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring)
-{
- struct io_uring_sq *sq = &ring->sq;
-
- if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries)
- return NULL;
- return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask];
-}
-
-static int io_uring_wait_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr)
-{
- struct io_uring_cq *cq = &ring->cq;
- const unsigned mask = *cq->kring_mask;
- unsigned head = *cq->khead;
- int ret;
-
- *cqe_ptr = NULL;
- do {
- read_barrier();
- if (head != *cq->ktail) {
- *cqe_ptr = &cq->cqes[head & mask];
- break;
- }
- ret = io_uring_enter(ring->ring_fd, 0, 1,
- IORING_ENTER_GETEVENTS, NULL);
- if (ret < 0)
- return -errno;
- } while (1);
-
- return 0;
-}
-
-static inline void io_uring_cqe_seen(struct io_uring *ring)
-{
- *(&ring->cq)->khead += 1;
- write_barrier();
-}
-
static unsigned long gettimeofday_ms(void)
{
struct timeval tv;
--
2.34.1

2023-09-14 01:27:39

by Breno Leitao

[permalink] [raw]
Subject: [PATCH v6 6/8] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT

Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
level is SOL_SOCKET. This is similar to the getsockopt(2) system
call, and both parameters are pointers to userspace.

Important to say that userspace needs to keep the pointer alive until
the CQE is completed.

Reviewed-by: Gabriel Krisman Bertazi <[email protected]>
Signed-off-by: Breno Leitao <[email protected]>
---
include/uapi/linux/io_uring.h | 9 +++++++++
io_uring/uring_cmd.c | 15 +++++++++++++++
2 files changed, 24 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 8e61f8b7c2ce..1c789ee6462d 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -43,6 +43,10 @@ struct io_uring_sqe {
union {
__u64 addr; /* pointer to buffer or iovecs */
__u64 splice_off_in;
+ struct {
+ __u32 level;
+ __u32 optname;
+ };
};
__u32 len; /* buffer size or number of iovecs */
union {
@@ -89,6 +93,10 @@ struct io_uring_sqe {
__u64 addr3;
__u64 __pad2[1];
};
+ struct {
+ __u64 optval;
+ __u64 optlen;
+ };
/*
* If the ring is initialized with IORING_SETUP_SQE128, then
* this field is used for 80 bytes of arbitrary command data
@@ -734,6 +742,7 @@ struct io_uring_recvmsg_out {
enum {
SOCKET_URING_OP_SIOCINQ = 0,
SOCKET_URING_OP_SIOCOUTQ,
+ SOCKET_URING_OP_GETSOCKOPT,
};

#ifdef __cplusplus
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 5753c3611b74..a2a6ac0c503b 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -167,6 +167,19 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
}
EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);

+static inline int io_uring_cmd_getsockopt(struct socket *sock,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
+ int __user *optlen = u64_to_user_ptr(READ_ONCE(cmd->sqe->optlen));
+ bool compat = !!(issue_flags & IO_URING_F_COMPAT);
+ int optname = READ_ONCE(cmd->sqe->optname);
+ int level = READ_ONCE(cmd->sqe->level);
+
+ return do_sock_getsockopt(sock, compat, level, optname, optval, optlen);
+}
+
#if defined(CONFIG_NET)
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
@@ -189,6 +202,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
if (ret)
return ret;
return arg;
+ case SOCKET_URING_OP_GETSOCKOPT:
+ return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
default:
return -EOPNOTSUPP;
}
--
2.34.1