This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
and optnames. SOCKET_URING_OP_GETSOCKOPT is limited, for now, to
SOL_SOCKET level, which seems to be the most common level parameter for
get/setsockopt(2).
In order to keep the implementation (and tests) simple, some refactors
were done prior to the changes, as follows:
Patches 1-2: Modify the BPF hooks to support sockptr_t, so, these functions
become flexible enough to accept user or kernel pointers for optval/optlen.
Patch 3-4: Remove the core {s,g}etsockopt() core function from
__sys_{g,s}etsockopt, so, the code could be reused by other callers,
such as io_uring.
Patch 5: Pass compat mode to the file/socket callbacks
Patch 6: Move io_uring helpers from io_uring_zerocopy_tx to a generic
io_uring headers. This simplify the test case (last patch)
Patch 7: Protect io_uring_cmd_sock() to not be called if CONFIG_NET is
disabled.
PS1: For getsockopt command, the optlen field is not a userspace
pointers, but an absolute value, so this is slightly different from
getsockopt(2) behaviour. The new optlen value is returned in cqe->res.
PS2: The userspace pointers need to be alive until the operation is
completed.
These changes were tested with a new test[1] in liburing, LTP sockopt*
tests, as also with bpf/progs/sockopt test case, which is now adapted to
run using both system calls and io_uring commands.
[1] Link: https://github.com/leitao/liburing/blob/getsockopt/test/socket-getsetsock-cmd.c
RFC -> V1:
* Copy user memory at io_uring subsystem, and call proto_ops
callbacks using kernel memory
* Implement all the cases for SOCKET_URING_OP_SETSOCKOPT
V1 -> V2
* Implemented the BPF part
* Using user pointers from optval to avoid kmalloc in io_uring part.
V2 -> V3:
* Break down __sys_setsockopt and reuse the core code, avoiding
duplicated code. This removed the requirement to expose
sock_use_custom_sol_socket().
* Added io_uring test to selftests/bpf/sockopt.
* Fixed compat argument, by passing it to the issue_flags.
V3 -> V4:
* Rebase on top of commit 1ded5e5a5931b ("net: annotate data-races around sock->ops")
* Also broke down __sys_setsockopt() to reuse the core function
from io_uring.
* Create a new patch to return -EOPNOTSUPP if CONFIG_NET is
disabled
* Added two SOL_SOCKET tests in bpf/prog_tests/sockopt.c
Breno Leitao (10):
bpf: Leverage sockptr_t in BPF getsockopt hook
bpf: Leverage sockptr_t in BPF setsockopt hook
net/socket: Break down __sys_setsockopt
net/socket: Break down __sys_getsockopt
io_uring/cmd: Pass compat mode in issue_flags
selftests/net: Extract uring helpers to be reusable
io_uring/cmd: return -EOPNOTSUPP if net is disabled
io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
selftests/bpf/sockopt: Add io_uring support
include/linux/bpf-cgroup.h | 9 +-
include/linux/io_uring.h | 1 +
include/net/sock.h | 4 +
include/uapi/linux/io_uring.h | 8 +
io_uring/uring_cmd.c | 55 ++++
kernel/bpf/cgroup.c | 25 +-
net/core/sock.c | 8 -
net/socket.c | 102 ++++---
tools/include/io_uring/mini_liburing.h | 282 ++++++++++++++++++
.../selftests/bpf/prog_tests/sockopt.c | 113 ++++++-
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/io_uring_zerocopy_tx.c | 268 +----------------
12 files changed, 544 insertions(+), 332 deletions(-)
create mode 100644 tools/include/io_uring/mini_liburing.h
--
2.34.1
Split __sys_getsockopt() into two functions by removing the core
logic into a sub-function (do_sock_getsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.
do_sock_getsockopt() will be called by io_uring getsockopt() command
operation in the following patch.
The same was done for the setsockopt pair.
Suggested-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/bpf-cgroup.h | 2 +-
include/net/sock.h | 2 ++
net/core/sock.c | 8 -----
net/socket.c | 62 ++++++++++++++++++++++++--------------
4 files changed, 42 insertions(+), 32 deletions(-)
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index cecfe8c99f28..ffaca1ab5e8d 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -378,7 +378,7 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
({ \
int __ret = 0; \
if (cgroup_bpf_enabled(CGROUP_GETSOCKOPT)) \
- get_user(__ret, optlen); \
+ copy_from_sockptr(&__ret, optlen, sizeof(int)); \
__ret; \
})
diff --git a/include/net/sock.h b/include/net/sock.h
index b059f9272303..c0185121efe4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1863,6 +1863,8 @@ int sock_setsockopt(struct socket *sock, int level, int op,
sockptr_t optval, unsigned int optlen);
int do_sock_setsockopt(struct socket *sock, bool compat, int level,
int optname, sockptr_t optval, int optlen);
+int do_sock_getsockopt(struct socket *sock, bool compat, int level,
+ int optname, sockptr_t optval, sockptr_t optlen);
int sk_getsockopt(struct sock *sk, int level, int optname,
sockptr_t optval, sockptr_t optlen);
diff --git a/net/core/sock.c b/net/core/sock.c
index 666a17cab4f5..cf15394ed664 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2009,14 +2009,6 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
return 0;
}
-int sock_getsockopt(struct socket *sock, int level, int optname,
- char __user *optval, int __user *optlen)
-{
- return sk_getsockopt(sock->sk, level, optname,
- USER_SOCKPTR(optval),
- USER_SOCKPTR(optlen));
-}
-
/*
* Initialize an sk_lock.
*
diff --git a/net/socket.c b/net/socket.c
index 3bf29a27653f..c79d2b2b902e 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2332,6 +2332,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
int optname));
+int do_sock_getsockopt(struct socket *sock, bool compat, int level,
+ int optname, sockptr_t optval, sockptr_t optlen)
+{
+ int max_optlen __maybe_unused;
+ const struct proto_ops *ops;
+ int err;
+
+ err = security_socket_getsockopt(sock, level, optname);
+ if (err)
+ return err;
+
+ ops = READ_ONCE(sock->ops);
+ if (level == SOL_SOCKET) {
+ err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
+ } else if (unlikely(!ops->getsockopt)) {
+ err = -EOPNOTSUPP;
+ } else {
+ if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
+ "Invalid argument type"))
+ return -EOPNOTSUPP;
+
+ err = ops->getsockopt(sock, level, optname, optval.user,
+ optlen.user);
+ }
+
+ if (!compat) {
+ max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+ err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
+ optval, optlen, max_optlen,
+ err);
+ }
+
+ return err;
+}
+EXPORT_SYMBOL(do_sock_getsockopt);
+
/*
* Get a socket option. Because we don't know the option lengths we have
* to pass a user mode parameter for the protocols to sort out.
@@ -2339,37 +2375,17 @@ INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
int __user *optlen)
{
- int max_optlen __maybe_unused;
- const struct proto_ops *ops;
int err, fput_needed;
+ bool compat = in_compat_syscall();
struct socket *sock;
sock = sockfd_lookup_light(fd, &err, &fput_needed);
if (!sock)
return err;
- err = security_socket_getsockopt(sock, level, optname);
- if (err)
- goto out_put;
-
- if (!in_compat_syscall())
- max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+ err = do_sock_getsockopt(sock, compat, level, optname,
+ USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
- ops = READ_ONCE(sock->ops);
- if (level == SOL_SOCKET)
- err = sock_getsockopt(sock, level, optname, optval, optlen);
- else if (unlikely(!ops->getsockopt))
- err = -EOPNOTSUPP;
- else
- err = ops->getsockopt(sock, level, optname, optval,
- optlen);
-
- if (!in_compat_syscall())
- err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
- USER_SOCKPTR(optval),
- USER_SOCKPTR(optlen),
- max_optlen, err);
-out_put:
fput_light(sock->file, fput_needed);
return err;
}
--
2.34.1
Leverage sockptr_t structure to pass universal pointer as argument, that
holds either a userspace pointer, or, a kernel pointer.
This makes this function flexible, so, we can mix and match user and
kernel space pointers. The main motivation for this change is to use it
in the io_uring {g,s}etsockopt(), which will use a userspace pointer for
*optval, but, a kernel value for optlen.
Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/bpf-cgroup.h | 5 +++--
kernel/bpf/cgroup.c | 20 +++++++++++---------
net/socket.c | 5 +++--
3 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 8506690dbb9c..f5b4fb6ed8c6 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -139,9 +139,10 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level,
int *optname, char __user *optval,
int *optlen, char **kernel_optval);
+
int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
- int optname, char __user *optval,
- int __user *optlen, int max_optlen,
+ int optname, sockptr_t optval,
+ sockptr_t optlen, int max_optlen,
int retval);
int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 5b2741aa0d9b..ebc8c58f7e46 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1875,8 +1875,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
}
int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
- int optname, char __user *optval,
- int __user *optlen, int max_optlen,
+ int optname, sockptr_t optval,
+ sockptr_t optlen, int max_optlen,
int retval)
{
struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
@@ -1903,8 +1903,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
* one that kernel returned as well to let
* BPF programs inspect the value.
*/
-
- if (get_user(ctx.optlen, optlen)) {
+ if (copy_from_sockptr(&ctx.optlen, optlen,
+ sizeof(ctx.optlen))) {
ret = -EFAULT;
goto out;
}
@@ -1915,8 +1915,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
}
orig_optlen = ctx.optlen;
- if (copy_from_user(ctx.optval, optval,
- min(ctx.optlen, max_optlen)) != 0) {
+ if (copy_from_sockptr(ctx.optval, optval,
+ min(ctx.optlen, max_optlen))) {
ret = -EFAULT;
goto out;
}
@@ -1930,7 +1930,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
if (ret < 0)
goto out;
- if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
+ if (!sockptr_is_null(optval) &&
+ (ctx.optlen > max_optlen || ctx.optlen < 0)) {
if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) {
pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n",
ctx.optlen, max_optlen);
@@ -1942,11 +1943,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
}
if (ctx.optlen != 0) {
- if (optval && copy_to_user(optval, ctx.optval, ctx.optlen)) {
+ if (!sockptr_is_null(optval) &&
+ copy_to_sockptr(optval, ctx.optval, ctx.optlen)) {
ret = -EFAULT;
goto out;
}
- if (put_user(ctx.optlen, optlen)) {
+ if (copy_to_sockptr(optlen, &ctx.optlen, sizeof(ctx.optlen))) {
ret = -EFAULT;
goto out;
}
diff --git a/net/socket.c b/net/socket.c
index 77f28328e387..6fda5d011521 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2355,8 +2355,9 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
if (!in_compat_syscall())
err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
- optval, optlen, max_optlen,
- err);
+ USER_SOCKPTR(optval),
+ USER_SOCKPTR(optlen),
+ max_optlen, err);
out_put:
fput_light(sock->file, fput_needed);
return err;
--
2.34.1
Create a new flag to track if the operation is running compat mode.
This basically check the context->compat and pass it to the issue_flags,
so, it could be queried later in the callbacks.
Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/io_uring.h | 1 +
io_uring/uring_cmd.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 106cdc55ff3b..bc53b35966ed 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -20,6 +20,7 @@ enum io_uring_cmd_flags {
IO_URING_F_SQE128 = (1 << 8),
IO_URING_F_CQE32 = (1 << 9),
IO_URING_F_IOPOLL = (1 << 10),
+ IO_URING_F_COMPAT = (1 << 11),
};
struct io_uring_cmd {
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 537795fddc87..60f843a357e0 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -128,6 +128,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
issue_flags |= IO_URING_F_SQE128;
if (ctx->flags & IORING_SETUP_CQE32)
issue_flags |= IO_URING_F_CQE32;
+ if (ctx->compat)
+ issue_flags |= IO_URING_F_COMPAT;
if (ctx->flags & IORING_SETUP_IOPOLL) {
if (!file->f_op->uring_cmd_iopoll)
return -EOPNOTSUPP;
--
2.34.1
Protect io_uring_cmd_sock() to be called if CONFIG_NET is not set. If
network is not enabled, but io_uring is, then we want to return
-EOPNOTSUPP for any possible socket operation.
This is helpful because io_uring_cmd_sock() can now call functions that
only exits if CONFIG_NET is enabled without having #ifdef CONFIG_NET
inside the function itself.
Signed-off-by: Breno Leitao <[email protected]>
---
io_uring/uring_cmd.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 60f843a357e0..6a91e1af7d05 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -167,6 +167,7 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
}
EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
+#if defined(CONFIG_NET)
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
struct socket *sock = cmd->file->private_data;
@@ -192,4 +193,11 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
return -EOPNOTSUPP;
}
}
+#else
+int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
EXPORT_SYMBOL_GPL(io_uring_cmd_sock);
--
2.34.1
Split __sys_setsockopt() into two functions by removing the core
logic into a sub-function (do_sock_setsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.
do_sock_setsockopt() will be called by io_uring setsockopt() command
operation in the following patch.
Signed-off-by: Breno Leitao <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
---
include/net/sock.h | 2 ++
net/socket.c | 39 +++++++++++++++++++++++++--------------
2 files changed, 27 insertions(+), 14 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 11d503417591..b059f9272303 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1861,6 +1861,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
sockptr_t optval, unsigned int optlen);
int sock_setsockopt(struct socket *sock, int level, int op,
sockptr_t optval, unsigned int optlen);
+int do_sock_setsockopt(struct socket *sock, bool compat, int level,
+ int optname, sockptr_t optval, int optlen);
int sk_getsockopt(struct sock *sk, int level, int optname,
sockptr_t optval, sockptr_t optlen);
diff --git a/net/socket.c b/net/socket.c
index 9ec9a8a07c0e..3bf29a27653f 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2261,31 +2261,21 @@ static bool sock_use_custom_sol_socket(const struct socket *sock)
return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
}
-/*
- * Set a socket option. Because we don't know the option lengths we have
- * to pass the user mode parameter for the protocols to sort out.
- */
-int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
- int optlen)
+int do_sock_setsockopt(struct socket *sock, bool compat, int level,
+ int optname, sockptr_t optval, int optlen)
{
- sockptr_t optval = USER_SOCKPTR(user_optval);
const struct proto_ops *ops;
char *kernel_optval = NULL;
- int err, fput_needed;
- struct socket *sock;
+ int err;
if (optlen < 0)
return -EINVAL;
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (!sock)
- return err;
-
err = security_socket_setsockopt(sock, level, optname);
if (err)
goto out_put;
- if (!in_compat_syscall())
+ if (!compat)
err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname,
optval, &optlen,
&kernel_optval);
@@ -2308,6 +2298,27 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
optlen);
kfree(kernel_optval);
out_put:
+ return err;
+}
+EXPORT_SYMBOL(do_sock_setsockopt);
+
+/* Set a socket option. Because we don't know the option lengths we have
+ * to pass the user mode parameter for the protocols to sort out.
+ */
+int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
+ int optlen)
+{
+ sockptr_t optval = USER_SOCKPTR(user_optval);
+ bool compat = in_compat_syscall();
+ int err, fput_needed;
+ struct socket *sock;
+
+ sock = sockfd_lookup_light(fd, &err, &fput_needed);
+ if (!sock)
+ return err;
+
+ err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
+
fput_light(sock->file, fput_needed);
return err;
}
--
2.34.1
Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
where a sockptr_t is either userspace or kernel space, and handled as
such.
Differently from the getsockopt(2), the optlen field is not a userspace
pointers. In getsockopt(2), userspace provides optlen pointer, which is
overwritten by the kernel. In this implementation, userspace passes a
u32, and the new value is returned in cqe->res. I.e., optlen is not a
pointer.
Important to say that userspace needs to keep the pointer alive until
the CQE is completed.
Signed-off-by: Breno Leitao <[email protected]>
---
include/uapi/linux/io_uring.h | 7 +++++++
io_uring/uring_cmd.c | 28 ++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 8e61f8b7c2ce..29efa02a4dcb 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -43,6 +43,10 @@ struct io_uring_sqe {
union {
__u64 addr; /* pointer to buffer or iovecs */
__u64 splice_off_in;
+ struct {
+ __u32 level;
+ __u32 optname;
+ };
};
__u32 len; /* buffer size or number of iovecs */
union {
@@ -79,6 +83,7 @@ struct io_uring_sqe {
union {
__s32 splice_fd_in;
__u32 file_index;
+ __u32 optlen;
struct {
__u16 addr_len;
__u16 __pad3[1];
@@ -89,6 +94,7 @@ struct io_uring_sqe {
__u64 addr3;
__u64 __pad2[1];
};
+ __u64 optval;
/*
* If the ring is initialized with IORING_SETUP_SQE128, then
* this field is used for 80 bytes of arbitrary command data
@@ -734,6 +740,7 @@ struct io_uring_recvmsg_out {
enum {
SOCKET_URING_OP_SIOCINQ = 0,
SOCKET_URING_OP_SIOCOUTQ,
+ SOCKET_URING_OP_GETSOCKOPT,
};
#ifdef __cplusplus
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 6a91e1af7d05..c373e05ba9ce 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -167,6 +167,32 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
}
EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
+INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
+ int optname));
+static inline int io_uring_cmd_getsockopt(struct socket *sock,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
+ bool compat = !!(issue_flags & IO_URING_F_COMPAT);
+ int optlen = READ_ONCE(cmd->sqe->optlen);
+ int optname = READ_ONCE(cmd->sqe->optname);
+ int level = READ_ONCE(cmd->sqe->level);
+ int err;
+
+ if (level != SOL_SOCKET)
+ return -EOPNOTSUPP;
+
+ err = do_sock_getsockopt(sock, compat, level, optname,
+ USER_SOCKPTR(optval),
+ KERNEL_SOCKPTR(&optlen));
+ if (err)
+ return err;
+
+ /* On success, return optlen */
+ return optlen;
+}
+
#if defined(CONFIG_NET)
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
@@ -189,6 +215,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
if (ret)
return ret;
return arg;
+ case SOCKET_URING_OP_GETSOCKOPT:
+ return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
default:
return -EOPNOTSUPP;
}
--
2.34.1
Change BPF setsockopt hook (__cgroup_bpf_run_filter_setsockopt()) to use
sockptr instead of user pointers. This brings flexibility to the
function, since it could be called with userspace or kernel pointers.
This change will allow the creation of a core sock_setsockopt, called
do_sock_setsockopt(), which will be called from both the system call path
and by io_uring command.
This also aligns with the getsockopt() counterpart, which is now using
sockptr_t universal pointer.
Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/bpf-cgroup.h | 2 +-
kernel/bpf/cgroup.c | 5 +++--
net/socket.c | 2 +-
3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index f5b4fb6ed8c6..cecfe8c99f28 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -137,7 +137,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
enum cgroup_bpf_attach_type atype);
int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level,
- int *optname, char __user *optval,
+ int *optname, sockptr_t optval,
int *optlen, char **kernel_optval);
int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index ebc8c58f7e46..f0dedd4f7f2e 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1785,7 +1785,7 @@ static bool sockopt_buf_allocated(struct bpf_sockopt_kern *ctx,
}
int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
- int *optname, char __user *optval,
+ int *optname, sockptr_t optval,
int *optlen, char **kernel_optval)
{
struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
@@ -1808,7 +1808,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
ctx.optlen = *optlen;
- if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) {
+ if (copy_from_sockptr(ctx.optval, optval,
+ min(*optlen, max_optlen))) {
ret = -EFAULT;
goto out;
}
diff --git a/net/socket.c b/net/socket.c
index 6fda5d011521..9ec9a8a07c0e 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2287,7 +2287,7 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
if (!in_compat_syscall())
err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname,
- user_optval, &optlen,
+ optval, &optlen,
&kernel_optval);
if (err < 0)
goto out_put;
--
2.34.1
Add initial support for SOCKET_URING_OP_SETSOCKOPT. This new command is
similar to setsockopt. This implementation leverages the function
do_sock_setsockopt(), which is shared with the setsockopt() system call
path.
Important to say that userspace needs to keep the pointer's memory alive
until the operation is completed. I.e, the memory could not be
deallocated before the CQE is returned to userspace.
Signed-off-by: Breno Leitao <[email protected]>
---
include/uapi/linux/io_uring.h | 1 +
io_uring/uring_cmd.c | 17 +++++++++++++++++
2 files changed, 18 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 29efa02a4dcb..3b443da353ba 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -741,6 +741,7 @@ enum {
SOCKET_URING_OP_SIOCINQ = 0,
SOCKET_URING_OP_SIOCOUTQ,
SOCKET_URING_OP_GETSOCKOPT,
+ SOCKET_URING_OP_SETSOCKOPT,
};
#ifdef __cplusplus
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index c373e05ba9ce..bec4730fb208 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -193,6 +193,21 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock,
return optlen;
}
+static inline int io_uring_cmd_setsockopt(struct socket *sock,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
+ bool compat = !!(issue_flags & IO_URING_F_COMPAT);
+ int optname = READ_ONCE(cmd->sqe->optname);
+ sockptr_t optval_s = USER_SOCKPTR(optval);
+ int optlen = READ_ONCE(cmd->sqe->optlen);
+ int level = READ_ONCE(cmd->sqe->level);
+
+ return do_sock_setsockopt(sock, compat, level, optname, optval_s,
+ optlen);
+}
+
#if defined(CONFIG_NET)
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
@@ -217,6 +232,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
return arg;
case SOCKET_URING_OP_GETSOCKOPT:
return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
+ case SOCKET_URING_OP_SETSOCKOPT:
+ return io_uring_cmd_setsockopt(sock, cmd, issue_flags);
default:
return -EOPNOTSUPP;
}
--
2.34.1
On Mon, 4 Sep 2023 09:24:53 -0700 Breno Leitao wrote:
> Patches 1-2: Modify the BPF hooks to support sockptr_t, so, these functions
> become flexible enough to accept user or kernel pointers for optval/optlen.
Have you seen:
https://lore.kernel.org/all/CAHk-=wgGV61xrG=gO0=dXH64o2TDWWrXn1mx-CX885JZ7h84Og@mail.gmail.com/
? I wasn't aware that Linus felt this way, now I wonder if having
sockptr_t spread will raise any red flags as this code flows back
to him.
On Tue, Sep 05, 2023 at 03:49:51PM -0700, Jakub Kicinski wrote:
> On Mon, 4 Sep 2023 09:24:53 -0700 Breno Leitao wrote:
> > Patches 1-2: Modify the BPF hooks to support sockptr_t, so, these functions
> > become flexible enough to accept user or kernel pointers for optval/optlen.
>
> Have you seen:
>
> https://lore.kernel.org/all/CAHk-=wgGV61xrG=gO0=dXH64o2TDWWrXn1mx-CX885JZ7h84Og@mail.gmail.com/
I haven't but I think it will not affect *much* this patchset.
> ? I wasn't aware that Linus felt this way, now I wonder if having
> sockptr_t spread will raise any red flags as this code flows back
> to him.
I can change the io_uring API in a way that we can avoid these
sockptr_t changes completely.
My plan is to mimic what getsockopt(2) is doing in io_uring cmd path, in
regard to optlen being an userpointer, instead of a value - which is
then translated to a KERNEL_SOCKPTR.
In this way, this change don't need to touch any sockptr field.
Thanks for the heads-up
On Tue, Sep 05, 2023 at 08:32:15AM -0400, Gabriel Krisman Bertazi wrote:
> Breno Leitao <[email protected]> writes:
>
> > Protect io_uring_cmd_sock() to be called if CONFIG_NET is not set. If
> > network is not enabled, but io_uring is, then we want to return
> > -EOPNOTSUPP for any possible socket operation.
> >
> > This is helpful because io_uring_cmd_sock() can now call functions that
> > only exits if CONFIG_NET is enabled without having #ifdef CONFIG_NET
> > inside the function itself.
> >
> > Signed-off-by: Breno Leitao <[email protected]>
> > ---
> > io_uring/uring_cmd.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> > index 60f843a357e0..6a91e1af7d05 100644
> > --- a/io_uring/uring_cmd.c
> > +++ b/io_uring/uring_cmd.c
> > @@ -167,6 +167,7 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
> > }
> > EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
> >
> > +#if defined(CONFIG_NET)
> > int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> > {
> > struct socket *sock = cmd->file->private_data;
> > @@ -192,4 +193,11 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> > return -EOPNOTSUPP;
> > }
> > }
> > +#else
> > +int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +#endif
> > +
> > EXPORT_SYMBOL_GPL(io_uring_cmd_sock);
>
> It doesn't make much sense to export the symbol on the !CONFIG_NET case.
> Usually, you'd make it a 'static inline' in the header file (even though
> it won't be ever inlined in this case):
>
> in include/linux/io_uring.h:
>
> #if defined(CONFIG_NET)
> int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags);
> #else
> static inline int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> {
> return -EOPNOTSUPP;
> }
> #endif
>
> But this is a minor detail. I'd say to consider doing it if you end up doing
> another spin of the patchset. Other than that, looks good to me.
This makes sense, and I will add the symbol export inside the
"if defined(CONFIG_NET)" block, since I need to respin this patchset to
address the sockptr_t concern.
Thanks for the good reviews!
Hello Jakub,
On Tue, Sep 05, 2023 at 03:49:51PM -0700, Jakub Kicinski wrote:
> On Mon, 4 Sep 2023 09:24:53 -0700 Breno Leitao wrote:
> > Patches 1-2: Modify the BPF hooks to support sockptr_t, so, these functions
> > become flexible enough to accept user or kernel pointers for optval/optlen.
>
> Have you seen:
>
> https://lore.kernel.org/all/CAHk-=wgGV61xrG=gO0=dXH64o2TDWWrXn1mx-CX885JZ7h84Og@mail.gmail.com/
>
> ? I wasn't aware that Linus felt this way, now I wonder if having
> sockptr_t spread will raise any red flags as this code flows back
> to him.
Thanks for the heads-up. I've been thinking about it for a while and I'd
like to hear what are the next steps here.
Let me first back up and state where we are, and what is the current
situation:
1) __sys_getsockopt() uses __user pointers for both optval and optlen
2) For io_uring command, Jens[1] suggested we get optlen from the io_uring
sqe, which is a kernel pointer/value.
Thus, we need to make the common code (callbacks) able to handle __user
and kernel pointers (for optlen, at least).
From a proto_ops callback perspective, ->setsockopt() uses sockptr.
int (*setsockopt)(struct socket *sock, int level,
int optname, sockptr_t optval,
unsigned int optlen);
Getsockopt() uses sockptr() for level=SOL_SOCKET:
int sk_getsockopt(struct sock *sk, int level, int optname,
sockptr_t optval, sockptr_t optlen)
But not for the other levels:
int (*getsockopt)(struct socket *sock, int level,
int optname, char __user *optval, int __user *optlen);
That said, if this patchset shouldn't use sockptr anymore, what is the
recommendation?
If we move this patchset to use iov_iter instead of sockptr, then I
understand we want to move *all* these callbacks to use iov_vec. Is this
the right direction?
Thanks for the guidance!
[1] https://lore.kernel.org/all/[email protected]/
On Fri, Oct 6, 2023 at 10:45 AM Breno Leitao <[email protected]> wrote:
>
> Hello Jakub,
>
> On Tue, Sep 05, 2023 at 03:49:51PM -0700, Jakub Kicinski wrote:
> > On Mon, 4 Sep 2023 09:24:53 -0700 Breno Leitao wrote:
> > > Patches 1-2: Modify the BPF hooks to support sockptr_t, so, these functions
> > > become flexible enough to accept user or kernel pointers for optval/optlen.
> >
> > Have you seen:
> >
> > https://lore.kernel.org/all/CAHk-=wgGV61xrG=gO0=dXH64o2TDWWrXn1mx-CX885JZ7h84Og@mail.gmail.com/
> >
> > ? I wasn't aware that Linus felt this way, now I wonder if having
> > sockptr_t spread will raise any red flags as this code flows back
> > to him.
>
> Thanks for the heads-up. I've been thinking about it for a while and I'd
> like to hear what are the next steps here.
>
> Let me first back up and state where we are, and what is the current
> situation:
>
> 1) __sys_getsockopt() uses __user pointers for both optval and optlen
> 2) For io_uring command, Jens[1] suggested we get optlen from the io_uring
> sqe, which is a kernel pointer/value.
>
> Thus, we need to make the common code (callbacks) able to handle __user
> and kernel pointers (for optlen, at least).
>
> From a proto_ops callback perspective, ->setsockopt() uses sockptr.
>
> int (*setsockopt)(struct socket *sock, int level,
> int optname, sockptr_t optval,
> unsigned int optlen);
>
> Getsockopt() uses sockptr() for level=SOL_SOCKET:
>
> int sk_getsockopt(struct sock *sk, int level, int optname,
> sockptr_t optval, sockptr_t optlen)
>
> But not for the other levels:
>
> int (*getsockopt)(struct socket *sock, int level,
> int optname, char __user *optval, int __user *optlen);
>
>
> That said, if this patchset shouldn't use sockptr anymore, what is the
> recommendation?
>
> If we move this patchset to use iov_iter instead of sockptr, then I
> understand we want to move *all* these callbacks to use iov_vec. Is this
> the right direction?
>
> Thanks for the guidance!
>
> [1] https://lore.kernel.org/all/[email protected]/
Since sockptr_t is already used by __sys_setsockopt and
__sys_setsockopt, patches 1 and 2 don't introduce any new sockptr code
paths.
setsockopt callbacks also already use sockptr as of commit
a7b75c5a8c41 ("net: pass a sockptr_t into ->setsockopt").
getsockopt callbacks do take user pointers, just not sockptr.
Is the only issue right now the optlen kernel pointer?
On Mon, Oct 09, 2023 at 03:11:05AM -0700, Willem de Bruijn wrote:
> On Fri, Oct 6, 2023 at 10:45 AM Breno Leitao <[email protected]> wrote:
> > Let me first back up and state where we are, and what is the current
> > situation:
> >
> > 1) __sys_getsockopt() uses __user pointers for both optval and optlen
> > 2) For io_uring command, Jens[1] suggested we get optlen from the io_uring
> > sqe, which is a kernel pointer/value.
> >
> > Thus, we need to make the common code (callbacks) able to handle __user
> > and kernel pointers (for optlen, at least).
> >
> > From a proto_ops callback perspective, ->setsockopt() uses sockptr.
> >
> > int (*setsockopt)(struct socket *sock, int level,
> > int optname, sockptr_t optval,
> > unsigned int optlen);
> >
> > Getsockopt() uses sockptr() for level=SOL_SOCKET:
> >
> > int sk_getsockopt(struct sock *sk, int level, int optname,
> > sockptr_t optval, sockptr_t optlen)
> >
> > But not for the other levels:
> >
> > int (*getsockopt)(struct socket *sock, int level,
> > int optname, char __user *optval, int __user *optlen);
> >
> >
> > That said, if this patchset shouldn't use sockptr anymore, what is the
> > recommendation?
> >
> > If we move this patchset to use iov_iter instead of sockptr, then I
> > understand we want to move *all* these callbacks to use iov_vec. Is this
> > the right direction?
> >
> > Thanks for the guidance!
> >
> > [1] https://lore.kernel.org/all/[email protected]/
>
> Since sockptr_t is already used by __sys_setsockopt and
> __sys_setsockopt, patches 1 and 2 don't introduce any new sockptr code
> paths.
>
> setsockopt callbacks also already use sockptr as of commit
> a7b75c5a8c41 ("net: pass a sockptr_t into ->setsockopt").
>
> getsockopt callbacks do take user pointers, just not sockptr.
>
> Is the only issue right now the optlen kernel pointer?
Correct. The current discussion is only related to optlen in the
getsockopt() callbacks (invoked when level != SOL_SOCKET). Everything
else (getsockopt(level=SOL_SOCKET..) and setsockopt) is using sockptr.
Is it bad if we review/merge this code as is (using sockptr), and start
the iov_iter/getsockopt() refactor in a follow-up thread?
Thanks!
On Mon, 9 Oct 2023 06:28:00 -0700 Breno Leitao wrote:
> Correct. The current discussion is only related to optlen in the
> getsockopt() callbacks (invoked when level != SOL_SOCKET). Everything
> else (getsockopt(level=SOL_SOCKET..) and setsockopt) is using sockptr.
>
> Is it bad if we review/merge this code as is (using sockptr), and start
> the iov_iter/getsockopt() refactor in a follow-up thread?
Sorry for the delay, I only looked at the code now :S
Agreed, that there's no need to worry about the sockptr spread
in this series. It looks good to go in.