This adds the capability for seccomp notifier listeners to add file
descriptors in response to a seccomp notification. This is useful for
syscalls in which the previous capabilities were not sufficient. The
current mechanism works well for syscalls that either have side effects
that are system / namespace wide (mount), or that operate on a specific
set of registers (reboot, mknod), and don't require dereferencing pointers.
The problem with derefencing pointers in a supervisor is that it leaves
us vulnerable to TOC-TOU [1] style attacks. For syscalls that had a direct
effect on file descriptors pidfd_getfd was added, allowing for those file
descriptors to be directly operated upon by the supervisor [2].
Unfortunately, this leaves system calls which return file descriptors
out of the picture. These are fairly common syscalls, such as openat,
socket, and perf_event_open that return file descriptors, and have
arguments that are pointers. These require that the supervisor is able to
verify the arguments, make the call on behalf of the process on hand,
and pass back the resulting file descriptor. This is where addfd comes
into play.
There is an additional flag that allows you to "set" an FD, rather than
add it with an arbitrary number. This has dup2 style semantics, and
installs the new file at that file descriptor, and atomically closes
the old one if it existed. This is useful for a particular use case
that we have, in which we want to swap out AF_INET sockets for AF_UNIX,
AF_INET6, and sockets in another namespace when doing "upconversion".
My specific usecase at Netflix is to enable our IPv4-IPv6 transition
mechanism, in which we our namespaces have no real IPv4 reachability,
and when it comes time to do a connect(2), we get a socket from a
namespace with global IPv4 reachability.
In addition, we intend to use it for our servicemesh, and where our
service mesh needs to intercept traffic ingress traffic, the addfd
capability will act as a mechanism to do socket activation.
Addfd is not implemented as a separate syscall, a la pidfd_getfd, as
VFS makes some optimizations in regards to the fdtable, and assumes
that they are not modified by external processes. Although a mechanism
that scheduled something in the context of the task could work, it is
somewhat simpler to do it in the context of the ioctl as we control
the task while in kernel. In addition there are not obvious needs
for this beyond seccomp notifier.
This mechanism leaves a potential issue that if the manager is
interrupted while injecting FDs, the child process will be left with
leaked / dangling FDs. This may lead to undefined behaviour. A
mechanism to work around this is to extend the structure and add a
"rollback" mechanism for FDs to be closed if things fail.
This introduces a new helper -- file_receive, which is responsible
for moving fds across processes. The helper replaces code in
SCM_RIGHTS. In SCM_RIGHTS compat codepath there was a bug that
resulted in this not being set all. This fixes that bug, and should
be cherry-picked into long-term. The file_receive change should
probably go into stable. The file_receive code also replaced the
receive fd logic in pidfd_getfd. This is somewhat contrary to my
original view[5], but I think it is best for the principal of
least surprise to adopt it. This should be cherry-picked into stable.
I tested this on amd64 with the x86-64 and x32 ABIs.
Given there is no testing infrastructure for cgroup v1, I opted to
forgo adding new tests there as it is considered deprecated.
Changes since v2:
* Introducion of the file_receive helper which hoists out logic to
manipulate file descriptors outside of seccomp.c to file.c
* Small fix that manipulated the socket's cgroup even when the
receive failed
* seccomp struct layout
Changes since v1:
* find_notification has been cleaned up slightly, and it replaces a use
case in send as well.
* Fixes ref counting rules to get / release references in the ioctl side,
rather than the seccomp notifier side [3].
* Removes the optional move flag, and opts into SCM_RIGHTS
* Rearranges the seccomp_notif_addfd datastructure for greater user
clarity [4]. In order to avoid unnamed padding it makes size u64,
which is a little bit of a waste of space.
* Changes error codes to return ESRCH upon the process going away on
notification, and EINPROGRESS is the notification is in an unexpected
state (and added tests for this behaviour)
[1]: https://lore.kernel.org/lkml/[email protected]/
[2]: https://lore.kernel.org/lkml/[email protected]/
[3]: https://lore.kernel.org/lkml/[email protected]/
[4]: https://lore.kernel.org/lkml/20200525135036.vp2nmmx42y7dfznf@wittgenstein/
[5]: https://lore.kernel.org/lkml/[email protected]/
Sargun Dhillon (4):
fs, net: Standardize on file_receive helper to move fds across
processes
pid: Use file_receive helper to copy FDs
seccomp: Introduce addfd ioctl to seccomp user notifier
selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
fs/file.c | 56 ++++++
include/linux/file.h | 2 +
include/uapi/linux/seccomp.h | 25 +++
kernel/pid.c | 20 +-
kernel/seccomp.c | 184 +++++++++++++++++-
net/compat.c | 10 +-
net/core/scm.c | 14 +-
tools/testing/selftests/seccomp/seccomp_bpf.c | 183 +++++++++++++++++
8 files changed, 467 insertions(+), 27 deletions(-)
--
2.25.1
Test whether we can add file descriptors in response to notifications.
This injects the file descriptors via notifications, and then uses
kcmp to determine whether or not it has been successful.
It also includes some basic sanity checking for arguments.
Signed-off-by: Sargun Dhillon <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Chris Palmer <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Robert Sesek <[email protected]>
Cc: Tycho Andersen <[email protected]>
Cc: Matt Denton <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 183 ++++++++++++++++++
1 file changed, 183 insertions(+)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 402ccb3a4e52..a786b1734ddd 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -45,6 +45,7 @@
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <linux/kcmp.h>
+#include <sys/resource.h>
#include <unistd.h>
#include <sys/syscall.h>
@@ -182,6 +183,12 @@ struct seccomp_metadata {
#define SECCOMP_IOCTL_NOTIF_SEND SECCOMP_IOWR(1, \
struct seccomp_notif_resp)
#define SECCOMP_IOCTL_NOTIF_ID_VALID SECCOMP_IOR(2, __u64)
+/* On success, the return value is the remote process's added fd number */
+#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOR(3, \
+ struct seccomp_notif_addfd)
+
+/* valid flags for seccomp_notif_addfd */
+#define SECCOMP_ADDFD_FLAG_SETFD (1UL << 0) /* Specify remote fd */
struct seccomp_notif {
__u64 id;
@@ -202,6 +209,15 @@ struct seccomp_notif_sizes {
__u16 seccomp_notif_resp;
__u16 seccomp_data;
};
+
+struct seccomp_notif_addfd {
+ __u64 size;
+ __u64 id;
+ __u32 flags;
+ __u32 srcfd;
+ __u32 newfd;
+ __u32 newfd_flags;
+};
#endif
#ifndef PTRACE_EVENTMSG_SYSCALL_ENTRY
@@ -3822,6 +3838,173 @@ TEST(user_notification_filter_empty_threaded)
EXPECT_GT((pollfd.revents & POLLHUP) ?: 0, 0);
}
+TEST(user_notification_sendfd)
+{
+ pid_t pid;
+ long ret;
+ int status, listener, memfd;
+ struct seccomp_notif_addfd addfd = {};
+ struct seccomp_notif req = {};
+ struct seccomp_notif_resp resp = {};
+ /* 100 ms */
+ struct timespec delay = { .tv_nsec = 100000000 };
+
+ memfd = memfd_create("test", 0);
+ ASSERT_GE(memfd, 0);
+
+ ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+ ASSERT_EQ(0, ret) {
+ TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+ }
+
+ /* Check that the basic notification machinery works */
+ listener = user_trap_syscall(__NR_getppid,
+ SECCOMP_FILTER_FLAG_NEW_LISTENER);
+ ASSERT_GE(listener, 0);
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ if (syscall(__NR_getppid) != USER_NOTIF_MAGIC)
+ exit(1);
+ exit(syscall(__NR_getppid) != USER_NOTIF_MAGIC);
+ }
+
+ ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0);
+
+ addfd.size = sizeof(addfd);
+ addfd.srcfd = memfd;
+ addfd.newfd_flags = O_CLOEXEC;
+ addfd.newfd = 0;
+ addfd.id = req.id;
+ addfd.flags = 0xff;
+
+ /* Verify bad flags cannot be set */
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+ EXPECT_EQ(errno, EINVAL);
+
+ /* Verify that remote_fd cannot be set without setting flags */
+ addfd.flags = 0;
+ addfd.newfd = 1;
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+ EXPECT_EQ(errno, EINVAL);
+
+ /* Verify we can set an arbitrary remote fd */
+ addfd.newfd = 0;
+
+ ret = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd);
+ EXPECT_GE(ret, 0);
+ EXPECT_EQ(filecmp(getpid(), pid, memfd, ret), 0);
+
+ /* Verify we can set a specific remote fd */
+ addfd.newfd = 42;
+ addfd.flags = SECCOMP_ADDFD_FLAG_SETFD;
+
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), 42);
+ EXPECT_EQ(filecmp(getpid(), pid, memfd, 42), 0);
+
+ resp.id = req.id;
+ resp.error = 0;
+ resp.val = USER_NOTIF_MAGIC;
+
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0);
+
+ /*
+ * This sets the ID of the ADD FD to the last request plus 1. The
+ * notification ID increments 1 per notification.
+ */
+ addfd.id = req.id + 1;
+
+ /* This spins until the underlying notification is generated */
+ while (ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd) != -1 &&
+ errno != -EINPROGRESS)
+ nanosleep(&delay, NULL);
+
+ memset(&req, 0, sizeof(req));
+ ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0);
+ ASSERT_EQ(addfd.id, req.id);
+
+ resp.id = req.id;
+ resp.error = 0;
+ resp.val = USER_NOTIF_MAGIC;
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0);
+
+
+ EXPECT_EQ(waitpid(pid, &status, 0), pid);
+ EXPECT_EQ(true, WIFEXITED(status));
+ EXPECT_EQ(0, WEXITSTATUS(status));
+
+ close(memfd);
+}
+
+TEST(user_notification_sendfd_rlimit)
+{
+ pid_t pid;
+ long ret;
+ int status, listener, memfd;
+ struct seccomp_notif_addfd addfd = {};
+ struct seccomp_notif req = {};
+ struct seccomp_notif_resp resp = {};
+ const struct rlimit lim = {
+ .rlim_cur = 0,
+ .rlim_max = 0,
+ };
+
+ memfd = memfd_create("test", 0);
+ ASSERT_GE(memfd, 0);
+
+ ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+ ASSERT_EQ(0, ret) {
+ TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+ }
+
+ /* Check that the basic notification machinery works */
+ listener = user_trap_syscall(__NR_getppid,
+ SECCOMP_FILTER_FLAG_NEW_LISTENER);
+ ASSERT_GE(listener, 0);
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0)
+ exit(syscall(__NR_getppid) != USER_NOTIF_MAGIC);
+
+
+ ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0);
+
+ ASSERT_EQ(prlimit(pid, RLIMIT_NOFILE, &lim, NULL), 0);
+
+ addfd.size = sizeof(addfd);
+ addfd.srcfd = memfd;
+ addfd.newfd_flags = O_CLOEXEC;
+ addfd.newfd = 0;
+ addfd.id = req.id;
+ addfd.flags = 0;
+
+ /* Should probably spot check /proc/sys/fs/file-nr */
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+ EXPECT_EQ(errno, EMFILE);
+
+ addfd.newfd = 100;
+ addfd.flags = SECCOMP_ADDFD_FLAG_SETFD;
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+ EXPECT_EQ(errno, EBADF);
+
+ resp.id = req.id;
+ resp.error = 0;
+ resp.val = USER_NOTIF_MAGIC;
+
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0);
+
+
+ EXPECT_EQ(waitpid(pid, &status, 0), pid);
+ EXPECT_EQ(true, WIFEXITED(status));
+ EXPECT_EQ(0, WEXITSTATUS(status));
+
+ close(memfd);
+}
+
/*
* TODO:
* - expand NNP testing
--
2.25.1
The code to copy file descriptors was duplicated in pidfd_getfd.
Rather than continue to duplicate it, this hoists the code out of
kernel/pid.c and uses the newly added file_receive helper.
Earlier, when this was implemented there was some back-and-forth
about how the semantics should work around copying around file
descriptors [1], and it was decided that the default behaviour
should be to not modify cgroup data. As a matter of least surprise,
this approach follows the default semantics as presented by SCM_RIGHTS.
In the future, a flag can be added to avoid manipulating the cgroup
data on copy.
[1]: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Sargun Dhillon <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Daniel Wagner <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Tycho Andersen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
kernel/pid.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/kernel/pid.c b/kernel/pid.c
index c835b844aca7..1642cf940aa1 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -606,7 +606,7 @@ static int pidfd_getfd(struct pid *pid, int fd)
{
struct task_struct *task;
struct file *file;
- int ret;
+ int ret, err;
task = get_pid_task(pid, PIDTYPE_PID);
if (!task)
@@ -617,18 +617,16 @@ static int pidfd_getfd(struct pid *pid, int fd)
if (IS_ERR(file))
return PTR_ERR(file);
- ret = security_file_receive(file);
- if (ret) {
- fput(file);
- return ret;
- }
-
ret = get_unused_fd_flags(O_CLOEXEC);
- if (ret < 0)
- fput(file);
- else
- fd_install(ret, file);
+ if (ret >= 0) {
+ err = file_receive(ret, file);
+ if (err) {
+ put_unused_fd(ret);
+ ret = err;
+ }
+ }
+ fput(file);
return ret;
}
--
2.25.1
Previously there were two chunks of code where the logic to receive file
descriptors was duplicated in net. The compat version of copying
file descriptors via SCM_RIGHTS did not have logic to update cgroups.
Logic to change the cgroup data was added in:
commit 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
commit d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
This was not copied to the compat path. This commit fixes that, and thus
should be cherry-picked into stable.
This introduces a helper (file_receive) which encapsulates the logic for
handling calling security hooks as well as manipulating cgroup information.
This helper can then be used other places in the kernel where file
descriptors are copied between processes
I tested cgroup classid setting on both the compat (x32) path, and the
native path to ensure that when moving the file descriptor the classid
is set.
Signed-off-by: Sargun Dhillon <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Daniel Wagner <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Jann Horn <[email protected]>,
Cc: John Fastabend <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Tycho Andersen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
fs/file.c | 35 +++++++++++++++++++++++++++++++++++
include/linux/file.h | 1 +
net/compat.c | 10 +++++-----
net/core/scm.c | 14 ++++----------
4 files changed, 45 insertions(+), 15 deletions(-)
diff --git a/fs/file.c b/fs/file.c
index abb8b7081d7a..5afd76fca8c2 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -18,6 +18,9 @@
#include <linux/bitops.h>
#include <linux/spinlock.h>
#include <linux/rcupdate.h>
+#include <net/sock.h>
+#include <net/netprio_cgroup.h>
+#include <net/cls_cgroup.h>
unsigned int sysctl_nr_open __read_mostly = 1024*1024;
unsigned int sysctl_nr_open_min = BITS_PER_LONG;
@@ -931,6 +934,38 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
return err;
}
+/*
+ * File Receive - Receive a file from another process
+ *
+ * This function is designed to receive files from other tasks. It encapsulates
+ * logic around security and cgroups. The file descriptor provided must be a
+ * freshly allocated (unused) file descriptor.
+ *
+ * This helper does not consume a reference to the file, so the caller must put
+ * their reference.
+ *
+ * Returns 0 upon success.
+ */
+int file_receive(int fd, struct file *file)
+{
+ struct socket *sock;
+ int err;
+
+ err = security_file_receive(file);
+ if (err)
+ return err;
+
+ fd_install(fd, get_file(file));
+
+ sock = sock_from_file(file, &err);
+ if (sock) {
+ sock_update_netprioidx(&sock->sk->sk_cgrp_data);
+ sock_update_classid(&sock->sk->sk_cgrp_data);
+ }
+
+ return 0;
+}
+
static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
{
int err = -EBADF;
diff --git a/include/linux/file.h b/include/linux/file.h
index 142d102f285e..7b56dc23e560 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -94,4 +94,5 @@ extern void fd_install(unsigned int fd, struct file *file);
extern void flush_delayed_fput(void);
extern void __fput_sync(struct file *);
+extern int file_receive(int fd, struct file *file);
#endif /* __LINUX_FILE_H */
diff --git a/net/compat.c b/net/compat.c
index 4bed96e84d9a..8ac0e7e09208 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -293,9 +293,6 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
for (i = 0, cmfptr = (int __user *) CMSG_COMPAT_DATA(cm); i < fdmax; i++, cmfptr++) {
int new_fd;
- err = security_file_receive(fp[i]);
- if (err)
- break;
err = get_unused_fd_flags(MSG_CMSG_CLOEXEC & kmsg->msg_flags
? O_CLOEXEC : 0);
if (err < 0)
@@ -306,8 +303,11 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
put_unused_fd(new_fd);
break;
}
- /* Bump the usage count and install the file. */
- fd_install(new_fd, get_file(fp[i]));
+ err = file_receive(new_fd, fp[i]);
+ if (err) {
+ put_unused_fd(new_fd);
+ break;
+ }
}
if (i > 0) {
diff --git a/net/core/scm.c b/net/core/scm.c
index dc6fed1f221c..ba93abf2881b 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -303,11 +303,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
for (i=0, cmfptr=(__force int __user *)CMSG_DATA(cm); i<fdmax;
i++, cmfptr++)
{
- struct socket *sock;
int new_fd;
- err = security_file_receive(fp[i]);
- if (err)
- break;
err = get_unused_fd_flags(MSG_CMSG_CLOEXEC & msg->msg_flags
? O_CLOEXEC : 0);
if (err < 0)
@@ -318,13 +314,11 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
put_unused_fd(new_fd);
break;
}
- /* Bump the usage count and install the file. */
- sock = sock_from_file(fp[i], &err);
- if (sock) {
- sock_update_netprioidx(&sock->sk->sk_cgrp_data);
- sock_update_classid(&sock->sk->sk_cgrp_data);
+ err = file_receive(new_fd, fp[i]);
+ if (err) {
+ put_unused_fd(new_fd);
+ break;
}
- fd_install(new_fd, get_file(fp[i]));
}
if (i > 0)
--
2.25.1
Thanks for working on this, Sargun! I'll briefly interrupt the code
review to explain why this is an important enhancement for our
application. I’m posting this message on behalf of the Chromium project,
which powers Google Chrome and several other open-source browsers (Edge,
Brave, Yandex Browser, etc.).
Chromium uses Seccomp-BPF to reduce the kernel attack surface available
to our child processes. These processes directly parse and manipulate
untrustworthy data downloaded from the Internet, an inherently risky
activity. We rely on Seccomp-BPF (along with other sandboxing
technologies) to help limit the potential damage to a system in the
event that the process is compromised.
One major blocking issue for us is filesystem-related syscalls. Our
default sandbox policy does not permit any filesystem access. Certain
child processes are permitted to access files on an explicit allow-list,
which we enforce by using SECCOMP_RET_TRAP on openat() (and other
syscalls that take a pathname), the signal handler for which forwards
the request to an unsandboxed broker process, and then passes the
resulting FD back via an SCM_RIGHTS message. However, this doesn’t work
if signals are disabled when the child process performs the system call.
Significantly, glibc is beginning to disable signals around entire
library functions. So, we are investigating the SECCOMP_RET_USER_NOTIF
facilities to replace our current signals-based mechanism.
The issue is that there is currently no good way to pass an FD from the
tracer back to the tracee when using SECCOMP_RET_USER_NOTIF, even though
this is a normal task for a system call implementation.
So, we need to switch to SECCOMP_RET_USER_NOTIF before glibc
incompatibilities invisibly break our Linux users. We are very eager to
see the addfd ioctl merged, since it would allow us to implement our file
brokering sandbox.
Regards,
Robert Sesek
Chromium Platform Security Team
On Tue, Jun 2, 2020 at 9:13 PM Sargun Dhillon <[email protected]> wrote:
>
> This adds the capability for seccomp notifier listeners to add file
> descriptors in response to a seccomp notification. This is useful for
> syscalls in which the previous capabilities were not sufficient. The
> current mechanism works well for syscalls that either have side effects
> that are system / namespace wide (mount), or that operate on a specific
> set of registers (reboot, mknod), and don't require dereferencing pointers.
> The problem with derefencing pointers in a supervisor is that it leaves
> us vulnerable to TOC-TOU [1] style attacks. For syscalls that had a direct
> effect on file descriptors pidfd_getfd was added, allowing for those file
> descriptors to be directly operated upon by the supervisor [2].
>
> Unfortunately, this leaves system calls which return file descriptors
> out of the picture. These are fairly common syscalls, such as openat,
> socket, and perf_event_open that return file descriptors, and have
> arguments that are pointers. These require that the supervisor is able to
> verify the arguments, make the call on behalf of the process on hand,
> and pass back the resulting file descriptor. This is where addfd comes
> into play.
>
> There is an additional flag that allows you to "set" an FD, rather than
> add it with an arbitrary number. This has dup2 style semantics, and
> installs the new file at that file descriptor, and atomically closes
> the old one if it existed. This is useful for a particular use case
> that we have, in which we want to swap out AF_INET sockets for AF_UNIX,
> AF_INET6, and sockets in another namespace when doing "upconversion".
>
> My specific usecase at Netflix is to enable our IPv4-IPv6 transition
> mechanism, in which we our namespaces have no real IPv4 reachability,
> and when it comes time to do a connect(2), we get a socket from a
> namespace with global IPv4 reachability.
>
> In addition, we intend to use it for our servicemesh, and where our
> service mesh needs to intercept traffic ingress traffic, the addfd
> capability will act as a mechanism to do socket activation.
>
> Addfd is not implemented as a separate syscall, a la pidfd_getfd, as
> VFS makes some optimizations in regards to the fdtable, and assumes
> that they are not modified by external processes. Although a mechanism
> that scheduled something in the context of the task could work, it is
> somewhat simpler to do it in the context of the ioctl as we control
> the task while in kernel. In addition there are not obvious needs
> for this beyond seccomp notifier.
>
> This mechanism leaves a potential issue that if the manager is
> interrupted while injecting FDs, the child process will be left with
> leaked / dangling FDs. This may lead to undefined behaviour. A
> mechanism to work around this is to extend the structure and add a
> "rollback" mechanism for FDs to be closed if things fail.
>
> This introduces a new helper -- file_receive, which is responsible
> for moving fds across processes. The helper replaces code in
> SCM_RIGHTS. In SCM_RIGHTS compat codepath there was a bug that
> resulted in this not being set all. This fixes that bug, and should
> be cherry-picked into long-term. The file_receive change should
> probably go into stable. The file_receive code also replaced the
> receive fd logic in pidfd_getfd. This is somewhat contrary to my
> original view[5], but I think it is best for the principal of
> least surprise to adopt it. This should be cherry-picked into stable.
>
> I tested this on amd64 with the x86-64 and x32 ABIs.
>
> Given there is no testing infrastructure for cgroup v1, I opted to
> forgo adding new tests there as it is considered deprecated.
>
> Changes since v2:
> * Introducion of the file_receive helper which hoists out logic to
> manipulate file descriptors outside of seccomp.c to file.c
> * Small fix that manipulated the socket's cgroup even when the
> receive failed
> * seccomp struct layout
> Changes since v1:
> * find_notification has been cleaned up slightly, and it replaces a use
> case in send as well.
> * Fixes ref counting rules to get / release references in the ioctl side,
> rather than the seccomp notifier side [3].
> * Removes the optional move flag, and opts into SCM_RIGHTS
> * Rearranges the seccomp_notif_addfd datastructure for greater user
> clarity [4]. In order to avoid unnamed padding it makes size u64,
> which is a little bit of a waste of space.
> * Changes error codes to return ESRCH upon the process going away on
> notification, and EINPROGRESS is the notification is in an unexpected
> state (and added tests for this behaviour)
>
> [1]: https://lore.kernel.org/lkml/[email protected]/
> [2]: https://lore.kernel.org/lkml/[email protected]/
> [3]: https://lore.kernel.org/lkml/[email protected]/
> [4]: https://lore.kernel.org/lkml/20200525135036.vp2nmmx42y7dfznf@wittgenstein/
> [5]: https://lore.kernel.org/lkml/[email protected]/
>
> Sargun Dhillon (4):
> fs, net: Standardize on file_receive helper to move fds across
> processes
> pid: Use file_receive helper to copy FDs
> seccomp: Introduce addfd ioctl to seccomp user notifier
> selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
>
> fs/file.c | 56 ++++++
> include/linux/file.h | 2 +
> include/uapi/linux/seccomp.h | 25 +++
> kernel/pid.c | 20 +-
> kernel/seccomp.c | 184 +++++++++++++++++-
> net/compat.c | 10 +-
> net/core/scm.c | 14 +-
> tools/testing/selftests/seccomp/seccomp_bpf.c | 183 +++++++++++++++++
> 8 files changed, 467 insertions(+), 27 deletions(-)
>
> --
> 2.25.1
>
On Wed, Jun 3, 2020 at 4:42 PM Kees Cook <[email protected]> wrote:
>
> On Tue, Jun 02, 2020 at 06:10:40PM -0700, Sargun Dhillon wrote:
> > Sargun Dhillon (4):
> > fs, net: Standardize on file_receive helper to move fds across
> > processes
> > pid: Use file_receive helper to copy FDs
>
> The fixes (that should add open-coded cgroups stuff) should be separate
> patches so they can be backported.
Patch 1/4, and 2/4 are separated so they can be backported. Patch 1 should
go into long term, and patch 2 should land in stable.
Do you see anything in 1/4, and 2/4 that shouldn't be there?
>
> The helper doesn't take the __user pointer I thought we'd agreed it
> should to avoid changing any SCM_RIGHTS behaviors?
>
It doesn't change the SCM_RIGHTS behaviour because it continues
to have the logic which allocates the file descriptor outside of the
helper.
1. Allocate FD (this happens in scm.c)
2. Copy FD # to userspace (this happens in scm.c)
3. Receive FD (this happens in the new helper)
> > seccomp: Introduce addfd ioctl to seccomp user notifier
> > selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
>
> Otherwise, yeah, this should be good.
>
> --
> Kees Cook
On Tue, Jun 02, 2020 at 06:10:40PM -0700, Sargun Dhillon wrote:
> Sargun Dhillon (4):
> fs, net: Standardize on file_receive helper to move fds across
> processes
> pid: Use file_receive helper to copy FDs
The fixes (that should add open-coded cgroups stuff) should be separate
patches so they can be backported.
The helper doesn't take the __user pointer I thought we'd agreed it
should to avoid changing any SCM_RIGHTS behaviors?
> seccomp: Introduce addfd ioctl to seccomp user notifier
> selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
Otherwise, yeah, this should be good.
--
Kees Cook
On Tue, Jun 02, 2020 at 06:10:41PM -0700, Sargun Dhillon wrote:
> Previously there were two chunks of code where the logic to receive file
> descriptors was duplicated in net. The compat version of copying
> file descriptors via SCM_RIGHTS did not have logic to update cgroups.
> Logic to change the cgroup data was added in:
> commit 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
> commit d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
>
> This was not copied to the compat path. This commit fixes that, and thus
> should be cherry-picked into stable.
>
> This introduces a helper (file_receive) which encapsulates the logic for
> handling calling security hooks as well as manipulating cgroup information.
> This helper can then be used other places in the kernel where file
> descriptors are copied between processes
>
> I tested cgroup classid setting on both the compat (x32) path, and the
> native path to ensure that when moving the file descriptor the classid
> is set.
>
> Signed-off-by: Sargun Dhillon <[email protected]>
> Suggested-by: Kees Cook <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Christian Brauner <[email protected]>
> Cc: Daniel Wagner <[email protected]>
> Cc: David S. Miller <[email protected]>
> Cc: Jann Horn <[email protected]>,
> Cc: John Fastabend <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Cc: Tycho Andersen <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> fs/file.c | 35 +++++++++++++++++++++++++++++++++++
> include/linux/file.h | 1 +
> net/compat.c | 10 +++++-----
> net/core/scm.c | 14 ++++----------
> 4 files changed, 45 insertions(+), 15 deletions(-)
>
> diff --git a/fs/file.c b/fs/file.c
> index abb8b7081d7a..5afd76fca8c2 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -18,6 +18,9 @@
> #include <linux/bitops.h>
> #include <linux/spinlock.h>
> #include <linux/rcupdate.h>
> +#include <net/sock.h>
> +#include <net/netprio_cgroup.h>
> +#include <net/cls_cgroup.h>
>
> unsigned int sysctl_nr_open __read_mostly = 1024*1024;
> unsigned int sysctl_nr_open_min = BITS_PER_LONG;
> @@ -931,6 +934,38 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
> return err;
> }
>
> +/*
> + * File Receive - Receive a file from another process
> + *
> + * This function is designed to receive files from other tasks. It encapsulates
> + * logic around security and cgroups. The file descriptor provided must be a
> + * freshly allocated (unused) file descriptor.
> + *
> + * This helper does not consume a reference to the file, so the caller must put
> + * their reference.
> + *
> + * Returns 0 upon success.
> + */
> +int file_receive(int fd, struct file *file)
This is all just a remote version of fd_install(), yet it deviates from
fd_install()'s semantics and naming. That's not great imho. What about
naming this something like:
fd_install_received()
and move the get_file() out of there so it has the same semantics as
fd_install(). It seems rather dangerous to have a function like
fd_install() that consumes a reference once it returned and another
version of this that is basically the same thing but doesn't consume a
reference because it takes its own. Seems an invitation for confusion.
Does that make sense?
> +{
> + struct socket *sock;
> + int err;
> +
> + err = security_file_receive(file);
> + if (err)
> + return err;
> +
> + fd_install(fd, get_file(file));
> +
> + sock = sock_from_file(file, &err);
> + if (sock) {
> + sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> + sock_update_classid(&sock->sk->sk_cgrp_data);
> + }
> +
> + return 0;
> +}
> +
> static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
> {
> int err = -EBADF;
> diff --git a/include/linux/file.h b/include/linux/file.h
> index 142d102f285e..7b56dc23e560 100644
> --- a/include/linux/file.h
> +++ b/include/linux/file.h
> @@ -94,4 +94,5 @@ extern void fd_install(unsigned int fd, struct file *file);
> extern void flush_delayed_fput(void);
> extern void __fput_sync(struct file *);
>
> +extern int file_receive(int fd, struct file *file);
> #endif /* __LINUX_FILE_H */
> diff --git a/net/compat.c b/net/compat.c
> index 4bed96e84d9a..8ac0e7e09208 100644
> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -293,9 +293,6 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
>
> for (i = 0, cmfptr = (int __user *) CMSG_COMPAT_DATA(cm); i < fdmax; i++, cmfptr++) {
> int new_fd;
> - err = security_file_receive(fp[i]);
> - if (err)
> - break;
> err = get_unused_fd_flags(MSG_CMSG_CLOEXEC & kmsg->msg_flags
> ? O_CLOEXEC : 0);
> if (err < 0)
> @@ -306,8 +303,11 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
> put_unused_fd(new_fd);
> break;
> }
> - /* Bump the usage count and install the file. */
> - fd_install(new_fd, get_file(fp[i]));
> + err = file_receive(new_fd, fp[i]);
> + if (err) {
> + put_unused_fd(new_fd);
> + break;
> + }
> }
>
> if (i > 0) {
> diff --git a/net/core/scm.c b/net/core/scm.c
> index dc6fed1f221c..ba93abf2881b 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -303,11 +303,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
> for (i=0, cmfptr=(__force int __user *)CMSG_DATA(cm); i<fdmax;
> i++, cmfptr++)
> {
> - struct socket *sock;
> int new_fd;
> - err = security_file_receive(fp[i]);
> - if (err)
> - break;
> err = get_unused_fd_flags(MSG_CMSG_CLOEXEC & msg->msg_flags
> ? O_CLOEXEC : 0);
> if (err < 0)
> @@ -318,13 +314,11 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
> put_unused_fd(new_fd);
> break;
> }
> - /* Bump the usage count and install the file. */
> - sock = sock_from_file(fp[i], &err);
> - if (sock) {
> - sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> - sock_update_classid(&sock->sk->sk_cgrp_data);
> + err = file_receive(new_fd, fp[i]);
> + if (err) {
> + put_unused_fd(new_fd);
> + break;
> }
> - fd_install(new_fd, get_file(fp[i]));
> }
>
> if (i > 0)
> --
> 2.25.1
>
On Thu, Jun 04, 2020 at 03:24:52AM +0200, Christian Brauner wrote:
> On Tue, Jun 02, 2020 at 06:10:41PM -0700, Sargun Dhillon wrote:
> > Previously there were two chunks of code where the logic to receive file
> > descriptors was duplicated in net. The compat version of copying
> > file descriptors via SCM_RIGHTS did not have logic to update cgroups.
> > Logic to change the cgroup data was added in:
> > commit 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
> > commit d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
> >
> > This was not copied to the compat path. This commit fixes that, and thus
> > should be cherry-picked into stable.
> >
> > This introduces a helper (file_receive) which encapsulates the logic for
> > handling calling security hooks as well as manipulating cgroup information.
> > This helper can then be used other places in the kernel where file
> > descriptors are copied between processes
> >
> > I tested cgroup classid setting on both the compat (x32) path, and the
> > native path to ensure that when moving the file descriptor the classid
> > is set.
> >
> > Signed-off-by: Sargun Dhillon <[email protected]>
> > Suggested-by: Kees Cook <[email protected]>
> > Cc: Al Viro <[email protected]>
> > Cc: Christian Brauner <[email protected]>
> > Cc: Daniel Wagner <[email protected]>
> > Cc: David S. Miller <[email protected]>
> > Cc: Jann Horn <[email protected]>,
> > Cc: John Fastabend <[email protected]>
> > Cc: Tejun Heo <[email protected]>
> > Cc: Tycho Andersen <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > ---
> > fs/file.c | 35 +++++++++++++++++++++++++++++++++++
> > include/linux/file.h | 1 +
> > net/compat.c | 10 +++++-----
> > net/core/scm.c | 14 ++++----------
> > 4 files changed, 45 insertions(+), 15 deletions(-)
> >
> > diff --git a/fs/file.c b/fs/file.c
> > index abb8b7081d7a..5afd76fca8c2 100644
> > --- a/fs/file.c
> > +++ b/fs/file.c
> > @@ -18,6 +18,9 @@
> > #include <linux/bitops.h>
> > #include <linux/spinlock.h>
> > #include <linux/rcupdate.h>
> > +#include <net/sock.h>
> > +#include <net/netprio_cgroup.h>
> > +#include <net/cls_cgroup.h>
> >
> > unsigned int sysctl_nr_open __read_mostly = 1024*1024;
> > unsigned int sysctl_nr_open_min = BITS_PER_LONG;
> > @@ -931,6 +934,38 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
> > return err;
> > }
> >
> > +/*
> > + * File Receive - Receive a file from another process
> > + *
> > + * This function is designed to receive files from other tasks. It encapsulates
> > + * logic around security and cgroups. The file descriptor provided must be a
> > + * freshly allocated (unused) file descriptor.
> > + *
> > + * This helper does not consume a reference to the file, so the caller must put
> > + * their reference.
> > + *
> > + * Returns 0 upon success.
> > + */
> > +int file_receive(int fd, struct file *file)
>
> This is all just a remote version of fd_install(), yet it deviates from
> fd_install()'s semantics and naming. That's not great imho. What about
> naming this something like:
>
> fd_install_received()
>
> and move the get_file() out of there so it has the same semantics as
> fd_install(). It seems rather dangerous to have a function like
> fd_install() that consumes a reference once it returned and another
> version of this that is basically the same thing but doesn't consume a
> reference because it takes its own. Seems an invitation for confusion.
> Does that make sense?
We have some competing opinions on this, I guess. What I really don't
like is the copy/pasting of the get_unused_fd_flags() and
put_unused_fd() needed by (nearly) all the callers. If it's a helper, it
should help. Specifically, I'd like to see this:
int file_receive(int fd, unsigned long flags, struct file *file,
int __user *fdptr)
{
struct socket *sock;
int err;
err = security_file_receive(file);
if (err)
return err;
if (fd < 0) {
/* Install new fd. */
int new_fd;
err = get_unused_fd_flags(flags);
if (err < 0)
return err;
new_fd = err;
/* Copy fd to any waiting user memory. */
if (fdptr) {
err = put_user(new_fd, fdptr);
if (err < 0) {
put_unused_fd(new_fd);
return err;
}
}
fd_install(new_fd, get_file(file));
fd = new_fd;
} else {
/* Replace existing fd. */
err = replace_fd(fd, file, flags);
if (err)
return err;
}
/* Bump the cgroup usage counts. */
sock = sock_from_file(fd, &err);
if (sock) {
sock_update_netprioidx(&sock->sk->sk_cgrp_data);
sock_update_classid(&sock->sk->sk_cgrp_data);
}
return fd;
}
If everyone else *really* prefers keeping the get_unused_fd_flags() /
put_unused_fd() stuff outside the helper, then I guess I'll give up,
but I think it is MUCH cleaner this way -- all 4 users trim down lots
of code duplication.
--
Kees Cook
On Wed, Jun 03, 2020 at 04:56:59PM -0700, Sargun Dhillon wrote:
> On Wed, Jun 3, 2020 at 4:42 PM Kees Cook <[email protected]> wrote:
> >
> > On Tue, Jun 02, 2020 at 06:10:40PM -0700, Sargun Dhillon wrote:
> > > Sargun Dhillon (4):
> > > fs, net: Standardize on file_receive helper to move fds across
> > > processes
> > > pid: Use file_receive helper to copy FDs
> >
> > The fixes (that should add open-coded cgroups stuff) should be separate
> > patches so they can be backported.
> Patch 1/4, and 2/4 are separated so they can be backported. Patch 1 should
> go into long term, and patch 2 should land in stable.
>
> Do you see anything in 1/4, and 2/4 that shouldn't be there?
So, my thinking was to open code the fixes to minimize the code churn
in the -stable trees and isolate logical changes. However, in looking
at the commits (3.6, 3.8) and the age of the rest of the nearby code in
SCM_RIGHTS (3.7), and the actual oldest supported kernel release (3.16),
I guess it would be better to split it like you've done.
> > The helper doesn't take the __user pointer I thought we'd agreed it
> > should to avoid changing any SCM_RIGHTS behaviors?
> >
> It doesn't change the SCM_RIGHTS behaviour because it continues
> to have the logic which allocates the file descriptor outside of the
> helper.
> 1. Allocate FD (this happens in scm.c)
> 2. Copy FD # to userspace (this happens in scm.c)
> 3. Receive FD (this happens in the new helper)
Sorry, I was not writing very clearly: I was meaning to have said:
I was expecting the helper to take the __user pointer (like I thought
we agreed[1]) so we could both avoid changing SCM_RIGHTS behavior and
avoid copy/pasting of the get_unused/put_unused code in 3 places. (I
get into this more in the other thread[2]).
So, let's finalize this decision in the thread at [2]. Again, sorry I
wasted your time due to my confusion!
-Kees
[1] Apologies, I misread your "1" in [3] to be "suggestion 1" from the
quoted text from me in that email, rather than the "[1]" it was,
which was a link to your counter-proposal. And then I wasted your
time by saying "agreed".
[2] https://lore.kernel.org/lkml/202006031845.F587F85A@keescook/
[3] https://lore.kernel.org/lkml/[email protected]/
--
Kees Cook
On Thu, Jun 04, 2020 at 03:24:52AM +0200, Christian Brauner wrote:
> On Tue, Jun 02, 2020 at 06:10:41PM -0700, Sargun Dhillon wrote:
> > Previously there were two chunks of code where the logic to receive file
> > descriptors was duplicated in net. The compat version of copying
> > file descriptors via SCM_RIGHTS did not have logic to update cgroups.
> > Logic to change the cgroup data was added in:
> > commit 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
> > commit d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
> >
> > This was not copied to the compat path. This commit fixes that, and thus
> > should be cherry-picked into stable.
> >
> > This introduces a helper (file_receive) which encapsulates the logic for
> > handling calling security hooks as well as manipulating cgroup information.
> > This helper can then be used other places in the kernel where file
> > descriptors are copied between processes
> >
> > I tested cgroup classid setting on both the compat (x32) path, and the
> > native path to ensure that when moving the file descriptor the classid
> > is set.
> >
> > Signed-off-by: Sargun Dhillon <[email protected]>
> > Suggested-by: Kees Cook <[email protected]>
> > Cc: Al Viro <[email protected]>
> > Cc: Christian Brauner <[email protected]>
> > Cc: Daniel Wagner <[email protected]>
> > Cc: David S. Miller <[email protected]>
> > Cc: Jann Horn <[email protected]>,
> > Cc: John Fastabend <[email protected]>
> > Cc: Tejun Heo <[email protected]>
> > Cc: Tycho Andersen <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > ---
> > fs/file.c | 35 +++++++++++++++++++++++++++++++++++
> > include/linux/file.h | 1 +
> > net/compat.c | 10 +++++-----
> > net/core/scm.c | 14 ++++----------
> > 4 files changed, 45 insertions(+), 15 deletions(-)
> >
> > diff --git a/fs/file.c b/fs/file.c
> > index abb8b7081d7a..5afd76fca8c2 100644
> > --- a/fs/file.c
> > +++ b/fs/file.c
> > @@ -18,6 +18,9 @@
> > #include <linux/bitops.h>
> > #include <linux/spinlock.h>
> > #include <linux/rcupdate.h>
> > +#include <net/sock.h>
> > +#include <net/netprio_cgroup.h>
> > +#include <net/cls_cgroup.h>
> >
> > unsigned int sysctl_nr_open __read_mostly = 1024*1024;
> > unsigned int sysctl_nr_open_min = BITS_PER_LONG;
> > @@ -931,6 +934,38 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
> > return err;
> > }
> >
> > +/*
> > + * File Receive - Receive a file from another process
> > + *
> > + * This function is designed to receive files from other tasks. It encapsulates
> > + * logic around security and cgroups. The file descriptor provided must be a
> > + * freshly allocated (unused) file descriptor.
> > + *
> > + * This helper does not consume a reference to the file, so the caller must put
> > + * their reference.
> > + *
> > + * Returns 0 upon success.
> > + */
> > +int file_receive(int fd, struct file *file)
>
> This is all just a remote version of fd_install(), yet it deviates from
> fd_install()'s semantics and naming. That's not great imho. What about
> naming this something like:
>
> fd_install_received()
>
> and move the get_file() out of there so it has the same semantics as
> fd_install(). It seems rather dangerous to have a function like
> fd_install() that consumes a reference once it returned and another
> version of this that is basically the same thing but doesn't consume a
> reference because it takes its own. Seems an invitation for confusion.
> Does that make sense?
>
You're right. The reason for the difference in my mind is that fd_install
always succeeds, whereas file_receive can fail. It's easier to do something
like:
fd_install(fd, get_file(f))
vs.
if (file_receive(fd, get_file(f))
fput(f);
Alternatively, if the reference was always consumed, it is somewhat
easier.
I'm fine either way, but just explaining my reasoning for the difference
in behaviour.
On Wed, Jun 03, 2020 at 07:22:57PM -0700, Kees Cook wrote:
> On Thu, Jun 04, 2020 at 03:24:52AM +0200, Christian Brauner wrote:
> > On Tue, Jun 02, 2020 at 06:10:41PM -0700, Sargun Dhillon wrote:
> > > Previously there were two chunks of code where the logic to receive file
> > > descriptors was duplicated in net. The compat version of copying
> > > file descriptors via SCM_RIGHTS did not have logic to update cgroups.
> > > Logic to change the cgroup data was added in:
> > > commit 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
> > > commit d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
> > >
> > > This was not copied to the compat path. This commit fixes that, and thus
> > > should be cherry-picked into stable.
> > >
> > > This introduces a helper (file_receive) which encapsulates the logic for
> > > handling calling security hooks as well as manipulating cgroup information.
> > > This helper can then be used other places in the kernel where file
> > > descriptors are copied between processes
> > >
> > > I tested cgroup classid setting on both the compat (x32) path, and the
> > > native path to ensure that when moving the file descriptor the classid
> > > is set.
> > >
> > > Signed-off-by: Sargun Dhillon <[email protected]>
> > > Suggested-by: Kees Cook <[email protected]>
> > > Cc: Al Viro <[email protected]>
> > > Cc: Christian Brauner <[email protected]>
> > > Cc: Daniel Wagner <[email protected]>
> > > Cc: David S. Miller <[email protected]>
> > > Cc: Jann Horn <[email protected]>,
> > > Cc: John Fastabend <[email protected]>
> > > Cc: Tejun Heo <[email protected]>
> > > Cc: Tycho Andersen <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > ---
> > > fs/file.c | 35 +++++++++++++++++++++++++++++++++++
> > > include/linux/file.h | 1 +
> > > net/compat.c | 10 +++++-----
> > > net/core/scm.c | 14 ++++----------
> > > 4 files changed, 45 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/fs/file.c b/fs/file.c
> > > index abb8b7081d7a..5afd76fca8c2 100644
> > > --- a/fs/file.c
> > > +++ b/fs/file.c
> > > @@ -18,6 +18,9 @@
> > > #include <linux/bitops.h>
> > > #include <linux/spinlock.h>
> > > #include <linux/rcupdate.h>
> > > +#include <net/sock.h>
> > > +#include <net/netprio_cgroup.h>
> > > +#include <net/cls_cgroup.h>
> > >
> > > unsigned int sysctl_nr_open __read_mostly = 1024*1024;
> > > unsigned int sysctl_nr_open_min = BITS_PER_LONG;
> > > @@ -931,6 +934,38 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
> > > return err;
> > > }
> > >
> > > +/*
> > > + * File Receive - Receive a file from another process
> > > + *
> > > + * This function is designed to receive files from other tasks. It encapsulates
> > > + * logic around security and cgroups. The file descriptor provided must be a
> > > + * freshly allocated (unused) file descriptor.
> > > + *
> > > + * This helper does not consume a reference to the file, so the caller must put
> > > + * their reference.
> > > + *
> > > + * Returns 0 upon success.
> > > + */
> > > +int file_receive(int fd, struct file *file)
> >
> > This is all just a remote version of fd_install(), yet it deviates from
> > fd_install()'s semantics and naming. That's not great imho. What about
> > naming this something like:
> >
> > fd_install_received()
> >
> > and move the get_file() out of there so it has the same semantics as
> > fd_install(). It seems rather dangerous to have a function like
> > fd_install() that consumes a reference once it returned and another
> > version of this that is basically the same thing but doesn't consume a
> > reference because it takes its own. Seems an invitation for confusion.
> > Does that make sense?
>
> We have some competing opinions on this, I guess. What I really don't
> like is the copy/pasting of the get_unused_fd_flags() and
> put_unused_fd() needed by (nearly) all the callers. If it's a helper, it
> should help. Specifically, I'd like to see this:
>
> int file_receive(int fd, unsigned long flags, struct file *file,
> int __user *fdptr)
> {
> struct socket *sock;
> int err;
>
> err = security_file_receive(file);
> if (err)
> return err;
>
> if (fd < 0) {
> /* Install new fd. */
> int new_fd;
>
> err = get_unused_fd_flags(flags);
> if (err < 0)
> return err;
> new_fd = err;
>
> /* Copy fd to any waiting user memory. */
> if (fdptr) {
> err = put_user(new_fd, fdptr);
> if (err < 0) {
> put_unused_fd(new_fd);
> return err;
> }
> }
> fd_install(new_fd, get_file(file));
> fd = new_fd;
> } else {
> /* Replace existing fd. */
> err = replace_fd(fd, file, flags);
> if (err)
> return err;
> }
>
> /* Bump the cgroup usage counts. */
> sock = sock_from_file(fd, &err);
> if (sock) {
> sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> sock_update_classid(&sock->sk->sk_cgrp_data);
> }
>
> return fd;
> }
>
> If everyone else *really* prefers keeping the get_unused_fd_flags() /
> put_unused_fd() stuff outside the helper, then I guess I'll give up,
> but I think it is MUCH cleaner this way -- all 4 users trim down lots
> of code duplication.
>
> --
> Kees Cook
This seems weird that the function has two different return mechanisms
depending on the value of fdptr, especially given that behaviour is
only invoked by SCM, whereas the other callers (addfd, and pidfd_getfd)
just want the FD value returned.
Won't this produce a "bad" result, if the user does:
struct msghdr msg = {};
struct cmsghdr *cmsg;
struct iovec io = {
.iov_base = &c,
.iov_len = 1,
};
msg.msg_iov = &io;
msg.msg_iovlen = 1;
msg.msg_control = NULL;
msg.msg_controllen = sizeof(buf);
recvmsg(sock, &msg, 0);
----
This will end up installing the FD, but it will efault, when
scm_detach_fds tries to fill out the rest of the info.
I mean, we can easily solve this with a null pointer check
in scm_detach_fds, but my fear is that user n will forget
to do this, and make a mistake.
Maybe it would be nice to have:
/* Receives file descriptor and installs it in userspace at uptr. */
static inline intfile_receive_user(struct file *file, unsigned long flags,
int __user *fdptr)
{
if (fdptr == NULL)
return -EFAULT;
return __file_receive(-1, flags, file, uptr);
}
And then just let pidfd_getfd, and seccomp_addfd call __file_receive
directly, or offer a different helper like:
static inline file_receive(long fd, struct *file, unsigned long flags)
{
return __file_receive(fd, flags, file, NULL);
}
On Wed, Jun 03, 2020 at 07:22:57PM -0700, Kees Cook wrote:
> On Thu, Jun 04, 2020 at 03:24:52AM +0200, Christian Brauner wrote:
> > On Tue, Jun 02, 2020 at 06:10:41PM -0700, Sargun Dhillon wrote:
> > > Previously there were two chunks of code where the logic to receive file
> > > descriptors was duplicated in net. The compat version of copying
> > > file descriptors via SCM_RIGHTS did not have logic to update cgroups.
> > > Logic to change the cgroup data was added in:
> > > commit 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
> > > commit d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
> > >
> > > This was not copied to the compat path. This commit fixes that, and thus
> > > should be cherry-picked into stable.
> > >
> > > This introduces a helper (file_receive) which encapsulates the logic for
> > > handling calling security hooks as well as manipulating cgroup information.
> > > This helper can then be used other places in the kernel where file
> > > descriptors are copied between processes
> > >
> > > I tested cgroup classid setting on both the compat (x32) path, and the
> > > native path to ensure that when moving the file descriptor the classid
> > > is set.
> > >
> > > Signed-off-by: Sargun Dhillon <[email protected]>
> > > Suggested-by: Kees Cook <[email protected]>
> > > Cc: Al Viro <[email protected]>
> > > Cc: Christian Brauner <[email protected]>
> > > Cc: Daniel Wagner <[email protected]>
> > > Cc: David S. Miller <[email protected]>
> > > Cc: Jann Horn <[email protected]>,
> > > Cc: John Fastabend <[email protected]>
> > > Cc: Tejun Heo <[email protected]>
> > > Cc: Tycho Andersen <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > ---
> > > fs/file.c | 35 +++++++++++++++++++++++++++++++++++
> > > include/linux/file.h | 1 +
> > > net/compat.c | 10 +++++-----
> > > net/core/scm.c | 14 ++++----------
> > > 4 files changed, 45 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/fs/file.c b/fs/file.c
> > > index abb8b7081d7a..5afd76fca8c2 100644
> > > --- a/fs/file.c
> > > +++ b/fs/file.c
> > > @@ -18,6 +18,9 @@
> > > #include <linux/bitops.h>
> > > #include <linux/spinlock.h>
> > > #include <linux/rcupdate.h>
> > > +#include <net/sock.h>
> > > +#include <net/netprio_cgroup.h>
> > > +#include <net/cls_cgroup.h>
> > >
> > > unsigned int sysctl_nr_open __read_mostly = 1024*1024;
> > > unsigned int sysctl_nr_open_min = BITS_PER_LONG;
> > > @@ -931,6 +934,38 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
> > > return err;
> > > }
> > >
> > > +/*
> > > + * File Receive - Receive a file from another process
> > > + *
> > > + * This function is designed to receive files from other tasks. It encapsulates
> > > + * logic around security and cgroups. The file descriptor provided must be a
> > > + * freshly allocated (unused) file descriptor.
> > > + *
> > > + * This helper does not consume a reference to the file, so the caller must put
> > > + * their reference.
> > > + *
> > > + * Returns 0 upon success.
> > > + */
> > > +int file_receive(int fd, struct file *file)
> >
> > This is all just a remote version of fd_install(), yet it deviates from
> > fd_install()'s semantics and naming. That's not great imho. What about
> > naming this something like:
> >
> > fd_install_received()
> >
> > and move the get_file() out of there so it has the same semantics as
> > fd_install(). It seems rather dangerous to have a function like
> > fd_install() that consumes a reference once it returned and another
> > version of this that is basically the same thing but doesn't consume a
> > reference because it takes its own. Seems an invitation for confusion.
> > Does that make sense?
>
> We have some competing opinions on this, I guess. What I really don't
> like is the copy/pasting of the get_unused_fd_flags() and
> put_unused_fd() needed by (nearly) all the callers. If it's a helper, it
> should help. Specifically, I'd like to see this:
>
> int file_receive(int fd, unsigned long flags, struct file *file,
> int __user *fdptr)
I still fail to see what this whole put_user() handling buys us at all
and why this function needs to be anymore complicated then simply:
fd_install_received(int fd, struct file *file)
{
security_file_receive(file);
sock = sock_from_file(fd, &err);
if (sock) {
sock_update_netprioidx(&sock->sk->sk_cgrp_data);
sock_update_classid(&sock->sk->sk_cgrp_data);
}
fd_install();
return;
}
exactly like fd_install() but for received files.
For scm you can fail somewhere in the middle of putting any number of
file descriptors so you're left in a state with only a subset of
requested file descriptors installed so it's not really useful there.
And if you manage to install an fd but then fail to put_user() it
userspace can simply check it's fds via proc and has to anyway on any
scm message error. If you fail an scm message userspace better check
their fds.
For seccomp maybe but even there I doubt it and I still maintain that
userspace screwing this up is on them which is how we do this most of
the time. And for pidfd_getfd() this whole put_user() thing doesn't
matter at all.
It's much easier and clearer if we simply have a fd_install() -
fd_install_received() parallelism where we follow an established
convention. _But_ if that blocks you from making this generic enough
then at least the replace_fd() vs fd_install() logic seems it shouldn't
be in there.
And the function name really needs to drive home the point that it
installs an fd into the tasks fdtable no matter what version you go
with. file_receive() is really not accurate enough for this at all.
> {
> struct socket *sock;
> int err;
>
> err = security_file_receive(file);
> if (err)
> return err;
>
> if (fd < 0) {
> /* Install new fd. */
> int new_fd;
>
> err = get_unused_fd_flags(flags);
> if (err < 0)
> return err;
> new_fd = err;
>
> /* Copy fd to any waiting user memory. */
> if (fdptr) {
> err = put_user(new_fd, fdptr);
> if (err < 0) {
> put_unused_fd(new_fd);
> return err;
> }
> }
> fd_install(new_fd, get_file(file));
> fd = new_fd;
> } else {
> /* Replace existing fd. */
> err = replace_fd(fd, file, flags);
> if (err)
> return err;
> }
>
> /* Bump the cgroup usage counts. */
> sock = sock_from_file(fd, &err);
> if (sock) {
> sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> sock_update_classid(&sock->sk->sk_cgrp_data);
> }
>
> return fd;
> }
>
> If everyone else *really* prefers keeping the get_unused_fd_flags() /
> put_unused_fd() stuff outside the helper, then I guess I'll give up,
> but I think it is MUCH cleaner this way -- all 4 users trim down lots
> of code duplication.
>
> --
> Kees Cook
From: Christian Brauner
> Sent: 04 June 2020 13:52
..
> For scm you can fail somewhere in the middle of putting any number of
> file descriptors so you're left in a state with only a subset of
> requested file descriptors installed so it's not really useful there.
> And if you manage to install an fd but then fail to put_user() it
> userspace can simply check it's fds via proc and has to anyway on any
> scm message error. If you fail an scm message userspace better check
> their fds.
There is a similar error path in the sctp 'peeloff' code.
If the put_user() fails it currently closes the fd before
returning -EFAULT.
I'm not at all sure this is helpful.
The application can't tell whether the SIGSEGV happened on the
copyin of the parameters or the copyout of the result.
ISTM that if the application passes an address that cannot
be written to it deserves what it gets - typically an fd it
doesn't know the number of.
What is important is that the kernel data is consistent.
So when the process exits the fd is closed and all the resources
are released.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Thu, Jun 04, 2020 at 02:52:26PM +0200, Christian Brauner wrote:
> On Wed, Jun 03, 2020 at 07:22:57PM -0700, Kees Cook wrote:
> > On Thu, Jun 04, 2020 at 03:24:52AM +0200, Christian Brauner wrote:
> > > On Tue, Jun 02, 2020 at 06:10:41PM -0700, Sargun Dhillon wrote:
> > > > Previously there were two chunks of code where the logic to receive file
> > > > descriptors was duplicated in net. The compat version of copying
> > > > file descriptors via SCM_RIGHTS did not have logic to update cgroups.
> > > > Logic to change the cgroup data was added in:
> > > > commit 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
> > > > commit d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
> > > >
> > > > This was not copied to the compat path. This commit fixes that, and thus
> > > > should be cherry-picked into stable.
> > > >
> > > > This introduces a helper (file_receive) which encapsulates the logic for
> > > > handling calling security hooks as well as manipulating cgroup information.
> > > > This helper can then be used other places in the kernel where file
> > > > descriptors are copied between processes
> > > >
> > > > I tested cgroup classid setting on both the compat (x32) path, and the
> > > > native path to ensure that when moving the file descriptor the classid
> > > > is set.
> > > >
> > > > Signed-off-by: Sargun Dhillon <[email protected]>
> > > > Suggested-by: Kees Cook <[email protected]>
> > > > Cc: Al Viro <[email protected]>
> > > > Cc: Christian Brauner <[email protected]>
> > > > Cc: Daniel Wagner <[email protected]>
> > > > Cc: David S. Miller <[email protected]>
> > > > Cc: Jann Horn <[email protected]>,
> > > > Cc: John Fastabend <[email protected]>
> > > > Cc: Tejun Heo <[email protected]>
> > > > Cc: Tycho Andersen <[email protected]>
> > > > Cc: [email protected]
> > > > Cc: [email protected]
> > > > Cc: [email protected]
> > > > Cc: [email protected]
> > > > ---
> > > > fs/file.c | 35 +++++++++++++++++++++++++++++++++++
> > > > include/linux/file.h | 1 +
> > > > net/compat.c | 10 +++++-----
> > > > net/core/scm.c | 14 ++++----------
> > > > 4 files changed, 45 insertions(+), 15 deletions(-)
> > > >
> > >
> > > This is all just a remote version of fd_install(), yet it deviates from
> > > fd_install()'s semantics and naming. That's not great imho. What about
> > > naming this something like:
> > >
> > > fd_install_received()
> > >
> > > and move the get_file() out of there so it has the same semantics as
> > > fd_install(). It seems rather dangerous to have a function like
> > > fd_install() that consumes a reference once it returned and another
> > > version of this that is basically the same thing but doesn't consume a
> > > reference because it takes its own. Seems an invitation for confusion.
> > > Does that make sense?
> >
> > We have some competing opinions on this, I guess. What I really don't
> > like is the copy/pasting of the get_unused_fd_flags() and
> > put_unused_fd() needed by (nearly) all the callers. If it's a helper, it
> > should help. Specifically, I'd like to see this:
> >
> > int file_receive(int fd, unsigned long flags, struct file *file,
> > int __user *fdptr)
>
> I still fail to see what this whole put_user() handling buys us at all
> and why this function needs to be anymore complicated then simply:
>
> fd_install_received(int fd, struct file *file)
> {
> security_file_receive(file);
>
> sock = sock_from_file(fd, &err);
> if (sock) {
> sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> sock_update_classid(&sock->sk->sk_cgrp_data);
> }
>
> fd_install();
> return;
> }
>
> exactly like fd_install() but for received files.
>
> For scm you can fail somewhere in the middle of putting any number of
> file descriptors so you're left in a state with only a subset of
> requested file descriptors installed so it's not really useful there.
> And if you manage to install an fd but then fail to put_user() it
> userspace can simply check it's fds via proc and has to anyway on any
> scm message error. If you fail an scm message userspace better check
> their fds.
> For seccomp maybe but even there I doubt it and I still maintain that
> userspace screwing this up is on them which is how we do this most of
> the time. And for pidfd_getfd() this whole put_user() thing doesn't
> matter at all.
>
> It's much easier and clearer if we simply have a fd_install() -
> fd_install_received() parallelism where we follow an established
> convention. _But_ if that blocks you from making this generic enough
> then at least the replace_fd() vs fd_install() logic seems it shouldn't
> be in there.
>
> And the function name really needs to drive home the point that it
> installs an fd into the tasks fdtable no matter what version you go
> with. file_receive() is really not accurate enough for this at all.
>
> > {
> > struct socket *sock;
> > int err;
> >
> > err = security_file_receive(file);
> > if (err)
> > return err;
> >
> > if (fd < 0) {
> > /* Install new fd. */
> > int new_fd;
> >
> > err = get_unused_fd_flags(flags);
> > if (err < 0)
> > return err;
> > new_fd = err;
> >
> > /* Copy fd to any waiting user memory. */
> > if (fdptr) {
> > err = put_user(new_fd, fdptr);
> > if (err < 0) {
> > put_unused_fd(new_fd);
> > return err;
> > }
> > }
> > fd_install(new_fd, get_file(file));
> > fd = new_fd;
> > } else {
> > /* Replace existing fd. */
> > err = replace_fd(fd, file, flags);
> > if (err)
> > return err;
> > }
> >
> > /* Bump the cgroup usage counts. */
> > sock = sock_from_file(fd, &err);
> > if (sock) {
> > sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> > sock_update_classid(&sock->sk->sk_cgrp_data);
> > }
> >
> > return fd;
> > }
> >
> > If everyone else *really* prefers keeping the get_unused_fd_flags() /
> > put_unused_fd() stuff outside the helper, then I guess I'll give up,
> > but I think it is MUCH cleaner this way -- all 4 users trim down lots
> > of code duplication.
> >
> > --
> > Kees Cook
How about this:
static int do_dup2(struct files_struct *files,
struct file *file, unsigned fd, unsigned flags)
__releases(&files->file_lock)
{
struct file *tofree;
struct fdtable *fdt;
...
/*
* New bit, allowing the file to be null. Doesn't have the same
* "sanity check" bits from __alloc_fd
*/
if (likely(file))
get_file(file);
rcu_assign_pointer(fdt->fd[fd], file);
__set_open_fd(fd, fdt);
...
}
/*
* File Receive - Receive a file from another process
*
* Encapsulates the logic to handle receiving a file from another task. It
* does not install the file descriptor. That is delegated to the user. If
* an error occurs that results in the file descriptor not being installed,
* they must put_unused_fd.
*
* fd should be >= 0 if you intend on replacing a file descriptor, or
* alternatively -1 if you want file_receive to allocate an FD for you
*
* Returns the fd number on success.
* Returns negative error code on failure.
*
*/
int file_receive(int fd, unsigned int flags, struct file *file)
{
int err;
struct socket *sock;
struct files_struct *files = current->files;
err = security_file_receive(file);
if (err)
return err;
if (fd >= 0) {
if (fd >= rlimit(RLIMIT_NOFILE))
return -EBADF;
spin_lock(&files->file_lock);
err = expand_files(files, fd);
if (err < 0) {
goto out_unlock;
}
err = do_dup2(files, NULL, fd, flags);
if (err)
return err;
} else {
fd = get_unused_fd_flags(flags);
if (fd < 0)
return fd;
}
sock = sock_from_file(file, &err);
if (sock) {
sock_update_netprioidx(&sock->sk->sk_cgrp_data);
sock_update_classid(&sock->sk->sk_cgrp_data);
}
return fd;
out_unlock:
spin_unlock(&files->file_lock);
return err;
}
---
then the code in scm.c:
err = file_receive(-1, flags, fp[i]);
if (err < 0)
break;
new_fd = err;
err = put_user(new_fd, cmfptr);
if (err) {
put_unused_fd(new_fd);
break;
}
/* Bump the usage count and install the file. */
fd_install(new_fd, get_file(fp[i]));
And addfd:
ret = file_receive(addfd->fd, addfd->flags, addfd->file);
if (ret >= 0)
fd_install(ret, get_file(addfd->file));
addfd->ret = ret;
----
This way there is:
1. No "put_user" logic in file_receive
2. Minimal (single) branching logic, unless there's something in between
the file_receive and installing the FD, such as put_user.
3. Doesn't implement fd_install, so there's no ambiguity about it being
file_install_received vs. just the receive logic.
On Fri, Jun 05, 2020 at 07:54:36AM +0000, Sargun Dhillon wrote:
> On Thu, Jun 04, 2020 at 02:52:26PM +0200, Christian Brauner wrote:
> > On Wed, Jun 03, 2020 at 07:22:57PM -0700, Kees Cook wrote:
> > > On Thu, Jun 04, 2020 at 03:24:52AM +0200, Christian Brauner wrote:
> > > > On Tue, Jun 02, 2020 at 06:10:41PM -0700, Sargun Dhillon wrote:
> > > > > Previously there were two chunks of code where the logic to receive file
> > > > > descriptors was duplicated in net. The compat version of copying
> > > > > file descriptors via SCM_RIGHTS did not have logic to update cgroups.
> > > > > Logic to change the cgroup data was added in:
> > > > > commit 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
> > > > > commit d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
> > > > >
> > > > > This was not copied to the compat path. This commit fixes that, and thus
> > > > > should be cherry-picked into stable.
> > > > >
> > > > > This introduces a helper (file_receive) which encapsulates the logic for
> > > > > handling calling security hooks as well as manipulating cgroup information.
> > > > > This helper can then be used other places in the kernel where file
> > > > > descriptors are copied between processes
> > > > >
> > > > > I tested cgroup classid setting on both the compat (x32) path, and the
> > > > > native path to ensure that when moving the file descriptor the classid
> > > > > is set.
> > > > >
> > > > > Signed-off-by: Sargun Dhillon <[email protected]>
> > > > > Suggested-by: Kees Cook <[email protected]>
> > > > > Cc: Al Viro <[email protected]>
> > > > > Cc: Christian Brauner <[email protected]>
> > > > > Cc: Daniel Wagner <[email protected]>
> > > > > Cc: David S. Miller <[email protected]>
> > > > > Cc: Jann Horn <[email protected]>,
> > > > > Cc: John Fastabend <[email protected]>
> > > > > Cc: Tejun Heo <[email protected]>
> > > > > Cc: Tycho Andersen <[email protected]>
> > > > > Cc: [email protected]
> > > > > Cc: [email protected]
> > > > > Cc: [email protected]
> > > > > Cc: [email protected]
> > > > > ---
> > > > > fs/file.c | 35 +++++++++++++++++++++++++++++++++++
> > > > > include/linux/file.h | 1 +
> > > > > net/compat.c | 10 +++++-----
> > > > > net/core/scm.c | 14 ++++----------
> > > > > 4 files changed, 45 insertions(+), 15 deletions(-)
> > > > >
> > > >
> > > > This is all just a remote version of fd_install(), yet it deviates from
> > > > fd_install()'s semantics and naming. That's not great imho. What about
> > > > naming this something like:
> > > >
> > > > fd_install_received()
> > > >
> > > > and move the get_file() out of there so it has the same semantics as
> > > > fd_install(). It seems rather dangerous to have a function like
> > > > fd_install() that consumes a reference once it returned and another
> > > > version of this that is basically the same thing but doesn't consume a
> > > > reference because it takes its own. Seems an invitation for confusion.
> > > > Does that make sense?
> > >
> > > We have some competing opinions on this, I guess. What I really don't
> > > like is the copy/pasting of the get_unused_fd_flags() and
> > > put_unused_fd() needed by (nearly) all the callers. If it's a helper, it
> > > should help. Specifically, I'd like to see this:
> > >
> > > int file_receive(int fd, unsigned long flags, struct file *file,
> > > int __user *fdptr)
> >
> > I still fail to see what this whole put_user() handling buys us at all
> > and why this function needs to be anymore complicated then simply:
> >
> > fd_install_received(int fd, struct file *file)
> > {
> > security_file_receive(file);
> >
> > sock = sock_from_file(fd, &err);
> > if (sock) {
> > sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> > sock_update_classid(&sock->sk->sk_cgrp_data);
> > }
> >
> > fd_install();
> > return;
> > }
> >
> > exactly like fd_install() but for received files.
> >
> > For scm you can fail somewhere in the middle of putting any number of
> > file descriptors so you're left in a state with only a subset of
> > requested file descriptors installed so it's not really useful there.
> > And if you manage to install an fd but then fail to put_user() it
> > userspace can simply check it's fds via proc and has to anyway on any
> > scm message error. If you fail an scm message userspace better check
> > their fds.
> > For seccomp maybe but even there I doubt it and I still maintain that
> > userspace screwing this up is on them which is how we do this most of
> > the time. And for pidfd_getfd() this whole put_user() thing doesn't
> > matter at all.
> >
> > It's much easier and clearer if we simply have a fd_install() -
> > fd_install_received() parallelism where we follow an established
> > convention. _But_ if that blocks you from making this generic enough
> > then at least the replace_fd() vs fd_install() logic seems it shouldn't
> > be in there.
> >
> > And the function name really needs to drive home the point that it
> > installs an fd into the tasks fdtable no matter what version you go
> > with. file_receive() is really not accurate enough for this at all.
> >
> > > {
> > > struct socket *sock;
> > > int err;
> > >
> > > err = security_file_receive(file);
> > > if (err)
> > > return err;
> > >
> > > if (fd < 0) {
> > > /* Install new fd. */
> > > int new_fd;
> > >
> > > err = get_unused_fd_flags(flags);
> > > if (err < 0)
> > > return err;
> > > new_fd = err;
> > >
> > > /* Copy fd to any waiting user memory. */
> > > if (fdptr) {
> > > err = put_user(new_fd, fdptr);
> > > if (err < 0) {
> > > put_unused_fd(new_fd);
> > > return err;
> > > }
> > > }
> > > fd_install(new_fd, get_file(file));
> > > fd = new_fd;
> > > } else {
> > > /* Replace existing fd. */
> > > err = replace_fd(fd, file, flags);
> > > if (err)
> > > return err;
> > > }
> > >
> > > /* Bump the cgroup usage counts. */
> > > sock = sock_from_file(fd, &err);
> > > if (sock) {
> > > sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> > > sock_update_classid(&sock->sk->sk_cgrp_data);
> > > }
> > >
> > > return fd;
> > > }
> > >
> > > If everyone else *really* prefers keeping the get_unused_fd_flags() /
> > > put_unused_fd() stuff outside the helper, then I guess I'll give up,
> > > but I think it is MUCH cleaner this way -- all 4 users trim down lots
> > > of code duplication.
> > >
> > > --
> > > Kees Cook
> How about this:
>
>
> static int do_dup2(struct files_struct *files,
> struct file *file, unsigned fd, unsigned flags)
> __releases(&files->file_lock)
> {
> struct file *tofree;
> struct fdtable *fdt;
>
> ...
>
> /*
> * New bit, allowing the file to be null. Doesn't have the same
> * "sanity check" bits from __alloc_fd
> */
> if (likely(file))
> get_file(file);
> rcu_assign_pointer(fdt->fd[fd], file);
>
> __set_open_fd(fd, fdt);
IIUC, this means we can get the fdt into a state of an open fd with a
NULL file... is that okay? That feels like something Al might rebel at.
:)
>
> ...
> }
>
> /*
> * File Receive - Receive a file from another process
> *
> * Encapsulates the logic to handle receiving a file from another task. It
> * does not install the file descriptor. That is delegated to the user. If
> * an error occurs that results in the file descriptor not being installed,
> * they must put_unused_fd.
> *
> * fd should be >= 0 if you intend on replacing a file descriptor, or
> * alternatively -1 if you want file_receive to allocate an FD for you
> *
> * Returns the fd number on success.
> * Returns negative error code on failure.
> *
> */
> int file_receive(int fd, unsigned int flags, struct file *file)
> {
> int err;
> struct socket *sock;
> struct files_struct *files = current->files;
>
> err = security_file_receive(file);
> if (err)
> return err;
>
> if (fd >= 0) {
> if (fd >= rlimit(RLIMIT_NOFILE))
> return -EBADF;
>
> spin_lock(&files->file_lock);
> err = expand_files(files, fd);
> if (err < 0) {
> goto out_unlock;
> }
>
> err = do_dup2(files, NULL, fd, flags);
> if (err)
> return err;
This seems like we're duplicating some checks and missing others -- I
really think we need to do this using the existing primitives. But I'd
really like some review or commentary from Al. We can do this a bunch of
ways, and I'd like to know which way looks best to him. :(
> This way there is:
> 1. No "put_user" logic in file_receive
> 2. Minimal (single) branching logic, unless there's something in between
> the file_receive and installing the FD, such as put_user.
> 3. Doesn't implement fd_install, so there's no ambiguity about it being
> file_install_received vs. just the receive logic.
I still wonder if we should refactor SCM_RIGHTS to just delay put_user
failures, which would simplify a bunch. It's a behavior change, but it
seems from Al and Jann that it just shouldn't matter. (And if it does,
we'll hear about it.)
--
Kees Cook
On Tue, Jun 09, 2020 at 12:43:48PM -0700, Kees Cook wrote:
> On Fri, Jun 05, 2020 at 07:54:36AM +0000, Sargun Dhillon wrote:
> > On Thu, Jun 04, 2020 at 02:52:26PM +0200, Christian Brauner wrote:
> > > On Wed, Jun 03, 2020 at 07:22:57PM -0700, Kees Cook wrote:
> > > > On Thu, Jun 04, 2020 at 03:24:52AM +0200, Christian Brauner wrote:
> > > > > On Tue, Jun 02, 2020 at 06:10:41PM -0700, Sargun Dhillon wrote:
> > > > > > Previously there were two chunks of code where the logic to receive file
> > > > > > descriptors was duplicated in net. The compat version of copying
> > > > > > file descriptors via SCM_RIGHTS did not have logic to update cgroups.
> > > > > > Logic to change the cgroup data was added in:
> > > > > > commit 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
> > > > > > commit d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
> > > > > >
> > > > > > This was not copied to the compat path. This commit fixes that, and thus
> > > > > > should be cherry-picked into stable.
> > > > > >
> > > > > > This introduces a helper (file_receive) which encapsulates the logic for
> > > > > > handling calling security hooks as well as manipulating cgroup information.
> > > > > > This helper can then be used other places in the kernel where file
> > > > > > descriptors are copied between processes
> > > > > >
> > > > > > I tested cgroup classid setting on both the compat (x32) path, and the
> > > > > > native path to ensure that when moving the file descriptor the classid
> > > > > > is set.
> > > > > >
> > > > > > Signed-off-by: Sargun Dhillon <[email protected]>
> > > > > > Suggested-by: Kees Cook <[email protected]>
> > > > > > Cc: Al Viro <[email protected]>
> > > > > > Cc: Christian Brauner <[email protected]>
> > > > > > Cc: Daniel Wagner <[email protected]>
> > > > > > Cc: David S. Miller <[email protected]>
> > > > > > Cc: Jann Horn <[email protected]>,
> > > > > > Cc: John Fastabend <[email protected]>
> > > > > > Cc: Tejun Heo <[email protected]>
> > > > > > Cc: Tycho Andersen <[email protected]>
> > > > > > Cc: [email protected]
> > > > > > Cc: [email protected]
> > > > > > Cc: [email protected]
> > > > > > Cc: [email protected]
> > > > > > ---
> > > > > > fs/file.c | 35 +++++++++++++++++++++++++++++++++++
> > > > > > include/linux/file.h | 1 +
> > > > > > net/compat.c | 10 +++++-----
> > > > > > net/core/scm.c | 14 ++++----------
> > > > > > 4 files changed, 45 insertions(+), 15 deletions(-)
> > > > > >
> > > > >
> > > > > This is all just a remote version of fd_install(), yet it deviates from
> > > > > fd_install()'s semantics and naming. That's not great imho. What about
> > > > > naming this something like:
> > > > >
> > > > > fd_install_received()
> > > > >
> > > > > and move the get_file() out of there so it has the same semantics as
> > > > > fd_install(). It seems rather dangerous to have a function like
> > > > > fd_install() that consumes a reference once it returned and another
> > > > > version of this that is basically the same thing but doesn't consume a
> > > > > reference because it takes its own. Seems an invitation for confusion.
> > > > > Does that make sense?
> > > >
> > > > We have some competing opinions on this, I guess. What I really don't
> > > > like is the copy/pasting of the get_unused_fd_flags() and
> > > > put_unused_fd() needed by (nearly) all the callers. If it's a helper, it
> > > > should help. Specifically, I'd like to see this:
> > > >
> > > > int file_receive(int fd, unsigned long flags, struct file *file,
> > > > int __user *fdptr)
> > >
> > > I still fail to see what this whole put_user() handling buys us at all
> > > and why this function needs to be anymore complicated then simply:
> > >
> > > fd_install_received(int fd, struct file *file)
> > > {
> > > security_file_receive(file);
> > >
> > > sock = sock_from_file(fd, &err);
> > > if (sock) {
> > > sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> > > sock_update_classid(&sock->sk->sk_cgrp_data);
> > > }
> > >
> > > fd_install();
> > > return;
> > > }
> > >
> > > exactly like fd_install() but for received files.
> > >
> > > For scm you can fail somewhere in the middle of putting any number of
> > > file descriptors so you're left in a state with only a subset of
> > > requested file descriptors installed so it's not really useful there.
> > > And if you manage to install an fd but then fail to put_user() it
> > > userspace can simply check it's fds via proc and has to anyway on any
> > > scm message error. If you fail an scm message userspace better check
> > > their fds.
> > > For seccomp maybe but even there I doubt it and I still maintain that
> > > userspace screwing this up is on them which is how we do this most of
> > > the time. And for pidfd_getfd() this whole put_user() thing doesn't
> > > matter at all.
> > >
> > > It's much easier and clearer if we simply have a fd_install() -
> > > fd_install_received() parallelism where we follow an established
> > > convention. _But_ if that blocks you from making this generic enough
> > > then at least the replace_fd() vs fd_install() logic seems it shouldn't
> > > be in there.
> > >
> > > And the function name really needs to drive home the point that it
> > > installs an fd into the tasks fdtable no matter what version you go
> > > with. file_receive() is really not accurate enough for this at all.
> > >
> > > > {
> > > > struct socket *sock;
> > > > int err;
> > > >
> > > > err = security_file_receive(file);
> > > > if (err)
> > > > return err;
> > > >
> > > > if (fd < 0) {
> > > > /* Install new fd. */
> > > > int new_fd;
> > > >
> > > > err = get_unused_fd_flags(flags);
> > > > if (err < 0)
> > > > return err;
> > > > new_fd = err;
> > > >
> > > > /* Copy fd to any waiting user memory. */
> > > > if (fdptr) {
> > > > err = put_user(new_fd, fdptr);
> > > > if (err < 0) {
> > > > put_unused_fd(new_fd);
> > > > return err;
> > > > }
> > > > }
> > > > fd_install(new_fd, get_file(file));
> > > > fd = new_fd;
> > > > } else {
> > > > /* Replace existing fd. */
> > > > err = replace_fd(fd, file, flags);
> > > > if (err)
> > > > return err;
> > > > }
> > > >
> > > > /* Bump the cgroup usage counts. */
> > > > sock = sock_from_file(fd, &err);
> > > > if (sock) {
> > > > sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> > > > sock_update_classid(&sock->sk->sk_cgrp_data);
> > > > }
> > > >
> > > > return fd;
> > > > }
> > > >
> > > > If everyone else *really* prefers keeping the get_unused_fd_flags() /
> > > > put_unused_fd() stuff outside the helper, then I guess I'll give up,
> > > > but I think it is MUCH cleaner this way -- all 4 users trim down lots
> > > > of code duplication.
> > > >
> > > > --
> > > > Kees Cook
> > How about this:
> >
> >
> > static int do_dup2(struct files_struct *files,
> > struct file *file, unsigned fd, unsigned flags)
> > __releases(&files->file_lock)
> > {
> > struct file *tofree;
> > struct fdtable *fdt;
> >
> > ...
> >
> > /*
> > * New bit, allowing the file to be null. Doesn't have the same
> > * "sanity check" bits from __alloc_fd
> > */
> > if (likely(file))
> > get_file(file);
> > rcu_assign_pointer(fdt->fd[fd], file);
> >
> > __set_open_fd(fd, fdt);
>
> IIUC, this means we can get the fdt into a state of an open fd with a
> NULL file... is that okay? That feels like something Al might rebel at.
> :)
>
> >
> > ...
> > }
> >
> > /*
> > * File Receive - Receive a file from another process
> > *
> > * Encapsulates the logic to handle receiving a file from another task. It
> > * does not install the file descriptor. That is delegated to the user. If
> > * an error occurs that results in the file descriptor not being installed,
> > * they must put_unused_fd.
> > *
> > * fd should be >= 0 if you intend on replacing a file descriptor, or
> > * alternatively -1 if you want file_receive to allocate an FD for you
> > *
> > * Returns the fd number on success.
> > * Returns negative error code on failure.
> > *
> > */
> > int file_receive(int fd, unsigned int flags, struct file *file)
> > {
> > int err;
> > struct socket *sock;
> > struct files_struct *files = current->files;
> >
> > err = security_file_receive(file);
> > if (err)
> > return err;
> >
> > if (fd >= 0) {
> > if (fd >= rlimit(RLIMIT_NOFILE))
> > return -EBADF;
> >
> > spin_lock(&files->file_lock);
> > err = expand_files(files, fd);
> > if (err < 0) {
> > goto out_unlock;
> > }
> >
> > err = do_dup2(files, NULL, fd, flags);
> > if (err)
> > return err;
>
> This seems like we're duplicating some checks and missing others -- I
> really think we need to do this using the existing primitives. But I'd
> really like some review or commentary from Al. We can do this a bunch of
> ways, and I'd like to know which way looks best to him. :(
>
> > This way there is:
> > 1. No "put_user" logic in file_receive
> > 2. Minimal (single) branching logic, unless there's something in between
> > the file_receive and installing the FD, such as put_user.
> > 3. Doesn't implement fd_install, so there's no ambiguity about it being
> > file_install_received vs. just the receive logic.
>
> I still wonder if we should refactor SCM_RIGHTS to just delay put_user
> failures, which would simplify a bunch. It's a behavior change, but it
I'm looking at __scm_install_fd() and I wonder what specifically you
mean by that? The put_user() seems to be placed such that the install
occurrs only if it succeeded. Sure, it only handles a single fd but
whatever. Userspace knows that already. Just look at systemd when a msg
fails:
void cmsg_close_all(struct msghdr *mh) {
struct cmsghdr *cmsg;
assert(mh);
CMSG_FOREACH(cmsg, mh)
if (cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SCM_RIGHTS)
close_many((int*) CMSG_DATA(cmsg), (cmsg->cmsg_len - CMSG_LEN(0)) / sizeof(int));
}
The only reasonable scenario for this whole mess I can think of is sm like (pseudo code):
fd_install_received(int fd, struct file *file)
{
sock = sock_from_file(fd, &err);
if (sock) {
sock_update_netprioidx(&sock->sk->sk_cgrp_data);
sock_update_classid(&sock->sk->sk_cgrp_data);
}
fd_install();
}
error = 0;
fdarray = malloc(fdmax);
for (i = 0; i < fdmax; i++) {
fdarray[i] = get_unused_fd_flags(o_flags);
if (fdarray[i] < 0) {
error = -EBADF;
break;
}
error = security_file_receive(file);
if (error)
break;
error = put_user(fd_array[i], ufd);
if (error)
break;
}
for (i = 0; i < fdmax; i++) {
if (error) {
/* ignore errors */
put_user(-EBADF, ufd); /* If this put_user() fails and the first one succeeded userspace might now close an fd it didn't intend to. */
put_unused_fd(fdarray[i]);
} else {
fd_install_received(fdarray[i], file);
}
}
Christian
On Tue, Jun 09, 2020 at 10:03:46PM +0200, Christian Brauner wrote:
> I'm looking at __scm_install_fd() and I wonder what specifically you
> mean by that? The put_user() seems to be placed such that the install
> occurrs only if it succeeded. Sure, it only handles a single fd but
> whatever. Userspace knows that already. Just look at systemd when a msg
> fails:
>
> void cmsg_close_all(struct msghdr *mh) {
> struct cmsghdr *cmsg;
>
> assert(mh);
>
> CMSG_FOREACH(cmsg, mh)
> if (cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SCM_RIGHTS)
> close_many((int*) CMSG_DATA(cmsg), (cmsg->cmsg_len - CMSG_LEN(0)) / sizeof(int));
> }
>
> The only reasonable scenario for this whole mess I can think of is sm like (pseudo code):
>
> fd_install_received(int fd, struct file *file)
> {
> sock = sock_from_file(fd, &err);
> if (sock) {
> sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> sock_update_classid(&sock->sk->sk_cgrp_data);
> }
>
> fd_install();
> }
>
> error = 0;
> fdarray = malloc(fdmax);
> for (i = 0; i < fdmax; i++) {
> fdarray[i] = get_unused_fd_flags(o_flags);
> if (fdarray[i] < 0) {
> error = -EBADF;
> break;
> }
>
> error = security_file_receive(file);
> if (error)
> break;
>
> error = put_user(fd_array[i], ufd);
> if (error)
> break;
> }
>
> for (i = 0; i < fdmax; i++) {
> if (error) {
> /* ignore errors */
> put_user(-EBADF, ufd); /* If this put_user() fails and the first one succeeded userspace might now close an fd it didn't intend to. */
> put_unused_fd(fdarray[i]);
> } else {
> fd_install_received(fdarray[i], file);
> }
> }
I see 4 cases of the same code pattern (get_unused_fd_flags(),
sock_update_*(), fd_install()), one of them has this difficult put_user()
in the middle, and one of them has a potential replace_fd() instead of
the get_used/fd_install. So, to me, it makes sense to have a helper that
encapsulates the common work that each of those call sites has to do,
which I keep cringing at all these suggestions that leave portions of it
outside the helper.
If it's too ugly to keep the put_user() in the helper, then we can try
what was suggested earlier, and just totally rework the failure path for
SCM_RIGHTS.
LOL. And while we were debating this, hch just went and cleaned stuff
up:
2618d530dd8b ("net/scm: cleanup scm_detach_fds")
So, um, yeah, now my proposal is actually even closer to what we already
have there. We just add the replace_fd() logic to __scm_install_fd() and
we're done with it.
--
Kees Cook
On June 9, 2020 10:55:42 PM GMT+02:00, Kees Cook <[email protected]> wrote:
>On Tue, Jun 09, 2020 at 10:03:46PM +0200, Christian Brauner wrote:
>> I'm looking at __scm_install_fd() and I wonder what specifically you
>> mean by that? The put_user() seems to be placed such that the install
>> occurrs only if it succeeded. Sure, it only handles a single fd but
>> whatever. Userspace knows that already. Just look at systemd when a
>msg
>> fails:
>>
>> void cmsg_close_all(struct msghdr *mh) {
>> struct cmsghdr *cmsg;
>>
>> assert(mh);
>>
>> CMSG_FOREACH(cmsg, mh)
>> if (cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type
>== SCM_RIGHTS)
>> close_many((int*) CMSG_DATA(cmsg),
>(cmsg->cmsg_len - CMSG_LEN(0)) / sizeof(int));
>> }
>>
>> The only reasonable scenario for this whole mess I can think of is sm
>like (pseudo code):
>>
>> fd_install_received(int fd, struct file *file)
>> {
>> sock = sock_from_file(fd, &err);
>> if (sock) {
>> sock_update_netprioidx(&sock->sk->sk_cgrp_data);
>> sock_update_classid(&sock->sk->sk_cgrp_data);
>> }
>>
>> fd_install();
>> }
>>
>> error = 0;
>> fdarray = malloc(fdmax);
>> for (i = 0; i < fdmax; i++) {
>> fdarray[i] = get_unused_fd_flags(o_flags);
>> if (fdarray[i] < 0) {
>> error = -EBADF;
>> break;
>> }
>>
>> error = security_file_receive(file);
>> if (error)
>> break;
>>
>> error = put_user(fd_array[i], ufd);
>> if (error)
>> break;
>> }
>>
>> for (i = 0; i < fdmax; i++) {
>> if (error) {
>> /* ignore errors */
>> put_user(-EBADF, ufd); /* If this put_user() fails and the first
>one succeeded userspace might now close an fd it didn't intend to. */
>> put_unused_fd(fdarray[i]);
>> } else {
>> fd_install_received(fdarray[i], file);
>> }
>> }
>
>I see 4 cases of the same code pattern (get_unused_fd_flags(),
>sock_update_*(), fd_install()), one of them has this difficult
>put_user()
>in the middle, and one of them has a potential replace_fd() instead of
>the get_used/fd_install. So, to me, it makes sense to have a helper
>that
>encapsulates the common work that each of those call sites has to do,
>which I keep cringing at all these suggestions that leave portions of
>it
>outside the helper.
>
>If it's too ugly to keep the put_user() in the helper, then we can try
>what was suggested earlier, and just totally rework the failure path
>for
>SCM_RIGHTS.
>
>LOL. And while we were debating this, hch just went and cleaned stuff
>up:
>
>2618d530dd8b ("net/scm: cleanup scm_detach_fds")
>
>So, um, yeah, now my proposal is actually even closer to what we
>already
>have there. We just add the replace_fd() logic to __scm_install_fd()
>and
>we're done with it.
Cool, you have a link? :)
Christian
On Tue, Jun 09, 2020 at 11:27:30PM +0200, Christian Brauner wrote:
> On June 9, 2020 10:55:42 PM GMT+02:00, Kees Cook <[email protected]> wrote:
> >LOL. And while we were debating this, hch just went and cleaned stuff up:
> >
> >2618d530dd8b ("net/scm: cleanup scm_detach_fds")
> >
> >So, um, yeah, now my proposal is actually even closer to what we already
> >have there. We just add the replace_fd() logic to __scm_install_fd() and
> >we're done with it.
>
> Cool, you have a link? :)
How about this:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=devel/seccomp/addfd/v3.1&id=bb94586b9e7cc88e915536c2e9fb991a97b62416
--
Kees Cook
On Tue, Jun 09, 2020 at 10:27:54PM -0700, Kees Cook wrote:
> On Tue, Jun 09, 2020 at 11:27:30PM +0200, Christian Brauner wrote:
> > On June 9, 2020 10:55:42 PM GMT+02:00, Kees Cook <[email protected]> wrote:
> > >LOL. And while we were debating this, hch just went and cleaned stuff up:
> > >
> > >2618d530dd8b ("net/scm: cleanup scm_detach_fds")
> > >
> > >So, um, yeah, now my proposal is actually even closer to what we already
> > >have there. We just add the replace_fd() logic to __scm_install_fd() and
> > >we're done with it.
> >
> > Cool, you have a link? :)
>
> How about this:
>
Thank you.
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=devel/seccomp/addfd/v3.1&id=bb94586b9e7cc88e915536c2e9fb991a97b62416
>
> --
> Kees Cook
+ if (ufd) {
+ error = put_user(new_fd, ufd);
+ if (error) {
+ put_unused_fd(new_fd);
+ return error;
+ }
+ }
I'm fairly sure this introduces a bug[1] if the user does:
struct msghdr msg = {};
struct cmsghdr *cmsg;
struct iovec io = {
.iov_base = &c,
.iov_len = 1,
};
msg.msg_iov = &io;
msg.msg_iovlen = 1;
msg.msg_control = NULL;
msg.msg_controllen = sizeof(buf);
recvmsg(sock, &msg, 0);
They will have the FD installed, no error message, but FD number wont be written
to memory AFAICT. If two FDs are passed, you will get an efault. They will both
be installed, but memory wont be written to. Maybe instead of 0, make it a
poison pointer, or -1 instead?
-----
As an aside, all of this junk should be dropped:
+ ret = get_user(size, &uaddfd->size);
+ if (ret)
+ return ret;
+
+ ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
+ if (ret)
+ return ret;
and the size member of the seccomp_notif_addfd struct. I brought this up
off-list with Tycho that ioctls have the size of the struct embedded in them. We
should just use that. The ioctl definition is based on this[2]:
#define _IOC(dir,type,nr,size) \
(((dir) << _IOC_DIRSHIFT) | \
((type) << _IOC_TYPESHIFT) | \
((nr) << _IOC_NRSHIFT) | \
((size) << _IOC_SIZESHIFT))
We should just use copy_from_user for now. In the future, we can either
introduce new ioctl names for new structs, or extract the size dynamically from
the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
----
+#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOR(3, \
+ struct seccomp_notif_addfd)
Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on
the documentation in ioctl.h -- "_IOW means userland is writing and kernel is
reading."
[1]: https://lore.kernel.org/lkml/[email protected]/
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/asm-generic/ioctl.h?id=v5.7#n69
On Tue, Jun 09, 2020 at 11:27:30PM +0200, Christian Brauner wrote:
> On June 9, 2020 10:55:42 PM GMT+02:00, Kees Cook <[email protected]> wrote:
> >On Tue, Jun 09, 2020 at 10:03:46PM +0200, Christian Brauner wrote:
> >> I'm looking at __scm_install_fd() and I wonder what specifically you
> >> mean by that? The put_user() seems to be placed such that the install
> >> occurrs only if it succeeded. Sure, it only handles a single fd but
> >> whatever. Userspace knows that already. Just look at systemd when a
> >msg
> >> fails:
> >>
> >> void cmsg_close_all(struct msghdr *mh) {
> >> struct cmsghdr *cmsg;
> >>
> >> assert(mh);
> >>
> >> CMSG_FOREACH(cmsg, mh)
> >> if (cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type
> >== SCM_RIGHTS)
> >> close_many((int*) CMSG_DATA(cmsg),
> >(cmsg->cmsg_len - CMSG_LEN(0)) / sizeof(int));
> >> }
> >>
> >> The only reasonable scenario for this whole mess I can think of is sm
> >like (pseudo code):
> >>
> >> fd_install_received(int fd, struct file *file)
> >> {
> >> sock = sock_from_file(fd, &err);
> >> if (sock) {
> >> sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> >> sock_update_classid(&sock->sk->sk_cgrp_data);
> >> }
> >>
> >> fd_install();
> >> }
> >>
> >> error = 0;
> >> fdarray = malloc(fdmax);
> >> for (i = 0; i < fdmax; i++) {
> >> fdarray[i] = get_unused_fd_flags(o_flags);
> >> if (fdarray[i] < 0) {
> >> error = -EBADF;
> >> break;
> >> }
> >>
> >> error = security_file_receive(file);
> >> if (error)
> >> break;
> >>
> >> error = put_user(fd_array[i], ufd);
> >> if (error)
> >> break;
> >> }
> >>
> >> for (i = 0; i < fdmax; i++) {
> >> if (error) {
> >> /* ignore errors */
> >> put_user(-EBADF, ufd); /* If this put_user() fails and the first
> >one succeeded userspace might now close an fd it didn't intend to. */
> >> put_unused_fd(fdarray[i]);
> >> } else {
> >> fd_install_received(fdarray[i], file);
> >> }
> >> }
> >
> >I see 4 cases of the same code pattern (get_unused_fd_flags(),
> >sock_update_*(), fd_install()), one of them has this difficult
> >put_user()
> >in the middle, and one of them has a potential replace_fd() instead of
> >the get_used/fd_install. So, to me, it makes sense to have a helper
> >that
> >encapsulates the common work that each of those call sites has to do,
> >which I keep cringing at all these suggestions that leave portions of
> >it
> >outside the helper.
> >
> >If it's too ugly to keep the put_user() in the helper, then we can try
> >what was suggested earlier, and just totally rework the failure path
> >for
> >SCM_RIGHTS.
> >
> >LOL. And while we were debating this, hch just went and cleaned stuff
> >up:
> >
> >2618d530dd8b ("net/scm: cleanup scm_detach_fds")
> >
> >So, um, yeah, now my proposal is actually even closer to what we
> >already
> >have there. We just add the replace_fd() logic to __scm_install_fd()
> >and
> >we're done with it.
>
> Cool, you have a link? :)
For the record, as I didn't see this yesterday since I was already
looking at a kernel with Christoph's changes. His changes just move the
logic that was already there before into a separate helper.
So effectively nothing has changed semantically in the scm code at all.
This is why I was asking yesterday what you meant by reworking the scm
code's put_user() logic as it seems obviously correct as it is done now.
Christian
From: Sargun Dhillon
> Sent: 10 June 2020 09:13
>
> On Tue, Jun 09, 2020 at 10:27:54PM -0700, Kees Cook wrote:
> > On Tue, Jun 09, 2020 at 11:27:30PM +0200, Christian Brauner wrote:
> > > On June 9, 2020 10:55:42 PM GMT+02:00, Kees Cook <[email protected]> wrote:
> > > >LOL. And while we were debating this, hch just went and cleaned stuff up:
> > > >
> > > >2618d530dd8b ("net/scm: cleanup scm_detach_fds")
> > > >
> > > >So, um, yeah, now my proposal is actually even closer to what we already
> > > >have there. We just add the replace_fd() logic to __scm_install_fd() and
> > > >we're done with it.
> > >
> > > Cool, you have a link? :)
> >
> > How about this:
> >
> Thank you.
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=devel/seccomp/addfd/v3.1&id=b
> b94586b9e7cc88e915536c2e9fb991a97b62416
> >
> > --
> > Kees Cook
>
> + if (ufd) {
> + error = put_user(new_fd, ufd);
> + if (error) {
> + put_unused_fd(new_fd);
> + return error;
> + }
> + }
> I'm fairly sure this introduces a bug[1] if the user does:
>
> struct msghdr msg = {};
> struct cmsghdr *cmsg;
> struct iovec io = {
> .iov_base = &c,
> .iov_len = 1,
> };
>
> msg.msg_iov = &io;
> msg.msg_iovlen = 1;
> msg.msg_control = NULL;
> msg.msg_controllen = sizeof(buf);
>
> recvmsg(sock, &msg, 0);
>
> They will have the FD installed, no error message, but FD number wont be written
> to memory AFAICT. If two FDs are passed, you will get an efault. They will both
> be installed, but memory wont be written to. Maybe instead of 0, make it a
> poison pointer, or -1 instead?
IMHO if the buffer isn't big enough the nothing should happen.
(or maybe a few of the fds be returned and the others left for later.)
OTOH if the user passed an invalid address then installing the fd
and returning EFAULT (and hence SIGSEGV) seems reasonable.
Properly written apps just don't do that.
In essence the 'copy_to_user' is done by the wrapper code.
The code filling in the CMSG buffer can be considered to be
writing a kernel buffer.
IIRC other kernels (eg NetBSD) do the copies for ioctl() requests
in the ioctl syscall wrapper.
The IOW/IOR/IOWR flags have to be right.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> On Tue, Jun 09, 2020 at 10:27:54PM -0700, Kees Cook wrote:
> > On Tue, Jun 09, 2020 at 11:27:30PM +0200, Christian Brauner wrote:
> > > On June 9, 2020 10:55:42 PM GMT+02:00, Kees Cook <[email protected]> wrote:
> > > >LOL. And while we were debating this, hch just went and cleaned stuff up:
> > > >
> > > >2618d530dd8b ("net/scm: cleanup scm_detach_fds")
> > > >
> > > >So, um, yeah, now my proposal is actually even closer to what we already
> > > >have there. We just add the replace_fd() logic to __scm_install_fd() and
> > > >we're done with it.
> > >
> > > Cool, you have a link? :)
> >
> > How about this:
> >
> Thank you.
> > https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=devel/seccomp/addfd/v3.1&id=bb94586b9e7cc88e915536c2e9fb991a97b62416
> >
> > --
> > Kees Cook
>
> + if (ufd) {
> + error = put_user(new_fd, ufd);
> + if (error) {
> + put_unused_fd(new_fd);
> + return error;
> + }
> + }
> I'm fairly sure this introduces a bug[1] if the user does:
Ah, sorry, I missed this before I posted my "v3.2" tree link.
>
> struct msghdr msg = {};
> struct cmsghdr *cmsg;
> struct iovec io = {
> .iov_base = &c,
> .iov_len = 1,
> };
>
> msg.msg_iov = &io;
> msg.msg_iovlen = 1;
> msg.msg_control = NULL;
> msg.msg_controllen = sizeof(buf);
>
> recvmsg(sock, &msg, 0);
>
> They will have the FD installed, no error message, but FD number wont be written
> to memory AFAICT. If two FDs are passed, you will get an efault. They will both
> be installed, but memory wont be written to. Maybe instead of 0, make it a
> poison pointer, or -1 instead?
Hmmm. I see what you mean -- SCM_RIGHTS effectively _requires_ a valid
__user pointer, so we can't use NULL to indicate "we don't want this".
I'm not sure I can pass this through directly at all, though.
> -----
> As an aside, all of this junk should be dropped:
> + ret = get_user(size, &uaddfd->size);
> + if (ret)
> + return ret;
> +
> + ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> + if (ret)
> + return ret;
>
> and the size member of the seccomp_notif_addfd struct. I brought this up
> off-list with Tycho that ioctls have the size of the struct embedded in them. We
> should just use that. The ioctl definition is based on this[2]:
> #define _IOC(dir,type,nr,size) \
> (((dir) << _IOC_DIRSHIFT) | \
> ((type) << _IOC_TYPESHIFT) | \
> ((nr) << _IOC_NRSHIFT) | \
> ((size) << _IOC_SIZESHIFT))
>
>
> We should just use copy_from_user for now. In the future, we can either
> introduce new ioctl names for new structs, or extract the size dynamically from
> the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
Okay, sounds good.
> ----
> +#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOR(3, \
> + struct seccomp_notif_addfd)
>
> Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on
> the documentation in ioctl.h -- "_IOW means userland is writing and kernel is
> reading."
Okay, let me tweak things and get a "v3.3". ;)
--
Kees Cook
On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> As an aside, all of this junk should be dropped:
> + ret = get_user(size, &uaddfd->size);
> + if (ret)
> + return ret;
> +
> + ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> + if (ret)
> + return ret;
>
> and the size member of the seccomp_notif_addfd struct. I brought this up
> off-list with Tycho that ioctls have the size of the struct embedded in them. We
> should just use that. The ioctl definition is based on this[2]:
> #define _IOC(dir,type,nr,size) \
> (((dir) << _IOC_DIRSHIFT) | \
> ((type) << _IOC_TYPESHIFT) | \
> ((nr) << _IOC_NRSHIFT) | \
> ((size) << _IOC_SIZESHIFT))
>
>
> We should just use copy_from_user for now. In the future, we can either
> introduce new ioctl names for new structs, or extract the size dynamically from
> the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
Yeah, that seems reasonable. Here's the diff for that part:
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 7b6028b399d8..98bf19b4e086 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -118,7 +118,6 @@ struct seccomp_notif_resp {
/**
* struct seccomp_notif_addfd
- * @size: The size of the seccomp_notif_addfd datastructure
* @id: The ID of the seccomp notification
* @flags: SECCOMP_ADDFD_FLAG_*
* @srcfd: The local fd number
@@ -126,7 +125,6 @@ struct seccomp_notif_resp {
* @newfd_flags: The O_* flags the remote FD should have applied
*/
struct seccomp_notif_addfd {
- __u64 size;
__u64 id;
__u32 flags;
__u32 srcfd;
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 3c913f3b8451..00cbdad6c480 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1297,14 +1297,9 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter,
struct seccomp_notif_addfd addfd;
struct seccomp_knotif *knotif;
struct seccomp_kaddfd kaddfd;
- u64 size;
int ret;
- ret = get_user(size, &uaddfd->size);
- if (ret)
- return ret;
-
- ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
+ ret = copy_from_user(&addfd, uaddfd, sizeof(addfd));
if (ret)
return ret;
>
> ----
> +#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOR(3, \
> + struct seccomp_notif_addfd)
>
> Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on
> the documentation in ioctl.h -- "_IOW means userland is writing and kernel is
> reading."
Oooooh. Yeah; good catch. Uhm, that means SECCOMP_IOCTL_NOTIF_ID_VALID
is wrong too, yes? Tycho, Christian, how disruptive would this be to
fix? (Perhaps support both and deprecate the IOR version at some point
in the future?)
Diff for just addfd's change:
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 7b6028b399d8..98bf19b4e086 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -146,7 +144,7 @@ struct seccomp_notif_addfd {
struct seccomp_notif_resp)
#define SECCOMP_IOCTL_NOTIF_ID_VALID SECCOMP_IOR(2, __u64)
/* On success, the return value is the remote process's added fd number */
-#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOR(3, \
+#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOW(3, \
struct seccomp_notif_addfd)
#endif /* _UAPI_LINUX_SECCOMP_H */
--
Kees Cook
On Wed, Jun 10, 2020 at 08:48:45AM +0000, David Laight wrote:
> From: Sargun Dhillon
> > Sent: 10 June 2020 09:13
> In essence the 'copy_to_user' is done by the wrapper code.
> The code filling in the CMSG buffer can be considered to be
> writing a kernel buffer.
>
> IIRC other kernels (eg NetBSD) do the copies for ioctl() requests
> in the ioctl syscall wrapper.
> The IOW/IOR/IOWR flags have to be right.
Yeah, this seems like it'd make a lot more sense (and would have easily
caught the IOR/IOW issue pointed out later in the thread). I wonder how
insane it would be to try to fix that globally in the kernel...
--
Kees Cook
On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
>
> Yeah, that seems reasonable. Here's the diff for that part:
>
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 7b6028b399d8..98bf19b4e086 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -118,7 +118,6 @@ struct seccomp_notif_resp {
>
> /**
> * struct seccomp_notif_addfd
> - * @size: The size of the seccomp_notif_addfd datastructure
> * @id: The ID of the seccomp notification
> * @flags: SECCOMP_ADDFD_FLAG_*
> * @srcfd: The local fd number
> @@ -126,7 +125,6 @@ struct seccomp_notif_resp {
> * @newfd_flags: The O_* flags the remote FD should have applied
> */
> struct seccomp_notif_addfd {
> - __u64 size;
> __u64 id;
> __u32 flags;
> __u32 srcfd;
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 3c913f3b8451..00cbdad6c480 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -1297,14 +1297,9 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter,
> struct seccomp_notif_addfd addfd;
> struct seccomp_knotif *knotif;
> struct seccomp_kaddfd kaddfd;
> - u64 size;
> int ret;
>
> - ret = get_user(size, &uaddfd->size);
> - if (ret)
> - return ret;
> -
> - ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> + ret = copy_from_user(&addfd, uaddfd, sizeof(addfd));
> if (ret)
> return ret;
>
>
Looks good to me. If we ever change the size of this struct, we can do the work
then to copy_struct_from_user.
> >
> > ----
> > +#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOR(3, \
> > + struct seccomp_notif_addfd)
> >
> > Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on
> > the documentation in ioctl.h -- "_IOW means userland is writing and kernel is
> > reading."
>
> Oooooh. Yeah; good catch. Uhm, that means SECCOMP_IOCTL_NOTIF_ID_VALID
> is wrong too, yes? Tycho, Christian, how disruptive would this be to
> fix? (Perhaps support both and deprecate the IOR version at some point
> in the future?)
I think at a minimum we should change the uapi, and accept both (for now). Maybe
a pr_warn_once telling people not to use the old one.
I can do the patch, if you want.
>
> Diff for just addfd's change:
>
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 7b6028b399d8..98bf19b4e086 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -146,7 +144,7 @@ struct seccomp_notif_addfd {
> struct seccomp_notif_resp)
> #define SECCOMP_IOCTL_NOTIF_ID_VALID SECCOMP_IOR(2, __u64)
> /* On success, the return value is the remote process's added fd number */
> -#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOR(3, \
> +#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOW(3, \
> struct seccomp_notif_addfd)
>
> #endif /* _UAPI_LINUX_SECCOMP_H */
>
> --
> Kees Cook
Looks good. Thank you.
From: Kees Cook
> Sent: 11 June 2020 04:03
...
> > IIRC other kernels (eg NetBSD) do the copies for ioctl() requests
> > in the ioctl syscall wrapper.
> > The IOW/IOR/IOWR flags have to be right.
>
> Yeah, this seems like it'd make a lot more sense (and would have easily
> caught the IOR/IOW issue pointed out later in the thread). I wonder how
> insane it would be to try to fix that globally in the kernel...
Seems like a good idea to me.
(Even though I'll need to fix our 'out of tree' modules.)
Unlike [sg]etsockopt() at least the buffer is bounded to 1k.
But you'd really need to add new kernel_ioctl() entry points
before deprecating the existing ones a release or two later.
With a bit of luck there aren't any drivers ported from SYSV that
just treat the ioctl command as a 32bit transparent value and
the argument as an integer.
I actually suspect that BSD added IOW (etc) in the 16bit to 32bit port.
The kernel copies being moved to the syscall stub at the same time.
Since Linux has only ever been 32bit and uses IOW is it actually odd
that Linus didn't do the copies in the stub.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > As an aside, all of this junk should be dropped:
> > + ret = get_user(size, &uaddfd->size);
> > + if (ret)
> > + return ret;
> > +
> > + ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > + if (ret)
> > + return ret;
> >
> > and the size member of the seccomp_notif_addfd struct. I brought this up
> > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > should just use that. The ioctl definition is based on this[2]:
> > #define _IOC(dir,type,nr,size) \
> > (((dir) << _IOC_DIRSHIFT) | \
> > ((type) << _IOC_TYPESHIFT) | \
> > ((nr) << _IOC_NRSHIFT) | \
> > ((size) << _IOC_SIZESHIFT))
> >
> >
> > We should just use copy_from_user for now. In the future, we can either
> > introduce new ioctl names for new structs, or extract the size dynamically from
> > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
>
> Yeah, that seems reasonable. Here's the diff for that part:
>
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 7b6028b399d8..98bf19b4e086 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -118,7 +118,6 @@ struct seccomp_notif_resp {
>
> /**
> * struct seccomp_notif_addfd
> - * @size: The size of the seccomp_notif_addfd datastructure
> * @id: The ID of the seccomp notification
> * @flags: SECCOMP_ADDFD_FLAG_*
> * @srcfd: The local fd number
> @@ -126,7 +125,6 @@ struct seccomp_notif_resp {
> * @newfd_flags: The O_* flags the remote FD should have applied
> */
> struct seccomp_notif_addfd {
> - __u64 size;
> __u64 id;
> __u32 flags;
> __u32 srcfd;
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 3c913f3b8451..00cbdad6c480 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -1297,14 +1297,9 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter,
> struct seccomp_notif_addfd addfd;
> struct seccomp_knotif *knotif;
> struct seccomp_kaddfd kaddfd;
> - u64 size;
> int ret;
>
> - ret = get_user(size, &uaddfd->size);
> - if (ret)
> - return ret;
> -
> - ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> + ret = copy_from_user(&addfd, uaddfd, sizeof(addfd));
> if (ret)
> return ret;
>
>
> >
> > ----
> > +#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOR(3, \
> > + struct seccomp_notif_addfd)
> >
> > Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on
> > the documentation in ioctl.h -- "_IOW means userland is writing and kernel is
> > reading."
>
> Oooooh. Yeah; good catch. Uhm, that means SECCOMP_IOCTL_NOTIF_ID_VALID
> is wrong too, yes? Tycho, Christian, how disruptive would this be to
> fix? (Perhaps support both and deprecate the IOR version at some point
> in the future?)
We have custom defines in our source code, i.e.
#define SECCOMP_IOCTL_NOTIF_ID_VALID SECCOMP_IOR(2, __u64)
so ideally we'd have a SECCOMP_IOCTL_NOTIF_ID_VALID_V2
Does that sound ok?
Christian
On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > As an aside, all of this junk should be dropped:
> > + ret = get_user(size, &uaddfd->size);
> > + if (ret)
> > + return ret;
> > +
> > + ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > + if (ret)
> > + return ret;
> >
> > and the size member of the seccomp_notif_addfd struct. I brought this up
> > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > should just use that. The ioctl definition is based on this[2]:
> > #define _IOC(dir,type,nr,size) \
> > (((dir) << _IOC_DIRSHIFT) | \
> > ((type) << _IOC_TYPESHIFT) | \
> > ((nr) << _IOC_NRSHIFT) | \
> > ((size) << _IOC_SIZESHIFT))
> >
> >
> > We should just use copy_from_user for now. In the future, we can either
> > introduce new ioctl names for new structs, or extract the size dynamically from
> > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
>
> Yeah, that seems reasonable. Here's the diff for that part:
Why does it matter that the ioctl() has the size of the struct embedded
within? Afaik, the kernel itself doesn't do anything with that size. It
merely checks that the size is not pathological and it does so at
compile time.
#ifdef __CHECKER__
#define _IOC_TYPECHECK(t) (sizeof(t))
#else
/* provoke compile error for invalid uses of size argument */
extern unsigned int __invalid_size_argument_for_IOC;
#define _IOC_TYPECHECK(t) \
((sizeof(t) == sizeof(t[1]) && \
sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
sizeof(t) : __invalid_size_argument_for_IOC)
#endif
The size itself is not verified at runtime. copy_struct_from_user()
still makes sense at least if we're going to allow expanding the struct
in the future.
Leaving that aside, the proposed direction here seems to mean that any
change to the struct itself will immediately mean a new ioctl() but
afaict, that also means a new struct. Since when you simply extend the
struct for the sake of the new ioctl you also change the size for the
ioctl.
Sure, you can simply treat the struct coming through the old ioctl as
being "capped" by e.g. hiding the size as suggested but then the gain
by having two separate ioctls is 0 compared to simply versioning the
struct with an explicit size member since the size encoded in the ioctl
and the actual size of the struct don't line up anymore which is the
only plus I can see for relying on _IOC_SIZE(). All this manages to do
then is to make it more annoying for userspace since they now need to
maintain multiple ioctls(). And if you have - however unlikely - say
three different ioctls all to be used with a different struct size of
the same struct I now need to know which ioctl() goes with which size of
the struct (I guess you could append the size to the ioctl name?
*shudder*). If you have the size in the struct itself you don't need to
care about any of that.
Maybe I'm not making sense or I misunderstand what's going on though.
Christian
>
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 7b6028b399d8..98bf19b4e086 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -118,7 +118,6 @@ struct seccomp_notif_resp {
>
> /**
> * struct seccomp_notif_addfd
> - * @size: The size of the seccomp_notif_addfd datastructure
> * @id: The ID of the seccomp notification
> * @flags: SECCOMP_ADDFD_FLAG_*
> * @srcfd: The local fd number
> @@ -126,7 +125,6 @@ struct seccomp_notif_resp {
> * @newfd_flags: The O_* flags the remote FD should have applied
> */
> struct seccomp_notif_addfd {
> - __u64 size;
> __u64 id;
> __u32 flags;
> __u32 srcfd;
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 3c913f3b8451..00cbdad6c480 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -1297,14 +1297,9 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter,
> struct seccomp_notif_addfd addfd;
> struct seccomp_knotif *knotif;
> struct seccomp_kaddfd kaddfd;
> - u64 size;
> int ret;
>
> - ret = get_user(size, &uaddfd->size);
> - if (ret)
> - return ret;
> -
> - ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> + ret = copy_from_user(&addfd, uaddfd, sizeof(addfd));
> if (ret)
> return ret;
>
>
> >
> > ----
> > +#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOR(3, \
> > + struct seccomp_notif_addfd)
> >
> > Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on
> > the documentation in ioctl.h -- "_IOW means userland is writing and kernel is
> > reading."
>
> Oooooh. Yeah; good catch. Uhm, that means SECCOMP_IOCTL_NOTIF_ID_VALID
> is wrong too, yes? Tycho, Christian, how disruptive would this be to
> fix? (Perhaps support both and deprecate the IOR version at some point
> in the future?)
>
> Diff for just addfd's change:
>
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 7b6028b399d8..98bf19b4e086 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -146,7 +144,7 @@ struct seccomp_notif_addfd {
> struct seccomp_notif_resp)
> #define SECCOMP_IOCTL_NOTIF_ID_VALID SECCOMP_IOR(2, __u64)
> /* On success, the return value is the remote process's added fd number */
> -#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOR(3, \
> +#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOW(3, \
> struct seccomp_notif_addfd)
>
> #endif /* _UAPI_LINUX_SECCOMP_H */
>
> --
> Kees Cook
On Thu, Jun 11, 2020 at 11:19:42AM +0200, Christian Brauner wrote:
> On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > As an aside, all of this junk should be dropped:
> > > + ret = get_user(size, &uaddfd->size);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > + if (ret)
> > > + return ret;
> > >
> > > and the size member of the seccomp_notif_addfd struct. I brought this up
> > > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > > should just use that. The ioctl definition is based on this[2]:
> > > #define _IOC(dir,type,nr,size) \
> > > (((dir) << _IOC_DIRSHIFT) | \
> > > ((type) << _IOC_TYPESHIFT) | \
> > > ((nr) << _IOC_NRSHIFT) | \
> > > ((size) << _IOC_SIZESHIFT))
> > >
> > >
> > > We should just use copy_from_user for now. In the future, we can either
> > > introduce new ioctl names for new structs, or extract the size dynamically from
> > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> >
> > Yeah, that seems reasonable. Here's the diff for that part:
> >
> > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> > index 7b6028b399d8..98bf19b4e086 100644
> > --- a/include/uapi/linux/seccomp.h
> > +++ b/include/uapi/linux/seccomp.h
> > @@ -118,7 +118,6 @@ struct seccomp_notif_resp {
> >
> > /**
> > * struct seccomp_notif_addfd
> > - * @size: The size of the seccomp_notif_addfd datastructure
> > * @id: The ID of the seccomp notification
> > * @flags: SECCOMP_ADDFD_FLAG_*
> > * @srcfd: The local fd number
> > @@ -126,7 +125,6 @@ struct seccomp_notif_resp {
> > * @newfd_flags: The O_* flags the remote FD should have applied
> > */
> > struct seccomp_notif_addfd {
> > - __u64 size;
> > __u64 id;
> > __u32 flags;
> > __u32 srcfd;
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index 3c913f3b8451..00cbdad6c480 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -1297,14 +1297,9 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter,
> > struct seccomp_notif_addfd addfd;
> > struct seccomp_knotif *knotif;
> > struct seccomp_kaddfd kaddfd;
> > - u64 size;
> > int ret;
> >
> > - ret = get_user(size, &uaddfd->size);
> > - if (ret)
> > - return ret;
> > -
> > - ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > + ret = copy_from_user(&addfd, uaddfd, sizeof(addfd));
> > if (ret)
> > return ret;
> >
> >
> > >
> > > ----
> > > +#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOR(3, \
> > > + struct seccomp_notif_addfd)
> > >
> > > Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on
> > > the documentation in ioctl.h -- "_IOW means userland is writing and kernel is
> > > reading."
> >
> > Oooooh. Yeah; good catch. Uhm, that means SECCOMP_IOCTL_NOTIF_ID_VALID
> > is wrong too, yes? Tycho, Christian, how disruptive would this be to
> > fix? (Perhaps support both and deprecate the IOR version at some point
> > in the future?)
>
> We have custom defines in our source code, i.e.
> #define SECCOMP_IOCTL_NOTIF_ID_VALID SECCOMP_IOR(2, __u64)
> so ideally we'd have a SECCOMP_IOCTL_NOTIF_ID_VALID_V2
>
> Does that sound ok?
>
> Christian
Why not change the public API in seccomp.h to:
#define SECCOMP_IOCTL_NOTIF_ID_VALID SECCOMP_IOW(2, __u64)
And then in seccomp.c:
#define SECCOMP_IOCTL_NOTIF_ID_VALID_OLD SECCOMP_IOR(2, __u64)
static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
struct seccomp_filter *filter = file->private_data;
void __user *buf = (void __user *)arg;
switch (cmd) {
case SECCOMP_IOCTL_NOTIF_RECV:
return seccomp_notify_recv(filter, buf);
case SECCOMP_IOCTL_NOTIF_SEND:
return seccomp_notify_send(filter, buf);
case SECCOMP_IOCTL_NOTIF_ID_VALID_OLD:
pr_warn_once("Detected usage of legacy (incorrect) version of seccomp notifier notif_id_valid ioctl\n");
case SECCOMP_IOCTL_NOTIF_ID_VALID:
return seccomp_notify_id_valid(filter, buf);
default:
return -EINVAL;
}
}
----
So, both will work fine, and whenevery anyone recompiles, or picks up new
headers, they will start calling the "right" one without a code change, and
we wont break any userspace.
On Thu, Jun 11, 2020 at 12:01:14PM +0200, Christian Brauner wrote:
> On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > As an aside, all of this junk should be dropped:
> > > + ret = get_user(size, &uaddfd->size);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > + if (ret)
> > > + return ret;
> > >
> > > and the size member of the seccomp_notif_addfd struct. I brought this up
> > > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > > should just use that. The ioctl definition is based on this[2]:
> > > #define _IOC(dir,type,nr,size) \
> > > (((dir) << _IOC_DIRSHIFT) | \
> > > ((type) << _IOC_TYPESHIFT) | \
> > > ((nr) << _IOC_NRSHIFT) | \
> > > ((size) << _IOC_SIZESHIFT))
> > >
> > >
> > > We should just use copy_from_user for now. In the future, we can either
> > > introduce new ioctl names for new structs, or extract the size dynamically from
> > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> >
> > Yeah, that seems reasonable. Here's the diff for that part:
>
> Why does it matter that the ioctl() has the size of the struct embedded
> within? Afaik, the kernel itself doesn't do anything with that size. It
> merely checks that the size is not pathological and it does so at
> compile time.
>
> #ifdef __CHECKER__
> #define _IOC_TYPECHECK(t) (sizeof(t))
> #else
> /* provoke compile error for invalid uses of size argument */
> extern unsigned int __invalid_size_argument_for_IOC;
> #define _IOC_TYPECHECK(t) \
> ((sizeof(t) == sizeof(t[1]) && \
> sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
> sizeof(t) : __invalid_size_argument_for_IOC)
> #endif
>
> The size itself is not verified at runtime. copy_struct_from_user()
> still makes sense at least if we're going to allow expanding the struct
> in the future.
Right, but if we simply change our headers and extend the struct, it will break
all existing programs compiled against those headers. In order to avoid that, if
we intend on extending this struct by appending to it, we need to have a
backwards compatibility mechanism. Just having copy_struct_from_user isn't
enough. The data structure either must be fixed size, or we need a way to handle
multiple ioctl numbers derived from headers with different sized struct arguments
The two approaches I see are:
1. use more indirection. This has previous art in drm[1]. That's look
something like this:
struct seccomp_notif_addfd_ptr {
__u64 size;
__u64 addr;
}
... And then it'd be up to us to dereference the addr and copy struct from user.
2. Expose one ioctl to the user, many internally
e.g., public api:
struct seccomp_notif {
__u64 id;
__u64 pid;
struct seccomp_data;
__u64 fancy_new_field;
}
#define SECCOMP_IOCTL_NOTIF_RECV SECCOMP_IOWR(0, struct seccomp_notif)
internally:
struct seccomp_notif_v1 {
__u64 id;
__u64 pid;
struct seccomp_data;
}
struct seccomp_notif_v2 {
__u64 id;
__u64 pid;
struct seccomp_data;
__u64 fancy_new_field;
}
and we can switch like this:
switch (cmd) {
/* for example. We actually have to do this for any struct we intend to
* extend to get proper backwards compatibility
*/
case SECCOMP_IOWR(0, struct seccomp_notif_v1)
return seccomp_notify_recv(filter, buf, sizeof(struct seccomp_notif_v1));
case SECCOMP_IOWR(0, struct seccomp_notif_v2)
return seccomp_notify_recv(filter, buf, sizeof(struct seccomp_notif_v3));
...
case SECCOMP_IOCTL_NOTIF_SEND:
return seccomp_notify_send(filter, buf);
case SECCOMP_IOCTL_NOTIF_ID_VALID:
return seccomp_notify_id_valid(filter, buf);
default:
return -EINVAL;
}
This has the downside that programs compiled against more modern kernel headers
will break on older kernels.
3. We can take the approach you suggested.
#define UNSIZED(cmd) (cmd & ~(_IOC_SIZEMASK << _IOC_SIZESHIFT)
static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
struct seccomp_filter *filter = file->private_data;
void __user *buf = (void __user *)arg;
int size = _IOC_SIZE(cmd);
cmd = UNSIZED(cmd);
switch (cmd) {
/* for example. We actually have to do this for any struct we intend to
* extend to get proper backwards compatibility
*/
case UNSIZED(SECCOMP_IOCTL_NOTIF_RECV):
return seccomp_notify_recv(filter, buf, size);
...
case SECCOMP_IOCTL_NOTIF_SEND:
return seccomp_notify_send(filter, buf);
case SECCOMP_IOCTL_NOTIF_ID_VALID:
return seccomp_notify_id_valid(filter, buf);
default:
return -EINVAL;
}
}
>
> Leaving that aside, the proposed direction here seems to mean that any
> change to the struct itself will immediately mean a new ioctl() but
> afaict, that also means a new struct. Since when you simply extend the
> struct for the sake of the new ioctl you also change the size for the
> ioctl.
>
> Sure, you can simply treat the struct coming through the old ioctl as
> being "capped" by e.g. hiding the size as suggested but then the gain
> by having two separate ioctls is 0 compared to simply versioning the
> struct with an explicit size member since the size encoded in the ioctl
> and the actual size of the struct don't line up anymore which is the
> only plus I can see for relying on _IOC_SIZE(). All this manages to do
> then is to make it more annoying for userspace since they now need to
> maintain multiple ioctls(). And if you have - however unlikely - say
> three different ioctls all to be used with a different struct size of
> the same struct I now need to know which ioctl() goes with which size of
> the struct (I guess you could append the size to the ioctl name?
> *shudder*). If you have the size in the struct itself you don't need to
> care about any of that.
> Maybe I'm not making sense or I misunderstand what's going on though.
>
> Christian
>
I don't understand why userspace has to have any knowledge of this. As soon as
we add the code above, and we use copy_struct_from_user based on _that_ size,
userspace will get free upgrades. If they are compiling against an older header
than the kernel, size will return a smaller number, and thus we will zero
out our trailing bits, and if their number is bigger, we just check their
bits are appropriately zeroed.
This approach would be forwards-and-backwards compatible.
There's a little bit of prior art here as well [2]. The approach is that
we effectively do the thing we had earlier with passing a size with
copy_struct_from_user, but instead of the size being embedded in the struct,
it's embedded in the ioctl command itself.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/drm/radeon_drm.h?h=v5.7#n831
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firewire/core-cdev.c?id=v5.7#n1621
On Thu, Jun 11, 2020 at 11:06:31AM +0000, Sargun Dhillon wrote:
> On Thu, Jun 11, 2020 at 12:01:14PM +0200, Christian Brauner wrote:
> > On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > > As an aside, all of this junk should be dropped:
> > > > + ret = get_user(size, &uaddfd->size);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > > + if (ret)
> > > > + return ret;
> > > >
> > > > and the size member of the seccomp_notif_addfd struct. I brought this up
> > > > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > > > should just use that. The ioctl definition is based on this[2]:
> > > > #define _IOC(dir,type,nr,size) \
> > > > (((dir) << _IOC_DIRSHIFT) | \
> > > > ((type) << _IOC_TYPESHIFT) | \
> > > > ((nr) << _IOC_NRSHIFT) | \
> > > > ((size) << _IOC_SIZESHIFT))
> > > >
> > > >
> > > > We should just use copy_from_user for now. In the future, we can either
> > > > introduce new ioctl names for new structs, or extract the size dynamically from
> > > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > >
> > > Yeah, that seems reasonable. Here's the diff for that part:
> >
> > Why does it matter that the ioctl() has the size of the struct embedded
> > within? Afaik, the kernel itself doesn't do anything with that size. It
> > merely checks that the size is not pathological and it does so at
> > compile time.
> >
> > #ifdef __CHECKER__
> > #define _IOC_TYPECHECK(t) (sizeof(t))
> > #else
> > /* provoke compile error for invalid uses of size argument */
> > extern unsigned int __invalid_size_argument_for_IOC;
> > #define _IOC_TYPECHECK(t) \
> > ((sizeof(t) == sizeof(t[1]) && \
> > sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
> > sizeof(t) : __invalid_size_argument_for_IOC)
> > #endif
> >
> > The size itself is not verified at runtime. copy_struct_from_user()
> > still makes sense at least if we're going to allow expanding the struct
> > in the future.
> Right, but if we simply change our headers and extend the struct, it will break
> all existing programs compiled against those headers. In order to avoid that, if
> we intend on extending this struct by appending to it, we need to have a
> backwards compatibility mechanism. Just having copy_struct_from_user isn't
> enough. The data structure either must be fixed size, or we need a way to handle
> multiple ioctl numbers derived from headers with different sized struct arguments
>
> The two approaches I see are:
> 1. use more indirection. This has previous art in drm[1]. That's look
> something like this:
>
> struct seccomp_notif_addfd_ptr {
> __u64 size;
> __u64 addr;
> }
>
> ... And then it'd be up to us to dereference the addr and copy struct from user.
Which isn't great but could do.
>
> 2. Expose one ioctl to the user, many internally
>
> e.g., public api:
>
> struct seccomp_notif {
> __u64 id;
> __u64 pid;
> struct seccomp_data;
> __u64 fancy_new_field;
> }
>
> #define SECCOMP_IOCTL_NOTIF_RECV SECCOMP_IOWR(0, struct seccomp_notif)
>
> internally:
> struct seccomp_notif_v1 {
> __u64 id;
> __u64 pid;
> struct seccomp_data;
> }
>
> struct seccomp_notif_v2 {
> __u64 id;
> __u64 pid;
> struct seccomp_data;
> __u64 fancy_new_field;
> }
>
> and we can switch like this:
> switch (cmd) {
> /* for example. We actually have to do this for any struct we intend to
> * extend to get proper backwards compatibility
> */
> case SECCOMP_IOWR(0, struct seccomp_notif_v1)
> return seccomp_notify_recv(filter, buf, sizeof(struct seccomp_notif_v1));
> case SECCOMP_IOWR(0, struct seccomp_notif_v2)
> return seccomp_notify_recv(filter, buf, sizeof(struct seccomp_notif_v3));
> ...
> case SECCOMP_IOCTL_NOTIF_SEND:
> return seccomp_notify_send(filter, buf);
> case SECCOMP_IOCTL_NOTIF_ID_VALID:
> return seccomp_notify_id_valid(filter, buf);
> default:
> return -EINVAL;
> }
>
> This has the downside that programs compiled against more modern kernel headers
> will break on older kernels.
>
> 3. We can take the approach you suggested.
>
> #define UNSIZED(cmd) (cmd & ~(_IOC_SIZEMASK << _IOC_SIZESHIFT)
> static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
> unsigned long arg)
> {
> struct seccomp_filter *filter = file->private_data;
> void __user *buf = (void __user *)arg;
> int size = _IOC_SIZE(cmd);
> cmd = UNSIZED(cmd);
>
> switch (cmd) {
> /* for example. We actually have to do this for any struct we intend to
> * extend to get proper backwards compatibility
> */
> case UNSIZED(SECCOMP_IOCTL_NOTIF_RECV):
> return seccomp_notify_recv(filter, buf, size);
> ...
> case SECCOMP_IOCTL_NOTIF_SEND:
> return seccomp_notify_send(filter, buf);
> case SECCOMP_IOCTL_NOTIF_ID_VALID:
> return seccomp_notify_id_valid(filter, buf);
> default:
> return -EINVAL;
> }
> }
>
> >
> > Leaving that aside, the proposed direction here seems to mean that any
> > change to the struct itself will immediately mean a new ioctl() but
> > afaict, that also means a new struct. Since when you simply extend the
> > struct for the sake of the new ioctl you also change the size for the
> > ioctl.
> >
> > Sure, you can simply treat the struct coming through the old ioctl as
> > being "capped" by e.g. hiding the size as suggested but then the gain
> > by having two separate ioctls is 0 compared to simply versioning the
> > struct with an explicit size member since the size encoded in the ioctl
> > and the actual size of the struct don't line up anymore which is the
> > only plus I can see for relying on _IOC_SIZE(). All this manages to do
> > then is to make it more annoying for userspace since they now need to
> > maintain multiple ioctls(). And if you have - however unlikely - say
> > three different ioctls all to be used with a different struct size of
> > the same struct I now need to know which ioctl() goes with which size of
> > the struct (I guess you could append the size to the ioctl name?
> > *shudder*). If you have the size in the struct itself you don't need to
> > care about any of that.
> > Maybe I'm not making sense or I misunderstand what's going on though.
> >
> > Christian
> >
> I don't understand why userspace has to have any knowledge of this. As soon as
> we add the code above, and we use copy_struct_from_user based on _that_ size,
Hm, which code exactly?
In the previous mail the only thing proposed was to switch to a simple
copy_from_user() which effectively bars us from extending the
seccomp_addfd struct which this is about, right? At that point, the only
option then becomes to either introduce a new ioctl() and a new struct
or to go for the hack in e.g. 3. (Afaiu, 2. is not working anymore
because we break userspace as soon as we append "fancy_new_field" to the
struct because it changes the ioctl() unless I'm missing something.)
Let me maybe rephrase: I'd prefer we merge something for addfd that is
extensible with minimal burden on userspace.
But if we are fine with saying "we don't care, let's just use
copy_from_user() for addfd and if we extend we add a new struct + ioctl"
then ok, sure. But I would prefer to keep dealing with new structs +
ioctls (Look at the end of btrfs.h [1] unlikely to be a problem for us,
but still.) as little as possible because that will be more churn in
userspace code than I'd prefer.
So either [1] or - since none of the generic extensibility seems to be
particularly nice - we bite the bullet and just add a:
__u64 reserved[4]
field and hope that this will carry us for a long time (probably will
for quite a long time) and defer the new ioctl() problem.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/btrfs.h
> userspace will get free upgrades. If they are compiling against an older header
> than the kernel, size will return a smaller number, and thus we will zero
> out our trailing bits, and if their number is bigger, we just check their
> bits are appropriately zeroed.
>
> This approach would be forwards-and-backwards compatible.
>
> There's a little bit of prior art here as well [2]. The approach is that
> we effectively do the thing we had earlier with passing a size with
> copy_struct_from_user, but instead of the size being embedded in the struct,
> it's embedded in the ioctl command itself.
That looks super sketchy. :D
Christian
From: Sargun Dhillon
> Sent: 11 June 2020 12:07
> Subject: Re: [PATCH v3 1/4] fs, net: Standardize on file_receive helper to move fds across processes
>
> On Thu, Jun 11, 2020 at 12:01:14PM +0200, Christian Brauner wrote:
> > On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > > As an aside, all of this junk should be dropped:
> > > > + ret = get_user(size, &uaddfd->size);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > > + if (ret)
> > > > + return ret;
> > > >
> > > > and the size member of the seccomp_notif_addfd struct. I brought this up
> > > > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > > > should just use that. The ioctl definition is based on this[2]:
> > > > #define _IOC(dir,type,nr,size) \
> > > > (((dir) << _IOC_DIRSHIFT) | \
> > > > ((type) << _IOC_TYPESHIFT) | \
> > > > ((nr) << _IOC_NRSHIFT) | \
> > > > ((size) << _IOC_SIZESHIFT))
> > > >
> > > >
> > > > We should just use copy_from_user for now. In the future, we can either
> > > > introduce new ioctl names for new structs, or extract the size dynamically from
> > > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > >
> > > Yeah, that seems reasonable. Here's the diff for that part:
> >
> > Why does it matter that the ioctl() has the size of the struct embedded
> > within? Afaik, the kernel itself doesn't do anything with that size. It
> > merely checks that the size is not pathological and it does so at
> > compile time.
> >
> > #ifdef __CHECKER__
> > #define _IOC_TYPECHECK(t) (sizeof(t))
> > #else
> > /* provoke compile error for invalid uses of size argument */
> > extern unsigned int __invalid_size_argument_for_IOC;
> > #define _IOC_TYPECHECK(t) \
> > ((sizeof(t) == sizeof(t[1]) && \
> > sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
> > sizeof(t) : __invalid_size_argument_for_IOC)
> > #endif
> >
> > The size itself is not verified at runtime. copy_struct_from_user()
> > still makes sense at least if we're going to allow expanding the struct
> > in the future.
> Right, but if we simply change our headers and extend the struct, it will break
> all existing programs compiled against those headers. In order to avoid that, if
> we intend on extending this struct by appending to it, we need to have a
> backwards compatibility mechanism. Just having copy_struct_from_user isn't
> enough. The data structure either must be fixed size, or we need a way to handle
> multiple ioctl numbers derived from headers with different sized struct arguments
>
> The two approaches I see are:
> 1. use more indirection. This has previous art in drm[1]. That's look
> something like this:
>
> struct seccomp_notif_addfd_ptr {
> __u64 size;
> __u64 addr;
> }
>
> ... And then it'd be up to us to dereference the addr and copy struct from user.
Do not go down that route. It isn't worth the pain.
You should also assume that userspace might have a compile-time check
on the buffer length (I've written one - not hard) and that the kernel
might (in the future - or on a BSD kernel) be doing the user copies
for you.
Also, if you change the structure you almost certainly need to
change the name of the ioctl cmd as well as its value.
Otherwise a recompiled program will pass the new cmd value (and
hopefully the right sized buffer) but it won't have initialised
the buffer properly.
This is likely to lead to unexpected behaviour.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Thu, Jun 11, 2020 at 10:39:23AM +0000, Sargun Dhillon wrote:
> On Thu, Jun 11, 2020 at 11:19:42AM +0200, Christian Brauner wrote:
> > On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > > As an aside, all of this junk should be dropped:
> > > > + ret = get_user(size, &uaddfd->size);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > > + if (ret)
> > > > + return ret;
> > > >
> > > > and the size member of the seccomp_notif_addfd struct. I brought this up
> > > > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > > > should just use that. The ioctl definition is based on this[2]:
> > > > #define _IOC(dir,type,nr,size) \
> > > > (((dir) << _IOC_DIRSHIFT) | \
> > > > ((type) << _IOC_TYPESHIFT) | \
> > > > ((nr) << _IOC_NRSHIFT) | \
> > > > ((size) << _IOC_SIZESHIFT))
> > > >
> > > >
> > > > We should just use copy_from_user for now. In the future, we can either
> > > > introduce new ioctl names for new structs, or extract the size dynamically from
> > > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > >
> > > Yeah, that seems reasonable. Here's the diff for that part:
> > >
> > > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> > > index 7b6028b399d8..98bf19b4e086 100644
> > > --- a/include/uapi/linux/seccomp.h
> > > +++ b/include/uapi/linux/seccomp.h
> > > @@ -118,7 +118,6 @@ struct seccomp_notif_resp {
> > >
> > > /**
> > > * struct seccomp_notif_addfd
> > > - * @size: The size of the seccomp_notif_addfd datastructure
> > > * @id: The ID of the seccomp notification
> > > * @flags: SECCOMP_ADDFD_FLAG_*
> > > * @srcfd: The local fd number
> > > @@ -126,7 +125,6 @@ struct seccomp_notif_resp {
> > > * @newfd_flags: The O_* flags the remote FD should have applied
> > > */
> > > struct seccomp_notif_addfd {
> > > - __u64 size;
> > > __u64 id;
> > > __u32 flags;
> > > __u32 srcfd;
> > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > > index 3c913f3b8451..00cbdad6c480 100644
> > > --- a/kernel/seccomp.c
> > > +++ b/kernel/seccomp.c
> > > @@ -1297,14 +1297,9 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter,
> > > struct seccomp_notif_addfd addfd;
> > > struct seccomp_knotif *knotif;
> > > struct seccomp_kaddfd kaddfd;
> > > - u64 size;
> > > int ret;
> > >
> > > - ret = get_user(size, &uaddfd->size);
> > > - if (ret)
> > > - return ret;
> > > -
> > > - ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > + ret = copy_from_user(&addfd, uaddfd, sizeof(addfd));
> > > if (ret)
> > > return ret;
> > >
> > >
> > > >
> > > > ----
> > > > +#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOR(3, \
> > > > + struct seccomp_notif_addfd)
> > > >
> > > > Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on
> > > > the documentation in ioctl.h -- "_IOW means userland is writing and kernel is
> > > > reading."
> > >
> > > Oooooh. Yeah; good catch. Uhm, that means SECCOMP_IOCTL_NOTIF_ID_VALID
> > > is wrong too, yes? Tycho, Christian, how disruptive would this be to
> > > fix? (Perhaps support both and deprecate the IOR version at some point
> > > in the future?)
> >
> > We have custom defines in our source code, i.e.
> > #define SECCOMP_IOCTL_NOTIF_ID_VALID SECCOMP_IOR(2, __u64)
> > so ideally we'd have a SECCOMP_IOCTL_NOTIF_ID_VALID_V2
> >
> > Does that sound ok?
> >
> > Christian
> Why not change the public API in seccomp.h to:
> #define SECCOMP_IOCTL_NOTIF_ID_VALID SECCOMP_IOW(2, __u64)
>
> And then in seccomp.c:
> #define SECCOMP_IOCTL_NOTIF_ID_VALID_OLD SECCOMP_IOR(2, __u64)
> static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
> unsigned long arg)
> {
> struct seccomp_filter *filter = file->private_data;
> void __user *buf = (void __user *)arg;
>
> switch (cmd) {
> case SECCOMP_IOCTL_NOTIF_RECV:
> return seccomp_notify_recv(filter, buf);
> case SECCOMP_IOCTL_NOTIF_SEND:
> return seccomp_notify_send(filter, buf);
> case SECCOMP_IOCTL_NOTIF_ID_VALID_OLD:
> pr_warn_once("Detected usage of legacy (incorrect) version of seccomp notifier notif_id_valid ioctl\n");
> case SECCOMP_IOCTL_NOTIF_ID_VALID:
> return seccomp_notify_id_valid(filter, buf);
> default:
> return -EINVAL;
> }
> }
> ----
>
> So, both will work fine, and whenevery anyone recompiles, or picks up new
> headers, they will start calling the "right" one without a code change, and
> we wont break any userspace.
Yeah, that's what I'd prefer here.
--
Kees Cook
On Thu, Jun 11, 2020 at 02:56:22PM +0000, David Laight wrote:
> From: Sargun Dhillon
> > Sent: 11 June 2020 12:07
> > Subject: Re: [PATCH v3 1/4] fs, net: Standardize on file_receive helper to move fds across processes
> >
> > On Thu, Jun 11, 2020 at 12:01:14PM +0200, Christian Brauner wrote:
> > > On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > > > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > > > As an aside, all of this junk should be dropped:
> > > > > + ret = get_user(size, &uaddfd->size);
> > > > > + if (ret)
> > > > > + return ret;
> > > > > +
> > > > > + ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > > > + if (ret)
> > > > > + return ret;
> > > > >
> > > > > and the size member of the seccomp_notif_addfd struct. I brought this up
> > > > > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > > > > should just use that. The ioctl definition is based on this[2]:
> > > > > #define _IOC(dir,type,nr,size) \
> > > > > (((dir) << _IOC_DIRSHIFT) | \
> > > > > ((type) << _IOC_TYPESHIFT) | \
> > > > > ((nr) << _IOC_NRSHIFT) | \
> > > > > ((size) << _IOC_SIZESHIFT))
> > > > >
> > > > >
> > > > > We should just use copy_from_user for now. In the future, we can either
> > > > > introduce new ioctl names for new structs, or extract the size dynamically from
> > > > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > > >
> > > > Yeah, that seems reasonable. Here's the diff for that part:
> > >
> > > Why does it matter that the ioctl() has the size of the struct embedded
> > > within? Afaik, the kernel itself doesn't do anything with that size. It
> > > merely checks that the size is not pathological and it does so at
> > > compile time.
> > >
> > > #ifdef __CHECKER__
> > > #define _IOC_TYPECHECK(t) (sizeof(t))
> > > #else
> > > /* provoke compile error for invalid uses of size argument */
> > > extern unsigned int __invalid_size_argument_for_IOC;
> > > #define _IOC_TYPECHECK(t) \
> > > ((sizeof(t) == sizeof(t[1]) && \
> > > sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
> > > sizeof(t) : __invalid_size_argument_for_IOC)
> > > #endif
> > >
> > > The size itself is not verified at runtime. copy_struct_from_user()
> > > still makes sense at least if we're going to allow expanding the struct
> > > in the future.
> > Right, but if we simply change our headers and extend the struct, it will break
> > all existing programs compiled against those headers. In order to avoid that, if
> > we intend on extending this struct by appending to it, we need to have a
> > backwards compatibility mechanism. Just having copy_struct_from_user isn't
> > enough. The data structure either must be fixed size, or we need a way to handle
> > multiple ioctl numbers derived from headers with different sized struct arguments
> >
> > The two approaches I see are:
> > 1. use more indirection. This has previous art in drm[1]. That's look
> > something like this:
> >
> > struct seccomp_notif_addfd_ptr {
> > __u64 size;
> > __u64 addr;
> > }
> >
> > ... And then it'd be up to us to dereference the addr and copy struct from user.
>
> Do not go down that route. It isn't worth the pain.
>
> You should also assume that userspace might have a compile-time check
> on the buffer length (I've written one - not hard) and that the kernel
> might (in the future - or on a BSD kernel) be doing the user copies
> for you.
>
> Also, if you change the structure you almost certainly need to
> change the name of the ioctl cmd as well as its value.
> Otherwise a recompiled program will pass the new cmd value (and
> hopefully the right sized buffer) but it won't have initialised
> the buffer properly.
> This is likely to lead to unexpected behaviour.
Hmmm.
So, while initially I thought Sargun's observation about ioctl's fixed
struct size was right, I think I've been swayed to Christian's view
(which is supported by the long tail of struct size pain we've seen in
other APIs).
Doing a separate ioctl for each structure version seems like the "old
solution" now that we've got EA syscalls. So, I'd like to keep the size
and copy_struct_from_user().
Which leaves us with the question of how to deal with the ioctl
numbering. As we've seen, there is no actual enforcement of direction
nor size, so to that end, while we could provide the hints about both, I
guess we just don't need to. To that end, perhaps _IO() is best:
#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IO(3)
Alternatively, we could use a size of either 0, 8(u64), or -1, and then
use IORW() so we _also_ won't paint ourselves into a corner if we ever
want to write something back to userspace in the structure:
#define SECCOMP_IOCTL_EA(nr) _IOC(_IOC_READ|_IOC_WRITE,SECCOMP_IOC_MAGIC,(nr),0)
#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOCTL_EA(3)
or
#define SECCOMP_IOCTL_EA(nr) _IOC(_IOC_READ|_IOC_WRITE,SECCOMP_IOC_MAGIC,(nr),8)
#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOCTL_EA(3)
or
#define SECCOMP_IOCTL_EA(nr) _IOC(_IOC_READ|_IOC_WRITE,SECCOMP_IOC_MAGIC,(nr),_IOC_SIZEMASK)
#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOCTL_EA(3)
I think I prefer the last one.
--
Kees Cook
On Thu, Jun 11, 2020 at 04:49:37PM -0700, Kees Cook wrote:
> I think I prefer the last one.
Here's where I am with things:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=devel/seccomp/addfd/v3.3
If we can agree on the ioctl numbering solution, I can actually send the
series for email review...
--
Kees Cook
From: Kees Cook
> Sent: 12 June 2020 00:50
> > From: Sargun Dhillon
> > > Sent: 11 June 2020 12:07
> > > Subject: Re: [PATCH v3 1/4] fs, net: Standardize on file_receive helper to move fds across
> processes
> > >
> > > On Thu, Jun 11, 2020 at 12:01:14PM +0200, Christian Brauner wrote:
> > > > On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > > > > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > > > > As an aside, all of this junk should be dropped:
> > > > > > + ret = get_user(size, &uaddfd->size);
> > > > > > + if (ret)
> > > > > > + return ret;
> > > > > > +
> > > > > > + ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > > > > + if (ret)
> > > > > > + return ret;
> > > > > >
> > > > > > and the size member of the seccomp_notif_addfd struct. I brought this up
> > > > > > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > > > > > should just use that. The ioctl definition is based on this[2]:
> > > > > > #define _IOC(dir,type,nr,size) \
> > > > > > (((dir) << _IOC_DIRSHIFT) | \
> > > > > > ((type) << _IOC_TYPESHIFT) | \
> > > > > > ((nr) << _IOC_NRSHIFT) | \
> > > > > > ((size) << _IOC_SIZESHIFT))
> > > > > >
> > > > > >
> > > > > > We should just use copy_from_user for now. In the future, we can either
> > > > > > introduce new ioctl names for new structs, or extract the size dynamically from
> > > > > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > > > >
> > > > > Yeah, that seems reasonable. Here's the diff for that part:
> > > >
> > > > Why does it matter that the ioctl() has the size of the struct embedded
> > > > within? Afaik, the kernel itself doesn't do anything with that size. It
> > > > merely checks that the size is not pathological and it does so at
> > > > compile time.
> > > >
> > > > #ifdef __CHECKER__
> > > > #define _IOC_TYPECHECK(t) (sizeof(t))
> > > > #else
> > > > /* provoke compile error for invalid uses of size argument */
> > > > extern unsigned int __invalid_size_argument_for_IOC;
> > > > #define _IOC_TYPECHECK(t) \
> > > > ((sizeof(t) == sizeof(t[1]) && \
> > > > sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
> > > > sizeof(t) : __invalid_size_argument_for_IOC)
> > > > #endif
> > > >
> > > > The size itself is not verified at runtime. copy_struct_from_user()
> > > > still makes sense at least if we're going to allow expanding the struct
> > > > in the future.
> > > Right, but if we simply change our headers and extend the struct, it will break
> > > all existing programs compiled against those headers. In order to avoid that, if
> > > we intend on extending this struct by appending to it, we need to have a
> > > backwards compatibility mechanism. Just having copy_struct_from_user isn't
> > > enough. The data structure either must be fixed size, or we need a way to handle
> > > multiple ioctl numbers derived from headers with different sized struct arguments
> > >
> > > The two approaches I see are:
> > > 1. use more indirection. This has previous art in drm[1]. That's look
> > > something like this:
> > >
> > > struct seccomp_notif_addfd_ptr {
> > > __u64 size;
> > > __u64 addr;
> > > }
> > >
> > > ... And then it'd be up to us to dereference the addr and copy struct from user.
> >
> > Do not go down that route. It isn't worth the pain.
> >
> > You should also assume that userspace might have a compile-time check
> > on the buffer length (I've written one - not hard) and that the kernel
> > might (in the future - or on a BSD kernel) be doing the user copies
> > for you.
> >
> > Also, if you change the structure you almost certainly need to
> > change the name of the ioctl cmd as well as its value.
> > Otherwise a recompiled program will pass the new cmd value (and
> > hopefully the right sized buffer) but it won't have initialised
> > the buffer properly.
> > This is likely to lead to unexpected behaviour.
>
> Hmmm.
>
> So, while initially I thought Sargun's observation about ioctl's fixed
> struct size was right, I think I've been swayed to Christian's view
> (which is supported by the long tail of struct size pain we've seen in
> other APIs).
>
> Doing a separate ioctl for each structure version seems like the "old
> solution" now that we've got EA syscalls. So, I'd like to keep the size
> and copy_struct_from_user().
If the size is variable then why not get the application to fill
in the size of the structure it is sending at the time of the ioctl.
So you'd have:
#define xxx_IOCTL_17(param) _IOCW('X', 17, sizeof *(param))
The application code would then do:
ioctl(fd, xxx_IOCTL_17(arg), arg);
The kernel code can either choose to have specific 'case'
for each size, or mask off the length bits and do the
length check later.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Fri, Jun 12, 2020 at 08:36:03AM +0000, David Laight wrote:
> From: Kees Cook
> > Sent: 12 June 2020 00:50
> > > From: Sargun Dhillon
> > > > Sent: 11 June 2020 12:07
> > > > Subject: Re: [PATCH v3 1/4] fs, net: Standardize on file_receive helper to move fds across
> > processes
> > > >
> > > > On Thu, Jun 11, 2020 at 12:01:14PM +0200, Christian Brauner wrote:
> > > > > On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > > > > > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > > > > > As an aside, all of this junk should be dropped:
> > > > > > > + ret = get_user(size, &uaddfd->size);
> > > > > > > + if (ret)
> > > > > > > + return ret;
> > > > > > > +
> > > > > > > + ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > > > > > + if (ret)
> > > > > > > + return ret;
> > > > > > >
> > > > > > > and the size member of the seccomp_notif_addfd struct. I brought this up
> > > > > > > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > > > > > > should just use that. The ioctl definition is based on this[2]:
> > > > > > > #define _IOC(dir,type,nr,size) \
> > > > > > > (((dir) << _IOC_DIRSHIFT) | \
> > > > > > > ((type) << _IOC_TYPESHIFT) | \
> > > > > > > ((nr) << _IOC_NRSHIFT) | \
> > > > > > > ((size) << _IOC_SIZESHIFT))
> > > > > > >
> > > > > > >
> > > > > > > We should just use copy_from_user for now. In the future, we can either
> > > > > > > introduce new ioctl names for new structs, or extract the size dynamically from
> > > > > > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > > > > >
> > > > > > Yeah, that seems reasonable. Here's the diff for that part:
> > > > >
> > > > > Why does it matter that the ioctl() has the size of the struct embedded
> > > > > within? Afaik, the kernel itself doesn't do anything with that size. It
> > > > > merely checks that the size is not pathological and it does so at
> > > > > compile time.
> > > > >
> > > > > #ifdef __CHECKER__
> > > > > #define _IOC_TYPECHECK(t) (sizeof(t))
> > > > > #else
> > > > > /* provoke compile error for invalid uses of size argument */
> > > > > extern unsigned int __invalid_size_argument_for_IOC;
> > > > > #define _IOC_TYPECHECK(t) \
> > > > > ((sizeof(t) == sizeof(t[1]) && \
> > > > > sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
> > > > > sizeof(t) : __invalid_size_argument_for_IOC)
> > > > > #endif
> > > > >
> > > > > The size itself is not verified at runtime. copy_struct_from_user()
> > > > > still makes sense at least if we're going to allow expanding the struct
> > > > > in the future.
> > > > Right, but if we simply change our headers and extend the struct, it will break
> > > > all existing programs compiled against those headers. In order to avoid that, if
> > > > we intend on extending this struct by appending to it, we need to have a
> > > > backwards compatibility mechanism. Just having copy_struct_from_user isn't
> > > > enough. The data structure either must be fixed size, or we need a way to handle
> > > > multiple ioctl numbers derived from headers with different sized struct arguments
> > > >
> > > > The two approaches I see are:
> > > > 1. use more indirection. This has previous art in drm[1]. That's look
> > > > something like this:
> > > >
> > > > struct seccomp_notif_addfd_ptr {
> > > > __u64 size;
> > > > __u64 addr;
> > > > }
> > > >
> > > > ... And then it'd be up to us to dereference the addr and copy struct from user.
> > >
> > > Do not go down that route. It isn't worth the pain.
> > >
> > > You should also assume that userspace might have a compile-time check
> > > on the buffer length (I've written one - not hard) and that the kernel
> > > might (in the future - or on a BSD kernel) be doing the user copies
> > > for you.
> > >
> > > Also, if you change the structure you almost certainly need to
> > > change the name of the ioctl cmd as well as its value.
> > > Otherwise a recompiled program will pass the new cmd value (and
> > > hopefully the right sized buffer) but it won't have initialised
> > > the buffer properly.
> > > This is likely to lead to unexpected behaviour.
Why do you say this? Assuming people are just pulling in <linux/seccomp.h>
they will get both the ioctl number, and the struct. The one case where
I can see things going wrong is languages which implement their own struct
packing / ioctls and wouldn't get the updated # because it's hard coded.
> >
> > Hmmm.
> >
> > So, while initially I thought Sargun's observation about ioctl's fixed
> > struct size was right, I think I've been swayed to Christian's view
> > (which is supported by the long tail of struct size pain we've seen in
> > other APIs).
> >
> > Doing a separate ioctl for each structure version seems like the "old
> > solution" now that we've got EA syscalls. So, I'd like to keep the size
> > and copy_struct_from_user().
>
> If the size is variable then why not get the application to fill
> in the size of the structure it is sending at the time of the ioctl.
>
> So you'd have:
> #define xxx_IOCTL_17(param) _IOCW('X', 17, sizeof *(param))
>
> The application code would then do:
> ioctl(fd, xxx_IOCTL_17(arg), arg);
>
> The kernel code can either choose to have specific 'case'
> for each size, or mask off the length bits and do the
> length check later.
>
> David
>
>
My suggest, written out (no idea if this code actually works), is as follows:
ioctl.h:
/* This needs to be added */
#define IOCDIR_MASK (_IOC_DIRMASK << _IOC_DIRSHIFT)
seccomp.h:
struct struct seccomp_notif_addfd {
__u64 fd;
...
}
/* or IOW? */
#define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOWR(3, struct seccomp_notif_addfd)
seccomp.c:
static long seccomp_notify_addfd(struct seccomp_filter *filter,
struct seccomp_notif_addfd __user *uaddfd int size)
{
struct seccomp_notif_addfd addfd;
int ret;
if (size < 32)
return -EINVAL;
if (size > PAGE_SIZE)
return -E2BIG;
ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
if (ret)
return ret;
...
}
/* Mask out size */
#define SIZE_MASK(cmd) (~IOCSIZE_MASK & cmd)
/* Mask out direction */
#define DIR_MASK(cmd) (~IOCDIR_MASK & cmd)
static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
struct seccomp_filter *filter = file->private_data;
void __user *buf = (void __user *)arg;
/* Fixed size ioctls. Can be converted later on? */
switch (cmd) {
case SECCOMP_IOCTL_NOTIF_RECV:
return seccomp_notify_recv(filter, buf);
case SECCOMP_IOCTL_NOTIF_SEND:
return seccomp_notify_send(filter, buf);
case SECCOMP_IOCTL_NOTIF_ID_VALID:
return seccomp_notify_id_valid(filter, buf);
}
/* Probably should make some nicer macros here */
switch (SIZE_MASK(DIR_MASK(cmd))) {
case SIZE_MASK(DIR_MASK(SECCOMP_IOCTL_NOTIF_ADDFD)):
return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd));
default:
return -EINVAL;
}
}
--------
What boxes does this tick?
* Forwards (and backwards) compatibility
* Applies to existing commands
* Command can be extended without requiring new ioctl to be defined
* It well accomodates the future where we want to have a kernel
helper copy the structures from userspace
The fact that the size of the argument struct, and the ioctl are defined in the
same header gives us the ability to "cheat", and for the argument size to be
included / embedded for free in the command passed to ioctl. In turn, this
gives us two benefits. First, it means we don't have to copy from user twice,
and can just do it all in one shot since the size is passed with the syscall
arguments. Second, it means that the user does not have to do the following:
seccomp_notif_addfd addfd = {};
addfd.size = sizeof(struct seccomp_notif_addfd)
Because sizeof(struct seccomp_notif_addfd) is embedded in
SECCOMP_IOCTL_NOTIF_ADDFD based on the same headers they plucked the struct out of.
On Fri, Jun 12, 2020 at 10:46:30AM +0000, Sargun Dhillon wrote:
> My suggest, written out (no idea if this code actually works), is as follows:
>
> ioctl.h:
> /* This needs to be added */
> #define IOCDIR_MASK (_IOC_DIRMASK << _IOC_DIRSHIFT)
This exists already:
#define _IOC_DIRMASK ((1 << _IOC_DIRBITS)-1)
>
>
> seccomp.h:
>
> struct struct seccomp_notif_addfd {
> __u64 fd;
> ...
> }
>
> /* or IOW? */
> #define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOWR(3, struct seccomp_notif_addfd)
>
> seccomp.c:
> static long seccomp_notify_addfd(struct seccomp_filter *filter,
> struct seccomp_notif_addfd __user *uaddfd int size)
> {
> struct seccomp_notif_addfd addfd;
> int ret;
>
> if (size < 32)
> return -EINVAL;
> if (size > PAGE_SIZE)
> return -E2BIG;
(Tanget: what was the reason for copy_struct_from_user() not including
the min/max check? I have a memory of Al objecting to having an
"internal" limit?)
>
> ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> if (ret)
> return ret;
>
> ...
> }
>
> /* Mask out size */
> #define SIZE_MASK(cmd) (~IOCSIZE_MASK & cmd)
>
> /* Mask out direction */
> #define DIR_MASK(cmd) (~IOCDIR_MASK & cmd)
>
> static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
> unsigned long arg)
> {
> struct seccomp_filter *filter = file->private_data;
> void __user *buf = (void __user *)arg;
>
> /* Fixed size ioctls. Can be converted later on? */
> switch (cmd) {
> case SECCOMP_IOCTL_NOTIF_RECV:
> return seccomp_notify_recv(filter, buf);
> case SECCOMP_IOCTL_NOTIF_SEND:
> return seccomp_notify_send(filter, buf);
> case SECCOMP_IOCTL_NOTIF_ID_VALID:
> return seccomp_notify_id_valid(filter, buf);
> }
>
> /* Probably should make some nicer macros here */
> switch (SIZE_MASK(DIR_MASK(cmd))) {
> case SIZE_MASK(DIR_MASK(SECCOMP_IOCTL_NOTIF_ADDFD)):
Ah yeah, I like this because of what you mention below: it's forward
compat too. (I'd just use the ioctl masks directly...)
switch (cmd & ~(_IOC_SIZEMASK | _IOC_DIRMASK))
> return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd));
I really like that this ends up having the same construction as a
standard EA syscall: the size is part of the syscall arguments.
> default:
> return -EINVAL;
> }
> }
>
> --------
>
> What boxes does this tick?
> * Forwards (and backwards) compatibility
> * Applies to existing commands
> * Command can be extended without requiring new ioctl to be defined
(Technically, a new one is always redefined, but it's automatic in that
the kernel needs to do nothing.)
> * It well accomodates the future where we want to have a kernel
> helper copy the structures from userspace
Yeah, this is a good solution.
> The fact that the size of the argument struct, and the ioctl are defined in the
> same header gives us the ability to "cheat", and for the argument size to be
> included / embedded for free in the command passed to ioctl. In turn, this
> gives us two benefits. First, it means we don't have to copy from user twice,
> and can just do it all in one shot since the size is passed with the syscall
> arguments. Second, it means that the user does not have to do the following:
>
> seccomp_notif_addfd addfd = {};
> addfd.size = sizeof(struct seccomp_notif_addfd)
>
> Because sizeof(struct seccomp_notif_addfd) is embedded in
> SECCOMP_IOCTL_NOTIF_ADDFD based on the same headers they plucked the struct out of.
Cool. I will do more patch reworking! ;)
--
Kees Cook
From: Kees Cook
> Sent: 12 June 2020 16:13
...
> > /* Fixed size ioctls. Can be converted later on? */
> > switch (cmd) {
> > case SECCOMP_IOCTL_NOTIF_RECV:
> > return seccomp_notify_recv(filter, buf);
> > case SECCOMP_IOCTL_NOTIF_SEND:
> > return seccomp_notify_send(filter, buf);
> > case SECCOMP_IOCTL_NOTIF_ID_VALID:
> > return seccomp_notify_id_valid(filter, buf);
> > }
> >
> > /* Probably should make some nicer macros here */
> > switch (SIZE_MASK(DIR_MASK(cmd))) {
> > case SIZE_MASK(DIR_MASK(SECCOMP_IOCTL_NOTIF_ADDFD)):
>
> Ah yeah, I like this because of what you mention below: it's forward
> compat too. (I'd just use the ioctl masks directly...)
>
> switch (cmd & ~(_IOC_SIZEMASK | _IOC_DIRMASK))
Since you need the same mask on the case labels I think
I'd define a helper just across the switch statement:
#define M(cmd) ((cmd & ~(_IOC_SIZEMASK | _IOC_DIRMASK))
switch (M(cmd)) {
case M(SECCOMP_IOCTL_NOTIF_RECV):
...
}
#undef M
It is probably wrong to mask off DIRMASK.
But you might need to add extra case labels for
the broken one(s).
Prior to worries about indirect jumps you could
get a dense set of case label and faster code.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Fri, Jun 12, 2020 at 08:13:25AM -0700, Kees Cook wrote:
> On Fri, Jun 12, 2020 at 10:46:30AM +0000, Sargun Dhillon wrote:
> > My suggest, written out (no idea if this code actually works), is as follows:
> >
> > ioctl.h:
> > /* This needs to be added */
> > #define IOCDIR_MASK (_IOC_DIRMASK << _IOC_DIRSHIFT)
>
> This exists already:
>
> #define _IOC_DIRMASK ((1 << _IOC_DIRBITS)-1)
>
> >
> >
> > seccomp.h:
> >
> > struct struct seccomp_notif_addfd {
> > __u64 fd;
> > ...
> > }
> >
> > /* or IOW? */
> > #define SECCOMP_IOCTL_NOTIF_ADDFD SECCOMP_IOWR(3, struct seccomp_notif_addfd)
> >
> > seccomp.c:
> > static long seccomp_notify_addfd(struct seccomp_filter *filter,
> > struct seccomp_notif_addfd __user *uaddfd int size)
> > {
> > struct seccomp_notif_addfd addfd;
> > int ret;
> >
> > if (size < 32)
> > return -EINVAL;
> > if (size > PAGE_SIZE)
> > return -E2BIG;
>
> (Tanget: what was the reason for copy_struct_from_user() not including
> the min/max check? I have a memory of Al objecting to having an
> "internal" limit?)
Al didn't want the PAGE_SIZE limit in there because there's nothing
inherently wrong with copying insane amounts of memory.
(Another tangent. I've asked this on Twitter not too long ago: do we
have stats how long copy_from_user()/copy_struct_from_user() takes with
growing struct/memory size? I'd be really interested in this. I have a
feeling that clone3()'s and - having had a chat with David Howells -
openat2()'s structs will continue to grow for a while... and I'd really
like to have some numbers on when copy_struct_from_user() becomes
costly or how costly it becomes.)
>
> >
> > ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > if (ret)
> > return ret;
> >
> > ...
> > }
> >
> > /* Mask out size */
> > #define SIZE_MASK(cmd) (~IOCSIZE_MASK & cmd)
> >
> > /* Mask out direction */
> > #define DIR_MASK(cmd) (~IOCDIR_MASK & cmd)
> >
> > static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
> > unsigned long arg)
> > {
> > struct seccomp_filter *filter = file->private_data;
> > void __user *buf = (void __user *)arg;
> >
> > /* Fixed size ioctls. Can be converted later on? */
> > switch (cmd) {
> > case SECCOMP_IOCTL_NOTIF_RECV:
> > return seccomp_notify_recv(filter, buf);
> > case SECCOMP_IOCTL_NOTIF_SEND:
> > return seccomp_notify_send(filter, buf);
> > case SECCOMP_IOCTL_NOTIF_ID_VALID:
> > return seccomp_notify_id_valid(filter, buf);
> > }
> >
> > /* Probably should make some nicer macros here */
> > switch (SIZE_MASK(DIR_MASK(cmd))) {
> > case SIZE_MASK(DIR_MASK(SECCOMP_IOCTL_NOTIF_ADDFD)):
>
> Ah yeah, I like this because of what you mention below: it's forward
> compat too. (I'd just use the ioctl masks directly...)
>
> switch (cmd & ~(_IOC_SIZEMASK | _IOC_DIRMASK))
>
> > return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd));
>
> I really like that this ends up having the same construction as a
> standard EA syscall: the size is part of the syscall arguments.
This is basically what I had proposed in my previous mail, right?
Christian
On Fri, Jun 12, 2020 at 08:28:16PM +0200, Christian Brauner wrote:
> Al didn't want the PAGE_SIZE limit in there because there's nothing
> inherently wrong with copying insane amounts of memory.
Right, ok.
> (Another tangent. I've asked this on Twitter not too long ago: do we
> have stats how long copy_from_user()/copy_struct_from_user() takes with
> growing struct/memory size? I'd be really interested in this. I have a
> feeling that clone3()'s and - having had a chat with David Howells -
> openat2()'s structs will continue to grow for a while... and I'd really
> like to have some numbers on when copy_struct_from_user() becomes
> costly or how costly it becomes.)
How long it takes? It should be basically the same, the costs should be
mostly in switching memory protections, etc. I wouldn't imagine how many
bytes being copied would matter much here, given the sub-page sizes.
> > Ah yeah, I like this because of what you mention below: it's forward
> > compat too. (I'd just use the ioctl masks directly...)
> >
> > switch (cmd & ~(_IOC_SIZEMASK | _IOC_DIRMASK))
> >
> > > return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd));
> >
> > I really like that this ends up having the same construction as a
> > standard EA syscall: the size is part of the syscall arguments.
>
> This is basically what I had proposed in my previous mail, right?
I guess I missed it! Well, then I think we're all in agreement? :)
--
Kees Cook
On Fri, Jun 12, 2020 at 11:38:33AM -0700, Kees Cook wrote:
> On Fri, Jun 12, 2020 at 08:28:16PM +0200, Christian Brauner wrote:
> > Al didn't want the PAGE_SIZE limit in there because there's nothing
> > inherently wrong with copying insane amounts of memory.
>
> Right, ok.
>
> > (Another tangent. I've asked this on Twitter not too long ago: do we
> > have stats how long copy_from_user()/copy_struct_from_user() takes with
> > growing struct/memory size? I'd be really interested in this. I have a
> > feeling that clone3()'s and - having had a chat with David Howells -
> > openat2()'s structs will continue to grow for a while... and I'd really
> > like to have some numbers on when copy_struct_from_user() becomes
> > costly or how costly it becomes.)
>
> How long it takes? It should be basically the same, the costs should be
> mostly in switching memory protections, etc. I wouldn't imagine how many
> bytes being copied would matter much here, given the sub-page sizes.
This makes me _very_ happy.
Christian
From: Christian Brauner
> Sent: 12 June 2020 19:28
...
> > > if (size < 32)
> > > return -EINVAL;
> > > if (size > PAGE_SIZE)
> > > return -E2BIG;
> >
> > (Tanget: what was the reason for copy_struct_from_user() not including
> > the min/max check? I have a memory of Al objecting to having an
> > "internal" limit?)
>
> Al didn't want the PAGE_SIZE limit in there because there's nothing
> inherently wrong with copying insane amounts of memory.
The problem is really allowing a user process to allocate
unbounded blocks of memory, not the copy itself.
The limit for IOW() etc is 16k - not a problem.
If a 32bit size is set to just under 4GB so you really want
to allocate 4GB of memory then find the request is garbage.
Seems like a nice DoS attack.
A 64bit size can be worse.
Potentially the limit should be in memdup_user() itself.
And possibly an extra parameter giving a per-call lower? limit.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)