Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp3847688pxy; Mon, 26 Apr 2021 11:09:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw6LSdZTzLejwHuJUyXtvAtkB9hWxkEQEt+J7iG48xwWN3lwjWgaB6m3nmkLyu3d/J//1up X-Received: by 2002:a17:906:60d6:: with SMTP id f22mr19679117ejk.177.1619460542508; Mon, 26 Apr 2021 11:09:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619460542; cv=none; d=google.com; s=arc-20160816; b=OgshKT340dQmAxiUwd66E9HtJD/SWsS/zCr2sVar4ftySbKJ+IhaZSE71mHPup2u2g R938CetOEbfmqS/7520t+QtXMwRzq3lZOPaC0r1AaCX3i6KSgptrggNleOm3tXb67Wki lLyw0nmmZkQKZWnMaaYhMRPcydr7dPy863u4ASokdN/9k/YeblTGscf5+qN5GMK+B9wC XO1dSpBfv86zFdXTsFAGL6lYW0oKdUOe19uL850R4NdpnHBvcp+Q4aV872odRQK+wr/a /h/ueZnXqJGKqBv/WexwRyh1aX5ng1xNUmmF3As2mfgAWmH/6M9BikDjUDIr48qOtNQP /N6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZdAmFzeYixQBordU9cisViQcou7Sfp9o0F4pOFrucHM=; b=o3gjt3xnR+UoLOH2NmbKXwEjEbODSng5d4hhr+SNqvAHaRQfPc2ItiWl4TC6TAUORc ifbu5Q9e3TS5lj8cB6CgX2cpqibLZH7ENI0z22pF9t13+c/CvZQs7kU1ymEQBlB1Xjcx PQDzP/7pkJPP3NPauUsUQC2qWYFKcZGTkiOT/nvh/q4PpTLufiK+3HQqWpE64M9sAg2X I15h6q9UhCOKilgy9hW+ISGUAcgr7cWnYWjzv8GNkatFpyKkCmjvKIx0Ed8lS74T6zOu qO34fr5f2kzyL05SF8To32zQGNELkH01Q6y7n+N4E8emkWl7Iv2+NGaOA1UGnfe60tsR Myzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sargun.me header.s=google header.b="LIYn/gUQ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s25si14074777eju.111.2021.04.26.11.08.39; Mon, 26 Apr 2021 11:09:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@sargun.me header.s=google header.b="LIYn/gUQ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234366AbhDZSHP (ORCPT + 99 others); Mon, 26 Apr 2021 14:07:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234288AbhDZSHG (ORCPT ); Mon, 26 Apr 2021 14:07:06 -0400 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06178C061760 for ; Mon, 26 Apr 2021 11:06:23 -0700 (PDT) Received: by mail-pl1-x630.google.com with SMTP id 20so25424280pll.7 for ; Mon, 26 Apr 2021 11:06:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZdAmFzeYixQBordU9cisViQcou7Sfp9o0F4pOFrucHM=; b=LIYn/gUQZwh6fpKc40HiE/rn5RRtwuX5B4sJx9khOnxpA1qdsgTiCS2ZeIe6pWDLrt 01xYJYV4KaZBWG10HOd5kmUt2M0VyP89RFhLzOn8ZeGgE/GFo4RIknRk/L4dGiTZZJLW IAPIqrNqEudf87szMqg4ZSphIAE3DAp5n87jo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZdAmFzeYixQBordU9cisViQcou7Sfp9o0F4pOFrucHM=; b=bmII9nOvCbtdMD5OZpruwWovbi6FKGhPRezzS0JCatsEEYfYITcpXMkGkXGbDO3z7p 4IuHVQl+KcYl9wkdoLOXbmeWXJiv14RbARq7GU0FMdi0nRzRI72ehcvPacgGvGTtpaCv Y+uInNC/Gf7s2OdnjaVK7Z0E39VP13+BNRZu6UDneYjePFdUUlXnrRVTRj5WwHJqgld0 6OnliYyGODYbWzN/hyuPkLIRLjOsOOILxEtZQWhiWkJfw38LWRvDmcfqnJnEXFUKYDfg B3f7R58l+RC186/oPZxaOJbTMCYZ4BYG1OQDjCKuLJI/40ACx9b8u4UYIDOeFzxlZSDL VuUg== X-Gm-Message-State: AOAM532KrTzMb6OAX2GqT2nLUXzr2gzVx+jpyMrfMAKm3fSmsCP9OiYT 0yqOJYGZPefttG8o617MdSjb7g== X-Received: by 2002:a17:90a:8548:: with SMTP id a8mr22645181pjw.48.1619460383088; Mon, 26 Apr 2021 11:06:23 -0700 (PDT) Received: from ubuntu.netflix.com (136-25-20-203.cab.webpass.net. [136.25.20.203]) by smtp.gmail.com with ESMTPSA id j7sm326835pfd.129.2021.04.26.11.06.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Apr 2021 11:06:22 -0700 (PDT) From: Sargun Dhillon To: Kees Cook , LKML , Linux Containers , Rodrigo Campos , Christian Brauner Cc: =?UTF-8?q?Mauricio=20V=C3=A1squez=20Bernal?= , Tycho Andersen , Giuseppe Scrivano , Andy Lutomirski , Will Drewry , Alban Crequy , Sargun Dhillon Subject: [PATCH RESEND 4/5] seccomp: Support atomic "addfd + send reply" Date: Mon, 26 Apr 2021 11:06:09 -0700 Message-Id: <20210426180610.2363-5-sargun@sargun.me> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210426180610.2363-1-sargun@sargun.me> References: <20210426180610.2363-1-sargun@sargun.me> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Rodrigo Campos Alban Crequy reported a race condition userspace faces when we want to add some fds and make the syscall return them[1] using seccomp notify. The problem is that currently two different ioctl() calls are needed by the process handling the syscalls (agent) for another userspace process (target): SECCOMP_IOCTL_NOTIF_ADDFD to allocate the fd and SECCOMP_IOCTL_NOTIF_SEND to return that value. Therefore, it is possible for the agent to do the first ioctl to add a file descriptor but the target is interrupted (EINTR) before the agent does the second ioctl() call. Other patches in this series add a way to block signals when a syscall is put to wait by seccomp. However, that might be a big hammer for some cases, as the golang runtime uses SIGURG to interrupt threads for GC collection. Sometimes we just don't want to interfere with the GC, for example, and just either add the fd and return it or fail the syscall. With no leaking fds added inadvertly to the target process. This patch adds a flag to the ADDFD ioctl() so it adds the fd and returns that value atomically to the target program, as suggested by Kees Cook[2]. This is done by simply allowing seccomp_do_user_notification() to add the fd and return it in this case. Therefore, in this case the target wakes up from the wait in seccomp_do_user_notification() either to interrupt the syscall or to add the fd and return it. This "allocate an fd and return" functionality is useful for syscalls that return a file descriptor only, like connect(2). Other syscalls that return a file descriptor but not as return value (or return more than one fd), like socketpair(), pipe(), recvmsg with SCM_RIGHTs, will not work with this flag. The way to go to emulate those in cases where a signal might interrupt is to use the functionality to block signals. The struct seccomp_notif_resp, used when doing SECCOMP_IOCTL_NOTIF_SEND ioctl() to send a response to the target, has three more fields that we don't allow to set when doing the addfd ioctl() to also return. The reasons to disallow each field are: * val: This will be set to the new allocated fd. No point taking it from userspace in this case. * error: If this is non-zero, the value is ignored. Therefore, it is pointless in this case as we want to return the value. * flags: The only flag is to let userspace continue to execute the syscall. This seems pointless, as we want the syscall to return the allocated fd. This is why those fields are not possible to set when using this new flag. [1]: https://lore.kernel.org/lkml/CADZs7q4sw71iNHmV8EOOXhUKJMORPzF7thraxZYddTZsxta-KQ@mail.gmail.com/ [2]: https://lore.kernel.org/lkml/202012011322.26DCBC64F2@keescook/ Signed-off-by: Rodrigo Campos Signed-off-by: Sargun Dhillon --- include/uapi/linux/seccomp.h | 1 + kernel/seccomp.c | 49 +++++++++++++++++++++++++++++++++--- 2 files changed, 46 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index bc7fc8b04749..95dd9bab73c6 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -118,6 +118,7 @@ struct seccomp_notif_resp { /* valid flags for seccomp_notif_addfd */ #define SECCOMP_ADDFD_FLAG_SETFD (1UL << 0) /* Specify remote fd */ +#define SECCOMP_ADDFD_FLAG_SEND (1UL << 1) /* Addfd and return it, atomically */ /** * struct seccomp_notif_addfd diff --git a/kernel/seccomp.c b/kernel/seccomp.c index b852e8617004..b9a56f33ce1a 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -109,6 +109,7 @@ struct seccomp_knotif { * installing process should allocate the fd as normal. * @flags: The flags for the new file descriptor. At the moment, only O_CLOEXEC * is allowed. + * @ioctl_flags: The flags used for the seccomp_addfd ioctl. * @ret: The return value of the installing process. It is set to the fd num * upon success (>= 0). * @completion: Indicates that the installing process has completed fd @@ -120,6 +121,7 @@ struct seccomp_kaddfd { struct file *file; int fd; unsigned int flags; + __u32 ioctl_flags; /* To only be set on reply */ int ret; @@ -1064,14 +1066,35 @@ static u64 seccomp_next_notify_id(struct seccomp_filter *filter) return filter->notif->next_id++; } -static void seccomp_handle_addfd(struct seccomp_kaddfd *addfd) +static void seccomp_handle_addfd(struct seccomp_kaddfd *addfd, struct seccomp_knotif *n) { + int fd; + /* * Remove the notification, and reset the list pointers, indicating * that it has been handled. */ list_del_init(&addfd->list); - addfd->ret = receive_fd_replace(addfd->fd, addfd->file, addfd->flags); + fd = receive_fd_replace(addfd->fd, addfd->file, addfd->flags); + + addfd->ret = fd; + + if (addfd->ioctl_flags & SECCOMP_ADDFD_FLAG_SEND) { + /* If we fail reset and return an error to the notifier */ + if (fd < 0) { + n->state = SECCOMP_NOTIFY_SENT; + } else { + /* Return the FD we just added */ + n->flags = 0; + n->error = 0; + n->val = fd; + } + } + + /* + * Mark the notification as completed. From this point, addfd mem + * might be invalidated and we can't safely read it anymore. + */ complete(&addfd->completion); } @@ -1141,7 +1164,7 @@ static int seccomp_do_user_notification(int this_syscall, struct seccomp_kaddfd, list); /* Check if we were woken up by a addfd message */ if (addfd) - seccomp_handle_addfd(addfd); + seccomp_handle_addfd(addfd, &n); } while (n.state != SECCOMP_NOTIFY_REPLIED); @@ -1614,7 +1637,7 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter, if (addfd.newfd_flags & ~O_CLOEXEC) return -EINVAL; - if (addfd.flags & ~SECCOMP_ADDFD_FLAG_SETFD) + if (addfd.flags & ~(SECCOMP_ADDFD_FLAG_SETFD | SECCOMP_ADDFD_FLAG_SEND)) return -EINVAL; if (addfd.newfd && !(addfd.flags & SECCOMP_ADDFD_FLAG_SETFD)) @@ -1624,6 +1647,7 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter, if (!kaddfd.file) return -EBADF; + kaddfd.ioctl_flags = addfd.flags; kaddfd.flags = addfd.newfd_flags; kaddfd.fd = (addfd.flags & SECCOMP_ADDFD_FLAG_SETFD) ? addfd.newfd : -1; @@ -1649,6 +1673,23 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter, goto out_unlock; } + if (addfd.flags & SECCOMP_ADDFD_FLAG_SEND) { + /* + * Disallow queuing an atomic addfd + send reply while there are + * some addfd requests still to process. + * + * There is no clear reason to support it and allows us to keep + * the loop on the other side straight-forward. + */ + if (!list_empty(&knotif->addfd)) { + ret = -EBUSY; + goto out_unlock; + } + + /* Allow exactly only one reply */ + knotif->state = SECCOMP_NOTIFY_REPLIED; + } + list_add(&kaddfd.list, &knotif->addfd); complete(&knotif->ready); mutex_unlock(&filter->notify_lock); -- 2.25.1