Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp3345492pxv; Sun, 4 Jul 2021 16:09:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyRZyaIYSRcMxv+BLJt8kS6aYMuv7xlr/CNR6gPUNWJY54TCk2MeOAMl4oxVWqO83LI11ms X-Received: by 2002:a17:906:919:: with SMTP id i25mr10152319ejd.171.1625440154472; Sun, 04 Jul 2021 16:09:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625440154; cv=none; d=google.com; s=arc-20160816; b=rvu3aaz+EjKjW+64zbfzXw1PZedOFhy7BEBZbpwei2Qj9/QF1H/8+83MdEGSHbwGai m/1Ogh+bPXsCYc9yUHtQhTYROJmo05FvbyRA3gRxTdwTFUzOgLXA1TfSECbIUzru7Tlz Zybd4zsbVBpKJqf4s5uCG6vuDOJBDv+6x2HaZ6QnEyBthSrjRxHK11OjviKuN2vnBcrL ecNwtYVI1qDKxIbbIJLLo3nbOZHo8wGwF/CcK30D1W9AOY7TzmV4KqpkRuuZFxB2tQYF gabwTRtxACt6C9OXWvdHz4fNxi7FxqNCTV9OUDSH+PVYDGZ2gcjpwmobzkoxpIWRkj/T hRjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=vXtfJalpLPi2LFatowZrXNvEasmiR+VzTNVXq+MmUv4=; b=FlTihMN0Z28h30OWZ2FWu3lHSriolrDlZRR9P1ScOWeBbjf2mD0TrsE3oXg9gliAlX d+lMZdRWqx3ENwsrvCZpba0OP7eFFFF35MGi16MTQ59WxLLlb2ts58QC4meTG2FHwuJg smfTZhd2YXlYIax+/C/K/piM2EAcKXEGAwP4A8M2f57tbt9VIjQ00C6NZq89dKhVrs8C daCrOS4bwVpoExKLAWlGliwCExif738EdtcbjuHysOltsER40AKTr/2NzxLtlA+TmQ0y kzXDEAZeE+XfxP/0wZfOVIq9yS2/7wN3yiSmPVYOybyazajt3/jBrfUDJbk6flCer4R4 t+AA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gHVuM86a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id co15si8749087edb.285.2021.07.04.16.08.50; Sun, 04 Jul 2021 16:09:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gHVuM86a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229770AbhGDXKA (ORCPT + 99 others); Sun, 4 Jul 2021 19:10:00 -0400 Received: from mail.kernel.org ([198.145.29.99]:46062 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231519AbhGDXIy (ORCPT ); Sun, 4 Jul 2021 19:08:54 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id EFC9E6193E; Sun, 4 Jul 2021 23:06:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1625439974; bh=Vpgq6X5cPfItKhrQdkf+YbXlVg8r9CqVZm+PqciBKyI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gHVuM86aw0nY2XBLfliHAE6+vHWQuRS1VhQtfYNxCk8R4wrVMNxTmo/F+xflIL1Lo 52TlSC7LgqbFOB0AmVmH7oRITFP7adLZDs8IzJHbD3lxklrTabiPBHviAuNjh7efZe l6EP50TUgkNu2nX/zMhtNtZ269RHAdHqTywPl+nja02QB/Q+Bp2iLvmTEXgYH+VPZa uney6/NnwLmDddk4KwrNaDFhfeTH4vYLLuU42k3JM9kX0+67STHGxvNkKdq4o3ypdh epZGwyjXQbvk6deddhbBWXY/uFQ3yPS6ZeOI3OvFV8bN0nqD4hWYxpTBFU/sqWNd1/ zE41IdEzOTcIQ== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Rodrigo Campos , Sargun Dhillon , Tycho Andersen , Christian Brauner , Kees Cook , Sasha Levin , linux-doc@vger.kernel.org Subject: [PATCH AUTOSEL 5.13 85/85] seccomp: Support atomic "addfd + send reply" Date: Sun, 4 Jul 2021 19:04:20 -0400 Message-Id: <20210704230420.1488358-85-sashal@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210704230420.1488358-1-sashal@kernel.org> References: <20210704230420.1488358-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Rodrigo Campos [ Upstream commit 0ae71c7720e3ae3aabd2e8a072d27f7bd173d25c ] Alban Crequy reported a race condition userspace faces when we want to add some fds and make the syscall return them[1] using seccomp notify. The problem is that currently two different ioctl() calls are needed by the process handling the syscalls (agent) for another userspace process (target): SECCOMP_IOCTL_NOTIF_ADDFD to allocate the fd and SECCOMP_IOCTL_NOTIF_SEND to return that value. Therefore, it is possible for the agent to do the first ioctl to add a file descriptor but the target is interrupted (EINTR) before the agent does the second ioctl() call. This patch adds a flag to the ADDFD ioctl() so it adds the fd and returns that value atomically to the target program, as suggested by Kees Cook[2]. This is done by simply allowing seccomp_do_user_notification() to add the fd and return it in this case. Therefore, in this case the target wakes up from the wait in seccomp_do_user_notification() either to interrupt the syscall or to add the fd and return it. This "allocate an fd and return" functionality is useful for syscalls that return a file descriptor only, like connect(2). Other syscalls that return a file descriptor but not as return value (or return more than one fd), like socketpair(), pipe(), recvmsg with SCM_RIGHTs, will not work with this flag. This effectively combines SECCOMP_IOCTL_NOTIF_ADDFD and SECCOMP_IOCTL_NOTIF_SEND into an atomic opteration. The notification's return value, nor error can be set by the user. Upon successful invocation of the SECCOMP_IOCTL_NOTIF_ADDFD ioctl with the SECCOMP_ADDFD_FLAG_SEND flag, the notifying process's errno will be 0, and the return value will be the file descriptor number that was installed. [1]: https://lore.kernel.org/lkml/CADZs7q4sw71iNHmV8EOOXhUKJMORPzF7thraxZYddTZsxta-KQ@mail.gmail.com/ [2]: https://lore.kernel.org/lkml/202012011322.26DCBC64F2@keescook/ Signed-off-by: Rodrigo Campos Signed-off-by: Sargun Dhillon Acked-by: Tycho Andersen Acked-by: Christian Brauner Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20210517193908.3113-4-sargun@sargun.me Signed-off-by: Sasha Levin --- .../userspace-api/seccomp_filter.rst | 12 +++++ include/uapi/linux/seccomp.h | 1 + kernel/seccomp.c | 51 ++++++++++++++++--- 3 files changed, 58 insertions(+), 6 deletions(-) diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst index 6efb41cc8072..d61219889e49 100644 --- a/Documentation/userspace-api/seccomp_filter.rst +++ b/Documentation/userspace-api/seccomp_filter.rst @@ -259,6 +259,18 @@ and ``ioctl(SECCOMP_IOCTL_NOTIF_SEND)`` a response, indicating what should be returned to userspace. The ``id`` member of ``struct seccomp_notif_resp`` should be the same ``id`` as in ``struct seccomp_notif``. +Userspace can also add file descriptors to the notifying process via +``ioctl(SECCOMP_IOCTL_NOTIF_ADDFD)``. The ``id`` member of +``struct seccomp_notif_addfd`` should be the same ``id`` as in +``struct seccomp_notif``. The ``newfd_flags`` flag may be used to set flags +like O_EXEC on the file descriptor in the notifying process. If the supervisor +wants to inject the file descriptor with a specific number, the +``SECCOMP_ADDFD_FLAG_SETFD`` flag can be used, and set the ``newfd`` member to +the specific number to use. If that file descriptor is already open in the +notifying process it will be replaced. The supervisor can also add an FD, and +respond atomically by using the ``SECCOMP_ADDFD_FLAG_SEND`` flag and the return +value will be the injected file descriptor number. + It is worth noting that ``struct seccomp_data`` contains the values of register arguments to the syscall, but does not contain pointers to memory. The task's memory is accessible to suitably privileged traces via ``ptrace()`` or diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 6ba18b82a02e..78074254ab98 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -115,6 +115,7 @@ struct seccomp_notif_resp { /* valid flags for seccomp_notif_addfd */ #define SECCOMP_ADDFD_FLAG_SETFD (1UL << 0) /* Specify remote fd */ +#define SECCOMP_ADDFD_FLAG_SEND (1UL << 1) /* Addfd and return it, atomically */ /** * struct seccomp_notif_addfd diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 9f58049ac16d..057e17f3215d 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -107,6 +107,7 @@ struct seccomp_knotif { * installing process should allocate the fd as normal. * @flags: The flags for the new file descriptor. At the moment, only O_CLOEXEC * is allowed. + * @ioctl_flags: The flags used for the seccomp_addfd ioctl. * @ret: The return value of the installing process. It is set to the fd num * upon success (>= 0). * @completion: Indicates that the installing process has completed fd @@ -118,6 +119,7 @@ struct seccomp_kaddfd { struct file *file; int fd; unsigned int flags; + __u32 ioctl_flags; union { bool setfd; @@ -1065,18 +1067,37 @@ static u64 seccomp_next_notify_id(struct seccomp_filter *filter) return filter->notif->next_id++; } -static void seccomp_handle_addfd(struct seccomp_kaddfd *addfd) +static void seccomp_handle_addfd(struct seccomp_kaddfd *addfd, struct seccomp_knotif *n) { + int fd; + /* * Remove the notification, and reset the list pointers, indicating * that it has been handled. */ list_del_init(&addfd->list); if (!addfd->setfd) - addfd->ret = receive_fd(addfd->file, addfd->flags); + fd = receive_fd(addfd->file, addfd->flags); else - addfd->ret = receive_fd_replace(addfd->fd, addfd->file, - addfd->flags); + fd = receive_fd_replace(addfd->fd, addfd->file, addfd->flags); + addfd->ret = fd; + + if (addfd->ioctl_flags & SECCOMP_ADDFD_FLAG_SEND) { + /* If we fail reset and return an error to the notifier */ + if (fd < 0) { + n->state = SECCOMP_NOTIFY_SENT; + } else { + /* Return the FD we just added */ + n->flags = 0; + n->error = 0; + n->val = fd; + } + } + + /* + * Mark the notification as completed. From this point, addfd mem + * might be invalidated and we can't safely read it anymore. + */ complete(&addfd->completion); } @@ -1120,7 +1141,7 @@ static int seccomp_do_user_notification(int this_syscall, struct seccomp_kaddfd, list); /* Check if we were woken up by a addfd message */ if (addfd) - seccomp_handle_addfd(addfd); + seccomp_handle_addfd(addfd, &n); } while (n.state != SECCOMP_NOTIFY_REPLIED); @@ -1581,7 +1602,7 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter, if (addfd.newfd_flags & ~O_CLOEXEC) return -EINVAL; - if (addfd.flags & ~SECCOMP_ADDFD_FLAG_SETFD) + if (addfd.flags & ~(SECCOMP_ADDFD_FLAG_SETFD | SECCOMP_ADDFD_FLAG_SEND)) return -EINVAL; if (addfd.newfd && !(addfd.flags & SECCOMP_ADDFD_FLAG_SETFD)) @@ -1591,6 +1612,7 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter, if (!kaddfd.file) return -EBADF; + kaddfd.ioctl_flags = addfd.flags; kaddfd.flags = addfd.newfd_flags; kaddfd.setfd = addfd.flags & SECCOMP_ADDFD_FLAG_SETFD; kaddfd.fd = addfd.newfd; @@ -1616,6 +1638,23 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter, goto out_unlock; } + if (addfd.flags & SECCOMP_ADDFD_FLAG_SEND) { + /* + * Disallow queuing an atomic addfd + send reply while there are + * some addfd requests still to process. + * + * There is no clear reason to support it and allows us to keep + * the loop on the other side straight-forward. + */ + if (!list_empty(&knotif->addfd)) { + ret = -EBUSY; + goto out_unlock; + } + + /* Allow exactly only one reply */ + knotif->state = SECCOMP_NOTIFY_REPLIED; + } + list_add(&kaddfd.list, &knotif->addfd); complete(&knotif->ready); mutex_unlock(&filter->notify_lock); -- 2.30.2