Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp116478ybg; Tue, 2 Jun 2020 18:16:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxP1a3aTNKfvMv7gXtVUrolfcU/fHvEQyRFTzVrTI+kX0kO0cXb2bl1UWd1EbjF9kbD0ZYx X-Received: by 2002:a17:906:470c:: with SMTP id y12mr26218960ejq.336.1591146979369; Tue, 02 Jun 2020 18:16:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591146979; cv=none; d=google.com; s=arc-20160816; b=hSrbjMzCRFcFBLZSkMA+Yf3D/+6+Q4J67GabB9ldcl2UVgUauaeNN6bw7/j5+5mdNJ ckBu/Uie78F3k4881SygvbA+Ta2FBFROwSukBDtflzgME3nVVid8GS9i5zN7g4vry918 3TURQmq9aqOcVzJ5vzaES05SDRQ4c0bYAmvSeqA3ELDHbHlzJMyo0F+MQm6oWx+Ttjta 6iuBdhxSbiuhug1y+O5mCkSbC49E+FYJ7+HbTHz0uI/W1zNkdRVNhsNSSexChICbsiJl veRKmrrwCCB91dgQXSnSgqBRNJzRRUz77EkU/uQy+rlpzrGcF3Ygk8bvoGJ04+aZiRgg XCxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=yEU/E/zVcsMNA7MOTATWLrC20i2Z/QyfV6gOSYm084g=; b=xRms8GLAAD+atvPow2JWz1goZpJ9LDpY8+C6MFyWIaMAg82wFXqquFDW1O90pVhHMC gVbOy6lA+mhd1Sdw2oPU4DCnYpKILXQu5BnTOIb2BBrq3sbylG5S38nF3SYZSdwprzT6 mE1OjrL59iCgUSczX87FfGBy/i6dYKFzrgm/7vvAJQcqdMhBV2gTxHInMtKOKZzFjZ8Y UF8dPPQV0H6E6X05LLOvEcMYEdHEpRQU9Y3CS22Sob3Tp+EcJzairUfyvhJg3V/Lsxpy UyOW+GFB6RWNuqX387OgnKCZigesDxo0IVCFtXOEBKtSUGtbxD4ysO/OWvRwExc0lJJG qiEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sargun.me header.s=google header.b=AQ0PVm5s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c14si304677ejx.415.2020.06.02.18.15.55; Tue, 02 Jun 2020 18:16:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@sargun.me header.s=google header.b=AQ0PVm5s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728174AbgFCBNh (ORCPT + 99 others); Tue, 2 Jun 2020 21:13:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726112AbgFCBNg (ORCPT ); Tue, 2 Jun 2020 21:13:36 -0400 Received: from mail-pl1-x641.google.com (mail-pl1-x641.google.com [IPv6:2607:f8b0:4864:20::641]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A301C08C5C0 for ; Tue, 2 Jun 2020 18:13:36 -0700 (PDT) Received: by mail-pl1-x641.google.com with SMTP id n2so186902pld.13 for ; Tue, 02 Jun 2020 18:13:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=yEU/E/zVcsMNA7MOTATWLrC20i2Z/QyfV6gOSYm084g=; b=AQ0PVm5sNefKPxKryme0FM/PUjnc1S6udB9emTmbswOXmgDeWj5o1gEXAunUxOUHNi Tlsu/ldzZeEZ6eeo3W5anDbSXlhhBGLhfCf5yNSYbN42c9tQYst+sGZqdHbQUG9KWCYg 221p8uegLM9t8IBh4T/KjsmbfpzstwiGTYu2M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=yEU/E/zVcsMNA7MOTATWLrC20i2Z/QyfV6gOSYm084g=; b=mVhIWXd53oaKck2DkW2O2p8cTJ8L3pksTviuk9U4fjnwPgla2Aig0J1fECUi0JhgQJ mHY+NV8IHchKt2i/hemN0M6P3ZXMng6kDf1gJuNXnnAitoCjHp8KivzVpajL7EmjNe3E hk5pF6GsXK1iCLNoPKbMKwMfaJOMeUIfUPbHLQ/+apqZIvbXK9qd/Xg4MrTpm7E0PmxA Gvi/6y3lp7OHDF2bzc+vKHHBHYFMrL9S+3OaAXQ3zznhf3patCsQhJwpzDWleAGUXBFY aYPXgl+jKH6/4XxKLAT2bm4Tym2kE1vDfsC6elP003ttDNaB3SBQsxKnaP3SeWMMlWWw NRhQ== X-Gm-Message-State: AOAM533ubgVBdLp27O+TKdxQ+oxGEgJLwloL/W9QrvEcXF4zx6BvMi7u D89d8wrT0n3HFpigCuaOBjp1kJV6g1FCOQlb X-Received: by 2002:a17:902:7b89:: with SMTP id w9mr25191514pll.252.1591146815609; Tue, 02 Jun 2020 18:13:35 -0700 (PDT) Received: from ubuntu.netflix.com (203.20.25.136.in-addr.arpa. [136.25.20.203]) by smtp.gmail.com with ESMTPSA id a12sm263222pjw.35.2020.06.02.18.13.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Jun 2020 18:13:34 -0700 (PDT) From: Sargun Dhillon To: Kees Cook , linux-kernel@vger.kernel.org Cc: Sargun Dhillon , Tycho Andersen , Matt Denton , Jann Horn , Chris Palmer , Aleksa Sarai , Robert Sesek , Christian Brauner , containers@lists.linux-foundation.org, Giuseppe Scrivano , Greg Kroah-Hartman Subject: [PATCH v3 0/4] Add seccomp notifier ioctl that enables adding fds Date: Tue, 2 Jun 2020 18:10:40 -0700 Message-Id: <20200603011044.7972-1-sargun@sargun.me> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This adds the capability for seccomp notifier listeners to add file descriptors in response to a seccomp notification. This is useful for syscalls in which the previous capabilities were not sufficient. The current mechanism works well for syscalls that either have side effects that are system / namespace wide (mount), or that operate on a specific set of registers (reboot, mknod), and don't require dereferencing pointers. The problem with derefencing pointers in a supervisor is that it leaves us vulnerable to TOC-TOU [1] style attacks. For syscalls that had a direct effect on file descriptors pidfd_getfd was added, allowing for those file descriptors to be directly operated upon by the supervisor [2]. Unfortunately, this leaves system calls which return file descriptors out of the picture. These are fairly common syscalls, such as openat, socket, and perf_event_open that return file descriptors, and have arguments that are pointers. These require that the supervisor is able to verify the arguments, make the call on behalf of the process on hand, and pass back the resulting file descriptor. This is where addfd comes into play. There is an additional flag that allows you to "set" an FD, rather than add it with an arbitrary number. This has dup2 style semantics, and installs the new file at that file descriptor, and atomically closes the old one if it existed. This is useful for a particular use case that we have, in which we want to swap out AF_INET sockets for AF_UNIX, AF_INET6, and sockets in another namespace when doing "upconversion". My specific usecase at Netflix is to enable our IPv4-IPv6 transition mechanism, in which we our namespaces have no real IPv4 reachability, and when it comes time to do a connect(2), we get a socket from a namespace with global IPv4 reachability. In addition, we intend to use it for our servicemesh, and where our service mesh needs to intercept traffic ingress traffic, the addfd capability will act as a mechanism to do socket activation. Addfd is not implemented as a separate syscall, a la pidfd_getfd, as VFS makes some optimizations in regards to the fdtable, and assumes that they are not modified by external processes. Although a mechanism that scheduled something in the context of the task could work, it is somewhat simpler to do it in the context of the ioctl as we control the task while in kernel. In addition there are not obvious needs for this beyond seccomp notifier. This mechanism leaves a potential issue that if the manager is interrupted while injecting FDs, the child process will be left with leaked / dangling FDs. This may lead to undefined behaviour. A mechanism to work around this is to extend the structure and add a "rollback" mechanism for FDs to be closed if things fail. This introduces a new helper -- file_receive, which is responsible for moving fds across processes. The helper replaces code in SCM_RIGHTS. In SCM_RIGHTS compat codepath there was a bug that resulted in this not being set all. This fixes that bug, and should be cherry-picked into long-term. The file_receive change should probably go into stable. The file_receive code also replaced the receive fd logic in pidfd_getfd. This is somewhat contrary to my original view[5], but I think it is best for the principal of least surprise to adopt it. This should be cherry-picked into stable. I tested this on amd64 with the x86-64 and x32 ABIs. Given there is no testing infrastructure for cgroup v1, I opted to forgo adding new tests there as it is considered deprecated. Changes since v2: * Introducion of the file_receive helper which hoists out logic to manipulate file descriptors outside of seccomp.c to file.c * Small fix that manipulated the socket's cgroup even when the receive failed * seccomp struct layout Changes since v1: * find_notification has been cleaned up slightly, and it replaces a use case in send as well. * Fixes ref counting rules to get / release references in the ioctl side, rather than the seccomp notifier side [3]. * Removes the optional move flag, and opts into SCM_RIGHTS * Rearranges the seccomp_notif_addfd datastructure for greater user clarity [4]. In order to avoid unnamed padding it makes size u64, which is a little bit of a waste of space. * Changes error codes to return ESRCH upon the process going away on notification, and EINPROGRESS is the notification is in an unexpected state (and added tests for this behaviour) [1]: https://lore.kernel.org/lkml/20190918084833.9369-2-christian.brauner@ubuntu.com/ [2]: https://lore.kernel.org/lkml/20200107175927.4558-1-sargun@sargun.me/ [3]: https://lore.kernel.org/lkml/20200525000537.GB23230@ZenIV.linux.org.uk/ [4]: https://lore.kernel.org/lkml/20200525135036.vp2nmmx42y7dfznf@wittgenstein/ [5]: https://lore.kernel.org/lkml/20200107175927.4558-1-sargun@sargun.me/ Sargun Dhillon (4): fs, net: Standardize on file_receive helper to move fds across processes pid: Use file_receive helper to copy FDs seccomp: Introduce addfd ioctl to seccomp user notifier selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD fs/file.c | 56 ++++++ include/linux/file.h | 2 + include/uapi/linux/seccomp.h | 25 +++ kernel/pid.c | 20 +- kernel/seccomp.c | 184 +++++++++++++++++- net/compat.c | 10 +- net/core/scm.c | 14 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 183 +++++++++++++++++ 8 files changed, 467 insertions(+), 27 deletions(-) -- 2.25.1