Received: by 2002:a25:2c96:0:0:0:0:0 with SMTP id s144csp641052ybs; Sun, 24 May 2020 16:42:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzgVSIDprfVRLR5MW5jOc2atLAWQZha1SiQ3p1WvK7iHYG2ILFSZCWKr63UhH8lgAI2XPT4 X-Received: by 2002:a50:e84b:: with SMTP id k11mr13147708edn.204.1590363778802; Sun, 24 May 2020 16:42:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590363778; cv=none; d=google.com; s=arc-20160816; b=O3KWeLgMOb+tJdWcl/MYvmI6C3CmsbPP0gZnR7dhXmPxdfxJ4F/xtFuypDrgla+WMJ akDjhYYm89Pyj3H1XRtXCPjQyD8SaqtMptVPu6Yzc6zXpyUVGN7jzK0pmtFNw6xMJFjH BClh2G02e3BqUMQ02LfpnFVkUNggSxjnZxu055uVoKocmfg6tOvrw39YSiQcFEgIaif/ +W6UrV9/L/oivQiqX+gtUu/gDfKi/y617h81rOHt68kLXRc+AgoVMp88ByfMauNOldZE fNVfJr6ol6qOYqPT+kLNNiToN3Cz9IZ5ecnUEtXCa8H84YFjTae6RwomHso2vi/6x1mJ +ymw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=dECFMl9BmmZ3+aYimNGMaHnjr31NSpcMrgo0hG57jSE=; b=uUalzm2xGa4gQ9oVZOWBVCUl3f5aoT65DJWQ19Qrm4P7S/QEoDD+B51IHBagDHe7o+ BBgVR0/jX2zJVy/zK4Lvas+vQ3aBw+Ms1reDGXMzpfNm0G+dJS//0zmyl63vyVa+mc35 bWp/7CUyKkt9FnTknaaU+OMpDq3VzMugyQZVjs6zlxsaS7SjbN2ZMrueQGuVE7r5c4VM 66tVcqai7R+UFm+F7d3LO02g6W/02KIQsvZJj2uHqw2MvWwoV2sv43YeqSU7Uqa/LTq6 rqmsmkcexXMcWMdyA0ocpyU5fivI0NWeMLPj5FQoOdzs4ux7Dgzd0qHIZzUoWHMc5mgl aBYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sargun.me header.s=google header.b="GCKHxVz/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m5si8658604eds.245.2020.05.24.16.42.36; Sun, 24 May 2020 16:42:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@sargun.me header.s=google header.b="GCKHxVz/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388591AbgEXXjs (ORCPT + 99 others); Sun, 24 May 2020 19:39:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388120AbgEXXjs (ORCPT ); Sun, 24 May 2020 19:39:48 -0400 Received: from mail-pj1-x1043.google.com (mail-pj1-x1043.google.com [IPv6:2607:f8b0:4864:20::1043]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2E65C05BD43 for ; Sun, 24 May 2020 16:39:46 -0700 (PDT) Received: by mail-pj1-x1043.google.com with SMTP id ci23so7775548pjb.5 for ; Sun, 24 May 2020 16:39:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=dECFMl9BmmZ3+aYimNGMaHnjr31NSpcMrgo0hG57jSE=; b=GCKHxVz/fL3BhDuGXbcUgVZYy3X+rDbmzbLHsW3C2IIIU7jKp9zs7WtQyBn9oXgltU oHAedSy3n2UqH1/xJkuE9Xxumx8S/hQzW1Yq12yF1MHOD3sW5o2uqgYtoxqTe7wSX61+ wVj/v/yR/Dho4fLFJnHRiX62/H9onLke6YIWA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=dECFMl9BmmZ3+aYimNGMaHnjr31NSpcMrgo0hG57jSE=; b=RF9YqUM+CbR10UMvcBE3ujfgSDyW5/hj4JDavTjSzWFFE/nRuoiozUsYSpdyvgGOcr tbSugb7m6/SjMn20AhQymyXraGFtOqBN1fL3rPtdQGmffYn7rsIQAQdbgd7ZqeahKfH0 tDgSVzEXHaay2qrD9t36EATW51SJSKwBDqHI5H2Tco9yNuUUWfa4CW/A99cgFO4wytcQ GAFq7No8yrdFBYJ5de5ISpSWol+BFiZCajbUPIZghNCpt50tO0FigLCHgmawApz9iFvG TZW439WEw5HEuMHJumFBeVus4j2lxyUl2Aq0+milylL96nTQ8eV/R/BFLphXZkD4vD4k KaMw== X-Gm-Message-State: AOAM533HZGlmdTJlx0s8Lu9UkS8BAlthHUHgapoVk/XYbqPWOnQNwyw+ agcPYIDEbegTuNJSFwEs+0T8HX84nJ5rCnd8 X-Received: by 2002:a17:90a:690f:: with SMTP id r15mr18203328pjj.65.1590363585776; Sun, 24 May 2020 16:39:45 -0700 (PDT) Received: from ubuntu.netflix.com (203.20.25.136.in-addr.arpa. [136.25.20.203]) by smtp.gmail.com with ESMTPSA id b16sm11633177pfi.74.2020.05.24.16.39.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 24 May 2020 16:39:45 -0700 (PDT) From: Sargun Dhillon To: linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, linux-api@vger.kernel.org Cc: Sargun Dhillon , christian.brauner@ubuntu.com, tycho@tycho.ws, keescook@chromium.org, cyphar@cyphar.com, Jeffrey Vander Stoep , jannh@google.com, rsesek@google.com, palmer@google.com Subject: [PATCH 0/5] Add seccomp notifier ioctl that enables adding fds Date: Sun, 24 May 2020 16:39:37 -0700 Message-Id: <20200524233942.8702-1-sargun@sargun.me> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This adds the capability for seccomp notifier listeners to add file descriptors in response to a seccomp notification. This is useful for syscalls in which the previous capabilities were not sufficient. The current mechanism works well for syscalls that either have side effects that are system / namespace wide (mount), or that operate on a specific set of registers (reboot, mknod), and don't require dereferencing pointers. The problem with derefencing pointers in a supervisor is that it leaves us vulnerable to TOC-TOU [1] style attacks. For syscalls that had a direct effect on file descriptors pidfd_getfd was added, allowing for those file descriptors to be directly operated upon by the supervisor [2]. Unfortunately, this leaves system calls which return file descriptors out of the picture. These are fairly common syscalls, such as openat, socket, and perf_event_open that return file descriptors, and have arguments that are pointers. These require that the supervisor is able to verify the arguments, make the call on behalf of the process on hand, and pass back the resulting file descriptor. This is where addfd comes into play. There is an additional flag that allows you to "set" an FD, rather than add it with an arbitrary number. This has dup2 style semantics, and installs the new file at that file descriptor, and atomically closes the old one if it existed. This is useful for a particular use case that we have, in which we want to swap out AF_INET sockets for AF_UNIX, AF_INET6, and sockets in another namespace when doing "upconversion". My specific usecase at Netflix is to enable our IPv4-IPv6 transition mechanism, in which we our namespaces have no real IPv4 reachability, and when it comes time to do a connect(2), we get a socket from a namespace with global IPv4 reachability. In addition, we intend to use it for our servicemesh, and where our service mesh needs to intercept traffic ingress traffic, the addfd capability will act as a mechanism to do socket activation. Addfd is not implemented as a separate syscall, a la pidfd_getfd, as VFS makes some optimizations in regards to the fdtable, and assumes that they are not modified by external processes. Although a mechanism that scheduled something in the context of the task could work, it is somewhat simpler to do it in the context of the ioctl as we control the task while in kernel. There is an additional flag (move) that was added to enable cgroup v1 controllers (netprio, classid), and moving sockets, as a socket can only be associated with one cgroup at a time. [1]: https://lore.kernel.org/lkml/20190918084833.9369-2-christian.brauner@ubuntu.com/ [2]: https://lore.kernel.org/lkml/20200107175927.4558-1-sargun@sargun.me/ Sargun Dhillon (5): seccomp: Add find_notification helper seccomp: Introduce addfd ioctl to seccomp user notifier selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD seccomp: Add SECCOMP_ADDFD_FLAG_MOVE flag to add fd ioctl selftests/seccomp: Add test for addfd move semantics include/uapi/linux/seccomp.h | 33 +++ kernel/seccomp.c | 228 +++++++++++++++-- tools/testing/selftests/seccomp/seccomp_bpf.c | 235 ++++++++++++++++++ 3 files changed, 479 insertions(+), 17 deletions(-) -- 2.25.1