Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp5330089yba; Wed, 10 Apr 2019 17:13:04 -0700 (PDT) X-Google-Smtp-Source: APXvYqyHM7B+PKWL9LlWnRi8p4OQ/Sbr6SWngZZob/UWlj9g4lv78SpVnGmCPVtUXBQBvPUudE+G X-Received: by 2002:a62:1385:: with SMTP id 5mr46761599pft.221.1554941584412; Wed, 10 Apr 2019 17:13:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554941584; cv=none; d=google.com; s=arc-20160816; b=LcWlLazL9Qsh58cx9V2K7ECIs7rg++zIyFpv4l1ncxQCHcDjxSHbikie7iDLrP9s6b DRpoWtoOHxzx68xAb/vjJ949aJHLDeNogDzDhUlDtMtFL6+xM0TEgH7Wsmxm3lsvYxg5 Wjhw7MergxDRhOvkrJn0NF2OBx5QoQ4BKR6S5fTIYS6yrET/XTb7xgjJbtOOPHTu6t9G c+7Q1+NQMfUhhF5nKXdO9AGJ0O/FBa0mfPYhwdxMTO/DBftfNy+Tsqvyw+QwfCncVI0r C7BVWYct6bshfCZrqAj201tJQG1zt4HZxCxnSdeQwwAtdFGvQYzg9FiHjDwsdg/DOIvP PWkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=bhhJrEgSnpGdPOxvbE0RnRdBbnjCFuj33XIQwLGk+j4=; b=lqmGxK/PrQ3BasWmGyNgK/2OmwbogvML3xF23+QX9OJ0OqTE55PNR/l2cxvbBQswsh y2qg4zBqyM4Tau2m9N5FZaWkorZNFWJbMpT1XQ8aZfuNOjpSNK6AGx9o0rkU65LYYEi5 cHbc+G/rBl/2voq7FHZrOmTGcIfiYUDeFtYPdbr0VDe7wZJp508wW3lf5XF4HqYS1PUf fuEe8lEiYQOoR0bdtSP1u9xxStqpacBF2EMnRVq1XXzK5fcpwwuUZz/FUAGIVYvCyXx9 S2QvB9svgJRtFVrpScbTRh4TRzQjApn3h8II8f5N5LvxMVrX256xPJ+tfQTgWBvcorOX SBqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=TZiliZIs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q20si30942797pgv.6.2019.04.10.17.12.48; Wed, 10 Apr 2019 17:13:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=TZiliZIs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726686AbfDKAMN (ORCPT + 99 others); Wed, 10 Apr 2019 20:12:13 -0400 Received: from mail-vs1-f68.google.com ([209.85.217.68]:43512 "EHLO mail-vs1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726023AbfDKAMN (ORCPT ); Wed, 10 Apr 2019 20:12:13 -0400 Received: by mail-vs1-f68.google.com with SMTP id t23so16571vso.10 for ; Wed, 10 Apr 2019 17:12:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bhhJrEgSnpGdPOxvbE0RnRdBbnjCFuj33XIQwLGk+j4=; b=TZiliZIs+m42kMuGA3ueXIYTfAhWML/YQc52g2kuSlr6Tkv7dQ49oDcavlQNxLqAGm 7EGaFRsPmMJBK/7iBn5PmAjIVDueHfKBBTpgA5oonaCeemQj5PLhhuzlRzaA5TN9gHYa nafDeT/5ZJ7NTYToU0Ldiuv3XaGsoZP/Cz8wEV9b5oCS1ttmoFmQWj8RKYQU2VUoda3Y p3zdVDll3qFTkIm3+UrLlGyfVr09TzMsue3vW9kJ8ftnzeGL4WqSKppCnrBeueGdtChj 5Z1DzMuKh3D8fXrlrqprhZViMMOfOFPS49WIyJdxgxi87Mbd/lK6OeQc97IuNx5NlKOt N8jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bhhJrEgSnpGdPOxvbE0RnRdBbnjCFuj33XIQwLGk+j4=; b=J+qse5lO3DnCDTrE8oYxs3dvPcEJ3EhsoCj0cIn44JR2SQbJAGCPrSy5s0AGAvFxPT h5KDBjQSDli6ZDmMAseB3Cn6gWJTAEWCciXDVqCqkmcwzxjSaPqmZNE0rG2RdRV7CL34 fQzjJ+LEraS+m6eNv71ZfwKvC5/t5dPKWnABHvBl7YMU9lwTpOAqhNFe+Q3y+xtU075h bFELRwBnMuEO3kMdE9VymTZoTxCCScWtoNx379ZWkAxX/SMS9mRhzxmaLwNpN6t4G85j zTu8TVkItv2euXGjfPcIDhntPkqVtCZJhc06p/jrC/QoHoZOLR6Veiyun0HJ1IxllQL9 lufA== X-Gm-Message-State: APjAAAUEU9HYO2F0kSDp5iCe5ujLpEbsOkoi8q/B6/tA+CFgb/a6XcOz Gq9AGxCyrY3iDAIMbaoazFGh4f1VgFEdeVYfWd3HaA== X-Received: by 2002:a05:6102:212:: with SMTP id z18mr26699112vsp.218.1554941531803; Wed, 10 Apr 2019 17:12:11 -0700 (PDT) MIME-Version: 1.0 References: <20190410234045.29846-1-christian@brauner.io> In-Reply-To: <20190410234045.29846-1-christian@brauner.io> From: Daniel Colascione Date: Wed, 10 Apr 2019 17:12:00 -0700 Message-ID: Subject: Re: [RFC PATCH] fork: add CLONE_PIDFD To: Christian Brauner Cc: Linus Torvalds , Al Viro , Jann Horn , David Howells , Linux API , linux-kernel , "Serge E. Hallyn" , Andy Lutomirski , Arnd Bergmann , "Eric W. Biederman" , Kees Cook , Alexey Dobriyan , Thomas Gleixner , Michael Kerrisk-manpages , Jonathan Kowalski , "Dmitry V. Levin" , Andrew Morton , Oleg Nesterov , Aleksa Sarai , Joel Fernandes , Daniel Colascione Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thanks for trying it both ways. On Wed, Apr 10, 2019 at 4:43 PM Christian Brauner wrote: > > Hey Linus, > > This is an RFC for adding a new CLONE_PIDFD flag to clone() as > previously discussed. > While implementing this Jann and I ran into additional complexity that > prompted us to send out an initial RFC patchset to make sure we still > think going forward with the current implementation is a good idea and > also provide an alternative approach: > > RFC-1: > This is an RFC for the implementation of pidfds as /proc/ file > descriptors. > The tricky part here is that we need to retrieve a file descriptor for > /proc/ before clone's point of no return. Otherwise, we need to fail > the creation of a process that has already passed all barriers and is > visible in userspace. Getting that file descriptor then becomes a rather > intricate dance including allocating a detached dentry that we need to > commit once attach_pid() has been called. > Note that this RFC only includes the logic we think is needed to return > /proc/ file descriptors from clone. It does *not* yet include the even > more complex logic needed to restrict procfs itself. And the additional > logic needed to prevent attacks such as openat(pidfd, "..", ...) and access > to /proc//net/ on top of the procfs restriction. Why would filtering proc be all that complicated? Wouldn't it just be adding a "sensitive" flag to struct pid_entry and skipping entries with that flag when constructing proc entries? > There are a couple of reasons why we stopped short of this and decided to > sent out an RFC first: > - Even the initial part of getting file descriptors from /proc/ out > of clone() required rather complex code that struck us as very > inelegant and heavy (which granted, might partially caused by not seeing > a cleaner way to implement this). Thus, it felt like we needed to see > whether this is even remotely considered acceptable. > - While discussing further aspects of this approach with Al we received > rather substantiated opposition to exposing even more codepaths to > procfs. > - Restricting access to procfs properly requires a lot of invasive work > even touching core vfs functions such as > follow_dotdot()/follow_dotdot_rcu() which also caused 2. Wasn't an internal bind mount supposed to take care of the parent traversal problem?