Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp448019yba; Mon, 1 Apr 2019 09:29:51 -0700 (PDT) X-Google-Smtp-Source: APXvYqwnpRSvRja5Hsz+80JF1o+LXmDyKlMFnDmpuhQsRwfJrmrh7j/4XnAk91nhpzcXDYPIwH17 X-Received: by 2002:a17:902:b484:: with SMTP id y4mr57951364plr.88.1554136191675; Mon, 01 Apr 2019 09:29:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554136191; cv=none; d=google.com; s=arc-20160816; b=T0jhJtLAE14CTYXbHb8TKAasYFGGEiP6hh4b+uydrMYo8ulzJ/FtH8ZKV4Zd4OkqME b9PAfHJmnUcKzGp5hEjf4B5Mu0jLWMBdCAwcKU7d3H/rD/RMHD1cWObRwIwyWcczp7nw 33iS7srpOak6BThJ38cU/TUNjdfaONDPh2OmeqJ2MQTItJVX17gKhm+dR3zfrdR64A23 hNR/Jkw7GJ8POCxBWT/EZ+SxlUhBFUJzml9Z+Xz3dOMQfjC8RuswD3dznvwTpWYx5E19 1+dch6B6EFE24We3jzT/77iTmvwly9EiwBb3VfXsnWN81m9Vn4MXLNmOm1NO1D/AMULD /LpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=Kma5HlMIaIo7ZZt/W843TSdh6qmdafDEwgat7MOQ+j8=; b=kXPETXc0DDaKYkSK4LMspsN6SmHsn15+gHY3arqLLykXc3kQmxoeiDzSuM4YAsJXNH Sr4ymfYlhJ/uFDqo5GKmitEnJPyLZrGXVoW5ZE6NCTeXZegtpYgbDEwWqqAtkANS8kNQ GEX9DRrxwuTQegeZj0RfVustqS72wQperhtwtigScUApzMG/u3jvAY10hNW9mah4Qxv/ S6T553ctQzG1ItfjbYU855Wh9efCZsPToMG1nVeeyAayx4MFJTevZhKWZ1SwCGsR1uhf BeYUl2Zokd8oUO18GSZ63udypqy/uWSHhPJvI4LxD4eZUopnEtHQEbm3JqE7yq0J4dfF g8+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=qIrNn3yp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o24si9157641pgh.260.2019.04.01.09.29.35; Mon, 01 Apr 2019 09:29:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=qIrNn3yp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728650AbfDAQ1o (ORCPT + 99 others); Mon, 1 Apr 2019 12:27:44 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:37395 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727032AbfDAQ1o (ORCPT ); Mon, 1 Apr 2019 12:27:44 -0400 Received: by mail-qk1-f194.google.com with SMTP id c1so6024329qkk.4; Mon, 01 Apr 2019 09:27:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Kma5HlMIaIo7ZZt/W843TSdh6qmdafDEwgat7MOQ+j8=; b=qIrNn3ypf5fh4GsslusABeGu83vEEVti55xgl1bu9+PSnnywYXuoL2NB+2vjMa65sm nW2GSKw+FioeEIipXrU218vrwc67DcPotro6KRAluqDmFoihqs37wQiDZoM30gdTiWZN SmCTIburjH+aJbO2gkhJWBxCsW9aqwTRW2qwsPq2flsqros3dgGqR6gGr35zxGwGfahd IkDw4XfwpIoxcnFKm12Mkhz8nix5DE6vhOfQq38BE58wXtsKyVh6eBZ84Y03h7L9RA+e l/6vBYKa1hzPuhkKcx26B3ow5pQIdP52eL461ZtrKVzeBnBKCJItNvsoDWLGHw71K6bl VZeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Kma5HlMIaIo7ZZt/W843TSdh6qmdafDEwgat7MOQ+j8=; b=aMfAzyHF9sw7LGojdmQIVk68zT6D+JJ2RF1Qws+/y0Evnq9J+eX5WDPborKKKJfLDu uKeXVwyf7+dxgm399xs5AxggozvCwPGWaQUpuoON4ODRL27D8xsf+UlsugG0WHentDqD /kpYqtXUdJATcre7Wph6E5TFigs2xRKmc2kbp1rAq6RKDWN3x6O1XKGnDfkRhPUYwzDN beCTtVWm4xgRY981/L9GPijpk6ws9PwHKWUoJO0cvv4CS/OxO7BDkT8dpaaSDzTg0XSp ge/l1aI7huaRjvoAXoOQd71EOhen5JVmYsnoL8GBjAXzkjKhFkJezu99MeMlyVax+ZCE GFEA== X-Gm-Message-State: APjAAAWR1uFmu/VaMKIavDOqc3ygdsAIa/svuZSkajFQ0zQ92DNE8c+L TwR4DbpUwib7uCFqdI7f3QsWn4pTKwY2UFy1rZA= X-Received: by 2002:a37:478d:: with SMTP id u135mr40076936qka.257.1554136062815; Mon, 01 Apr 2019 09:27:42 -0700 (PDT) MIME-Version: 1.0 References: <20190330171215.3yrfxwodstmgzmxy@brauner.io> <132107F4-F56B-4D6E-9E00-A6F7C092E6BD@amacapital.net> <20190331211041.vht7dnqg4e4bilr2@brauner.io> <18C7FCB9-2CBA-4237-94BB-9C4395A2106B@amacapital.net> <20190401114059.7gdsvcqyoz2o5bbz@yavin> In-Reply-To: From: Jonathan Kowalski Date: Mon, 1 Apr 2019 17:27:39 +0100 Message-ID: Subject: Re: [PATCH v2 0/5] pid: add pidfd_open() To: Linus Torvalds Cc: Daniel Colascione , Aleksa Sarai , Andy Lutomirski , Christian Brauner , Jann Horn , Andrew Lutomirski , David Howells , "Serge E. Hallyn" , Linux API , Linux List Kernel Mailing , Arnd Bergmann , "Eric W. Biederman" , Konstantin Khlebnikov , Kees Cook , Alexey Dobriyan , Thomas Gleixner , Michael Kerrisk-manpages , "Dmitry V. Levin" , Andrew Morton , Oleg Nesterov , Nagarathnam Muthusamy , Al Viro , Joel Fernandes Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 1, 2019 at 5:15 PM Linus Torvalds wrote: > > On Mon, Apr 1, 2019 at 9:07 AM Jonathan Kowalski wrote: > > > > With the POLLHUP model on a simple pidfd, you'd know when the process > > you were referring to is dead (and one can map POLLPRI to dead and > > POLLHUP to zombie, etc). > > Adding ->poll() to the pidfd should be easy. Again, it would be > trivially be made to work for the directory fd you get from > /proc/ too. > > Yeah, yeah, pollable directories are odd, but the vfs layer doesn't > care about things like "is this a directory or not". It will just call > the f_op->poll() method. I know, Andy even sent a patch for that long back. The question is, this sure solves the immediate usecase, but it inhibits some very powerful (and natural) things from being realised in the future, and makes some choices harder. Currently, pidfd_send_signal doesn't work across PID namespaces. It would be possible to make it work, but some things need to be taken care of, precisely, that one allows a task to open pidfds for tasks *it* can see. Why? because you essentially isolate the PID namespace, so your open() for this namespace suddenly doesn't start opening things it cannot see through some other namespace (i.e. /proc), precisely how you cannot open sockets in network namespaces from the outside, though if you can setns, you should be able to (same with pidfds). This makes for a nice delegation model, I can essentially put a task in a namespace with no other tasks, keep pushing pidfds into the damn thing, and subject to kernel permissions and capabilities it can yield in the owner userns, signal the said task. You can extend this to ptrace and other things, by making them accept a pidfd. This means userspace has to explicitly pass such descriptors around to make this work, like it does today (and how I can use an open socket and accept connections whilst living in totally isolated network namespace). Besides that, /proc comes with too much stuff, it should be possible to go from pidfd to /proc/ and do whatever you wish to, but atleast two things that require varying levels of capabilities of inspection, the latter of which can be isolated by mount namespaces even if the process would usially be allowed to peek into it and read the entire thing, do not end being munged together. I can choose to pass both, but if /proc dir fds *are* pidfds, you need the entire complexity of masking and whatnot (which would be usable on its own, no doubt), making directory descriptors pollable and readable, etc etc. > > Linus