Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp633286yba; Sun, 31 Mar 2019 08:34:54 -0700 (PDT) X-Google-Smtp-Source: APXvYqyA9YFe+Cvph5bwqv0vki7DckamX2LBpJPFZDb9WMlrYmaM3JGcbe/VHAuMld8EAWXihOKt X-Received: by 2002:a17:902:d889:: with SMTP id b9mr57176142plz.294.1554046494702; Sun, 31 Mar 2019 08:34:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554046494; cv=none; d=google.com; s=arc-20160816; b=Bmi2rIn2+xI9GTeljlvt+3Kb3k7VvD6+RQ21UDOos3TkY+ID/Z3xUZAsDhsOSLSsFY khWUwiH0qLUXpScsN4wK3cFl1NUpjTlQ/pVRaUG8o9QPR3GZtsKFzChmIh7L/MzJBKPN 3+saqh+rVE6wYcjZ5gejyd9A8ZmYv3nSX6gcftj3QgwNweN4M03KmPJQFIpuhOCzfzdO eJ5tGPbnFm4y/vNMrWyXM+3h6tcdrNSiOoBME1OGXagvg++emz6xZJUg1/TqPgati0/A P20H/JUl0XFTZ3++6dudFy66JZoxM9Z4PxAidGvgAFMncCLOKBjnLNkfvPvZA7tUpfql lsZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=4az699/ItXcqB4rUuwqQwTNBLxb/aNzr48w+MPX4VyY=; b=B5iFJSJIdteexwGHQmMxERHbxnce1QZ+HUbdyk24GwWzW197kjiVD6VdOPbNGH12UD 4ilmvzl6UqZ7DNYxrjlvsKCrIqJLLGPX4b47qrZe9N2Dm66KIwGAJNXWagjvszIkaNYn GIjXizfa/t6sTP/IV8MlmtqO8XfuEJIbr99KKSQh5SLwc+NTUbh6XE488ZY9f5MEujBt Yt4PI8iWnpkdurr/9bLf55A0taWs6ZwVg+F1BGlSQ0N+G81rC7IMTCuTDJwyFZvXdtuj Xpa2Vz+6yTF3cAqW2lYbGXnWSASb6sx/N0Nyz019g4Ao4wXr7N43QzhUfzQ65a+NCn4U akOQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=TG8tAR0l; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h7si6638852pfb.167.2019.03.31.08.34.36; Sun, 31 Mar 2019 08:34:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=TG8tAR0l; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731305AbfCaPd5 (ORCPT + 99 others); Sun, 31 Mar 2019 11:33:57 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:46651 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731229AbfCaPdy (ORCPT ); Sun, 31 Mar 2019 11:33:54 -0400 Received: by mail-qk1-f194.google.com with SMTP id s81so4139845qke.13; Sun, 31 Mar 2019 08:33:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4az699/ItXcqB4rUuwqQwTNBLxb/aNzr48w+MPX4VyY=; b=TG8tAR0lIEnRybo7yzbwnF/T2o23MvnFjwIOuRJNTKFWMUYzEome/bpuB4nQB144a1 YB6AoZlmikv+wNkTe8sHCXoO4AEcTYR58YNe1r45MSKfivR9/wD2unYuiJ9eGPHWV2rJ JA7moKggJSOLs5XRVcTGtVA3NsOJpwn3LsVNMSqzk3WYOZlr8DqnOBx3c1z/GacShxY5 kX13KP/cPEbSHkkslzjgYT3WT1YjMUrwkNnayr6x4LR2ZfpZvlNVvh86z8V6GF/eeUWK tS9pgTYoot1XayadIi3wnJqgG+ZJX9awVurbokr2rvlSJxGT/rTkCU1mBWjja8d5wmnQ 06eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4az699/ItXcqB4rUuwqQwTNBLxb/aNzr48w+MPX4VyY=; b=ikTGRgBoi5eQct8hmJbTWCQ2M1Z8vVsH0Fq8Uyb5T3rz/68f/4y+op95VjHJ+1GB0E e5p5b89xWzjx3DFPDR8pNmkXm/OwsXmdiPdwLQYmuzxn8vAzKxtsdXt1JlGc2qM4XXyu Fu+49plDE2SOvj4gEtT69NiMD9er36OI/KJv20VzhAMr8Glh3G1CdS/ipfFfYy6mo3lm lpJUZXKnana3g+7HDotvRHaOQuwvyf4t7ZqfyudG7LbXJx6/BE9EwNWilJJOdqfR2P+C FeVQ+FFtgHrNzvYsXnYlPu6OOQ/H9eaO9Ia3GX3ioseq7YT4ZhJR4TIa8k0Anq+hOzyV KmWw== X-Gm-Message-State: APjAAAXEK2qIdD5A4kxKy/ZZ+AnsFz14j5XzS+df2e7/EJLGjr/VYEOe J4BR5H7kfT35e66EpXB5O9qHt+eMsQOuzZlgs7Q= X-Received: by 2002:a05:620a:1438:: with SMTP id k24mr44521117qkj.165.1554046433442; Sun, 31 Mar 2019 08:33:53 -0700 (PDT) MIME-Version: 1.0 References: <20190329155425.26059-1-christian@brauner.io> <20190331010716.GA189578@google.com> <20190331040810.GB189578@google.com> In-Reply-To: From: Jonathan Kowalski Date: Sun, 31 Mar 2019 16:33:48 +0100 Message-ID: Subject: Re: [PATCH v2 0/5] pid: add pidfd_open() To: Linus Torvalds Cc: Jann Horn , Joel Fernandes , Daniel Colascione , Christian Brauner , Andrew Lutomirski , David Howells , "Serge E. Hallyn" , Linux API , Linux List Kernel Mailing , Arnd Bergmann , "Eric W. Biederman" , Konstantin Khlebnikov , Kees Cook , Alexey Dobriyan , Thomas Gleixner , Michael Kerrisk-manpages , "Dmitry V. Levin" , Andrew Morton , Oleg Nesterov , Nagarathnam Muthusamy , Aleksa Sarai , Al Viro Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Mar 31, 2019 at 3:59 PM Linus Torvalds wrote: > > On Sat, Mar 30, 2019 at 9:47 PM Jann Horn wrote: > > > > Sure, given a pidfd_clone() syscall, as long as the parent of the > > process is giving you a pidfd for it and you don't have to deal with > > grandchildren created by fork() calls outside your control, that > > works. > > Don't do pidfd_clone() and pidfd_wait(). > > Both of those existing system calls already get a "flags" argument. > Just make a WPIDFD (for waitid) and CLONE_PIDFD (for clone) bit, and > make the existing system calls just take/return a pidfd. clone is out of flags, so there will have to be a new system call. I am not sure about the waitid bit. Are you suggesting it takes a pidfd and waits using it? I was thinking if we could make the pidfd itself pollable and readable for exit status. At pidfd_open time, you pass the flag and only if you're a parent you get a readable instance, if not, a pollable one for everyone (eg. for an indirect child as a reaper), and it fails for threads. Then, the pidfd clone2 returns can also be polled and read from. The main pain point is, currently when I ptrace from a thread a process, I need to use waitpid (waitid throws away ptrace critical information), and since ptrace works on a thread by thread basis, only the attached thread can do the waitpid. This means I cannot do anything else from the attached thread concurrently. waitfd was supposed to solve this (back in 2009) but it never made it in, and clone4 from Josh Triplett did something similar (returned exit status over the clonefd). FreeBSD's process descriptors are also pollable (which is where all this work was originally inspired from) and it would help with adoption if semantics were similar. Besides that, it would help libraries to be able to host their own set of children without affecting the entire process's waiting logic oe mucking with the SIGCHLD handler (you wouldn't need signals). > > Side note: we could (should?) also make the default maxpid just be > larger. It needs to fit in an 'int', but MAXINT instead of 65535 would > likely alreadt make a lot of these attacks harder. > > There was some really old legacy reason why we actually limited it to > 65535 originally. It was old and crufty even back when.. > > Linus > > Linus