Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2374867yba; Mon, 15 Apr 2019 10:17:27 -0700 (PDT) X-Google-Smtp-Source: APXvYqzUqqdqVzQJ0c2KATRDEf/DeOv53C/SqCGHJSg4nIK95Z728rzC7pAPwPksTTR5xAezg+N5 X-Received: by 2002:a17:902:7883:: with SMTP id q3mr76903418pll.60.1555348647361; Mon, 15 Apr 2019 10:17:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555348647; cv=none; d=google.com; s=arc-20160816; b=thOKtXyG+ftpixUyxcpwQs7BkdUT8WQbexdyiRljWj+9mH/SUwJbriNYKsyCQNTHWl ZnV3tYEAAwcX9o5/jHQ2FPZ+7adODw6kAuD0N3BMBWcVliZLGEO3VbG8eHKbPcSv80Zp go3hznEOeGdVL21GuPE/6pggUYvRG5bG3GdgskcOgNutaTRx69sW9wX69KP1tUsCaAv6 jlvcFvZv8P3Z5qigr/5C0DgvRolhc4VBvmIVXCzMIdh15EwFogJY1due8Kj4hi00BVhJ FDDnvF1EKavtrXlMpq2lJcbQTcZgH6wBETeQT/4F+eU8QYAJNZjkL7LJigM8JdFiKVkI Nb4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=VgY988J5MmT52aqvYksvcGCD5FJa6IhC1tvL+3Puv5o=; b=dHI3aFuxVMQC/EPJMAQtApX7yOeio0BDsc8G+jYUzMrhP/MCZG1VTqvX73A2LdcXSH 9Xu0uVIh4ssO2mqxpXpouNt0zj6m3v8cLJ2tn0WnAJ8RNWhcpQqdpyFhULO2Hw2Lro3g ms6NrkYPBPqd04CaRt7O5oYVVVhuykFHxmP9hbZ8TG7jbCNRKMg1linZJXMGZkPcTCcH Eehs7So6dcEwftgrAFSmPQWddbl8zQ9FTJ1bgEkvv/9CUq16euVPGk0WR1/eMHDaUGWC CqXfOjhYcOsGnhnMUla/5pZnkUTIVYC1zAMOF9Z3cPwYt67ZoHYZERhWLjqy83FgA1wZ +UlQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=UncmXVMS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j12si45474392plk.144.2019.04.15.10.17.02; Mon, 15 Apr 2019 10:17:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=UncmXVMS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727717AbfDORPJ (ORCPT + 99 others); Mon, 15 Apr 2019 13:15:09 -0400 Received: from mail-qk1-f195.google.com ([209.85.222.195]:45462 "EHLO mail-qk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727285AbfDORPI (ORCPT ); Mon, 15 Apr 2019 13:15:08 -0400 Received: by mail-qk1-f195.google.com with SMTP id z76so10299474qkb.12; Mon, 15 Apr 2019 10:15:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=VgY988J5MmT52aqvYksvcGCD5FJa6IhC1tvL+3Puv5o=; b=UncmXVMSz7OXjTXGY8qYgsCV3V3MfYl1p/zr9dPy+Dw86ciXPF8XlEdW/6d+ClBGzj /QgVDJz6pYBxVvYI5ii4Nm15MvvOSW1+JYa4BWHqpiClrG/sxdIgzUTKjwZO9ymF3ju4 3uOFLTT+lg9vuszGDB106bGfXo9yHxgMOWuV/Wpz0MwoTZ2/FW1NwNnQohaNC0Xhy3RA mKLa2Y/fM5SzPORcD+6mF2IKcVm0yYUSrdFKW7+a4GSFMUP9GUyUKE3r979fELVphAon 7Smrr3iceGFmvaZ5FoZQ63YQ3FH2i/EAZ2bHk4b/kpBrZr5MSEUCC/+Y+A0+O+6BzaTU MigQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VgY988J5MmT52aqvYksvcGCD5FJa6IhC1tvL+3Puv5o=; b=WTl0HphSiMLKk4QApW+1w4daty5rRyvJBPKr125myjjHYCVu++zgbLyiYGazCpHku5 8ry3SfmVwFJMghTlOpGJ8xUZp4rku7+V4W7kIlLGV/cP45H8Yi0ceJJXMOVqSyPGBP0d 8KFxEYRDuQq9LYlvsvigwtFvZjwClQj5exUD8p58Vj+jtLZUi5Vy20vpOviU1q3vqYMZ t510exASni6Xu1iMtKOMO5PTMyMfjf8D6Mk55UliXulkytbb0u87TpNkrhdDTkWO4tCJ XKyXE3fky77qIINTUH66s0Bl4gylWHRRm7HWCfFkbdJID4LT5PubWsDJOxbZBRn4SOxt Q2Uw== X-Gm-Message-State: APjAAAVxO/xq9MHiuPArh/GLj3TQECivRmLEerZZYIzEntdxDpVkeSQz gXP2lkRChzJbEbeYIBabObvzHdjJuHAZNZzghGs3i+SX X-Received: by 2002:a37:9cc1:: with SMTP id f184mr35404220qke.211.1555348507518; Mon, 15 Apr 2019 10:15:07 -0700 (PDT) MIME-Version: 1.0 References: <20190414201436.19502-1-christian@brauner.io> <20190414201436.19502-3-christian@brauner.io> <20190415105209.GA22204@redhat.com> <20190415114204.ydczeuwmi74wfsuv@brauner.io> <20190415132416.GB22204@redhat.com> In-Reply-To: <20190415132416.GB22204@redhat.com> From: Jonathan Kowalski Date: Mon, 15 Apr 2019 18:15:28 +0100 Message-ID: Subject: Re: [PATCH 2/4] clone: add CLONE_PIDFD To: Oleg Nesterov Cc: Christian Brauner , Linus Torvalds , Al Viro , Jann Horn , David Howells , Linux API , linux-kernel , "Serge E. Hallyn" , Andy Lutomirski , Arnd Bergmann , "Eric W. Biederman" , Kees Cook , Thomas Gleixner , Michael Kerrisk-manpages , Andrew Morton , Aleksa Sarai , Joel Fernandes , Daniel Colascione Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 15, 2019 at 2:25 PM Oleg Nesterov wrote: > > On 04/15, Christian Brauner wrote: > > > > > CLONE_PARENT_SETTID doesn't look very usefule, so what if we add > > > > > > if ((clone_flags & (CLONE_PIDFD|CLONE_PARENT_SETTID)) == > > > (CLONE_PIDFD|CLONE_PARENT_SETTID)) > > > return ERR_PTR(-EINVAL); > > > > > > at the start of copy_process() ? > > > > > > Then it can do > > > > > > if (clone_flags & CLONE_PIDFD) { > > > retval = pidfd_create(pid, &pidfdf); > > > if (retval < 0) > > > goto bad_fork_free_pid; > > > retval = put_user(retval, parent_tidptr) > > > if (retval < 0) > > > goto bad_fork_free_pid; > > > } > > > > Uhhh Oleg, that is nifty. I have to say I like that a lot. This would > > let us return the pid and the pidfd in one go and we can also start > > pidfd numbering at 0. > > Christian, sorry if it was already discussed, but I can't force myself to > read all the previous discussions ;) > > If we forget about CONFIG_PROC_FS, why do we really want to create a file? > > > Suppose we add a global u64 counter incremented by copy_process and reported > in /proc/$pid/status. Suppose that clone(CLONE_PIDFD) writes this counter to > *parent_tidptr. Let's denote this counter as UNIQ_PID. > > Now, if you want to (say) safely kill a task and you have its UNIQ_PID, you > can do > > kill_by_pid_uniq(int pid, u64 uniq_pid) > { > pidfd = open("/proc/$pid", O_DIRECTORY); > > status = openat(pidfd, "status"); > u64 this_uniq_pid = ... read UNIQ_PID from status ...; > > if (uniq_pid != this_uniq_pid) > return; > > pidfd_send_signal(pidfd); > } > > Why else do we want pidfd? Apart from what others have already pointed out, there are two other things I am looking forward to: * Currently, when ptracing from a thread, waitpid means that I need to block or constantly loop over with pauses to receive the ptrace related results, since ptrace is thread directed (and to be able to poll other event sources as well, eg. to receive further commands over a pipe/message passing fd), and related waitpid responses only arrive to the attached thread. The waitfd patchset was rejected on the grounds that one could use a separate thread to do the waitpid while polling from the attached thread or a new thread, but due to ptrace this is false. pidfds would allow for this to work (this does mean we'd also need to be able to return one at ATTACH/SEIZE time, though). Note that waitid and other variants throw away a lot of needed information. * Descriptors mean you can optionally choose to bind your privileges to the file descriptor and then pass it around to others. They do not work this way now but the choice of such an extension has been kept open. One of the examples is binding one's CAP_KILL capability and then pass it to another process, so that it can freely signal the said process (and only that), or be able to optionally poke holes in the restrictions imposed by PID namespaces (possibly in the future), etc. > > Oleg. >