Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1854966imu; Sun, 18 Nov 2018 10:10:31 -0800 (PST) X-Google-Smtp-Source: AJdET5eRXDGGIIOlY34gXQlysuGwhHdzRLmRWcdFw8TGfIATC0V9VfrE3XPHnA2t6H2nVOvRy72l X-Received: by 2002:a63:df13:: with SMTP id u19mr17408108pgg.294.1542564631502; Sun, 18 Nov 2018 10:10:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542564631; cv=none; d=google.com; s=arc-20160816; b=dE2PMmDnAntPlhGDBeD0prRg3RTy9PIXk1lcGrhhoh+uWIFRYBVjyShJ25C5fzswzp uaLUgjGXnYucntOsqvMw1aWNnLYhPsdc1y5ValuoTy0nI8xgQvQw6uU6gnDQ2VATk1D/ M/6qXPxcrADPp1+Pnex0/LroSDuW9JC+sVb/iCUfnhhIYPG+QfOzqpMpyDC88Y2Od5+o 7my8FN3ywl7QAiPZ9Q26SpESIpXKuWK59p18JVqvIIQoHVCet52UJlCzXz4KzUSmq/1R 7abS56epX+f8fCzTMt1uTAyF+Har5ItU0lmz4uFOL4VBvtRxOTFjTpfXFSgWgb7eeSDW KWgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature; bh=OkxUC9gqkiM0F1BvhyKuD3GceDfHx+4+mBXytetxRCM=; b=T7jIAXIRYfnezdZZxfMJqkDJVV+0r0BBv4qGh2xNmrYwyIkix1AgBuBK/3yMSPuw1P /3bDxuHGzq8g6/7kPg74TbbnZwVu5Fw1thirFFyECuUcBIJlOe2P18ICs0ZGdFjAk4xG Vk579kpQREJ6RrBmjyIYaNe5nWI/iyMpc2dnclwj//jTPbwEsreRNSHfDXzfeqkkjRMW c9cHMpf8uJlBcg9s5EDFKZ+l2nE21At2TuaBjJJRQNAlSmrKPKhlIrVARloP81ZNuEo8 vEkDhiA0L4nx5h2HNRfrICz/9BtZCTE0kWMqAjzQz5Inf5ARloDbTcyJwcyrLDAUAAJR BfSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="MXbj5/lU"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 2-v6si12519883pfq.129.2018.11.18.10.10.03; Sun, 18 Nov 2018 10:10:31 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="MXbj5/lU"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727089AbeKSE23 (ORCPT + 99 others); Sun, 18 Nov 2018 23:28:29 -0500 Received: from mail-vs1-f67.google.com ([209.85.217.67]:34340 "EHLO mail-vs1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726822AbeKSE23 (ORCPT ); Sun, 18 Nov 2018 23:28:29 -0500 Received: by mail-vs1-f67.google.com with SMTP id y27so16540881vsi.1 for ; Sun, 18 Nov 2018 10:07:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=OkxUC9gqkiM0F1BvhyKuD3GceDfHx+4+mBXytetxRCM=; b=MXbj5/lU3jX5TN39a1zwyfJfifY9YwmAAgwahR9yMBFAaQUCxlq7vVJf9f6cGEYMKW oX4zYQYB6XNdhr07Ye5svr+JPqy6Xz8C51DpxZQVrFr1fgUyNQIUBCa3byUFqVXVqDSF c2xTJ3LuSpzhG3p+dJJrKzSTBZqpORwGUlStZ1D4AvGMl+nSYGwVeuWAYiJSR9dS+/lL MTfQRK1ArleEwr3Up8nRw3QluIexoj+WG/bhGSXhVGbDpE8iv9xdopNH/jJNsjTcetQF Q4/xfmuugQAqPi60LC5JAmoZMagN9euQ1/VcVWRIoZM+PbCNS12LD58xJLgU1oKf6pgn sxmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=OkxUC9gqkiM0F1BvhyKuD3GceDfHx+4+mBXytetxRCM=; b=Kn2lbiwhD32CnTjYDldO/yHib2QMQoJYryVmeSH+STkKC8agcaXoTcsfqfi5f4/OIw u0Wtmcda0g/uHJlSusN/LTVycGCyzlkqfZg6o5DpTFlRgWT1o3EpvygUSq3KiiebTccL dwT9iDPAl9aRcIqJlyJR0GsWNrwsAJ5YhE/s37xjyyikkgkT53GM0Gvsa2+dBRs5PkcN 7xCltsxdzZ3XqGNOz0GwhYeoSPVnwGmkSaNQqaYRNN8WFHW1gEYlYf0+YwvleOVd6Udn FY/x1YvQ7DfIGiP/W/zhAYGxA2fA+mgC1BIPp8Z4TBZzZaRVfINQIHgc0yyNf2p/+WmJ QouA== X-Gm-Message-State: AGRZ1gKD+aB2HeKvKE9pO2DjyndqB90JsAiMOJgZNbi6c8hyXogT5mQE zvg+zRvUhdYOGBgvfzH7+WZfNtX41g21d2Iv1l5xMg== X-Received: by 2002:a67:b44:: with SMTP id 65mr7809440vsl.77.1542564452790; Sun, 18 Nov 2018 10:07:32 -0800 (PST) MIME-Version: 1.0 Received: by 2002:a67:f48d:0:0:0:0:0 with HTTP; Sun, 18 Nov 2018 10:07:31 -0800 (PST) In-Reply-To: <20181118174148.nvkc4ox2uorfatbm@brauner.io> References: <20181118111751.6142-1-christian@brauner.io> <20181118174148.nvkc4ox2uorfatbm@brauner.io> From: Daniel Colascione Date: Sun, 18 Nov 2018 10:07:31 -0800 Message-ID: Subject: Re: [PATCH] proc: allow killing processes via file descriptors To: Christian Brauner Cc: Andy Lutomirski , "Eric W. Biederman" , LKML , "Serge E. Hallyn" , Jann Horn , Andrew Morton , Oleg Nesterov , Aleksa Sarai , Al Viro , Linux FS Devel , Linux API , Tim Murray , Kees Cook , David Howells Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Nov 18, 2018 at 9:41 AM, Christian Brauner wrote: > On Sun, Nov 18, 2018 at 07:38:09AM -0800, Andy Lutomirski wrote: >> On Sun, Nov 18, 2018 at 5:59 AM Daniel Colascione wrote: >> > >> > I had been led to believe that the proposal would be a comprehensive >> > process API, not an ioctl basically equivalent to my previous patch. >> > If you had a more comprehensive proposal, please just share it on LKML >> > instead of limiting the discussion to those able to attend these >> > various conferences. If there's some determined opposition to a >> > general new process API, this opposition needs a fair and full airing, >> > as not everyone can attend these conferences. >> > >> > On Sun, Nov 18, 2018 at 3:17 AM, Christian Brauner wrote: >> > > With this patch an open() call on /proc/ will give userspace a handle >> > > to struct pid of the process associated with /proc/. This allows to >> > > maintain a stable handle on a process. >> > > I have been discussing various approaches extensively during technical >> > > conferences this year culminating in a long argument with Eric at Linux >> > > Plumbers. The general consensus was that having a handle on a process >> > > will be something that is very simple and easy to maintain >> > >> > ioctls are the opposite of "easy to maintain". Their >> > file-descriptor-specific behavior makes it difficult to use the things >> > safely. If you want to take this approach, please make a new system >> > call. An ioctl is just a system call with a very strange spelling and >> > unfortunate collision semantics. >> > >> > > with the >> > > option of being extensible via a more advanced api if the need arises. >> > >> > The need *has* arisen; see my exithand patch. >> > >> > > I >> > > believe that this patch is the most simple, dumb, and therefore >> > > maintainable solution. >> > > >> > > The need for this has arisen in order to reliably kill a process without >> > > running into issues of the pid being recycled as has been described in the >> > > rejected patch [1]. >> > >> > That patch was not "rejected". It was tabled pending the more >> > comprehensive process API proposal that was supposed to have emerged. >> > This patch is just another variant of the sort of approach we >> > discussed on that patch's thread here. As I mentioned on that thread, >> > the right approach option is a new system call, not an ioctl. >> > >> > To fulfill the need described in that patchset a new >> > > ioctl() PROC_FD_SIGNAL is added. It can be used to send signals to a >> > > process via a file descriptor: >> > > >> > > int fd = open("/proc/1234", O_DIRECTORY | O_CLOEXEC); >> > > ioctl(fd, PROC_FD_SIGNAL, SIGKILL); >> > > close(fd); >> > > >> > > Note, the stable handle will allow us to carefully extend this feature in >> > > the future. >> > >> > We still need the ability to synchronously wait on a process's death, >> > as in my patch set. I will be refreshing that patch set. >> >> I fully agree that a more comprehensive, less expensive API for >> managing processes would be nice. But I also think that this patch >> (using the directory fd and ioctl) is better from a security >> perspective than using a new file in /proc. >> >> I have an old patch to make proc directory fds pollable: >> >> https://lore.kernel.org/patchwork/patch/345098/ >> >> That patch plus the one in this thread might make a nice addition to >> the kernel even if we expect something much better to come along >> later. > > I agree. Eric's point was to make the first implementation of this as > simple as possible that's why this patch is intentionally almost > trivial. And I like it for its simplicity. > > I had a more comprehensive API proposal of which open(/proc/) was a > part. I didn't send out alongside this patch as Eric clearly prefered to > only have the /proc/ part. Here is the full proposal as I intended > to originally send it out: Thanks. > The gist is to have file descriptors for processes which is obviously not a new > idea. This has been done before in other OSes and it has been tried before in > Linux [2], [3] (Thanks to Kees for pointing out these patches.). So I want to > make it very clear that I'm not laying claim to this being my or even a novel > idea in any way. However, I want to diverge from previous approaches with my > suggestion. (Though I can't be sure that there's not something similar in other > OSes already.) Windows works basically as you describe. You can create a process is suspended state, configure it however you want, then let it run. CreateProcess (and even moreso, NtCreateProcess) also provide a rich (and *extensible*) interface for pre-creation process configuration. >> One of the main motivations for having procfds is to have a race-free way of > configuring, starting, polling, and killing a process. Basically, a process > lifecycle api if you want to think about it that way. The api should also be > easily extendable in the future to avoid running into the limitations we > currently see with the clone*() syscall(s) again. > > One of the crucial points of the api is to *separate the configuration > of a process through a procfd from actually creating the process*. > This is a crucial property expressed in the open*() system calls. First, get a > stable handle on an object then allow for ways to configure it. As such the > procfd api shares the same insight with Al's and David's new mount api. > (Fwiw, Andy also pointed out similarities with posix_spawn().) > What I envisioned was to have the following syscalls (multiple name suggestions): > > 1. int process_open / proc_open / procopen > 2. int process_config / proc_config / procconfig or ioctl()-based > 3. int process_info / proc_info / procinfo or ioctl()-based > 4. int process_manage / proc_manage / procmanage or ioctl()-based The API you've proposed seems fine to me, although I'd either 1) consolidate operations further into one system call, or 2) separate the different management operations into more and different system calls that can be audited independently. The grouping you've proposed seems to have the worst aspects of API splitting and API multiplexing. But I have no objection to it in spirit. That said, while I do want to fix process configuration and startup generally, I want to fix specific holes in the existing API surface first. The two patches I've already sent do that, and this work shouldn't wait on an ideal larger process-API overhaul that may or may not arrive. Based on previous history, I suspect that an API of the scope you're proposing would take years to overcome all LKML objections and land. I don't want to wait that long when we can make smaller fixes that would not conflict with the general architecture. The original patch on this thread is half of the right fix. While I think we should use a system call instead of an ioctl, and while I have some specific implementation critiques (which I described in a different message), it's the right general sort of thing. We should merge it. Next, I want to merge my exithand proposal, or something like it. It's likewise a simple change that, in a minimal way, addresses a longstanding API deficiency. I'm very strongly against the POLLERR-on-directory variant of the idea.