Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1834481imu; Sun, 18 Nov 2018 09:45:38 -0800 (PST) X-Google-Smtp-Source: AJdET5eZkjrWHZPnWviW41CCcXSZ2lMbDIJRXTky8RcDTa86N+fwZAMnWuKLYAW9Z8DJ3lsX8Mbq X-Received: by 2002:a63:77ce:: with SMTP id s197mr17145696pgc.89.1542563138540; Sun, 18 Nov 2018 09:45:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542563138; cv=none; d=google.com; s=arc-20160816; b=WyAGWaKZo56cqXh9DXK8IObmEnRbiEhqhqn7MS8Ve5+5UDnQFkcx/Yi9/KYKjVoW+s Zst4G2lHmZN25AEC8vAbyZY0UpVeiUb/ntk+LQPzdR7sBg2yrc2+VBBcqgzc59iJd0OB hraL9fooVvLihr4WDhP7QjH3jepies+tALCpmWSa31Vl7u3w7cqM1lNQgW8/Yf1Eg+Yd iH8EvKTZx8HWgs0uqfZZnLeJiZUi5P1zmGMw9er4Pewlc8ZezBFkrjiAmsY5wsn4hYrN ChoduMKjonEfFOGZC40DyAVVgAHFa6gpakyUdWliu24FSvu+fMtOuU9kT4WmsLn79B/x Kmqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=oNZtxricH8VCmcfIxt8qVe7e51FKvNWHQXITy2s7Thc=; b=wPFoDsoTDINwb91py8Oh5sMFSAlVt12kbAi81NBCWXRT0epxi/4F9q4GoNrFqdcAIA 3Zwy1XPvzQFbDdGfICn2It8OueEXimun8AGDFbsep27LNlIFAfzaZ/aTUa9r47e0a3VE YYOQtmLgVz8bj2dYDAxc1+OCXbwlXIEZ+GCg85Kpy5W3UYcIALiQudW6C0/DiEGo4zn/ GlsYSpLRHT62psUWA/mEGM8v4KocZTDZ3BQzR510Uyu5VXd4HgRu7Wm0wrwucp0oX8aT IjYJUNHzUR+GXgIYM5Ut42CYBJxjjUCY0Xn1y8hR6ufXadc8ZbnYHqYhPMa+aH3yxd15 LQgA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=uga8K6Kc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o11si36944704pgd.234.2018.11.18.09.45.23; Sun, 18 Nov 2018 09:45:38 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=uga8K6Kc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727094AbeKSEFW (ORCPT + 99 others); Sun, 18 Nov 2018 23:05:22 -0500 Received: from mail.kernel.org ([198.145.29.99]:40972 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726523AbeKSEFV (ORCPT ); Sun, 18 Nov 2018 23:05:21 -0500 Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 482B2208E3 for ; Sun, 18 Nov 2018 17:44:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542563068; bh=HA98euqYy3wEgON3mmY8TFT9et9Xln146bQy9GbsbMI=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=uga8K6KcFlGl/mx8jabRYAiry78uryKEqpdtVmfyGyRvhOmVTxstF1+w0Rq9ZFe/0 zvaHKHaAh1KoVwNAV4jwOZYLWl4EwSFK3eHv/OXmdjOovF35ia/qgb9pMS3yFKtwIp TXY7Ceq972D8ndfZ38owKZ8mRnl59ng/vQqXglI4= Received: by mail-wr1-f48.google.com with SMTP id x10so951153wrs.8 for ; Sun, 18 Nov 2018 09:44:28 -0800 (PST) X-Gm-Message-State: AA+aEWYXT4qCNsNSXahnVWLpYUFbfh7p+uSZu9JWGyufpIz//SEeL1GH fWCGenK7q0BwnzCa4OSwG0I2rKnd1KGb3pUn6MD/Qg== X-Received: by 2002:a5d:5541:: with SMTP id g1mr2387353wrw.330.1542563066707; Sun, 18 Nov 2018 09:44:26 -0800 (PST) MIME-Version: 1.0 References: <20181118111751.6142-1-christian@brauner.io> <20181118174148.nvkc4ox2uorfatbm@brauner.io> In-Reply-To: <20181118174148.nvkc4ox2uorfatbm@brauner.io> From: Andy Lutomirski Date: Sun, 18 Nov 2018 09:44:15 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] proc: allow killing processes via file descriptors To: Christian Brauner Cc: Andrew Lutomirski , Daniel Colascione , "Eric W. Biederman" , LKML , "Serge E. Hallyn" , Jann Horn , Andrew Morton , Oleg Nesterov , Aleksa Sarai , Al Viro , Linux FS Devel , Linux API , Tim Murray , Kees Cook , David Howells Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Nov 18, 2018 at 9:42 AM Christian Brauner wrote: > > On Sun, Nov 18, 2018 at 07:38:09AM -0800, Andy Lutomirski wrote: > > On Sun, Nov 18, 2018 at 5:59 AM Daniel Colascione wrote: > > > > > > I had been led to believe that the proposal would be a comprehensive > > > process API, not an ioctl basically equivalent to my previous patch. > > > If you had a more comprehensive proposal, please just share it on LKML > > > instead of limiting the discussion to those able to attend these > > > various conferences. If there's some determined opposition to a > > > general new process API, this opposition needs a fair and full airing, > > > as not everyone can attend these conferences. > > > > > > On Sun, Nov 18, 2018 at 3:17 AM, Christian Brauner wrote: > > > > With this patch an open() call on /proc/ will give userspace a handle > > > > to struct pid of the process associated with /proc/. This allows to > > > > maintain a stable handle on a process. > > > > I have been discussing various approaches extensively during technical > > > > conferences this year culminating in a long argument with Eric at Linux > > > > Plumbers. The general consensus was that having a handle on a process > > > > will be something that is very simple and easy to maintain > > > > > > ioctls are the opposite of "easy to maintain". Their > > > file-descriptor-specific behavior makes it difficult to use the things > > > safely. If you want to take this approach, please make a new system > > > call. An ioctl is just a system call with a very strange spelling and > > > unfortunate collision semantics. > > > > > > > with the > > > > option of being extensible via a more advanced api if the need arises. > > > > > > The need *has* arisen; see my exithand patch. > > > > > > > I > > > > believe that this patch is the most simple, dumb, and therefore > > > > maintainable solution. > > > > > > > > The need for this has arisen in order to reliably kill a process without > > > > running into issues of the pid being recycled as has been described in the > > > > rejected patch [1]. > > > > > > That patch was not "rejected". It was tabled pending the more > > > comprehensive process API proposal that was supposed to have emerged. > > > This patch is just another variant of the sort of approach we > > > discussed on that patch's thread here. As I mentioned on that thread, > > > the right approach option is a new system call, not an ioctl. > > > > > > To fulfill the need described in that patchset a new > > > > ioctl() PROC_FD_SIGNAL is added. It can be used to send signals to a > > > > process via a file descriptor: > > > > > > > > int fd = open("/proc/1234", O_DIRECTORY | O_CLOEXEC); > > > > ioctl(fd, PROC_FD_SIGNAL, SIGKILL); > > > > close(fd); > > > > > > > > Note, the stable handle will allow us to carefully extend this feature in > > > > the future. > > > > > > We still need the ability to synchronously wait on a process's death, > > > as in my patch set. I will be refreshing that patch set. > > > > I fully agree that a more comprehensive, less expensive API for > > managing processes would be nice. But I also think that this patch > > (using the directory fd and ioctl) is better from a security > > perspective than using a new file in /proc. > > > > I have an old patch to make proc directory fds pollable: > > > > https://lore.kernel.org/patchwork/patch/345098/ > > > > That patch plus the one in this thread might make a nice addition to > > the kernel even if we expect something much better to come along > > later. > > I agree. Eric's point was to make the first implementation of this as > simple as possible that's why this patch is intentionally almost > trivial. And I like it for its simplicity. > > I had a more comprehensive API proposal of which open(/proc/) was a > part. I didn't send out alongside this patch as Eric clearly prefered to > only have the /proc/ part. Here is the full proposal as I intended > to originally send it out: > > The gist is to have file descriptors for processes which is obviously not a new > idea. This has been done before in other OSes and it has been tried before in > Linux [2], [3] (Thanks to Kees for pointing out these patches.). So I want to > make it very clear that I'm not laying claim to this being my or even a novel > idea in any way. However, I want to diverge from previous approaches with my > suggestion. (Though I can't be sure that there's not something similar in other > OSes already.) > > One of the main motivations for having procfds is to have a race-free way of > configuring, starting, polling, and killing a process. Basically, a process > lifecycle api if you want to think about it that way. The api should also be > easily extendable in the future to avoid running into the limitations we > currently see with the clone*() syscall(s) again. > > One of the crucial points of the api is to *separate the configuration > of a process through a procfd from actually creating the process*. > This is a crucial property expressed in the open*() system calls. First, get a > stable handle on an object then allow for ways to configure it. As such the > procfd api shares the same insight with Al's and David's new mount api. > (Fwiw, Andy also pointed out similarities with posix_spawn().) > What I envisioned was to have the following syscalls (multiple name suggestions): > > 1. int process_open / proc_open / procopen > 2. int process_config / proc_config / procconfig or ioctl()-based > 3. int process_info / proc_info / procinfo or ioctl()-based > 4. int process_manage / proc_manage / procmanage or ioctl()-based Emails crossed :( For process management, I generally like this, although we might do better if we make execve() effectively invalidate the handle. Then we avoid a bunch of nasty permission issues. For process *creation*, we have the problem that libc authors feel that they can't safely use fds at all. There was a proposal for "high fds" a long time back to solve that. We might finally need to do something like that.