Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp4566458imd; Tue, 30 Oct 2018 04:13:07 -0700 (PDT) X-Google-Smtp-Source: AJdET5f80MQKVf5i5lY5lai26UflEDN9TSHETWDVvKGYEJEn50XX1YWOK312UV5ReaFBWTPtXDJA X-Received: by 2002:a62:120b:: with SMTP id a11-v6mr2456661pfj.165.1540897987250; Tue, 30 Oct 2018 04:13:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540897987; cv=none; d=google.com; s=arc-20160816; b=VtScnOQ5AqKAxv1Kg4SHUUiPzRGukj22Nn/tR0FzE6jQTF5tiWF3BNYO5w6NDwgLYc UbJSQDxD/Hq1v6CWRNXGk2EWRe8WfZA4CcvsAem9bjc6jZeGLvpY6pcVsnvpZSafhJFe N4oWwAyYAOyL9WCBqalQjIa+Y5XfM4xbYvUIQ+Kfa7WRLZkb7IpwNjtWyhfTN1hpNYdI mptz9/awBvFVCpwRxwf7kuTPHZPRLDQGObEznuI4+x0Dbk+b13xNv0Xp3Mtp05CyvfcP PwyS7uk+vKx/sJyOAPrUVrGrNXnveCnwpS+cHAiCLw2FfAY4K2uPMTr4FjD+z/J/uMWe zykA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature; bh=L1wMJ/fCyP2TUT8luWKsXO8NQtgXmY5wpEhcZkgnlow=; b=Rw5Juocuq+/huabBlBzlah7UxwgpQs/HyyXXQ6vhR1XobZBJetm6Nhn51BMPiL65Ce Ud9Fr+DafMaRL6FtBhIfpR4OfyoXjaVy03vnpIABIijKR+RHhfbRVSwE9NfFsJHYtFqU fJtOqCt/DM8oZhLpkoeMzMGyFS//o9u/fBgyaPsL7+r8+jTreQfSYEYRdvZLDeCvRPo6 VMV6gBPJW7vd++TtbPmOX3+y0ugBFc0UMnP8VvXsou59SMpkhUwrawq9lofyBrT37ODz n4YyVnPmenHxcwOrohg+KCUf1QV+mnko/sD4i4AC+/JzSO2H82qRCEPsGr9mYRvEcHqd T/hA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=aSM87JEo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e16-v6si23327034pfn.124.2018.10.30.04.12.51; Tue, 30 Oct 2018 04:13:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=aSM87JEo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727764AbeJ3UF2 (ORCPT + 99 others); Tue, 30 Oct 2018 16:05:28 -0400 Received: from mail-ua1-f66.google.com ([209.85.222.66]:40089 "EHLO mail-ua1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727381AbeJ3UF1 (ORCPT ); Tue, 30 Oct 2018 16:05:27 -0400 Received: by mail-ua1-f66.google.com with SMTP id n7so2133853uao.7 for ; Tue, 30 Oct 2018 04:12:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=L1wMJ/fCyP2TUT8luWKsXO8NQtgXmY5wpEhcZkgnlow=; b=aSM87JEoUbYVST668cQJMPxoFncyZ6U7B19c/JxLC7Tglqq7IoGURw/R0k201vYJ4z /7FY32MI60Kx8mxm/bQbZyiFq/1U57y1ZVJ9j6IcvMemvagxHGgEUita9hik8yIX1h70 q7+3AAZez4tXDpZt6yQoUE5xeP/w+2P/B5dm3xXDbQ1wtIaTzOYr0/aEtZbfo0PsoB4m xpkt18AtKfzSFYqjGKSRCWs0Ma4Wr8KEotaUMBAcsRLBIp0anfAORIaN+L/50C2QXD1o s4rf6q5Y9jsyuU2G2yvy2XhdbZXvGgKQ039PpWMuDg365GKrSWE6T1+pIz4iDrfuN+0p fKFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=L1wMJ/fCyP2TUT8luWKsXO8NQtgXmY5wpEhcZkgnlow=; b=JwU2PXOGnDKuP1XNlmtXCKIXRnTp4IeHJJpgEqtE3apR+7deWz7emn8LUZrFk7LzBB W7Qu4Dl6php26rkC828ZxlWhgBAWTx/T1yo81Yl0dQj/5srrbAfmgf4nQ4gk8tmKDysN oEuo+Bly3IBjtAwwxLaQlWP/Bu+BAUGdwJ2ZN86sqZl3tCOjXkaSX5P2syYqOVWt3iZ0 VGQxB+zP7ioUz+Jl/N510gnDKms2db6yvMa0KUqGAW5A0aMR3YIJGyrl1iLw2QP56Ns/ Ed+s3Tvj77i2WawcE0OnUHNNpTnRkPzRB/DSNFwzB4V8AxBH5tV7UVwpMEQ0HaVGAE2o dpgQ== X-Gm-Message-State: AGRZ1gIxEIbJsvnQ9aDD90hSVW5aoHLPvYySTrbr1q9P4u3hZuoPdSWa 1J+FiEBzfGtHu+XZ0Nc4SdXvLRvt6dC/IM/2BMWUYs+v/YM= X-Received: by 2002:ab0:5648:: with SMTP id z8mr7140326uaa.126.1540897941866; Tue, 30 Oct 2018 04:12:21 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a67:f492:0:0:0:0:0 with HTTP; Tue, 30 Oct 2018 04:12:21 -0700 (PDT) In-Reply-To: References: <20181029221037.87724-1-dancol@google.com> <20181030103910.mnzot3zcoh6j7did@gmail.com> <20181030104037.73t5uz3piywxwmye@gmail.com> From: Daniel Colascione Date: Tue, 30 Oct 2018 11:12:21 +0000 Message-ID: Subject: Re: [RFC PATCH] Implement /proc/pid/kill To: Christian Brauner Cc: Joel Fernandes , Linux Kernel Mailing List , Tim Murray , Suren Baghdasaryan Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 30, 2018 at 11:04 AM, Christian Brauner wrote: > On Tue, Oct 30, 2018 at 11:48 AM Daniel Colascione wrote: >> >> On Tue, Oct 30, 2018 at 10:40 AM, Christian Brauner >> wrote: >> > On Tue, Oct 30, 2018 at 11:39:11AM +0100, Christian Brauner wrote: >> >> On Tue, Oct 30, 2018 at 08:50:22AM +0000, Daniel Colascione wrote: >> >> > On Tue, Oct 30, 2018 at 3:21 AM, Joel Fernandes wrote: >> >> > > On Mon, Oct 29, 2018 at 3:11 PM Daniel Colascione wrote: >> >> > >> >> >> > >> Add a simple proc-based kill interface. To use /proc/pid/kill, just >> >> > >> write the signal number in base-10 ASCII to the kill file of the >> >> > >> process to be killed: for example, 'echo 9 > /proc/$$/kill'. >> >> > >> >> >> > >> Semantically, /proc/pid/kill works like kill(2), except that the >> >> > >> process ID comes from the proc filesystem context instead of from an >> >> > >> explicit system call parameter. This way, it's possible to avoid races >> >> > >> between inspecting some aspect of a process and that process's PID >> >> > >> being reused for some other process. >> >> > >> >> >> > >> With /proc/pid/kill, it's possible to write a proper race-free and >> >> > >> safe pkill(1). An approximation follows. A real program might use >> >> > >> openat(2), having opened a process's /proc/pid directory explicitly, >> >> > >> with the directory file descriptor serving as a sort of "process >> >> > >> handle". >> >> > > >> >> > > How long does the 'inspection' procedure take? If its a short >> >> > > duration, then is PID reuse really an issue, I mean the PIDs are not >> >> > > reused until wrap around and the only reason this can be a problem is >> >> > > if you have the wrap around while the 'inspecting some aspect' >> >> > > procedure takes really long. >> >> > >> >> > It's a race. Would you make similar statements about a similar fix for >> >> > a race condition involving a mutex and a double-free just because the >> >> > race didn't crash most of the time? The issue I'm trying to fix here >> >> > is the same problem, one level higher up in the abstraction hierarchy. >> >> > >> >> > > Also the proc fs is typically not the right place for this. Some >> >> > > entries in proc are writeable, but those are for changing values of >> >> > > kernel data structures. The title of man proc(5) is "proc - process >> >> > > information pseudo-filesystem". So its "information" right? >> >> > >> >> > Why should userspace care whether a particular operation is "changing >> >> > [a] value[] of [a] kernel data structure" or something else? That >> >> > something in /proc is a struct field is an implementation detail. It's >> >> > the interface semantics that matters, and whether a particular >> >> > operation is achieved by changing a struct field or by making a >> >> > function call is irrelevant to userspace. Proc is a filesystem about >> >> > processes. Why shouldn't you be able to send a signal to a process via >> >> > proc? It's an operation involving processes. >> >> > >> >> > It's already possible to do things *to* processes via proc, e.g., >> >> > adjust OOM killer scores. Proc filesystem file descriptors are >> >> > userspace references to kernel-side struct pid instances, and as such, >> >> > make good process handles. There are already "verb" files in procfs, >> >> > such as /proc/sys/vm/drop_caches and /proc/sysrq-trigger. Why not add >> >> > a kill "verb", especially if it closes a race that can't be closed >> >> > some other way? >> >> > >> >> > You could implement this interface as a system call that took a procfs >> >> > directory file descriptor, but relative to this proposal, it would be >> >> > all downside. Such a thing would act just the same way as >> >> > /pric/pid/kill, and wouldn't be usable from the shell or from programs >> >> > that didn't want to use syscall(2). (Since glibc isn't adding new >> >> > system call wrappers.) AFAIK, the only downside of having a "kill" >> >> > file is the need for a string-to-integer conversion, but compared to >> >> > process killing, integer parsing is insignificant. >> >> > >> >> > > IMO without a really good reason for this, it could really be a hard >> >> > > sell but the RFC was worth it anyway to discuss it ;-) >> >> > >> >> > The traditional unix process API is down there at level -10 of Rusty >> >> > Russel's old bad API scale: "It's impossible to get right". The races >> >> > in the current API are unavoidable. That most programs don't hit these >> >> > races most of the time doesn't mean that the race isn't present. >> >> > >> >> > We've moved to a model where we identify other system resources, like >> >> > DRM fences, locks, sockets, and everything else via file descriptors. >> >> > This change is a step toward using procfs file descriptors to work >> >> > with processes, which makes the system more regular and easier to >> >> > reason about. A clean API that's possible to use correctly is a >> >> > worthwhile project. >> >> >> >> So I have been disucssing a new process API With David Howells, Kees >> >> Cook and a few others and I am working on an RFC/proposal for this. It >> >> is partially inspired by the new mount API. So I would like to block >> >> this patch until then. I would like to get this right very much and >> >> It's good to hear that others are thinking about this problem. >> >> >> I >> >> don't think this is the way to go. > > Because we want this to be generic and things like getting handles on > processes via /proc is just a part of that The word "generic" is like the word "secure": it's hard to tell what it means in isolation. :-) Over what domain do we need to be generic? Procfs file descriptors already work on processes generally, and they allow for race-free access to anything that's reachable via a procfs pid directory. In what way would an alternate approach be even more generic? >> Why not? >> >> Does your proposed API allow for a race-free pkill, with arbitrary >> selection criteria? This capability is a good litmus test for fixing >> the long-standing Unix process API issues. > > You'd have a handle on the process with an fd so yes, it would be. Thanks. That's good to hear. Any idea on the timetable for this proposal? I'm open to lots of alternative technical approaches, but I don't want this capability to languish for a long time.