Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp4558957imd; Tue, 30 Oct 2018 04:06:11 -0700 (PDT) X-Google-Smtp-Source: AJdET5dbpKI5PYAj6JpOzyGMmTM1VrvtyxAhs0XFXJ4Fwivv22D/tuXnq7Y0WZRTxjXlXauYLRFP X-Received: by 2002:a17:902:7282:: with SMTP id d2-v6mr11397668pll.272.1540897570992; Tue, 30 Oct 2018 04:06:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540897570; cv=none; d=google.com; s=arc-20160816; b=etxXFQmk0mtByiPkBJveyFY0FpvZX1DaP4/U6kpSKfJ36vTTneimHs9BXVGUz2v0tb cSgIRQv5Q2ZStc9BuNtlxnns3o818D6O7N+gK2a4wlvYx4XU5mbYjdWSlAWtkbIAxrF/ VSACtdkJ9NecOYP5C2AqVC29EyMFIk3M0ObB7DNMxDwUDG9fBHGQ47dfsLW2S9RxHi9+ n62MGpDwaJtlvVzPVzBLl8WUyiF8vebcbnV5iT1hUxjBYNktH4OQfzzOfAgawL/KEhBp T3hijgJbEAwikShjGfziZSvvwhvoVX2joNsPtlhQZACeucQSgY8cVl5KExHMLpIy6TYt zhhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=+xD8uEpiSIMs93mm0jrRvaJ5Pldce2QKveGyTBtfPY4=; b=kJKTn5NCUGrS6dXyAcQ83WoWi76gKo+9IfyHyp7Oebd7ViMu+N4H6yFDVfzepuJhji Vqf9hTlu6ch/1mo6fCusrZDDGm544W0JWVaiBqfmzfYG7kCzkVkbxgQzduGyE7E4to/q Weu5eyOy9HcjD1FY6mdkUB0HkVpu7ZOBRcu+q6VFDz0luRC0ilVeDDWffS7DLj/XU3QJ 1ALqa0e6wQXfB9KuIsYbwNXpxdxx86x1y935QS5S4wu5wXfZQw15R+1lRT+4ZoLNMgH7 IR/4AnmoSYfqlF5euGt6M03OP/x4qGfNuNk2Bu9VhmdsN+zq8h0ZxHD1Is7YvjTJVdVL bSQg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x2-v6si23171968pln.232.2018.10.30.04.05.55; Tue, 30 Oct 2018 04:06:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727698AbeJ3T6G (ORCPT + 99 others); Tue, 30 Oct 2018 15:58:06 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:48492 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727239AbeJ3T6G (ORCPT ); Tue, 30 Oct 2018 15:58:06 -0400 Received: from mail-vs1-f71.google.com ([209.85.217.71]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1gHRpX-0003al-ID for linux-kernel@vger.kernel.org; Tue, 30 Oct 2018 11:05:03 +0000 Received: by mail-vs1-f71.google.com with SMTP id z73so2935817vsc.10 for ; Tue, 30 Oct 2018 04:05:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+xD8uEpiSIMs93mm0jrRvaJ5Pldce2QKveGyTBtfPY4=; b=KlFzy+ZhnyMsWEOM0usKnrcs26aynx/vTYX5i/7QcCeAi3FkcvDP0itk69UUMsnIYU i6QRcGmTDDpsmoFkz3yQpciiaf+C/T1RqJNWP7RaCoSPy9uq3jz4BageAqr/Att/yo1w sxv2Bme/xGfv8DOa02+N/THPlzCjKvczs1GZAYDOLUgV1QgXoS48/kMeZdDHulhKiysf CuVJBdtlcYl4wcOSBX1Nvflw4R2P3MdoMai367vx583YHYI5EAX/FO/KSZ08HLKg2Nky um1UWq+7VeCPU00LICnIHcorfG73zKvMTv20vOkErkQeGnS/hCDOwW1n/ar8x/jBy53L jWXA== X-Gm-Message-State: AGRZ1gIJHgvlV+3WAHYtATtiQkMu3cPPs6OnpovXmK/tNj0CBtbe6E+D ECjnQ/4hJHcv/VaX5vXnqz4S0ViEhUICB80EZxDgkusp1HIt1p53Je/hUKGwzsC9vcm1tTlIXxS RIpMoIZ5kya/RrvQnU+pRAzy9QygvUNQbGrLGzf2Mo9dMUYVl9hkCg6wAwg== X-Received: by 2002:a9f:308a:: with SMTP id j10mr8033116uab.28.1540897502235; Tue, 30 Oct 2018 04:05:02 -0700 (PDT) X-Received: by 2002:a9f:308a:: with SMTP id j10mr8033094uab.28.1540897501579; Tue, 30 Oct 2018 04:05:01 -0700 (PDT) MIME-Version: 1.0 References: <20181029221037.87724-1-dancol@google.com> <20181030103910.mnzot3zcoh6j7did@gmail.com> <20181030104037.73t5uz3piywxwmye@gmail.com> In-Reply-To: From: Christian Brauner Date: Tue, 30 Oct 2018 12:04:50 +0100 Message-ID: Subject: Re: [RFC PATCH] Implement /proc/pid/kill To: Daniel Colascione Cc: Joel Fernandes , Linux Kernel Mailing List , Tim Murray , Suren Baghdasaryan Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 30, 2018 at 11:48 AM Daniel Colascione wrote: > > On Tue, Oct 30, 2018 at 10:40 AM, Christian Brauner > wrote: > > On Tue, Oct 30, 2018 at 11:39:11AM +0100, Christian Brauner wrote: > >> On Tue, Oct 30, 2018 at 08:50:22AM +0000, Daniel Colascione wrote: > >> > On Tue, Oct 30, 2018 at 3:21 AM, Joel Fernandes wrote: > >> > > On Mon, Oct 29, 2018 at 3:11 PM Daniel Colascione wrote: > >> > >> > >> > >> Add a simple proc-based kill interface. To use /proc/pid/kill, just > >> > >> write the signal number in base-10 ASCII to the kill file of the > >> > >> process to be killed: for example, 'echo 9 > /proc/$$/kill'. > >> > >> > >> > >> Semantically, /proc/pid/kill works like kill(2), except that the > >> > >> process ID comes from the proc filesystem context instead of from an > >> > >> explicit system call parameter. This way, it's possible to avoid races > >> > >> between inspecting some aspect of a process and that process's PID > >> > >> being reused for some other process. > >> > >> > >> > >> With /proc/pid/kill, it's possible to write a proper race-free and > >> > >> safe pkill(1). An approximation follows. A real program might use > >> > >> openat(2), having opened a process's /proc/pid directory explicitly, > >> > >> with the directory file descriptor serving as a sort of "process > >> > >> handle". > >> > > > >> > > How long does the 'inspection' procedure take? If its a short > >> > > duration, then is PID reuse really an issue, I mean the PIDs are not > >> > > reused until wrap around and the only reason this can be a problem is > >> > > if you have the wrap around while the 'inspecting some aspect' > >> > > procedure takes really long. > >> > > >> > It's a race. Would you make similar statements about a similar fix for > >> > a race condition involving a mutex and a double-free just because the > >> > race didn't crash most of the time? The issue I'm trying to fix here > >> > is the same problem, one level higher up in the abstraction hierarchy. > >> > > >> > > Also the proc fs is typically not the right place for this. Some > >> > > entries in proc are writeable, but those are for changing values of > >> > > kernel data structures. The title of man proc(5) is "proc - process > >> > > information pseudo-filesystem". So its "information" right? > >> > > >> > Why should userspace care whether a particular operation is "changing > >> > [a] value[] of [a] kernel data structure" or something else? That > >> > something in /proc is a struct field is an implementation detail. It's > >> > the interface semantics that matters, and whether a particular > >> > operation is achieved by changing a struct field or by making a > >> > function call is irrelevant to userspace. Proc is a filesystem about > >> > processes. Why shouldn't you be able to send a signal to a process via > >> > proc? It's an operation involving processes. > >> > > >> > It's already possible to do things *to* processes via proc, e.g., > >> > adjust OOM killer scores. Proc filesystem file descriptors are > >> > userspace references to kernel-side struct pid instances, and as such, > >> > make good process handles. There are already "verb" files in procfs, > >> > such as /proc/sys/vm/drop_caches and /proc/sysrq-trigger. Why not add > >> > a kill "verb", especially if it closes a race that can't be closed > >> > some other way? > >> > > >> > You could implement this interface as a system call that took a procfs > >> > directory file descriptor, but relative to this proposal, it would be > >> > all downside. Such a thing would act just the same way as > >> > /pric/pid/kill, and wouldn't be usable from the shell or from programs > >> > that didn't want to use syscall(2). (Since glibc isn't adding new > >> > system call wrappers.) AFAIK, the only downside of having a "kill" > >> > file is the need for a string-to-integer conversion, but compared to > >> > process killing, integer parsing is insignificant. > >> > > >> > > IMO without a really good reason for this, it could really be a hard > >> > > sell but the RFC was worth it anyway to discuss it ;-) > >> > > >> > The traditional unix process API is down there at level -10 of Rusty > >> > Russel's old bad API scale: "It's impossible to get right". The races > >> > in the current API are unavoidable. That most programs don't hit these > >> > races most of the time doesn't mean that the race isn't present. > >> > > >> > We've moved to a model where we identify other system resources, like > >> > DRM fences, locks, sockets, and everything else via file descriptors. > >> > This change is a step toward using procfs file descriptors to work > >> > with processes, which makes the system more regular and easier to > >> > reason about. A clean API that's possible to use correctly is a > >> > worthwhile project. > >> > >> So I have been disucssing a new process API With David Howells, Kees > >> Cook and a few others and I am working on an RFC/proposal for this. It > >> is partially inspired by the new mount API. So I would like to block > >> this patch until then. I would like to get this right very much and > > It's good to hear that others are thinking about this problem. > > >> I > >> don't think this is the way to go. Because we want this to be generic and things like getting handles on processes via /proc is just a part of that. > > Why not? > > Does your proposed API allow for a race-free pkill, with arbitrary > selection criteria? This capability is a good litmus test for fixing > the long-standing Unix process API issues. You'd have a handle on the process with an fd so yes, it would be.