Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp4441340imd; Tue, 30 Oct 2018 01:52:41 -0700 (PDT) X-Google-Smtp-Source: AJdET5fTLcnZdhfhKVp0aY3VMnpUipkX1+4FsFipXEq6u9d3f5gOq8OkNqr4UYAlYl1VmFOkZN9s X-Received: by 2002:a63:9dca:: with SMTP id i193-v6mr16046995pgd.98.1540889561368; Tue, 30 Oct 2018 01:52:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540889561; cv=none; d=google.com; s=arc-20160816; b=gOiJYRrSJB6oIf0MJCSyfgFlZvg48woSFOXSbcGxaYRtZQXDXsnGn4aH0PCKq5UfKV DOIMsvMaYAeTcX/CUkf3HvZE8bkJZgDoxFyC1iW8J+2o3iQ7SwKXOdZPa0T9B/48AxUa 8PbL5B+4RAe7nnQUDc7qcBM3DNpHG1HORs3GmeBFKd5A7TMnw+3ao+bLPmb8Tx1gbVJq UX0XLcluiilr+LntDWHTVXgmrRHPaFoCimeEEakUn1rdDHgHtm8mCdWlaTQMsXEpslGa 76W84453jCNzPtyYvgE3plybtf02N+gC4KC42S0qrIA/jvfwncplV4xCBRej3va7KBlq M2Xg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature; bh=oEU/mv+esfKKeHliUNOW6T9zb6Me+WuY+3DiEVFHtfs=; b=0mjvxlEHiyV8ys4E49k3fUSgwoiEDvZGqKz32CB28fcn4QOoL7S2qCSllyyp3soclF dEHm+sdMUiXb5Ppm1elvsGdLHKoYsAVm40EGQ08JwKEKPgRLyUsFLhTocB3uCFsMF78x hKrhzOe/Z7ptNz8nMaJpnI2pzzCyIc2A5ok31bFEGvUQHVlplgo6toOuc9slYvfQ1wRh ylSk/XiclyO4kq2hfi+e1rmKsKza3XNXcPoTA2JUXV9cEGX/TxGssHOO7F3FRKm2mWt+ d82bC49U4TLPjgoL/te4D9WModnxQJj+YuTZXXmeArbh4vu7tOPEPAKE6eyAi/v+InVP +edA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qmmo0JHa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b28-v6si17152489pgm.568.2018.10.30.01.52.25; Tue, 30 Oct 2018 01:52:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qmmo0JHa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726743AbeJ3Rm4 (ORCPT + 99 others); Tue, 30 Oct 2018 13:42:56 -0400 Received: from mail-vk1-f196.google.com ([209.85.221.196]:40382 "EHLO mail-vk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726189AbeJ3Rm4 (ORCPT ); Tue, 30 Oct 2018 13:42:56 -0400 Received: by mail-vk1-f196.google.com with SMTP id h20-v6so2784432vke.7 for ; Tue, 30 Oct 2018 01:50:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=oEU/mv+esfKKeHliUNOW6T9zb6Me+WuY+3DiEVFHtfs=; b=qmmo0JHaZ1gM9tAyL1absGBu4ZVWnWusnmoOSO1af3fR3OYg24KxGIiQatWdp/3wih b2wdZI1XE+SGvguiMW+faF3ET9DqTfeaWjIEc1EtT+Xh50RL+/f/e7BntOOVKX7v/9wh 7ptTNhQyAHVzm5crR0vUcSTx0vLFf4NxDkNXkltdSSzm/VltWhP9ALW5kG+1owvg6Gog BGUQ13p+AMkApVOjaeB9cYu96aq8wS/s1pwqEtEzcWS85jRMPAQWt7zTuyO1Of5cavyk aiRl5CcitJMGciqdWZ3It6Pro/irE1GdLhuiU1QLo0gOkfBMkPUcM8ycnbIsEvAW/8hs iDcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=oEU/mv+esfKKeHliUNOW6T9zb6Me+WuY+3DiEVFHtfs=; b=kFSNzKRFvo7/Vil9JeYLViqa51Tmf9fhhI72l7R24Oi+8673pLpl+CCoqgTCH6J95x oaD9DHOl88GUQc6H2q677pFpUOIqPcWYIitD92UugAfx1XxqCyDvkucSUw5sBv2YM6oe FqBRNGBL1/BlD4MuHpKFakCa/VmhX8VcmZ/prCGB/qrfM8qRaxZRNz6sagTxgyd/6K5a vCBzVW/ZZlDIMr4ivOSjNm/d/3IOxvR0jRXEck/3pc3NQ7I6O1D5yvXcL9k0WDDY9Sf+ d2SYU34Tma0vfv5h61U+rtEBCNO961Rvuhq1OuhXwmCWyJPoS04iR8F3SdpaOpYrsNqL D1vA== X-Gm-Message-State: AGRZ1gJaD1SdsiKBCdJ9/G3fs0uvFAuML0ajGio0Z1uP4p2Aq3pmrN0Z FgHvYxbhKnLqKd6L2jov+e+0VdIT3J0T8VbIds6S5mUqTXRnjw== X-Received: by 2002:a1f:984e:: with SMTP id a75mr7884126vke.89.1540889423286; Tue, 30 Oct 2018 01:50:23 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a67:f492:0:0:0:0:0 with HTTP; Tue, 30 Oct 2018 01:50:22 -0700 (PDT) In-Reply-To: References: <20181029221037.87724-1-dancol@google.com> From: Daniel Colascione Date: Tue, 30 Oct 2018 08:50:22 +0000 Message-ID: Subject: Re: [RFC PATCH] Implement /proc/pid/kill To: Joel Fernandes Cc: LKML , Tim Murray , Suren Baghdasaryan Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 30, 2018 at 3:21 AM, Joel Fernandes wrote: > On Mon, Oct 29, 2018 at 3:11 PM Daniel Colascione wrote: >> >> Add a simple proc-based kill interface. To use /proc/pid/kill, just >> write the signal number in base-10 ASCII to the kill file of the >> process to be killed: for example, 'echo 9 > /proc/$$/kill'. >> >> Semantically, /proc/pid/kill works like kill(2), except that the >> process ID comes from the proc filesystem context instead of from an >> explicit system call parameter. This way, it's possible to avoid races >> between inspecting some aspect of a process and that process's PID >> being reused for some other process. >> >> With /proc/pid/kill, it's possible to write a proper race-free and >> safe pkill(1). An approximation follows. A real program might use >> openat(2), having opened a process's /proc/pid directory explicitly, >> with the directory file descriptor serving as a sort of "process >> handle". > > How long does the 'inspection' procedure take? If its a short > duration, then is PID reuse really an issue, I mean the PIDs are not > reused until wrap around and the only reason this can be a problem is > if you have the wrap around while the 'inspecting some aspect' > procedure takes really long. It's a race. Would you make similar statements about a similar fix for a race condition involving a mutex and a double-free just because the race didn't crash most of the time? The issue I'm trying to fix here is the same problem, one level higher up in the abstraction hierarchy. > Also the proc fs is typically not the right place for this. Some > entries in proc are writeable, but those are for changing values of > kernel data structures. The title of man proc(5) is "proc - process > information pseudo-filesystem". So its "information" right? Why should userspace care whether a particular operation is "changing [a] value[] of [a] kernel data structure" or something else? That something in /proc is a struct field is an implementation detail. It's the interface semantics that matters, and whether a particular operation is achieved by changing a struct field or by making a function call is irrelevant to userspace. Proc is a filesystem about processes. Why shouldn't you be able to send a signal to a process via proc? It's an operation involving processes. It's already possible to do things *to* processes via proc, e.g., adjust OOM killer scores. Proc filesystem file descriptors are userspace references to kernel-side struct pid instances, and as such, make good process handles. There are already "verb" files in procfs, such as /proc/sys/vm/drop_caches and /proc/sysrq-trigger. Why not add a kill "verb", especially if it closes a race that can't be closed some other way? You could implement this interface as a system call that took a procfs directory file descriptor, but relative to this proposal, it would be all downside. Such a thing would act just the same way as /pric/pid/kill, and wouldn't be usable from the shell or from programs that didn't want to use syscall(2). (Since glibc isn't adding new system call wrappers.) AFAIK, the only downside of having a "kill" file is the need for a string-to-integer conversion, but compared to process killing, integer parsing is insignificant. > IMO without a really good reason for this, it could really be a hard > sell but the RFC was worth it anyway to discuss it ;-) The traditional unix process API is down there at level -10 of Rusty Russel's old bad API scale: "It's impossible to get right". The races in the current API are unavoidable. That most programs don't hit these races most of the time doesn't mean that the race isn't present. We've moved to a model where we identify other system resources, like DRM fences, locks, sockets, and everything else via file descriptors. This change is a step toward using procfs file descriptors to work with processes, which makes the system more regular and easier to reason about. A clean API that's possible to use correctly is a worthwhile project.