Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1833350imu; Sun, 18 Nov 2018 09:44:12 -0800 (PST) X-Google-Smtp-Source: AJdET5egZwYEOf6rHYh3U0lNLG8Lh920wuixCoejd+AzmIeRdc8WRE5ub+X7RjXKE/N3UTrCyjKj X-Received: by 2002:a62:178c:: with SMTP id 134-v6mr20077441pfx.29.1542563052395; Sun, 18 Nov 2018 09:44:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542563052; cv=none; d=google.com; s=arc-20160816; b=Jo1UvZLuFoEkFLNVGe59dP1pcXXoY6s9q2MJTZgs+C8NDvUN++9qag8a2PVKmOUSWo MNi0/u9aUi0gkzIn5j2dl3676dYyUYBBUJkHTDY3UJNQoD28zS3+9voft6nVuk4KzjC+ exgSeqx7oSZsotsrLTD0TTWQcq5bJbXNZm3+kH2wGKT1uW0hraHUxW0AwKWcI7gczpN5 4UAoBi7e+0TiL+W7hRaqc3Acao7q8drQosVGTODpKEisxcvj/qmD5FupVF94g4USFO1p SWoIexnQG6ejoyLQN39RoMDBjSqUHVV21zR2Zz/U4ZtgQkfsOX87PujY3qFloGF80F6v rkmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=axl1DnWinL+hBF8B/IHIfMuW7OQHfHRCQTfh3g6SBWU=; b=ndP6km74LbVltXoJLAZoTrYPa1zDScH5mNOTa9tyOeLyMKTy0XiwiEqcKy8JeXnUSV iHSg3oMpzEQFyS3DEiem4v3uY/C6DyC4ZU3Uk9MI+su1tR3Dx9xzUhvrLz5iSx8hhNWW UnFNiOJ96vBE8bfWPqMa6M1Y6tLuh543W9WMGCBT7PtlJsIC5pU8Xlxe4lT9ZX0Ka+Uq wr/giHqHDaxioXpkG/xsikrqBljxMhaKuZJgv57vxubxbJ5fJzvd7oWD2F65+QY6sUlz slW2MVa2AXw1HdQgRgapCDF68Tvdlf4/5gRGvkv3yas4pozB5SMLQP0CJ2lmBL4uYQ1T xrEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=gMJUxCZf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1-v6si13756652plx.278.2018.11.18.09.43.57; Sun, 18 Nov 2018 09:44:12 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=gMJUxCZf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727168AbeKSEDm (ORCPT + 99 others); Sun, 18 Nov 2018 23:03:42 -0500 Received: from mail.kernel.org ([198.145.29.99]:39804 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726366AbeKSEDm (ORCPT ); Sun, 18 Nov 2018 23:03:42 -0500 Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com [209.85.128.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1394A2133F for ; Sun, 18 Nov 2018 17:42:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542562969; bh=bCs1J/7Yfm0qq5kP6PaCpP2lviOXbQyHS08M4EvEz5w=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=gMJUxCZf3aEWSJtlxFBa3CP6yw1MZmwnyY/joLuhgglYDp5Cv/g5D/FRn8Lzb1aEr xPUZHrg34hLsYWHw7sVmGOAWS+GVh2jmI2ocwamDPc4wttz5f8UyDQAymahRC/HJO6 fw+OzFlQxFuHkm86pSEQHNOxA63dGd06KVMZ0pjE= Received: by mail-wm1-f44.google.com with SMTP id g131so2705181wmg.3 for ; Sun, 18 Nov 2018 09:42:48 -0800 (PST) X-Gm-Message-State: AGRZ1gLwSs1unFgIu6g/JkiG3xPpTsnIpOpTmPZZcWCIsn8i1BPfYSuH XkH0tyugCzntVS8P/xa/UJI7wDSiC3Go4PAQX5WCMw== X-Received: by 2002:a1c:bb42:: with SMTP id l63-v6mr4963895wmf.32.1542562967299; Sun, 18 Nov 2018 09:42:47 -0800 (PST) MIME-Version: 1.0 References: <20181118111751.6142-1-christian@brauner.io> In-Reply-To: From: Andy Lutomirski Date: Sun, 18 Nov 2018 09:42:35 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] proc: allow killing processes via file descriptors To: Daniel Colascione Cc: Andrew Lutomirski , Randy Dunlap , Christian Brauner , "Eric W. Biederman" , LKML , "Serge E. Hallyn" , Jann Horn , Andrew Morton , Oleg Nesterov , Aleksa Sarai , Al Viro , Linux FS Devel , Linux API , Tim Murray , Kees Cook , Jan Engelhardt Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Nov 18, 2018 at 9:24 AM Daniel Colascione wrote: > Assuming we don't broaden exit status readability (which would make a > lot of things simpler), the exit notification mechanism must work like > this: if you can see a process in /proc, you should be able to wait on > it. If you learn that process's exit status through some other means > --- e.g., you're the process's parent, you can ptrace the process, you > have CAP_WHATEVER_IT_IS_ --- then you should be able to learn the fate > of the process. Otherwise you just be able to learn that the process > exited. Sounds reasonable to me. Except for the obvious turd that, if you open /proc/PID/whatever, and the process calls execve(), then the resulting semantics are awkward at best. > > > Windows has an easy time of it because > > Windows has an easier time of it because it doesn't use an ad-hoc > ambient authority permission model. In Windows, if you can open a > handle to do something, that handle lets you do the thing. Period. > There's none of this "well, I opened this process FD, but since I > opened it, the process called setuid, so now I can't get its exit > status" nonsense. Privilege elevation is always accomplished via a > separate call to CreateProcessWithToken, which creates a *new* process > with the elevated privileges. An existing process can't suddenly and > magically become this special thing that you can't inspect, but that > has the same PID and identity as this other process that you used to > be able to inspect. The model is just better, because permission is > baked into the HANDLE. Now, that ship has sailed. We're stuck with > setreuid and exec. But let's be clear about what's causing the > complexity. I'm not entirely sure that ship has sailed. In the kernel, we already have a bit of a distinction between a pid (and tid, etc -- I'm referring to struct pid) and a task. If we make a new process-management API, we could put a distinction like this into the API. As a straw-man proposal (highly incomplete and probably wrong, but maybe it gets the idea across): Have a way to get an fd that refers to a "running program". (I'm calling it that to distinguish it from "task" and "pid", both of which already mean something.) You'd be able to open such an fd given a pid, and your permissions would be checked at that time. R access means you can read the running program's memory and otherwise introspect it. W means you can modify it's memory and otherwise mess with it. X means you can send it signals. We might need more bits to really do this right. Now here's the kicker: if the "running program" calls execve(), it goes away. The fd gets some sort of notification that this happened and there's an API to get a handle to the new running program *if the caller has the appropriate permissions*. setresuid() has no effect here -- if you have W access to the process and the process calls setresuid(), you still have W access. To make this fully useful, we'd probably want to elaborate it with a race-free way to track all descendents and, if needed, kill them all, subject to permissions. This API ought to be extensible to replace ptrace() eventually. Does this seem like a reasonable direction to go in?