Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1960017imu; Sun, 18 Nov 2018 12:30:53 -0800 (PST) X-Google-Smtp-Source: AJdET5djl5bwM/2byD2FOeMhWQOuJTCEXlEoe8xSoK+hQIu6M7SnDuRR0eRVkl4u57ukXumlw81E X-Received: by 2002:a62:ed09:: with SMTP id u9-v6mr19701614pfh.188.1542573053093; Sun, 18 Nov 2018 12:30:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542573053; cv=none; d=google.com; s=arc-20160816; b=E3bMuzqh0aKK/5ctPrPcosvuFE8ww8qDI8LAvQVUrrSPoZB4UZ1j0YWi71dbQSv6LH sRuWGvrB9WDNswyLl3WO2mJ5iF0TStCP5dNFi6bv8D7yxBiCvcwwdFEpQbObZ6Zz2HO0 VE7oomHDk17g0jjGhVxgyJiUlC7gk1oqE66Atb3JwK4IQxUq8Bs5z7QcixkDAPdlklv7 J2Cv1KIdytpoU+9uv6QTTgN5Ik7IMOyZ54cfCEGcS60i1MF7U/77B1nxLxuuPfbOKz8k dlx4C1WuO9hJY6IcB25OddPjRBPVXAtPtJkAbznjsThaJQoshO1c3ItsXBpHLYT0jzYa 7PbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=09TMh7ltv9Knkf6XQBcc1ye3e4wBO5cn7QHqPt6W2Ww=; b=Hxtr/8i/jz157L6Dd6FbE4My24JMz9wXpjWoiBcLk50TrfO0d/ZF0bbIeW6koY4mkB ZOjn8Y3NpuycRbjqeMFmj4XzKEm+q1SfV876rIFbHCMu3f/AeaiJ/2XWyg0IsOSX7ANb BxZGePXfuPO2qc+VGX3iThBxy1Pa1XqDcTQHPFIXvVmtngxue7TSVpshPD4nUNBOxfDz dG1BotDSnNSsEjRBgq1Sv0Ka6PJCKTzlp4nuBwQlG2hW0hvHMax54+8r5wR/n0pflySt 579cMthl8ip9jRGK5aTQpW66HZ48r0TZSfqmkqrTSDt7mcW4UOd/byOs4xJfzku4szUN kWuw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b="dZGLagY/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 30-v6si38324712plb.342.2018.11.18.12.30.36; Sun, 18 Nov 2018 12:30:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b="dZGLagY/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726975AbeKSGgl (ORCPT + 99 others); Mon, 19 Nov 2018 01:36:41 -0500 Received: from mail-pg1-f194.google.com ([209.85.215.194]:38987 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725805AbeKSGgk (ORCPT ); Mon, 19 Nov 2018 01:36:40 -0500 Received: by mail-pg1-f194.google.com with SMTP id w6so584028pgl.6 for ; Sun, 18 Nov 2018 12:15:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brauner.io; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=09TMh7ltv9Knkf6XQBcc1ye3e4wBO5cn7QHqPt6W2Ww=; b=dZGLagY/1faClkZX1qWovtSTj82b60UaWO8IUfEdXk3o8fIRNjtU/mrcrl/Ah1XI1Y rR7zkzg3JPNmMk9UPRsnueqQUM7cj+iS0TsGx6Z71AlzG2mhpGRYo5noYA1wx+ENnL2f 73gNqyUXSPn/IQ2MWsOTJfty8YcDaGxZ/dLAOf1dAhEzLl2iwb+KyolxguG5VTiEevQH B8olS+MqvTDoL2HTYkUj8snDpJLjngIIU0h+XezC+1alRTr25NidkJ3mt9ZyQiCPDQIt UPQppMBy7Si3KXod8veEXnVdvSt8eAy7TJ68cXnJKcXO7M2mg5CdaKczP9oD5KtqHtZT 81wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=09TMh7ltv9Knkf6XQBcc1ye3e4wBO5cn7QHqPt6W2Ww=; b=ZEdd1oQPxPnp2iZK7gyre3bRmkeK+tWSj/sM+f2EagdjDnWkzvXC6ABnOfBeJ4xzES QkwY+uyu3v02StRCGuiRzCcozwJpyxHKCAS+/qw7mOmlWLIVJ/1JQqAWVEcdszTuCPN5 Vmm7K/vTDWgG42JUJNbdGbr5sM0RZJehxnpLKgVUExTD0yxZMFLvA3oUptJyUjJqJQcU i9BXgu8C5Zf7th1ycuOdELf4j0bEFzngC1Ljt4Q+Wt0hXYX2JB6TYvi+mHYMlo2oGzBM Pd+h56eTb8h5mSmkB9ZTYvdxml3z6kUA9H7+/8qsWcXNoMu81rxUJsOKqHo3M2mjSLjP BQzA== X-Gm-Message-State: AGRZ1gJ/uPQxt1uOd5K5TtHTFMTqeZ3ZjDwhTNIDTWEYyyoQl1EzoCPg RVgyMvepF0SI5aefEEtjQgOS0A== X-Received: by 2002:a63:7c13:: with SMTP id x19mr17395594pgc.45.1542572125921; Sun, 18 Nov 2018 12:15:25 -0800 (PST) Received: from brauner.io ([2404:4404:133a:4500:9d11:de0b:446c:8485]) by smtp.gmail.com with ESMTPSA id q199sm26626979pfc.97.2018.11.18.12.15.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 18 Nov 2018 12:15:25 -0800 (PST) Date: Sun, 18 Nov 2018 21:15:16 +0100 From: Christian Brauner To: Daniel Colascione Cc: Aleksa Sarai , Andy Lutomirski , Randy Dunlap , "Eric W. Biederman" , LKML , "Serge E. Hallyn" , Jann Horn , Andrew Morton , Oleg Nesterov , Al Viro , Linux FS Devel , Linux API , Tim Murray , Kees Cook , Jan Engelhardt Subject: Re: [PATCH] proc: allow killing processes via file descriptors Message-ID: <20181118201514.c5ujdjav6ccodyff@brauner.io> References: <20181118190504.ixglsqbn6mxkcdzu@yavin> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Nov 18, 2018 at 11:44:19AM -0800, Daniel Colascione wrote: > On Sun, Nov 18, 2018 at 11:05 AM, Aleksa Sarai wrote: > > On 2018-11-18, Daniel Colascione wrote: > >> > Here's my point: if we're really going to make a new API to manipulate > >> > processes by their fd, I think we should have at least a decent idea > >> > of how that API will get extended in the future. Right now, we have > >> > an extremely awkward situation where opening an fd in /proc requires > >> > certain capabilities or uids, and using those fds often also checks > >> > current's capabilities, and the target process may have changed its > >> > own security context, including gaining privilege via SUID, SGID, or > >> > LSM transition rules in the mean time. This has been a huge source of > >> > security bugs. It would be nice to have a model for future APIs that > >> > avoids these problems. > >> > > >> > And I didn't say in my proposal that a process's identity should > >> > fundamentally change when it calls execve(). I'm suggesting that > >> > certain operations that could cause a process to gain privilege or > >> > otherwise require greater permission to introspect (mainly execve) > >> > could be handled by invalidating the new process management fds. > >> > Sure, if init re-execs itself, it's still PID 1, but that doesn't > >> > necessarily mean that: > >> > > >> > fd = process_open_management_fd(1); > >> > [init reexecs] > >> > process_do_something(fd); > >> > > >> > needs to work. > >> > >> PID 1 is a bad example here, because it doesn't get recycled. Other > >> PIDs do. The snippet you gave *does* need to work, in general, because > >> if exec invalidates the handle, and you need to reopen by PID to > >> re-establish your right to do something with the process, that process > >> may in fact have died between the invalidation and your reopen, and > >> your reopened FD may refer to some other random process. > > > > I imagine the error would be -EPERM rather than -ESRCH in this case, > > which would be incredibly trivial for userspace to differentiate > > between. > > Why would userspace necessarily see EPERM? The PID might get recycled > into a different random process that the caller has the ability to > affect. > > > If you wish to re-open the path that is also trivial by > > re-opening through /proc/self/fd/$fd -- which will re-do any permission > > checks and will guarantee that you are re-opening the same 'struct file' > > and thus the same 'struct pid'. > > When you reopen via /proc/self/fd, you get a new struct file > referencing the existing inode, not the same struct file. A new > reference to the old struct file would just be dup. > > Anyway: what other API requires, for correct operation, occasional > reopening through /proc/self/fd? It's cumbersome, and it doesn't add > anything. If we invalidate process handles on execve, and processes > are legally allowed to re-exec themselves for arbitrary reasons at any > time, that's tantamount to saying that handles might become invalid at > any time and that all callers must be prepared to go through the > reopen-and-retry path before any operation. > > Why are we making them do that? So that a process can have an open FD > that represents a process-operation capability? Which capability does > the open FD represent? > > I think when you and Andy must be talking about is an API that looks like this: > > int open_process_operation_handle(int procfs_dirfd, int capability_bitmask) > > capability_bitmask would have bits like > > PROCESS_CAPABILITY_KILL --- send a signal > PROCESS_CAPABILITY_PTRACE --- attach to a process > PROCESS_CAPABILITY_READ_EXIT_STATUS --- what it says on the tin > PROCESS_CAPABILITY_READ_CMDLINE --- etc. > > Then you'd have system calls like > > int process_kill(int process_capability_fd, int signo, const union sigval data) > int process_ptrace_attach(int process_capability_fd) > int process_wait_for_exit(int process_capability_fd, siginfo_t* exit_info) > > that worked on these capability bits. If a process execs or does > something else to change its security capabilities, operations on > these capability FDs would fail with ESTALE or something and callers > would have to re-acquire their capabilities. > > This approach works fine. It has some nice theoretical properties, and > could allow for things like nicer privilege separation for debuggers. > I wouldn't mind something like this getting into the kernel. > > I just don't think this model is necessary right now. I want a small > change from what we have today, one likely to actually make it into > the tree. And bypassing the capability FDs and just allowing callers > to operate directly on process *identity* FDs, using privileges in > effect at the time of all, is at least no worse than what we have now. > > That is, I'm proposing an API that looks like this: > > int process_kill(int procfs_dfd, int signo, const union sigval value) I've started a second tree with process_signal(int procpid_dfd, int sig) instead of an ioctl(). Why do you want sigval too? > > If, later, process_kill were to *also* accept process-capability FDs, > nothing would break. > > >> The only way around this problem is to have two separate FDs --- one > >> to represent process identity, which *must* be continuous across > >> execve, and the other to represent some specific capability, some > >> ability to do something to that process. It's reasonable to invalidate > >> capability after execve, but it's not reasonable to invalidate > >> identity. In concrete terms, I don't see a big advantage to this > >> separation, and I think a single identity FD combined with > >> per-operation capability checks is sufficient. And much simpler. > > > > I think that the error separation above would trivially allow user-space > > to know whether the identity or capability of a process being monitored > > has changed. > > > > Currently, all operations on a '/proc/$pid' which you've previously > > opened and has died will give you -ESRCH. > > Not the case. Zombies have died, but profs operations work fine on zombies. > > >> > Similarly, it seems like > >> > it's probably safe to be able to open an fd that lets you watch the > >> > exit status of a process, have the process call setresuid(), and still > >> > see the exit status. > >> > >> Is it? That's an open question. > > > > Well, if we consider wait4(2) it seems that this is already the case. > > If you fork+exec a setuid binary you can definitely see its exit code. > > Only if you're the parent. Otherwise, you can't see the process exit > status unless you pass a ptrace access check and consult > /proc/pid/stat after the process dies, but before the zombie > disappears. Random unrelated and unprivileged processes can't see exit > statuses from distant parts of the system. > > >> > My POLLERR hack, aside from being ugly, > >> > avoids this particular issue because it merely lets you wait for > >> > something you already could have observed using readdir(). > >> > >> Yes. I mentioned this same issue-punting as the motivation behind > >> exithand, initially, just reading EOF on exit. > > > > One question I have about EOF-on-exit is that if we wish to extend it to > > allow providing the exit status (which is something we discussed in the > > original thread), how will multiple-readers be handled in such a > > scenario? > > Would we be storing the exit status or siginfo in the > > equivalent of a locked memfd? > > Yes, that's what I have in mind. A siginfo_t is small enough that we > could just store it as a blob allocated off the procfs inode or > something like that without bothering with a shmfs file. You'd be able > to read(2) the exit status as many times as you wanted.