Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2525302yba; Mon, 15 Apr 2019 13:34:40 -0700 (PDT) X-Google-Smtp-Source: APXvYqzoyx9F4xL5clzz8QbFc4AplN8j/UJc5Ef120FkxS99IC74lj7zbEfTSPkdFvbKWAci9i9K X-Received: by 2002:aa7:943b:: with SMTP id y27mr53780284pfo.59.1555360480640; Mon, 15 Apr 2019 13:34:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555360480; cv=none; d=google.com; s=arc-20160816; b=YGaRZ2ICGWQyiWdgYDnG0kKLKUfplMGgOKWFHbZuNLR6YPBXzcH6KQO+iqLWzb7sfY JqBJxTiqptaA6t/2WnroLg6SBFOMdt+fmfQ/jO3A0wVDylj7u8XaUPdLoeh8h1K31Wck Tn8+5J4XqoAofeX+ZNiRyaEd6O0TrqYCgCa2wrWB+rrmba2jYmzYVPENVQvDcFOTw/wK r6Io5zSAd62QlrOPGpTlA4LE3RqEif7RZb3CV8p7NZ6FQ6GAgazv7OfRoCY30APhUi5V gJRwldSfEZ1hX3NJzSjkuH1rwqAIpFOk3Zg85+H1orvFQnn1qtGl6Rl000oYFnHish72 +XPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=XX99TCG+odUr7nfPKAJ4NJnVNZjI3leJ5oQfKA+oP7w=; b=yo7C4HnEa6niWEGrwsCEDuqyHZ/pVKCcpnejB49DSsrWNUbOeBByTtxT+P4zEiSTxq WYmPlIri9MbUTvswstMMyrihfyOOmWk7Zx0WjbnjQLSzM1cFeUDnqn3jV17KiNQRRbF5 J0KPRa8mM85AxxN8oMtRQoeSeb8WGoJlQeLJA4svQk8+haOjTEbyzl1TX7N5qA7y3oT9 MEGginveMPRGr9vTbozaX35opZGQ8ilg4d2mitS13+LN5Mru8sn3VLcHe2nXkb8lSsSI HEiUMTmgD5VRg5MbuSTKrsCtPPpyo2pUoj9/IU9Q1GUsHU2Xpc1pWHs4xqny/5DT30ij 6ITQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=diOtadqu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v5si45174453pfm.134.2019.04.15.13.34.24; Mon, 15 Apr 2019 13:34:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=diOtadqu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728273AbfDOU3i (ORCPT + 99 others); Mon, 15 Apr 2019 16:29:38 -0400 Received: from mail.kernel.org ([198.145.29.99]:49056 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727910AbfDOU3h (ORCPT ); Mon, 15 Apr 2019 16:29:37 -0400 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id DCD6021909 for ; Mon, 15 Apr 2019 20:29:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1555360177; bh=Rh8v9GqgBV+0qOTCc87sR2oSBg3MJSxGGbsRHQN8SMI=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=diOtadquN6QdwACXgwremnrq5LX2pFAPgxDV0jFn5o8om5hYdhT+o65Q6HyHaK2Id +vTJ6qIdZ3Rf7iiw8c0mOHGtdYHwND4bodiZWlxAdTjQkvX+RyT6GKz+MSd/RA3kHh feFYcKX3YQyeL9akDvM42mq7pBZZvFnxjSPaSMqA= Received: by mail-wm1-f53.google.com with SMTP id h18so22109174wml.1 for ; Mon, 15 Apr 2019 13:29:36 -0700 (PDT) X-Gm-Message-State: APjAAAUxOW12VML8YNFVRDPGCNHOEv8L8JI+fLkhQB2RMd+9CnQJWSUR hoR8rky+HIr01bf6gcV4+R5kinf3Z/ZCPlmDQtAtDw== X-Received: by 2002:a1c:4102:: with SMTP id o2mr23122688wma.91.1555360175032; Mon, 15 Apr 2019 13:29:35 -0700 (PDT) MIME-Version: 1.0 References: <20190414201436.19502-1-christian@brauner.io> <20190415195911.z7b7miwsj67ha54y@yavin> In-Reply-To: <20190415195911.z7b7miwsj67ha54y@yavin> From: Andy Lutomirski Date: Mon, 15 Apr 2019 13:29:23 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD] To: Aleksa Sarai Cc: "Enrico Weigelt, metux IT consult" , Christian Brauner , Linus Torvalds , Al Viro , Jann Horn , David Howells , Linux API , LKML , "Serge E. Hallyn" , Andrew Lutomirski , Arnd Bergmann , "Eric W. Biederman" , Kees Cook , Thomas Gleixner , Michael Kerrisk , Andrew Morton , Oleg Nesterov , Joel Fernandes , Daniel Colascione Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 15, 2019 at 12:59 PM Aleksa Sarai wrote: > > On 2019-04-15, Enrico Weigelt, metux IT consult wrote: > > > This patchset makes it possible to retrieve pid file descriptors at > > > process creation time by introducing the new flag CLONE_PIDFD to the > > > clone() system call as previously discussed. > > > > Sorry, for highjacking this thread, but I'm curious on what things to > > consider when introducing new CLONE_* flags. > > > > The reason I'm asking is: > > > > I'm working on implementing plan9-like fs namespaces, where unprivileged > > processes can change their own namespace at will. For that, certain > > traditional unix'ish things have to be disabled, most notably suid. > > As forbidding suid can be helpful in other scenarios, too, I thought > > about making this its own feature. Doing that switch on clone() seems > > a nice place for that, IMHO. > > Just spit-balling -- is no_new_privs not sufficient for this usecase? > Not granting privileges such as setuid during execve(2) is the main > point of that flag. > I would personally *love* it if distros started setting no_new_privs for basically all processes. And pidfd actually gets us part of the way toward a straightforward way to make sudo and su still work in a no_new_privs world: su could call into a daemon that would spawn the privileged task, and su would get a (read-only!) pidfd back and then wait for the fd and exit. I suppose that, done naively, this might cause some odd effects with respect to tty handling, but I bet it's solveable. I suppose it would be nifty if there were a way for a process, by mutual agreement, to reparent itself to an unrelated process. Anyway, clone(2) is an enormous mess. Surely the right solution here is to have a whole new process creation API that takes a big, extensible struct as an argument, and supports *at least* the full abilities of posix_spawn() and ideally covers all the use cases for fork() + do stuff + exec(). It would be nifty if this API also had a way to say "add no_new_privs and therefore enable extra functionality that doesn't work without no_new_privs". This functionality would include things like returning a future extra-privileged pidfd that gives ptrace-like access. As basic examples, the improved process creation API should take a list of dup2() operations to perform, fds to remove the O_CLOEXEC flag from, fds to close (or, maybe even better, a list of fds to *not* close), a list of rlimit changes to make, a list of signal changes to make, the ability to set sid, pgrp, uid, gid (as in setresuid/setresgid), the ability to do capset() operations, etc. The posix_spawn() API, for all that it's rather complicated, covers a bunch of the basics pretty well. Sharing the parent's VM, signal set, fd table, etc, should all be options, but they should default to *off*. (Many other operating systems allow one to create a process and gain a capability to do all kinds of things to that process. It's a generally good idea.) --Andy