Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2655748yba; Mon, 15 Apr 2019 16:59:59 -0700 (PDT) X-Google-Smtp-Source: APXvYqw8p/L+fj20eIlRbJRiB4gV53VSr3BKCaRjMGSk6W43A7QtbOYjJnyT2Oj9fZrsuw9fK5Ol X-Received: by 2002:a17:902:9a4c:: with SMTP id x12mr31385798plv.90.1555372799608; Mon, 15 Apr 2019 16:59:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555372799; cv=none; d=google.com; s=arc-20160816; b=TftPDmKIAvTWlf1xIl99ADFzDTk7dUeMmwekbssuW7eX1PgT8ZrNV6AvzfNn80rGth Uy4gI4WZAaAUJgLjK1M5e8LpF4WkVpUA5fMw4+/I4L34sFrDV9O1IbpAbfHwpEcxbttK t2340woW8pIe80aMHV+3Lppi6ReaBPEFomY5QBCoCynoQrDBqRm1OBbKoRsDRAqciKKJ zDl6DlanRqi3jlT/xPbkIHi6FTog29UCDFLAhMPApX7zVEcmnQHw21HKLgjqnftG7Fv7 EXXgo2D3twHQPbqE1WfUER5QlQ4/2q6pFri1UeDKadUd78GL5WEkVV9MWzX56g4yZ5FZ VUKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=gPqOWK4qmsb7BCT8ZdADKEgcfVodMOL6n995Zrnvkd4=; b=dpQC4TCY7k5PWKPRgOvn6Xd6AALlLwJOX7x2tjyOn5rYaGo5+G5xrks+CnK8/Tazvu 1T2ARXXtllzgqLXZLmgdV+69gg9ZtzkMwNYXwN4ILaVRotZP3A6EyMoPA8XLGl5Kk0R1 brZ30AIe1NmDJUjvtRY2tJRTKHzSuM2BferGE0MQfGtYzhPzSCtGX1ES0ziTuSkaIi1c ghCAwaAxV9PcywUfq2KEnwCFzcBIiLZijvYQ5HpsT732XbsrlgEaO4ELFZbLTfcWD8Xs msuNRBf+VLnLzAsWBzr3mkJb5KKrl7QpaWAizCydGGRVSc0rUjZm70xHeJDObBQyouja AHtA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=2l746GZF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 11si46166421plb.330.2019.04.15.16.59.42; Mon, 15 Apr 2019 16:59:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=2l746GZF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728190AbfDOX7H (ORCPT + 99 others); Mon, 15 Apr 2019 19:59:07 -0400 Received: from mail.kernel.org ([198.145.29.99]:56394 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726527AbfDOX7H (ORCPT ); Mon, 15 Apr 2019 19:59:07 -0400 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A694E20854 for ; Mon, 15 Apr 2019 23:59:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1555372745; bh=3+UbA1yAN3lsUeVKtHwfxlSU4L80rFsOslnpcjJt750=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=2l746GZFvPPC+7vZffZmakn3bmugVqUgWPhNilSMeGWIU6W3bINYR4L4o1Gki/4jF +YXOpaUcrCu0dJl++3bVaEZEAq77cpFmEwhBM7jkhLAOVynQG0ZeDqUolcf3emlLFM aEfmTuqEFyIo6zCLCmsdcYbNzoEwLpHJEqxS/PJQ= Received: by mail-wm1-f53.google.com with SMTP id q16so22813655wmj.3 for ; Mon, 15 Apr 2019 16:59:05 -0700 (PDT) X-Gm-Message-State: APjAAAXTh0jX7A6YZsKBrjLYgbTm3UNy6PqiM4J3VBfazeTKyD/3llsK tCQGj865uC8pGMryQXHvo9aCCNSYZnD62eMCD5i09Q== X-Received: by 2002:a1c:4e19:: with SMTP id g25mr25211794wmh.9.1555372744163; Mon, 15 Apr 2019 16:59:04 -0700 (PDT) MIME-Version: 1.0 References: <20190414201436.19502-1-christian@brauner.io> <20190415195911.z7b7miwsj67ha54y@yavin> In-Reply-To: From: Andy Lutomirski Date: Mon, 15 Apr 2019 16:58:52 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD] To: Jonathan Kowalski Cc: Andy Lutomirski , Aleksa Sarai , "Enrico Weigelt, metux IT consult" , Christian Brauner , Linus Torvalds , Al Viro , Jann Horn , David Howells , Linux API , LKML , "Serge E. Hallyn" , Arnd Bergmann , "Eric W. Biederman" , Kees Cook , Thomas Gleixner , Michael Kerrisk , Andrew Morton , Oleg Nesterov , Joel Fernandes , Daniel Colascione Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 15, 2019 at 2:26 PM Jonathan Kowalski wrote: > > On Mon, Apr 15, 2019 at 9:34 PM Andy Lutomirski wrote: > > I would personally *love* it if distros started setting no_new_privs > > for basically all processes. And pidfd actually gets us part of the > > way toward a straightforward way to make sudo and su still work in a > > no_new_privs world: su could call into a daemon that would spawn the > > privileged task, and su would get a (read-only!) pidfd back and then > > wait for the fd and exit. I suppose that, done naively, this might > > cause some odd effects with respect to tty handling, but I bet it's > > solveable. I suppose it would be nifty if there were a way for a > > Hmm, isn't what you're describing roughly what systemd-run -t does? It > will serialize the argument list, ask PID 1 to create a transient unit > (go through the polkit stuff), and then set the stdout/stderr and > stdin of the service to your tty, make it the controlling terminal of > the process and > reset it. So I guess it should work with sudo/su just fine too. > > There is also s6-sudod (and a s6-sudoc client to it) that works in a > similar fashion, though it's a lot less fancy. Cute. Now we just distros to work out the kinks and to ship these as sudo and su :) > > > process, by mutual agreement, to reparent itself to an unrelated > > process. > > > > Anyway, clone(2) is an enormous mess. Surely the right solution here > > is to have a whole new process creation API that takes a big, > > extensible struct as an argument, and supports *at least* the full > > abilities of posix_spawn() and ideally covers all the use cases for > > fork() + do stuff + exec(). It would be nifty if this API also had a > > way to say "add no_new_privs and therefore enable extra functionality > > that doesn't work without no_new_privs". This functionality would > > include things like returning a future extra-privileged pidfd that > > gives ptrace-like access. > > My idea was that this intent could be supplied at clone time, you > could attach ptrace access modes to a pidfd (we could make those a bit > granular, perhaps) and any API that takes PIDs and checks against the > caller's ptrace access mode could instead derive so from the pidfd. > Since killing is a bit convoluted due to setuid binaries, that should > work if one is CAP_KILL capable in the owning userns of the task, and > if not that, has permissions to kill and the target has NNP set. This CAP_KILL trick makes me nervous. This particular permission is really quite powerful, and it would need some analysis to conclude that it's not *more* powerful than CAP_KILL. > This > would allow you to bind kill privileges in a way that is compatible > with both worlds, the upshot being NNP allows for the functionality to > be available to a lot more of userspace. Ofcourse, this would require > a new clone version, possibly with taking a clone2 struct which sets a > few parameters for the process and the flags for the pidfd. > > Another point is that you have a pidfd_open (or something else) that > can create multiple pidfds from a pidfd obtained at clone time and > create pidfds with varying level of rights. It can also work by taking > a TID to open a pidfd for an external task (and then for all the > rights you wish to acquire on it, check against your ambient > authority). Indeed. > > (Actually, in general, having FMODE_* style bits spanning all methods > a file descriptor can take (through system calls), with the type of > object as key (class containing a set), and be able to enable/disable > them and seal them would be a useful addition, this all happening at > the struct file level instead of inode level sealing in memfds). At the risk of saying a dirty word, the Windows API works quite a bit like this :)