Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3807381yba; Mon, 29 Apr 2019 08:51:14 -0700 (PDT) X-Google-Smtp-Source: APXvYqzqv3B714m2EltQ3u7mp7S+hlbwdE0Qm6q0vVrU3taRLCtx/kfAYELmeEHZsnC3Bc/pn2to X-Received: by 2002:a63:5d46:: with SMTP id o6mr24741311pgm.217.1556553074174; Mon, 29 Apr 2019 08:51:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556553074; cv=none; d=google.com; s=arc-20160816; b=HDbjaYLRFnn8wpwoLS5Rrbr2yEDLm7wN5BY2dTSG8sP5GxIs2Ravwia5egVjoa3j9d gbuU5fi5hIlJBJxM1IJUW3W7ZNcf1A2ajLbwGEpy8UukYbjAhzUi4yYgBvttFGD8y1la AVETh0EoddRXQc5l+xJp12ngAGtnAWWHj9SBvvkUKZVYTs8Qx+3n21Be3OhN6PBY5HBn MaJ2rQxUjM78qEf8Er+45STGxlVaMrW/sT9k3+cxyN3IV9TYstOYQ9Y69ivHuKGigMho GLtc7H+UvAAjzMe6smGPhgvC6DeQChmtb7nCzqnhYxPYqrY8roiiMLrNpLIRlLSVynY0 O4DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=UagJkd6wgwN4qR6daGNwOz1V3huNKQRtx+zVSASlPlg=; b=bvRO6Yngd0WJ+fDs1fo7xYGUlDvYHowwWDkkuemNTce3osjjqtg6kXzWS2R3PUfRV1 tJSdppAB+ZxH/eYQhIyieJGBabCR4tfEKhocbttunLrcCB1yCbe6hq/I4GPUiWRgE8+G 6u0VOD9NgDPxD8S2XJWKKChrP1//BhJ35HoKTo3K4yj4jP2ph5aftLHWPKzpeat7endz pRwG/Q8sbpwhtAExj8V1dDuFzz4mQWiVMT3G4+BzTe7V+yrlqniQzOzjhUSf9lOp8Cub d5+4kFtCRGJgIbiaDZgKKRwAVUSWcw1S8PiFORtRjvRrDcshwcHLlBJj9zEYXYx0HTvX DCqA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 31si34958396plk.42.2019.04.29.08.50.57; Mon, 29 Apr 2019 08:51:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728629AbfD2Ptx (ORCPT + 99 others); Mon, 29 Apr 2019 11:49:53 -0400 Received: from mail.hallyn.com ([178.63.66.53]:39526 "EHLO mail.hallyn.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728555AbfD2Ptx (ORCPT ); Mon, 29 Apr 2019 11:49:53 -0400 Received: by mail.hallyn.com (Postfix, from userid 1001) id D18296C6; Mon, 29 Apr 2019 10:49:49 -0500 (CDT) Date: Mon, 29 Apr 2019 10:49:49 -0500 From: "Serge E. Hallyn" To: "Enrico Weigelt, metux IT consult" Cc: "Serge E. Hallyn" , Christian Brauner , torvalds@linux-foundation.org, viro@zeniv.linux.org.uk, jannh@google.com, dhowells@redhat.com, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, luto@kernel.org, arnd@arndb.de, ebiederm@xmission.com, keescook@chromium.org, tglx@linutronix.de, mtk.manpages@gmail.com, akpm@linux-foundation.org, oleg@redhat.com, cyphar@cyphar.com, joel@joelfernandes.org, dancol@google.com Subject: Re: RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD] Message-ID: <20190429154949.GA23456@mail.hallyn.com> References: <20190414201436.19502-1-christian@brauner.io> <20190415155034.GA25351@mail.hallyn.com> <000a64d6-1e22-21bf-f232-15f141092e44@metux.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000a64d6-1e22-21bf-f232-15f141092e44@metux.net> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 16, 2019 at 08:32:50PM +0200, Enrico Weigelt, metux IT consult wrote: (Sorry for the late reply, I had missed this one) > On 15.04.19 17:50, Serge E. Hallyn wrote: > > Hi, > > >> I'm working on implementing plan9-like fs namespaces, where unprivileged>> processes can change their own namespace at will. For that, certain> > > Is there any place where we can see previous discussion about this? > Yes, lkml and constainers list. > It's stalled since few month, as I'm too busy w/ other things. > > > If you have to disable suid anyway, then is there any reason why the> existing ability to do this in a private user namespace, with only> > your own uid mapped (which you can do without any privilege) does> not > suffice? That was actually one of the main design goals of user> > namespaces, to be able to clone(CLONE_NEWUSER), map your current uid,> > then clone(CLONE_NEWNS) and bind mount at will. > Well, it's not that easy ... maybe I should explain a bit more about how > Plan9 works, and how I intent to map it into Linux: > > * on plan9, anybody can alter his own fs namespace (bind and mount), as > well as spawning new ones > * basically anything is coming from some fileserver - even devices > (eg. there is no such thing like device nodes) > * access control is done by the individual fileservers, based on the > initial authentication (on connecting to the server, before mounting) yes, so far I'm aware of this, > * all users are equal - no root at all. the only exception is the > initial process, which gets the kernel devices mounted into his > namespace. This does not match my understanding, but I'm most likely wrong. (I thought there was an actual 'host owner' uid, which mostly is only used for initial process, but is basically root with a different name, and used far less. No uid transitions without factotem so that it *looked* like no root user). > What I'd like to achieve on Linux: > > * unprivileged users can have their own mount namespace, where they > can mount at will (maybe just 9P). No problem, you can do that now. > * but they still appear as the same normal users to the rest of the > system No problem, you can do that now. > * 9p programs (compiled for Linux ABI) can run parallel to traditional > linux programs within the same user and sessions (eg. from a terminal, > i can call both the same way) > * namespace modifications affect both equally (eg. I could run ff in > an own ns) affect both of what equally? > * these namespaces exist as long as there's one process alive in here That's sort of how it is now, except you can also pin the namespaces with their fds. > * creating a new ns can be done by unprivileged user That's true now. > One of the things to make this work (w/o introducing a massive security > hole) is disable suid for those processes (actually, one day i'd like to > get rid of it completely, but that's another story). That's exactly what user namespaces are for. You can create a new user namespace, using no privilege at all, with your current uid (i.e. 1000) mapped to whatever uid you like; if you pick 0, then you can unshare all the namespaces you like. Once you unshare mnt_ns, you can mount to your heart's content. To other processes on the host, your process is uid 1000. Host uid 0 is not mapped into your ns, so you cannot exploit suid to host root. Regarding factotem, I agree that with the pidfd work going on etc, it's getting more and more tempting to attempt a switch to that. Looking back at my folder, I see you posted a kernel patch for it. I had done the same long ago. Happy to work with you again on that, and put a simple daemon into shadow package, if util-linux isn't deemed the far better place. -serge