Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933466AbeAJCJW (ORCPT + 1 other); Tue, 9 Jan 2018 21:09:22 -0500 Received: from mail-yb0-f181.google.com ([209.85.213.181]:39273 "EHLO mail-yb0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933327AbeAJCJT (ORCPT ); Tue, 9 Jan 2018 21:09:19 -0500 X-Google-Smtp-Source: ACJfBotWybaGLqejnqWmF+LnEooRNK5L4M3/onueBJ73AR+3ZL5vI/23aRvGsAydWMD+4nCPCQyTtuBnzGMqU2AeqaE= MIME-Version: 1.0 In-Reply-To: <20180109222859.GA25956@mail.hallyn.com> References: <20180108062452.GA21717@mail.hallyn.com> <20180108154733.GA29416@mail.hallyn.com> <20180108181121.GA32302@mail.hallyn.com> <20180108183610.GA562@mail.hallyn.com> <20180109222859.GA25956@mail.hallyn.com> From: =?UTF-8?B?TWFoZXNoIEJhbmRld2FyICjgpK7gpLngpYfgpLYg4KSs4KSC4KSh4KWH4KS14KS+4KSwKQ==?= Date: Tue, 9 Jan 2018 18:08:58 -0800 Message-ID: Subject: Re: [PATCHv3 0/2] capability controlled user-namespaces To: "Serge E. Hallyn" Cc: James Morris , LKML , Netdev , Kernel-hardening , Linux API , Kees Cook , "Eric W . Biederman" , Eric Dumazet , David Miller , Mahesh Bandewar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 9, 2018 at 2:28 PM, Serge E. Hallyn wrote: > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com): >> On Mon, Jan 8, 2018 at 10:36 AM, Serge E. Hallyn wrote: >> > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com): >> >> On Mon, Jan 8, 2018 at 10:11 AM, Serge E. Hallyn wrote: >> >> > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com): >> >> >> On Mon, Jan 8, 2018 at 7:47 AM, Serge E. Hallyn wrote: >> >> >> > Quoting James Morris (james.l.morris@oracle.com): >> >> >> >> On Mon, 8 Jan 2018, Serge E. Hallyn wrote: >> >> >> >> I meant in terms of "marking" a user ns as "controlled" type -- it's >> >> >> >> unnecessary jargon from an end user point of view. >> >> >> > >> >> >> > Ah, yes, that was my point in >> >> >> > >> >> >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/01845.html >> >> >> > and >> >> >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/02276.html >> >> >> > >> >> >> >> This may happen internally but don't make it a special case with a >> >> >> >> different name and don't bother users with internal concepts: simply >> >> >> >> implement capability whitelists with the default having equivalent >> >> > >> >> > So the challenge is to have unprivileged users be contained, while >> >> > allowing trusted workloads in containers created by a root user to >> >> > bypass the restriction. >> >> > >> >> > Now, the current proposal actually doesn't support a root user starting >> >> > an application that it doesn't quite trust in such a way that it *is* >> >> > subject to the whitelist. >> >> >> >> Well, this is not hard since root process can spawn another process >> >> and loose privileges before creating user-ns to be controlled by the >> >> whitelist. >> > >> > It would have to drop cap_sys_admin for the container to be marked as >> > "controlled", which may prevent the container runtime from properly starting >> > the container. >> > >> Yes, but that's a conflict of trusted operations (that requires >> SYS_ADMIN) and untrusted processes it may spawn. > > Not sure I understand what you're saying, but > > I guess that in any case the task which is doing unshare(CLONE_NEWNS) > can drop cap_sys_admin first. Though that is harder if using clone, > and it is awkward because it's not the container manager, but the user, > who will judge whether the container workload should be restricted. > So the container driver will add a flag like "run-controlled", and > the driver will convert that to dropping a capability; which again > is weird. It would seem nicer to introduce a userns flag, 'caps-controlled' > For an unprivileged userns, it is always set to 1, and root cannot > change it. For a root-created userns, it stays 0, but root can set it > to 1 (using /proc file?). In this way a either container runtime or just an > admin script can say "no wait I want this container to still be controlled". > > Or we could instead add a second sysctl to decide whether all or only > 'controlled' user namespaces should be controlled. That's not pretty though. > Yes, I like your idea of a flag to clone() which will force the user-ns to be controlled. This will have effect only on the root user and any other user specifying is actually a NOP since those will be controlled with or without that flag. But this is still an enhancement to the current patch-set and I don't mind doing it as a follow-up after this patch-series. At this moment James has asked for Eric's input, which I believe hasn't been recorded.