Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933050AbcCHQlR (ORCPT ); Tue, 8 Mar 2016 11:41:17 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:34854 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932902AbcCHQlI (ORCPT ); Tue, 8 Mar 2016 11:41:08 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Andy Lutomirski Cc: "linux-kernel\@vger.kernel.org" , Linux Containers , Alexander Larsson , Colin Walters , Serge Hallyn , Stephane Graber , Kees Cook , Seth Forshee References: Date: Tue, 08 Mar 2016 10:31:09 -0600 In-Reply-To: (Andy Lutomirski's message of "Mon, 7 Mar 2016 21:15:25 -0800") Message-ID: <874mch2ape.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX19kZ0o/bsDJi119KPhUZdr59aYsuHw5enA= X-SA-Exim-Connect-IP: 70.59.168.211 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa06 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Andy Lutomirski X-Spam-Relay-Country: X-Spam-Timing: total 767 ms - load_scoreonly_sql: 0.30 (0.0%), signal_user_changed: 5 (0.7%), b_tie_ro: 3.7 (0.5%), parse: 1.24 (0.2%), extract_message_metadata: 22 (2.9%), get_uri_detail_list: 4.6 (0.6%), tests_pri_-1000: 5 (0.7%), tests_pri_-950: 1.81 (0.2%), tests_pri_-900: 1.47 (0.2%), tests_pri_-400: 40 (5.2%), check_bayes: 38 (5.0%), b_tokenize: 15 (2.0%), b_tok_get_all: 11 (1.4%), b_comp_prob: 5 (0.7%), b_tok_touch_all: 3.0 (0.4%), b_finish: 0.92 (0.1%), tests_pri_0: 678 (88.3%), check_dkim_signature: 0.78 (0.1%), check_dkim_adsp: 3.9 (0.5%), tests_pri_500: 7 (0.9%), rewrite_mail: 0.00 (0.0%) Subject: Re: Thoughts on tightening up user namespace creation X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4112 Lines: 94 Andy Lutomirski writes: > Hi all- [Snip strange things distros do] Distros do strange things from other peoples perspectives. Sometimes we can help with that sometimes we can't. In general producing kernel code that is reliable and well maintained is what we can do. Distro folks can decide what they are comfortable beyond that. Frankly I find it heartening that not all distros enable everything all of the time, are are showing some modicum of restraint and judgement. If folks don't think a feature like user namespaces is ready and they don't need that feature I am quite happy for them not to enable that feature in their kernel. > Since I doubt we'll ever fully address the attack surface issue at > least, would it make sense to try to come up with an upstreamable way > to limit who can create new user namespaces and/or do various > dangerous things with them? Even without user namespaces the kernel has attack surface issues. The kernel is big and bugs happen. That surface is only bigger when you are root in a user namespace so the probability of a finding an exploitable bug goes up. > I'll divide the rest of the email into the "what" and the "who". > > +++ What does the privilege of creating a user namespace entail? +++ > > This could be an all-or-nothing thing. It would certainly be possible > for appropriately privileged tasks to be able to unshare namespaces > and use their facilities exactly like any task can in a current > user-ns-enabled kernel and for other tasks to be unable to unshare > anything. > > Finer gradations are, in principle, possible. For example, it could > be possible for a given task to unshare its userns but to have limited > caps inside or to be unable to unshare certain other namespaces. For > example, maybe a task could unshare userns and mount ns but not net > ns. I don't think this would be particularly useful. I am actually inclined to think just the opposite. There was a period where would have been much less susceptible to problems if just unprivileged create to the mount namespace could have been implemented. When I look at this from a resource consumption point of view I definitely see arguments for limiting things by resource type. As it can be very easy to know I need no more than X of some specific resource type but that I don't know how much memory that will take. > It might be more interesting to allow a task to unshare all > namespaces, hold all capabilities in them, but to still be unable to > use certain privileged facilities. For example, maybe denying > administrative control over iptables, creation of exotic network > interface types, or similar would make sense. I don't know how we'd > specify this type of constraint. That does seem to start approaching lsm territory. And there is a funny balance between reducing attack surface and adding attack surface to reduce attack surface. > +++ Who can create user namespaces (possibly with restrictions)? +++ > > I can think of a few formulations. > > A simpler approach would be to add a per-namespace setting listing > users and/or groups that can unshare their userns. A userns starts > out allowing everyone to unshare userns, and anyone with CAP_SYS_ADMIN > can change the setting. > > A fancier approach would be to have an fd that represents the right to > unshare your userns. Some privilege broker could give out those fds > to apps that need them and meet whatever criteria are set. If you try > to unshare your userns without the fd, it falls back to some simpler > policy. > > I think I prefer the simpler one. It's simple, and I haven't come up > with a concrete problem with it yet. Agreed. Your simple scheme is roughly what I was proposing earlier of having a per user limit on the number of user namespaces they can create. I am a little partial to having it be a resource limit as that covers more use cases with less code. That said the really important case to cover is the case where some subset of applications are denied access to resources (for sandboxing) and another subset is allowed. Eric