Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751055AbeACBa6 (ORCPT + 1 other); Tue, 2 Jan 2018 20:30:58 -0500 Received: from mail-yb0-f195.google.com ([209.85.213.195]:38262 "EHLO mail-yb0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750985AbeACBa4 (ORCPT ); Tue, 2 Jan 2018 20:30:56 -0500 X-Google-Smtp-Source: ACJfBouXBZR0BD9LMChmit1/qB1GqA2EPcM/VnHa+Wy56+2AWM0Rzjd8FL4rNEOWRxFe9hlzeiwyHreqfI+8uhZnoxg= MIME-Version: 1.0 In-Reply-To: References: <20171205223052.12687-1-mahesh@bandewar.net> From: =?UTF-8?B?TWFoZXNoIEJhbmRld2FyICjgpK7gpLngpYfgpLYg4KSs4KSC4KSh4KWH4KS14KS+4KSwKQ==?= Date: Tue, 2 Jan 2018 17:30:34 -0800 Message-ID: Subject: Re: [PATCHv3 0/2] capability controlled user-namespaces To: James Morris Cc: LKML , Netdev , Kernel-hardening , Linux API , Kees Cook , Serge Hallyn , "Eric W . Biederman" , Eric Dumazet , David Miller , Mahesh Bandewar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Sat, Dec 30, 2017 at 12:31 AM, James Morris wrote: > On Wed, 27 Dec 2017, Mahesh Bandewar (महेश बंडेवार) wrote: > >> Hello James, >> >> Seems like I missed your name to be added into the review of this >> patch series. Would you be willing be pull this into the security >> tree? Serge Hallyn has already ACKed it. > > Sure! > Thank you James. > >> >> Thanks, >> --mahesh.. >> >> On Tue, Dec 5, 2017 at 2:30 PM, Mahesh Bandewar wrote: >> > From: Mahesh Bandewar >> > >> > TL;DR version >> > ------------- >> > Creating a sandbox environment with namespaces is challenging >> > considering what these sandboxed processes can engage into. e.g. >> > CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. >> > Current form of user-namespaces, however, if changed a bit can allow >> > us to create a sandbox environment without locking down user- >> > namespaces. >> > >> > Detailed version >> > ---------------- >> > >> > Problem >> > ------- >> > User-namespaces in the current form have increased the attack surface as >> > any process can acquire capabilities which are not available to them (by >> > default) by performing combination of clone()/unshare()/setns() syscalls. >> > >> > #define _GNU_SOURCE >> > #include >> > #include >> > #include >> > >> > int main(int ac, char **av) >> > { >> > int sock = -1; >> > >> > printf("Attempting to open RAW socket before unshare()...\n"); >> > sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); >> > if (sock < 0) { >> > perror("socket() SOCK_RAW failed: "); >> > } else { >> > printf("Successfully opened RAW-Sock before unshare().\n"); >> > close(sock); >> > sock = -1; >> > } >> > >> > if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) { >> > perror("unshare() failed: "); >> > return 1; >> > } >> > >> > printf("Attempting to open RAW socket after unshare()...\n"); >> > sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); >> > if (sock < 0) { >> > perror("socket() SOCK_RAW failed: "); >> > } else { >> > printf("Successfully opened RAW-Sock after unshare().\n"); >> > close(sock); >> > sock = -1; >> > } >> > >> > return 0; >> > } >> > >> > The above example shows how easy it is to acquire NET_RAW capabilities >> > and once acquired, these processes could take benefit of above mentioned >> > or similar issues discovered/undiscovered with malicious intent. Note >> > that this is just an example and the problem/solution is not limited >> > to NET_RAW capability *only*. >> > >> > The easiest fix one can apply here is to lock-down user-namespaces which >> > many of the distros do (i.e. don't allow users to create user namespaces), >> > but unfortunately that prevents everyone from using them. >> > >> > Approach >> > -------- >> > Introduce a notion of 'controlled' user-namespaces. Every process on >> > the host is allowed to create user-namespaces (governed by the limit >> > imposed by per-ns sysctl) however, mark user-namespaces created by >> > sandboxed processes as 'controlled'. Use this 'mark' at the time of >> > capability check in conjunction with a global capability whitelist. >> > If the capability is not whitelisted, processes that belong to >> > controlled user-namespaces will not be allowed. >> > >> > Once a user-ns is marked as 'controlled'; all its child user- >> > namespaces are marked as 'controlled' too. >> > >> > A global whitelist is list of capabilities governed by the >> > sysctl which is available to (privileged) user in init-ns to modify >> > while it's applicable to all controlled user-namespaces on the host. >> > >> > Marking user-namespaces controlled without modifying the whitelist is >> > equivalent of the current behavior. The default value of whitelist includes >> > all capabilities so that the compatibility is maintained. However it gives >> > admins fine-grained ability to control various capabilities system wide >> > without locking down user-namespaces. >> > >> > Please see individual patches in this series. >> > >> > Mahesh Bandewar (2): >> > capability: introduce sysctl for controlled user-ns capability whitelist >> > userns: control capabilities of some user namespaces >> > >> > Documentation/sysctl/kernel.txt | 21 +++++++++++++++++ >> > include/linux/capability.h | 7 ++++++ >> > include/linux/user_namespace.h | 25 ++++++++++++++++++++ >> > kernel/capability.c | 52 +++++++++++++++++++++++++++++++++++++++++ >> > kernel/sysctl.c | 5 ++++ >> > kernel/user_namespace.c | 4 ++++ >> > security/commoncap.c | 8 +++++++ >> > 7 files changed, 122 insertions(+) >> > >> > -- >> > 2.15.0.531.g2ccb3012c9-goog >> > >> > > -- > James Morris >