Received: by 10.223.164.202 with SMTP id h10csp582099wrb; Wed, 8 Nov 2017 23:22:22 -0800 (PST) X-Google-Smtp-Source: ABhQp+QO1r54J28+UfHOboyzjT5HqTH21vlfjSe7oJtXQuttnfxvKOCRXHbZ796gPHTFoMkbvZxO X-Received: by 10.98.11.83 with SMTP id t80mr3161167pfi.79.1510212142194; Wed, 08 Nov 2017 23:22:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510212142; cv=none; d=google.com; s=arc-20160816; b=GDTUkWk5Q1xQbEHaru6am1GRcQwmNXdSRCDGC0k1fph9W10KbOUTO0xZfI+1W2+ZI9 b79y2Cc8irsGuanJn4DmD8emlIH5vKuoXygQBXze/iYmevxTUF8iMvFUSUpQ4h4Eqobd uqDkdz2Sgq9EeB2I5GINla0FGuk+0OoByhwdagGvZ3x368y4rcEsRnR8rdp5EvbFNl9k W7fgYBAjfqS5pM5HvO7iHZOu7UfyJfspECpCdcrchY2NOszk5mKwdzLYabfQAq/4tAgb zdkeNz/Se74HuMRx/JASVYB/WmPx3UMXCe3/tawzfHyZe36ZTpt4E+5BUDCm7YCb/9Qn gOFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=gcJFK+fjKqqC3aHff6UhFmRCxpfMAZ0kZOlkH6qxHbc=; b=M5CJzdkZLJqKKg1mGTIl1+xouhI+tloeL10AAoIdHF6HDvi8HEBIkHeMLl7jbqh6xR VY89pehrb13AJnKqCkSW3SIs2wsbH3vBtSj0mlpPKRp56oRW/0IwefPoKvkQeexVHLy5 +QrbOL9iUb1G4CLaxnRvR5/At+tH7hTHF+Fw7L8kFNv88G+a0mTXrFjeNsU8S8S7XgtP TGBjjoWMAA4k3g/bmk6SHYdMFgnuFddAoHtS3LLo03yzkwXXR3v/hBpZaND/Cds7dd3t f+niO3FobRrNKglgkBqvFw645/c6TPVwcW7JQ51p9GEVjb81wMokX86M3EXJ8AZKBELK vvsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Iv4bV+iG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z11si2558196plk.55.2017.11.08.23.22.10; Wed, 08 Nov 2017 23:22:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Iv4bV+iG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752702AbdKIHUX (ORCPT + 82 others); Thu, 9 Nov 2017 02:20:23 -0500 Received: from mail-yw0-f194.google.com ([209.85.161.194]:44032 "EHLO mail-yw0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752643AbdKIHS3 (ORCPT ); Thu, 9 Nov 2017 02:18:29 -0500 Received: by mail-yw0-f194.google.com with SMTP id k11so4529551ywh.1 for ; Wed, 08 Nov 2017 23:18:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=gcJFK+fjKqqC3aHff6UhFmRCxpfMAZ0kZOlkH6qxHbc=; b=Iv4bV+iG52qbP2O4SyhDkgAD/6jKjVzisOJ52zWuMt4rjAg/EehjjE0RsjEbimO+E9 TLKFdNtsFG4PSCJAfQzx66vKij3X8aVzklbY1VHHOPrjxEE6cMCxhMGBzUL3T94zyqld TUp5cugwY1QWBbWVauuAifMvAVzuNu7sEScMjOGgCyLgRAxmG/nPbKz4KcwV0tgT/w9a u070fW4yflKJWLWde9Gh30eGy3zcI2rphv9K9bT8jdTLfyP98AzuxWeMnJgzurYiLp/t EsfsxGsXhHQZH6iE/AIg4bWXACcjrHTO98eAASicHlOLdcTqMigZ4xRuJdf7g328jSSI dc2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=gcJFK+fjKqqC3aHff6UhFmRCxpfMAZ0kZOlkH6qxHbc=; b=iJ8dPHQdwzPVViaNlBueQzELRTIVIkrG1rI8/LQEMovl/LexLmChqXYKaaqPJDtqBR PhFlR6/rocWewWd0f2JNgl30xOVjJ84hIOGapqY08KlawD+3kF5mXFOZyKo6k5HS9k+X fZZksoBm4CzJ+s7CQx6j060LyH3U2QpqbzDkMqJHPUtwB8lxMn0dp+Kkck/dtaRErDgH I6GgvVLsNF0nYnInzsHcYyazgABoIiSnLN6ZQb1LF6ZKQe9JDqPZCpS4Yv8lkpStzN+V 7iriSgfJYQgTqBJWSV8dAR8n6U4Kg1XKs8jYK3/kVXR7u/KK1ucJgirCoydDFdqnqy5/ oScA== X-Gm-Message-State: AJaThX4e9ZKvicJdubbQXvaAIwEwhIL5Tnxqxc17hNIIMW1620SUi0jN EGfR8ftyQbq3xhkEp+BJSp+itkrQwoZfM+vJn7MJlw== X-Received: by 10.129.209.9 with SMTP id w9mr2059527ywi.208.1510211908777; Wed, 08 Nov 2017 23:18:28 -0800 (PST) MIME-Version: 1.0 Received: by 10.37.131.198 with HTTP; Wed, 8 Nov 2017 23:18:08 -0800 (PST) In-Reply-To: <20171109032134.GA15666@mail.hallyn.com> References: <20171106150302.GA26634@mail.hallyn.com> <1510003994.736.0.camel@gmail.com> <20171106221418.GA32543@mail.hallyn.com> <20171106233913.GA1518@mail.hallyn.com> <20171107032802.GA6669@mail.hallyn.com> <20171108190223.vdkyepcaegmub6le@gmail.com> <20171109032134.GA15666@mail.hallyn.com> From: =?UTF-8?B?TWFoZXNoIEJhbmRld2FyICjgpK7gpLngpYfgpLYg4KSs4KSC4KSh4KWH4KS14KS+4KSwKQ==?= Date: Thu, 9 Nov 2017 16:18:08 +0900 Message-ID: Subject: Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces To: "Serge E. Hallyn" Cc: Christian Brauner , Boris Lukashev , Daniel Micay , Mahesh Bandewar , LKML , Netdev , Kernel-hardening , Linux API , Kees Cook , "Eric W . Biederman" , Eric Dumazet , David Miller Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [resend response as earlier one failed because of formatting issues] On Thu, Nov 9, 2017 at 12:21 PM, Serge E. Hallyn wrote: > > On Thu, Nov 09, 2017 at 09:55:41AM +0900, Mahesh Bandewar (=E0=A4=AE=E0= =A4=B9=E0=A5=87=E0=A4=B6 =E0=A4=AC=E0=A4=82=E0=A4=A1=E0=A5=87=E0=A4=B5=E0= =A4=BE=E0=A4=B0) wrote: > > On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner > > wrote: > > > On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (=E0=A4=AE= =E0=A4=B9=E0=A5=87=E0=A4=B6 =E0=A4=AC=E0=A4=82=E0=A4=A1=E0=A5=87=E0=A4=B5= =E0=A4=BE=E0=A4=B0) wrote: > > >> Sorry folks I was traveling and seems like lot happened on this thre= ad. :p > > >> > > >> I will try to response few of these comments selectively - > > >> > > >> > The thing that makes me hesitate with this set is that it is a > > >> > permanent new feature to address what (I hope) is a temporary > > >> > problem. > > >> I agree this is permanent new feature but it's not solving a tempora= ry > > >> problem. It's impossible to assess what and when new vulnerability > > >> that could show up. I think Daniel summed it up appropriately in his > > >> response > > >> > > >> > Seems like there are two naive ways to do it, the first being to j= ust > > >> > look at all code under ns_capable() plus code called from there. = It > > >> > seems like looking at the result of that could be fruitful. > > >> This is really hard. The main issue that there were features designe= d > > >> and developed before user-ns days with an assumption that unprivileg= ed > > >> users will never get certain capabilities which only root user gets. > > >> Now that is not true anymore with user-ns creation with mapping root > > >> for any process. Also at the same time blocking user-ns creation for > > >> eveyone is a big-hammer which is not needed too. So it's not that ea= sy > > >> to just perform a code-walk-though and correct those decisions now. > > >> > > >> > It seems to me that the existing control in > > >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duc= t tape > > >> > in that case. > > >> This solution is essentially blocking unprivileged users from using > > >> the user-namespaces entirely. This is not really a solution that can > > >> work. The solution that this patch-set adds allows unprivileged user= s > > >> to create user-namespaces. Actually the proposed solution is more > > >> fine-grained approach than the unprivileged_userns_clone solution > > >> since you can selectively block capabilities rather than completely > > >> blocking the functionality. > > > > > > I've been talking to St=C3=A9phane today about this and we should als= o keep in mind > > > that we have: > > > > > > chb@conventiont|~ > > >> ls -al /proc/sys/user/ > > > total 0 > > > dr-xr-xr-x 1 root root 0 Nov 6 23:32 . > > > dr-xr-xr-x 1 root root 0 Nov 2 22:13 .. > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_cgroup_namespaces > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_instances > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_watches > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_ipc_namespaces > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_mnt_namespaces > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_net_namespaces > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_pid_namespaces > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_user_namespaces > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_uts_namespaces > > > > > > These files allow you to limit the number of namespaces that can be c= reated > > > *per namespace* type. So let's say your system runs a bunch of user n= amespaces > > > you can do: > > > > > > chb@conventiont|~ > > >> echo 0 > /proc/sys/user/max_user_namespaces > > > > > > So that the next time you try to create a user namespaces you'd see: > > > > > > chb@conventiont|~ > > >> unshare -U > > > unshare: unshare failed: No space left on device > > > > > > So there's not even a need to upstream a new sysctl since we have way= s of > > > blocking this. > > > > > I'm not sure how it's solving the problem that my patch-set is addressi= ng? > > I agree though that the need for unprivileged_userns_clone sysctl goes > > away as this is equivalent to setting that sysctl to 0 as you have > > described above. > > oh right that was the reasoning iirc for not needing the other sysctl. > > > However as I mentioned earlier, blocking processes from creating > > user-namespaces is not the solution. Processes should be able to > > create namespaces as they are designed but at the same time we need to > > have controls to 'contain' them if a need arise. Setting max_no to 0 > > is not the solution that I'm looking for since it doesn't solve the > > problem. > > well yesterday we were told that was explicitly not the goal, but that wa= s > not by you ... i just mention it to explain why we seem to be walking in > circles a bit. > > anyway the bounding set doesn't actually make sense so forget that. the > question then is just whether it makes sense to allow things to continue > at all in this situation. would you mind indulging me by giving one or t= wo > concrete examples in the previous known cves of what capabilities you wou= ld > have dropped tto allow the rest to continue to be safely used? > Of course. Let's take an example of the CVE that I have mentioned in my cover-letter - CVE-2017-7308(https://cve.mitre.org/cgi-bin/cvename.cgi?name=3DCVE-2017-730= 8). It's well documented and even has a exploit(https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308) c-program that can demonstrate how it can be used against non-patched kernel. There is very nice blog post(https://googleprojectzero.blogspot.kr/2017/05/exploiting-linux-kernel-= via-packet.html) about this vulnerability by Andrey Konovalov. This is about the AF_PACKET socket interface that is protected behind NET_RAW capability. This capability is not available to unprivileged user. However, any unprivileged user can get NET_RAW capability (as demonstrated in the cover-letter code that I have attached in this patch series) so this NET_RAW capability is available to any unprivileged user on the host if the kernel has user-namespaces available. With this patch-set applied, all that is needed is to flip a bit with the sysctl (kernel.controlled_userns_caps_whitelist) as demonstrated below - root@lphh6:~# uname -a Linux lphh6 4.14.0-smp-DEV #97 SMP @1510203579 x86_64 GNU/Linux root@lphh6:~# sysctl -q kernel.controlled_userns_caps_whitelist kernel.controlled_userns_caps_whitelist =3D 1f,ffffffff Now when I run the program (demo from the cover-letter) as a normal unprivileged user I can't create a RAW socket in init-ns but I can in the child-ns. dumbo@lphh6:~$ /tmp/acquire_raw Attempting to open RAW socket before unshare()... socket() SOCK_RAW failed: : Operation not permitted Attempting to open RAW socket after unshare()... Successfully opened RAW-Sock after unshare(). dumbo@lphh6:~$ Now as a root user. Take off CAP_NET_RAW root@lphh6:~# sysctl -w kernel.controlled_userns_caps_whitelist=3D1f,ffffdf= ff kernel.controlled_userns_caps_whitelist =3D 1f,ffffdfff root@lphh6:~# Now run the same program as an unprivileged user - dumbo@lphh6:~$ /tmp/acquire_raw Attempting to open RAW socket before unshare()... socket() SOCK_RAW failed: : Operation not permitted Attempting to open RAW socket after unshare()... socket() SOCK_RAW failed: : Operation not permitted dumbo@lphh6:~$ Notice that it has failed to create a raw socket in init and in child namespace. It's not blocking creation of user-namespaces but allowing admin turn individual capability bits on and off. This is very simplistic example of just demonstrating how capability bits turn-on/off works. So let's assume a sandboxed environment where we don't know what a binary that we are about run in an environment which is identified as susceptible. By turning off the NET_RAW bit, the admin gets an assurance that system is safe and if binary fails because it's not getting this capability then that bad but a sad consequence (without compromising the host integrity) but if it doesn't use the NET_RAW capability but any other combination of remaining 36 capabilities, it would get whatever is necessary. This means we can safely allow processes to create user-namespaces by taking off certain capabilities in question for temporary/extended period until proper fix is applied without compromising the system integrity. The impact will vary based on which capability is taken off and admin would / should be ware of for the environment that he/she is dealing with. thanks, --mahesh.. > thanks, > serge From 1583557114292953021@xxx Thu Nov 09 03:22:28 +0000 2017 X-GM-THRID: 1583003759650790753 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread