Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755584AbdLVBDk (ORCPT ); Thu, 21 Dec 2017 20:03:40 -0500 Received: from mail-yb0-f196.google.com ([209.85.213.196]:37045 "EHLO mail-yb0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753134AbdLVBDh (ORCPT ); Thu, 21 Dec 2017 20:03:37 -0500 X-Google-Smtp-Source: ACJfBou41+01yp8nmY1e/1XwM5xuZQb8qc3mOWkqv2mzU1qeQ/POt0CgggWc1x1pNVY1FDBWDNEvQpuSd0fXevYPomI= MIME-Version: 1.0 In-Reply-To: <87wp1foiwa.fsf@xmission.com> References: <20171221210605.181720-1-zenczykowski@gmail.com> <87wp1foiwa.fsf@xmission.com> From: =?UTF-8?Q?Maciej_=C5=BBenczykowski?= Date: Fri, 22 Dec 2017 02:03:35 +0100 Message-ID: Subject: Re: [PATCH] userns: honour no_new_privs for cap_bset during user ns creation/switch To: "Eric W. Biederman" Cc: linux-security-module@vger.kernel.org, Linux Kernel Mailing List , Mahesh Bandewar , Willem de Bruijn , Linux Containers Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3532 Lines: 105 On Thu, Dec 21, 2017 at 10:44 PM, Eric W. Biederman wrote: > No. This makes no logical sense. > > A task that enters a user namespace loses all capabilities to everything > outside of the user namespace. Capabilities inside a user namespace are > only valid for objects created inside that user namespace. > > So limiting capabilities inside a user namespace when the capability > bounding set is already fully honored by not giving the processes any of > those capabilities makes no logical sense. > > If the concern is kernel attack surface versus logical permissions we > can look at ways to reduce the attack surface but that needs to be fully > discussed in the change log. Here's an example of using user namespaces to read a file you shouldn't be able to. lpk19:~# uname -r 4.15.0-smp-d1ce8ceb8ba8 (we start as true global root) lpk19:~# id uid=0(root) gid=0(root) groups=0(root) (cleanup after previous run) lpk19:~# cd /; chattr -i /immu; rm -f /immu/log; rmdir /immu (now we create an append only logfile owned by target user:group) lpk19:~# cd /; mkdir /immu; touch /immu/log; chown produser:prod /immu/log; chmod a-rwx,u+w /immu/log; chattr +a /immu/log (let's show what things look like) lpk19:~# chattr +i /immu; ls -ld / /immu /immu/log; lsattr -d / /immu /immu/log drwxr-xr-x 22 root root 4096 Dec 21 16:33 / drwxr-xr-x 2 root root 4096 Dec 21 16:23 /immu --w------- 1 produser prod 0 Dec 21 16:23 /immu/log -----------I--e---- / ----i---------e---- /immu -----a--------e---- /immu/log (the immutable bit prevents us from changing permissions on the file) lpk19:/# chmod a+rwx /immu/log chmod: changing permissions of '/immu/log': Operation not permitted (the append only bit prevents us from simply overwriting the file) lpk19:/# echo log1 > /immu/log -bash: /immu/log: Operation not permitted (but we can append to it) lpk19:/# echo log1 >> /immu/log (we're global root with CAP_DAC_OVERRIDE, so we can *still* read it) lpk19:/# cat /immu/log log1 (let's transition to target user) lpk19:/# su - produser produser@lpk19:~$ id uid=2080(produser) gid=620(prod) groups=620(prod) (we can't overwrite it) produser@lpk19:~$ echo log2 > /immu/log -su: /immu/log: Operation not permitted (but we can log to it: as intended) produser@lpk19:~$ echo log2 >> /immu/log (we can't change its permissions, cause it's in an immutable directory) produser@lpk19:~$ chmod u+r /immu/log chmod: changing permissions of '/immu/log': Operation not permitted (we can't dump the file, cause we don't have CAP_DAC_OVERRIDE) produser@lpk19:~$ cat /immu/log cat: /immu/log: Permission denied (or can we?) produser@lpk19:~$ unshare -U -r cat /immu/log log1 log2 ---- Now, of course, the above patch doesn't actually fix this on it's own, since 'su' doesn't (yet?) know to restrict bset or to set no_new_privs. But: it allows the sandbox equivalent of su to drop CAP_DAC_OVERRIDE from it's inh/eff/perm/ambient/bset, and set no_new_privs. Now the unshare won't gain CAP_DAC_OVERRIDE and won't be able to cat the non-readable append-only log file. IMHO the point of having a capability bounding set and/or no_new_privs is to never be able to regain capabilities. Note also that 'no_new_privs' isn't cleared across a unshare(CLONE_NEWUSER) [presumably also applies to setns()]. We can of course argue the implementation details (for example instead of using the existing no_new_privs flag, add a new keep_bset_across_userns_transitions securebits flag)... but *something* has to be done. - Maciej