Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755930AbdLVBSf convert rfc822-to-8bit (ORCPT ); Thu, 21 Dec 2017 20:18:35 -0500 Received: from out03.mta.xmission.com ([166.70.13.233]:60652 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755450AbdLVBSa (ORCPT ); Thu, 21 Dec 2017 20:18:30 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Maciej =?utf-8?Q?=C5=BBenczykowski?= Cc: linux-security-module@vger.kernel.org, Linux Kernel Mailing List , Mahesh Bandewar , Willem de Bruijn , Linux Containers References: <20171221210605.181720-1-zenczykowski@gmail.com> <87wp1foiwa.fsf@xmission.com> Date: Thu, 21 Dec 2017 19:18:02 -0600 In-Reply-To: ("Maciej \=\?utf-8\?Q\?\=C5\=BBenczykowski\=22's\?\= message of "Fri, 22 Dec 2017 02:03:35 +0100") Message-ID: <87fu83lfw5.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-XM-SPF: eid=1eSByn-0008A6-0Q;;;mid=<87fu83lfw5.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=67.3.133.177;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/SW6UOB+iLlEzaHClAxSdKW2OQVs/gzxA= X-SA-Exim-Connect-IP: 67.3.133.177 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 2.5 XMWhlSbjSex Whole Obfuscated Subjects * 0.7 XMSubLong Long Subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa04 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_01 4+ unique symbols in subject X-Spam-DCC: XMission; sa04 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: =?ISO-8859-1?Q?***;Maciej =c5=bbenczykowski ?= X-Spam-Relay-Country: X-Spam-Timing: total 618 ms - load_scoreonly_sql: 0.05 (0.0%), signal_user_changed: 3.7 (0.6%), b_tie_ro: 2.4 (0.4%), parse: 1.63 (0.3%), extract_message_metadata: 19 (3.1%), get_uri_detail_list: 5 (0.8%), tests_pri_-1000: 8 (1.3%), tests_pri_-950: 1.83 (0.3%), tests_pri_-900: 1.34 (0.2%), tests_pri_-400: 34 (5.6%), check_bayes: 33 (5.3%), b_tokenize: 15 (2.4%), b_tok_get_all: 10 (1.5%), b_comp_prob: 3.8 (0.6%), b_tok_touch_all: 2.7 (0.4%), b_finish: 0.67 (0.1%), tests_pri_0: 539 (87.2%), check_dkim_signature: 0.65 (0.1%), check_dkim_adsp: 4.3 (0.7%), tests_pri_500: 4.3 (0.7%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH] userns: honour no_new_privs for cap_bset during user ns creation/switch X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4831 Lines: 129 Maciej Żenczykowski writes: > On Thu, Dec 21, 2017 at 10:44 PM, Eric W. Biederman > wrote: >> No. This makes no logical sense. >> >> A task that enters a user namespace loses all capabilities to everything >> outside of the user namespace. Capabilities inside a user namespace are >> only valid for objects created inside that user namespace. >> >> So limiting capabilities inside a user namespace when the capability >> bounding set is already fully honored by not giving the processes any of >> those capabilities makes no logical sense. >> >> If the concern is kernel attack surface versus logical permissions we >> can look at ways to reduce the attack surface but that needs to be fully >> discussed in the change log. > > Here's an example of using user namespaces to read a file you > shouldn't be able to. > > lpk19:~# uname -r > 4.15.0-smp-d1ce8ceb8ba8 > > (we start as true global root) > lpk19:~# id > uid=0(root) gid=0(root) groups=0(root) > > (cleanup after previous run) > lpk19:~# cd /; chattr -i /immu; rm -f /immu/log; rmdir /immu > > (now we create an append only logfile owned by target user:group) > lpk19:~# cd /; mkdir /immu; touch /immu/log; chown produser:prod > /immu/log; chmod a-rwx,u+w /immu/log; chattr +a /immu/log > > (let's show what things look like) > lpk19:~# chattr +i /immu; ls -ld / /immu /immu/log; lsattr -d / /immu /immu/log > drwxr-xr-x 22 root root 4096 Dec 21 16:33 / > drwxr-xr-x 2 root root 4096 Dec 21 16:23 /immu > --w------- 1 produser prod 0 Dec 21 16:23 /immu/log > -----------I--e---- / > ----i---------e---- /immu > -----a--------e---- /immu/log > > (the immutable bit prevents us from changing permissions on the file) > lpk19:/# chmod a+rwx /immu/log > chmod: changing permissions of '/immu/log': Operation not permitted > > (the append only bit prevents us from simply overwriting the file) > lpk19:/# echo log1 > /immu/log > -bash: /immu/log: Operation not permitted > > (but we can append to it) > lpk19:/# echo log1 >> /immu/log > > (we're global root with CAP_DAC_OVERRIDE, so we can *still* read it) > lpk19:/# cat /immu/log > log1 > > (let's transition to target user) > lpk19:/# su - produser > > produser@lpk19:~$ id > uid=2080(produser) gid=620(prod) groups=620(prod) > > (we can't overwrite it) > produser@lpk19:~$ echo log2 > /immu/log > -su: /immu/log: Operation not permitted > > (but we can log to it: as intended) > produser@lpk19:~$ echo log2 >> /immu/log > > (we can't change its permissions, cause it's in an immutable directory) > produser@lpk19:~$ chmod u+r /immu/log > chmod: changing permissions of '/immu/log': Operation not permitted > > (we can't dump the file, cause we don't have CAP_DAC_OVERRIDE) > produser@lpk19:~$ cat /immu/log > cat: /immu/log: Permission denied > > (or can we?) > produser@lpk19:~$ unshare -U -r cat /immu/log > log1 > log2 > > ---- > > Now, of course, the above patch doesn't actually fix this on it's own, > since 'su' doesn't (yet?) know to restrict bset or to set > no_new_privs. > > But: it allows the sandbox equivalent of su to drop CAP_DAC_OVERRIDE > from it's inh/eff/perm/ambient/bset, and set no_new_privs. > Now the unshare won't gain CAP_DAC_OVERRIDE and won't be able to cat > the non-readable append-only log file. > > IMHO the point of having a capability bounding set and/or no_new_privs > is to never be able to regain capabilities. > Note also that 'no_new_privs' isn't cleared across a > unshare(CLONE_NEWUSER) [presumably also applies to setns()]. > > We can of course argue the implementation details (for example instead > of using the existing no_new_privs flag, add a new > keep_bset_across_userns_transitions securebits flag)... but > *something* has to be done. Good point about CAP_DAC_OVERRIDE on files you own. I think there is an argument that you are playing dangerous games with the permission system there, as it isn't effectively a file you own if you can't read it, and you can't change it's permissions. Given little things like that I can completely see no_new_privs meaning you can't create a user namespace. That seems consistent with the meaning and philosophy of no_new_privs. So simple it is hard to get wrong. We could do more clever things like plug this whole in user namespaces, and that would not hurt my feelings. However unless that is our only choice to avoid badly breaking userspace I would have to have to depend on user namespaces being perfect for no_new_privs to be a proper jail. As a general rule user namespaces are where we tackle the subtle scary things that should work, and no_new_privs is where we implement a simple hard to get wrong jail. Most of the time the effect is the same to an outside observer (bounded permissions), but there is a real difference in difficulty of implementation. Eric