Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752137AbdFSVlc (ORCPT ); Mon, 19 Jun 2017 17:41:32 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:33214 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750903AbdFSVla (ORCPT ); Mon, 19 Jun 2017 17:41:30 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: "Serge E. Hallyn" Cc: Stefan Berger , Masami Ichikawa , containers@lists.linux-foundation.org, lkp@01.org, xiaolong.ye@intel.com, LKML , Mimi Zohar References: <20170508044408.GA11400@mail.hallyn.com> <20170508181156.GA23112@mail.hallyn.com> <9f80188c-df03-066a-5dac-785cc711d064@linux.vnet.ibm.com> <20170613171818.GA9070@mail.hallyn.com> <74e490f3-3c47-abfa-86ae-0fa0d1ddb43a@linux.vnet.ibm.com> <20170613235521.GC15685@mail.hallyn.com> <20170615030543.GA8979@mail.hallyn.com> <20170618221418.GA364@mail.hallyn.com> Date: Mon, 19 Jun 2017 16:34:22 -0500 In-Reply-To: <20170618221418.GA364@mail.hallyn.com> (Serge E. Hallyn's message of "Sun, 18 Jun 2017 17:14:18 -0500") Message-ID: <87tw3boe5d.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1dN4QD-0001DP-9a;;;mid=<87tw3boe5d.fsf@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=67.3.213.87;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX19+ki1o9fMFVJx2R+brDYqjgUwQXIEFrQU= X-SA-Exim-Connect-IP: 67.3.213.87 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa04 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa04 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;"Serge E. Hallyn" X-Spam-Relay-Country: X-Spam-Timing: total 5634 ms - load_scoreonly_sql: 0.06 (0.0%), signal_user_changed: 2.4 (0.0%), b_tie_ro: 1.67 (0.0%), parse: 0.76 (0.0%), extract_message_metadata: 15 (0.3%), get_uri_detail_list: 2.6 (0.0%), tests_pri_-1000: 3.4 (0.1%), tests_pri_-950: 0.93 (0.0%), tests_pri_-900: 0.78 (0.0%), tests_pri_-400: 25 (0.4%), check_bayes: 24 (0.4%), b_tokenize: 8 (0.1%), b_tok_get_all: 9 (0.2%), b_comp_prob: 2.3 (0.0%), b_tok_touch_all: 3.1 (0.1%), b_finish: 0.50 (0.0%), tests_pri_0: 673 (11.9%), check_dkim_signature: 0.53 (0.0%), check_dkim_adsp: 2.4 (0.0%), tests_pri_500: 4910 (87.2%), poll_dns_idle: 4903 (87.0%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH v4] Introduce v3 namespaced file capabilities X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3836 Lines: 76 "Serge E. Hallyn" writes: > Quoting Stefan Berger (stefanb@linux.vnet.ibm.com): >> On 06/14/2017 11:05 PM, Serge E. Hallyn wrote: >> >On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote: >> >>On 06/13/2017 07:55 PM, Serge E. Hallyn wrote: >> >>>Quoting Stefan Berger (stefanb@linux.vnet.ibm.com): >> >>>> If all extended >> >>>>attributes were to support this model, maybe the 'uid' could be >> >>>>associated with the 'name' of the xattr rather than its 'value' (not >> >>>>sure whether that's possible). >> >>>Right, I missed that in your original email when I saw it this morning. >> >>>It's not what my patch does, but it's an interesting idea. Do you have >> >>>a patch to that effect? We might even be able to generalize that to >> >>No, I don't have a patch. It may not be possible to implement it. >> >>The xattr_handler's take the name of the xattr as input to get(). >> >That may be ok though. Assume the host created a container with >> >100000 as the uid for root, which created a container with 130000 as >> >uid for root. If root in the nested container tries to read the >> >xattr, the kernel can check for security.foo[130000] first, then >> >security.foo[100000], then security.foo. Or, it can do a listxattr >> >and look for those. Am I overlooking one? >> > >> >>So one could try to encode the mapped uid in the name. However, that >> >I thought that's exactly what you were suggesting in your original >> >email? "security.capability[uid=2000]" >> > >> >>could lead to problems with stale xattrs in a shared filesystem over >> >>time unless one could limit the number of xattrs with the same >> >>prefix, e.g., security.capability*. So I doubt that it would work. >> >Hm. Yeah. But really how many setups are there like that? I.e. if >> >you launch a regular docker or lxd container, the image doesn't do a >> >bind mount of a shared image, it layers something above it or does a >> >copy. What setups do you know of where multiple containers in different >> >user namespaces mount the same filesystem shared and writeable? >> >> I think I have something now that accomodates userns access to >> security.capability: >> >> https://github.com/stefanberger/linux/commits/xattr_for_userns > > Thanks! > >> Encoding of uid is in the attribute name now as follows: >> security.foo@uid= >> >> 1) The 'plain' security.capability is only r/w accessible from the >> host (init_user_ns). >> 2) When userns reads/writes 'security.capability' it will read/write >> security.capability@uid= instead, with uid being the uid of >> root , e.g. 1000. >> 3) When listing xattrs for userns the host's security.capability is >> filtered out to avoid read failures iof 'security.capability' if >> security.capability@uid= is read but not there. (see 1) and 2)) >> 4) security.capability* may all be read from anywhere >> 5) security.capability@uid= may be read or written directly >> from a userns if matches the uid of root (current_uid()) > > This looks very close to what we want. One exception - we do want > to support root in a user namespace being able to write > security.capability@uid= where is a valid uid mapped in its > namespace. In that case the name should be rewritten to be > security.capability@uid= where y is the unmapped kuid.val. > > Eric, > > so far my patch hasn't yet hit Linus' tree. Given that, would you > mind taking a look and seeing what you think of this approach? If > we may decide to go this route, we probably should stop my patch > from hitting Linus' tree before we have to continue supporting it. Agreed. I will take a look. I also want to see how all of this works in the context of stackable filesystems. As that is the one case that looked like it could be a problem case in your current patchset. Eric