Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756744AbbGQAt4 (ORCPT ); Thu, 16 Jul 2015 20:49:56 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:55939 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756588AbbGQAtw (ORCPT ); Thu, 16 Jul 2015 20:49:52 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Seth Forshee Cc: Andy Lutomirski , "Serge E. Hallyn" , Alexander Viro , Serge Hallyn , James Morris , Linux FS Devel , LSM List , SELinux-NSA , "linux-kernel\@vger.kernel.org" References: <1436989569-69582-4-git-send-email-seth.forshee@canonical.com> <20150715214848.GA24204@mail.hallyn.com> <87wpy1camr.fsf@x220.int.ebiederm.org> <87io9kzq5g.fsf@x220.int.ebiederm.org> <87wpy0u1zo.fsf@x220.int.ebiederm.org> <87vbdkslke.fsf@x220.int.ebiederm.org> <20150716131308.GB77715@ubuntu-hedt> Date: Thu, 16 Jul 2015 19:43:24 -0500 In-Reply-To: <20150716131308.GB77715@ubuntu-hedt> (Seth Forshee's message of "Thu, 16 Jul 2015 08:13:08 -0500") Message-ID: <87vbdjmx5f.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX198qioLFviBW29kmtiFj2lQFAZOP/SbbdI= X-SA-Exim-Connect-IP: 67.3.205.90 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.7 XMSubLong Long Subject * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa04 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa04 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Seth Forshee X-Spam-Relay-Country: X-Spam-Timing: total 424 ms - load_scoreonly_sql: 0.05 (0.0%), signal_user_changed: 5 (1.2%), b_tie_ro: 3.6 (0.9%), parse: 1.40 (0.3%), extract_message_metadata: 23 (5.4%), get_uri_detail_list: 6 (1.5%), tests_pri_-1000: 8 (1.8%), tests_pri_-950: 1.15 (0.3%), tests_pri_-900: 0.93 (0.2%), tests_pri_-400: 32 (7.4%), check_bayes: 30 (7.0%), b_tokenize: 10 (2.3%), b_tok_get_all: 10 (2.5%), b_comp_prob: 3.3 (0.8%), b_tok_touch_all: 3.4 (0.8%), b_finish: 0.93 (0.2%), tests_pri_0: 344 (81.2%), tests_pri_500: 6 (1.3%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH 3/7] fs: Ignore file caps in mounts from other user namespaces X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5302 Lines: 122 Seth Forshee writes: > On Thu, Jul 16, 2015 at 12:44:49AM -0500, Eric W. Biederman wrote: >> Andy Lutomirski writes: >> >> > On Wed, Jul 15, 2015 at 10:04 PM, Eric W. Biederman >> > wrote: >> >> Andy Lutomirski writes: >> >> >> >>> >> >>> So here's the semantic question: >> >>> >> >>> Suppose an unprivileged user (uid 1000) creates a user namespace and a >> >>> mount namespace. They stick a file (owned by uid 1000 as seen by >> >>> init_user_ns) in there and mark it setuid root and give it fcaps. >> >> >> >> To make this make sense I have to ask, is this file on a filesystem >> >> where uid 1000 as seen by the init_user_ns stored as uid 1000 on >> >> the filesystem? Or is this uid 0 as seen by the filesystem? >> >> >> >> I assume this is uid 0 on the filesystem in question or else your >> >> unprivileged user would not have sufficient privileges over the >> >> filesystem to setup fcaps. >> > >> > I was thinking uid 0 as seen by the filesystem. But even if it were >> > uid 1000, the unprivileged user can still set whatever mode and xattrs >> > they want -- they control the backing store. >> >> Yes. And that is what I was really asking. Are we taking about a >> filesystem where the user controls the backing store? >> >> >>> Then global root gets an fd to this filesystem. If they execve the >> >>> file directly, then, with my patch 4, it won't act as setuid 1000 and >> >>> the fcaps will be ignored. Even with my patch 4, though, if they bind >> >>> mount the fs and execve the file from their bind mount, it will act as >> >>> setuid 1000. Maybe this is odd. However, with Seth's patch 3, the >> >>> fcaps will (correctly) not be honored. >> >> >> >> With patch 3 you can also think of it as fcaps being honored and you >> >> get all the caps in the appropriate user namespace, but since you are >> >> not in that user namespace and so don't have a place to store them >> >> in struct cred you don't get the file caps. >> >> >> >> From the philosophy of interpreting the file as defined by the >> >> filesystem in principle we could extend struct cred so you actually >> >> get the creds just in uid 1000s user namespace, but that is very >> >> unlikely to be worth it. >> > >> > I agree. >> > >> >> >> >>> I tend to thing that, if we're not honoring the fcaps, we shouldn't be >> >>> honoring the setuid bit either. After all, it's really not a trusted >> >>> file, even though the only user who could have messed with it really >> >>> is the apparent owner. >> >> >> >> For the file caps we can't honor them because you don't have the bits >> >> in struct cred. >> >> >> >> For setuid we can honor it, and setuid is something that the user >> >> namespace allows. >> >> >> > >> > We certainly *can* honor it. But why should we? I'd be more >> > comfortable with this if the contents of an untrusted filesystem were >> > really treated as just data. >> >> In these weird bleed through situtations I don't know that we should. >> But extending nosuid protections in this way is a bit like yama >> a bit gratuitious stomping don't care cases in the semantics to >> make bugs harder to exploit. >> >> >>> And, if we're going to say we don't trust the file and shouldn't honor >> >>> setuid or fcaps, then merging all the functionality into mnt_may_suid >> >>> could make sense. Yes, these two things do different things, but they >> >>> could hook in to the same place. >> >> >> >> There are really two separate questions: >> >> - Do we trust this filesystem? >> >> - Do you have the bits to implement this concept? >> >> >> >> Even if in this specific context the two questions wind up looking >> >> exactly the same. I think it makes a lot of sense to ask the two >> >> questions separately. As future maintenance changes may cause the >> >> implementation of the questions to diverge. >> >> >> > >> > Agreed. >> > >> > Unless someone thinks of an argument to the contrary, I'd say "no, we >> > don't trust this filesystem". I could be convinced otherwise. >> >> But this is context dependent. From the perspective of the container >> we really do want to trust the filesystem. As the container root set it >> up, and if he isn't being hostile likely has a use for setfcaps files >> and setuid files and all of the rest. >> >> Perhaps I should phrase it as: >> - In this context do we trust the code? AKA mnt_may_suid? >> - What do these bits mean in this context? (Usually something more complicated). >> >> Which says to me we want both patches 3 and 4 (even if 4 uses s_user_ns) >> because 3 is different than 4. > > So what I'll do is: > > - Add a s_user_ns check to mnt_may_suid > - Keep the (now redundant) s_user_ns check in get_file_caps > > I'm on the fence about having both the mnt and user ns checks in > mnt_may_suid - it might be overkill, but it still adds the protection > against clearing MNT_NOSUID in a bind mount. So I guess I'll keep the > mnt ns check. That sounds like a plan. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/