Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752669AbbGPFvo (ORCPT ); Thu, 16 Jul 2015 01:51:44 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:55916 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750761AbbGPFvm (ORCPT ); Thu, 16 Jul 2015 01:51:42 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Andy Lutomirski Cc: "Serge E. Hallyn" , Seth Forshee , Alexander Viro , Serge Hallyn , James Morris , Linux FS Devel , LSM List , SELinux-NSA , "linux-kernel\@vger.kernel.org" References: <1436989569-69582-1-git-send-email-seth.forshee@canonical.com> <1436989569-69582-4-git-send-email-seth.forshee@canonical.com> <20150715214848.GA24204@mail.hallyn.com> <87wpy1camr.fsf@x220.int.ebiederm.org> <87io9kzq5g.fsf@x220.int.ebiederm.org> <87wpy0u1zo.fsf@x220.int.ebiederm.org> Date: Thu, 16 Jul 2015 00:44:49 -0500 In-Reply-To: (Andy Lutomirski's message of "Wed, 15 Jul 2015 22:15:30 -0700") Message-ID: <87vbdkslke.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX1/QXGsEvmdaaqv3aQS9L9E0HEpkTf6iojo= X-SA-Exim-Connect-IP: 67.3.205.90 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.7 XMSubLong Long Subject * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa02 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa02 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Andy Lutomirski X-Spam-Relay-Country: X-Spam-Timing: total 1917 ms - load_scoreonly_sql: 0.28 (0.0%), signal_user_changed: 6 (0.3%), b_tie_ro: 3.7 (0.2%), parse: 2.1 (0.1%), extract_message_metadata: 42 (2.2%), get_uri_detail_list: 7 (0.3%), tests_pri_-1000: 16 (0.9%), tests_pri_-950: 2.9 (0.2%), tests_pri_-900: 2.2 (0.1%), tests_pri_-400: 46 (2.4%), check_bayes: 43 (2.3%), b_tokenize: 19 (1.0%), b_tok_get_all: 11 (0.6%), b_comp_prob: 6 (0.3%), b_tok_touch_all: 2.7 (0.1%), b_finish: 0.94 (0.0%), tests_pri_0: 1776 (92.6%), tests_pri_500: 10 (0.5%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH 3/7] fs: Ignore file caps in mounts from other user namespaces X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4556 Lines: 109 Andy Lutomirski writes: > On Wed, Jul 15, 2015 at 10:04 PM, Eric W. Biederman > wrote: >> Andy Lutomirski writes: >> >>> >>> So here's the semantic question: >>> >>> Suppose an unprivileged user (uid 1000) creates a user namespace and a >>> mount namespace. They stick a file (owned by uid 1000 as seen by >>> init_user_ns) in there and mark it setuid root and give it fcaps. >> >> To make this make sense I have to ask, is this file on a filesystem >> where uid 1000 as seen by the init_user_ns stored as uid 1000 on >> the filesystem? Or is this uid 0 as seen by the filesystem? >> >> I assume this is uid 0 on the filesystem in question or else your >> unprivileged user would not have sufficient privileges over the >> filesystem to setup fcaps. > > I was thinking uid 0 as seen by the filesystem. But even if it were > uid 1000, the unprivileged user can still set whatever mode and xattrs > they want -- they control the backing store. Yes. And that is what I was really asking. Are we taking about a filesystem where the user controls the backing store? >>> Then global root gets an fd to this filesystem. If they execve the >>> file directly, then, with my patch 4, it won't act as setuid 1000 and >>> the fcaps will be ignored. Even with my patch 4, though, if they bind >>> mount the fs and execve the file from their bind mount, it will act as >>> setuid 1000. Maybe this is odd. However, with Seth's patch 3, the >>> fcaps will (correctly) not be honored. >> >> With patch 3 you can also think of it as fcaps being honored and you >> get all the caps in the appropriate user namespace, but since you are >> not in that user namespace and so don't have a place to store them >> in struct cred you don't get the file caps. >> >> From the philosophy of interpreting the file as defined by the >> filesystem in principle we could extend struct cred so you actually >> get the creds just in uid 1000s user namespace, but that is very >> unlikely to be worth it. > > I agree. > >> >>> I tend to thing that, if we're not honoring the fcaps, we shouldn't be >>> honoring the setuid bit either. After all, it's really not a trusted >>> file, even though the only user who could have messed with it really >>> is the apparent owner. >> >> For the file caps we can't honor them because you don't have the bits >> in struct cred. >> >> For setuid we can honor it, and setuid is something that the user >> namespace allows. >> > > We certainly *can* honor it. But why should we? I'd be more > comfortable with this if the contents of an untrusted filesystem were > really treated as just data. In these weird bleed through situtations I don't know that we should. But extending nosuid protections in this way is a bit like yama a bit gratuitious stomping don't care cases in the semantics to make bugs harder to exploit. >>> And, if we're going to say we don't trust the file and shouldn't honor >>> setuid or fcaps, then merging all the functionality into mnt_may_suid >>> could make sense. Yes, these two things do different things, but they >>> could hook in to the same place. >> >> There are really two separate questions: >> - Do we trust this filesystem? >> - Do you have the bits to implement this concept? >> >> Even if in this specific context the two questions wind up looking >> exactly the same. I think it makes a lot of sense to ask the two >> questions separately. As future maintenance changes may cause the >> implementation of the questions to diverge. >> > > Agreed. > > Unless someone thinks of an argument to the contrary, I'd say "no, we > don't trust this filesystem". I could be convinced otherwise. But this is context dependent. From the perspective of the container we really do want to trust the filesystem. As the container root set it up, and if he isn't being hostile likely has a use for setfcaps files and setuid files and all of the rest. Perhaps I should phrase it as: - In this context do we trust the code? AKA mnt_may_suid? - What do these bits mean in this context? (Usually something more complicated). Which says to me we want both patches 3 and 4 (even if 4 uses s_user_ns) because 3 is different than 4. And now I better context switch back to fixing bind mounts. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/