Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753232AbbGWAWD (ORCPT ); Wed, 22 Jul 2015 20:22:03 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:37973 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750779AbbGWAV5 (ORCPT ); Wed, 22 Jul 2015 20:21:57 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Casey Schaufler Cc: Seth Forshee , Andy Lutomirski , Alexander Viro , Linux FS Devel , LSM List , SELinux-NSA , Serge Hallyn , "linux-kernel\@vger.kernel.org" References: <55A8253E.3000407@schaufler-ca.com> <55A8398A.3000802@schaufler-ca.com> <55A85041.2070301@schaufler-ca.com> <20150721203550.GA80838@ubuntu-hedt> <55AEF75F.9010703@schaufler-ca.com> <20150722155634.GB124342@ubuntu-hedt> <55AFDCA6.10201@schaufler-ca.com> <20150722193223.GD124342@ubuntu-hedt> <55B02FBD.4040606@schaufler-ca.com> Date: Wed, 22 Jul 2015 19:15:19 -0500 In-Reply-To: <55B02FBD.4040606@schaufler-ca.com> (Casey Schaufler's message of "Wed, 22 Jul 2015 17:05:17 -0700") Message-ID: <87vbdbd90o.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX1+Fu+0k/IykYmFBFJpMW165szLYKvfH+gk= X-SA-Exim-Connect-IP: 67.3.205.90 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.7 XMSubLong Long Subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa06 1397; Body=1 Fuz1=1 Fuz2=1] * 0.2 T_XMDrugObfuBody_14 obfuscated drug references X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Casey Schaufler X-Spam-Relay-Country: X-Spam-Timing: total 932 ms - load_scoreonly_sql: 0.05 (0.0%), signal_user_changed: 4.6 (0.5%), b_tie_ro: 3.4 (0.4%), parse: 3.5 (0.4%), extract_message_metadata: 24 (2.6%), get_uri_detail_list: 8 (0.9%), tests_pri_-1000: 7 (0.8%), tests_pri_-950: 1.62 (0.2%), tests_pri_-900: 1.34 (0.1%), tests_pri_-400: 54 (5.8%), check_bayes: 52 (5.6%), b_tokenize: 20 (2.1%), b_tok_get_all: 18 (1.9%), b_comp_prob: 6 (0.6%), b_tok_touch_all: 5 (0.5%), b_finish: 0.96 (0.1%), tests_pri_0: 821 (88.1%), tests_pri_500: 7 (0.8%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH 0/7] Initial support for user namespace owned mounts X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12755 Lines: 286 Casey Schaufler writes: > On 7/22/2015 12:32 PM, Seth Forshee wrote: >> On Wed, Jul 22, 2015 at 11:10:46AM -0700, Casey Schaufler wrote: >>> On 7/22/2015 8:56 AM, Seth Forshee wrote: >>>> On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: >>>>> On 7/21/2015 1:35 PM, Seth Forshee wrote: >>>>>> On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: >>>>>>> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler wrote: >>>>>>>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: >>>>>>>>> I really don't see the benefit of making up extra rules that apply to >>>>>>>>> users outside a userns who try to access specifically a filesystem >>>>>>>>> with backing store. They wouldn't make sense for filesystems without >>>>>>>>> backing store. >>>>>>>> Sure it would. For Smack, it would be the label a file would be >>>>>>>> created with, which would be the label of the process creating >>>>>>>> the memory based filesystem. For SELinux the rules are more a >>>>>>>> touch more sophisticated, but I'm sure that Paul or Stephen could >>>>>>>> come up with how to determine it. >>>>>>>> >>>>>>>> The point, looping all the way back to the beginning, where we >>>>>>>> were talking about just ignoring the labels on the filesystem, >>>>>>>> is that if you use the same Smack label on the files in the >>>>>>>> filesystem as the backing store file has, we'll all be happy. >>>>>>>> If that label isn't something user can write to, he won't be >>>>>>>> able to write to the mounted objects, either. If there is no >>>>>>>> backing store then use the label of the process creating the >>>>>>>> filesystem, which will be the user, which will mean everything >>>>>>>> will work hunky dory. >>>>>>>> >>>>>>>> Yes, there's work involved, but I doubt there's a lot. Getting >>>>>>>> the label from the backing store or the creating process is >>>>>>>> simple enough. >>>>>>>> >>>>>> So something like the diff below (untested)? >>>>> I think that this is close, and quite good for someone >>>>> who isn't very familiar with Smack. It's definitely headed >>>>> in the right direction. >>>>> >>>>>> All I'm really doing is setting smk_default as you describe above and >>>>>> then using it instead of smk_of_current() in >>>>>> smack_inode_alloc_security() and instead of the label from the disk in >>>>>> smack_d_instantiate(). >>>>> Let's say your backing store is a file labeled Rubble. >>>>> >>>>> mount -o smackfsroot=Rubble,smackfsdef=Rubble ... >>>>> >>>>> It is completely reasonable for a process labeled Flintstone to >>>>> have rwxa access to a file labeled Rubble. >>>>> >>>>> Smack rule: Flintstone Rubble rwxa >>>>> >>>>> In the case of writing to an existing Rubble file, what you >>>>> have looks fine. What's not so great is that if the Flintstone >>>>> process creates a file, it should be labeled Flintstone. Your >>>>> use of the smk_default, which is going to violate the principle >>>>> of least astonishment, and break the Smack policy as well. >>>>> >>>>> Let's make a minor change. Instead of using smackfsroot let's >>>>> use smackfstransmute and a slightly different access rule: >>>>> >>>>> mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... >>>>> >>>>> Smack rule: Flintstone Rubble rwxat >>>>> >>>>> Now the only change we have to make to the Smack code is >>>>> that we don't want to create any files unless either the >>>>> process is labeled Rubble or the rule allowing the creation >>>>> has the "t" for transmute access. That should ensure that >>>>> everything is labeled Rubble. If it isn't, someone has mucked >>>>> with the metadata in a detectable way. >>>> All right, that kind of makes sense, but I'm still missing some pieces. >>>> Questions follow. >>>> >>>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h >>>>>> index 32f598db0b0d..4597420ab933 100644 >>>>>> --- a/include/linux/fs.h >>>>>> +++ b/include/linux/fs.h >>>>>> @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) >>>>>> __sb_start_write(sb, SB_FREEZE_FS, true); >>>>>> } >>>>>> >>>>>> +static inline bool sb_in_userns(struct super_block *sb) >>>>>> +{ >>>>>> + return sb->s_user_ns != &init_user_ns; >>>>>> +} >>>>>> >>>>>> extern bool inode_owner_or_capable(const struct inode *inode); >>>>>> >>>>>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c >>>>>> index a143328f75eb..591fd19294e7 100644 >>>>>> --- a/security/smack/smack_lsm.c >>>>>> +++ b/security/smack/smack_lsm.c >>>>>> @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, >>>>>> char *buffer; >>>>>> struct smack_known *skp = NULL; >>>>>> >>>>>> + /* Should never fetch xattrs from untrusted mounts */ >>>>>> + if (WARN_ON(sb_in_userns(ip->i_sb))) >>>>>> + return ERR_PTR(-EPERM); >>>>>> + >>>>> Go ahead and fetch it, we'll check to make sure it's viable later. >>>>> >>>>>> if (ip->i_op->getxattr == NULL) >>>>>> return ERR_PTR(-EOPNOTSUPP); >>>>>> >>>>>> @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) >>>>>> */ >>>>>> if (specified) >>>>>> return -EPERM; >>>>>> + >>>>>> /* >>>>>> - * Unprivileged mounts get root and default from the caller. >>>>>> + * User namespace mounts get root and default from the backing >>>>>> + * store, if there is one. Other unprivileged mounts get them >>>>>> + * from the caller. >>>>>> */ >>>>>> - skp = smk_of_current(); >>>>>> + skp = (sb_in_userns(sb) && sb->s_bdev) ? >>>>>> + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); >>>>>> sp->smk_root = skp; >>>>>> sp->smk_default = skp; >>>>> sp->smk_flags |= SMK_INODE_TRANSMUTE; >>>> I assume that you meant skp and not sp here. >>> Actually, neither is correct. You want to set SMK_INODE_TRANSMUTE >>> in the smk_flags field of the root inode. That's easy: >>> >>> transmute = 1; >>> >>> and the code after "Initialize the root inode" will take care of it. >> Yeah, that's what I've actually done. >> >>>>>> } >>>>>> @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) >>>>>> */ >>>>>> static int smack_inode_alloc_security(struct inode *inode) >>>>>> { >>>>>> - struct smack_known *skp = smk_of_current(); >>>>>> + struct smack_known *skp; >>>>>> + >>>>>> + if (sb_in_userns(inode->i_sb)) >>>>>> + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; >>>>>> + else >>>>>> + skp = smk_of_current(); >>>>> This should be left alone. >>>>> smack_inode_init_security is where you could disallow access that doesn't >>>>> legitimately result in a Rubble label on the file. It's something like >>>>> >>>>> ... after the call may = smk_access_entry(...) >>>>> if (sb_in_userns(inode->i_sb)) >>>>> if (skp != dsp && (may & MAY_TRANSMUTE) == 0) >>>>> return -EACCES; >>>> I'm not getting how this covers all cases. >>>> >>>> So we've set the transmute flag on the root inode. Files and directories >>>> created in the root directory get the same label, and directories also >>>> get the transmute attribute. That's all fine. >>>> >>>> What about an existing directory in the filesystem that already has a >>>> Slate label? I'm not getting what happens with this directory, or for >>>> new files created in this directory, which also relates to my other >>>> questions below. >>>> >>>> Also an aside - smk_access_entry looks weird. may is initialized to >>>> -ENOENT, and then rule_list is searched for a rule which matches the >>>> object and subject labels. Presumably it's possible that no rule could >>>> be found, otherwise the prior initialization of may is pointless. If >>>> this happens the following code treats it as though it always contains >>>> access flags even though it might contain -ENOENT. Nothing bad actually >>>> happens with a two's compliement representation of -ENOENT since it will >>>> just set a bit that's already set, but it still seems like it should >>>> have a may > 0 condition, for clarity if for no other reason. >>> My suggested code is just wrong. I wasn't looking at the whole code, >>> only the patch, and got myself confused. Apologies. >>> >>> If we want to go straight for the jugular how about this? I'm assuming >>> that inode->i_sb->s_bdev->bd_inode is the inode of the backing store. >> Yes. >> >>> static int smack_inode_permission(struct inode *inode, int mask) >>> { >>> struct smk_audit_info ad; >>> int no_block = mask & MAY_NOT_BLOCK; >>> int rc; >>> >>> mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND); >>> /* >>> * No permission to check. Existence test. Yup, it's there. >>> */ >>> if (mask == 0) >>> return 0; >>> >>> + if (sb_in_userns(inode->i_sb)) && >>> + smk_of_inode(inode) != smk_of_inode(inode->i_sb->s_bdev->bd_inode)) >>> + return -EACCES; >>> + >>> /* May be droppable after audit */ >>> if (no_block) >>> return -ECHILD; >>> smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_INODE); >>> smk_ad_setfield_u_fs_inode(&ad, inode); >>> rc = smk_curacc(smk_of_inode(inode), mask, &ad); >>> rc = smk_bu_inode(inode, mask, rc); >>> return rc; >>> } >> Hmm, okay. I think I've been a little confused all this time about how >> you want to handle these unprivileged mounts. > > Not your problem. I'm not the most consistent of reviewers. > >> Originally I thought you wanted all objects in the filesystem to get the >> same label as the backing store. That's what I tried to implement >> originally, i.e. smk_root=smk_default=smk_of_inode(...->bd_inode), then >> assign every object (new and existing) smk_default and completely ignore >> the labels on disk. > > I want everything to have the label of the backing store, but > I don't want to ignore it if it somehow got something else. Because > the only legitimate label for this example is Rubble, I want to > reject anything else that appears. If someone builds a filesystem > by hand with Slate labels I want it treated "safely". > >> This is what I currently think you want for user ns mounts: >> >> 1. smk_root and smk_default are assigned the label of the backing >> device. >> 2. s_root is assigned the transmute property. >> 3. For existing files: >> a. Files with the same label as the backing device are accessible. >> b. Files with any other label are not accessible. > > That's right. Accept correct data, reject anything that's not right. > >> If this is right, there are a couple lingering questions in my mind. >> >> First, what happens with files created in directories with the same >> label as the backing device but without the transmute property set? The >> inode for the new file will initially be labeled with smk_of_current(), >> but then during d_instantiate it will get smk_default and thus end up >> with the label we want. So that seems okay. > > Yes. > >> The second is whether files with the SMACK64EXEC attribute is still a >> problem. It seems it is, for files with the same label as the backing >> store at least. I think we can simply skip the code that reads out this >> xattr and sets smk_task for user ns mounts, or else skip assigning the >> label to the new task in bprm_set_creds. The latter seems more >> consistent with the approach you've suggested for dealing with labels >> from disk. > > Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in > smack_d_instantiate for unprivileged mounts would do the trick. > >> So I guess all of that seems okay, though perhaps a bit restrictive >> given that the user who mounted the filesystem already has full access >> to the backing store. > > In truth, there is no reason to expect that the "user" who did the > mount will ever have a Smack label that differs from the label of > the backing store. If what we've got here seems restrictive, it's > because you've got access from someone other than the "user". > >> Please let me know whether or not this matches up with what you are >> thinking, then I can procede with the implementation. > > My current mindset is that, if you're going to allow unprivileged > mounts of user defined backing stores, this is as safe as we can > make it. That actually sounds very reasonable to me. It is essentially what we do with uid and gids already. I presume the smack namespace support would when integrated with all of this would allow a set of labels to be set. Have I missed a part of the conversation you talk about fileystems that don't have support for storing labels? Filesystems like vfat, isofs, etc. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/