Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754636AbbG3Rkj (ORCPT ); Thu, 30 Jul 2015 13:40:39 -0400 Received: from out02.mta.xmission.com ([166.70.13.232]:39963 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750822AbbG3Rkg (ORCPT ); Thu, 30 Jul 2015 13:40:36 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Seth Forshee Cc: Casey Schaufler , Stephen Smalley , Andy Lutomirski , Alexander Viro , Linux FS Devel , LSM List , SELinux-NSA , Serge Hallyn , "linux-kernel\@vger.kernel.org" References: <20150721203550.GA80838@ubuntu-hedt> <55AEF75F.9010703@schaufler-ca.com> <20150722155634.GB124342@ubuntu-hedt> <55AFDCA6.10201@schaufler-ca.com> <20150722193223.GD124342@ubuntu-hedt> <55B02FBD.4040606@schaufler-ca.com> <20150728204009.GF83521@ubuntu-hedt> <55BA4E48.50109@schaufler-ca.com> <878u9xlgo8.fsf@x220.int.ebiederm.org> <20150730172517.GB131344@ubuntu-hedt> Date: Thu, 30 Jul 2015 12:33:57 -0500 In-Reply-To: <20150730172517.GB131344@ubuntu-hedt> (Seth Forshee's message of "Thu, 30 Jul 2015 12:25:17 -0500") Message-ID: <87pp39k0sa.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX1+0iBIVdbdrLwpd7e/PTUhDPg0bqvSm7UE= X-SA-Exim-Connect-IP: 97.119.22.40 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.7 XMSubLong Long Subject * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa07 1397; Body=1 Fuz1=1 Fuz2=1] * 0.1 XMSolicitRefs_0 Weightloss drug X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Seth Forshee X-Spam-Relay-Country: X-Spam-Timing: total 446 ms - load_scoreonly_sql: 0.06 (0.0%), signal_user_changed: 3.6 (0.8%), b_tie_ro: 2.6 (0.6%), parse: 1.16 (0.3%), extract_message_metadata: 21 (4.7%), get_uri_detail_list: 4.9 (1.1%), tests_pri_-1000: 6 (1.4%), tests_pri_-950: 1.24 (0.3%), tests_pri_-900: 1.10 (0.2%), tests_pri_-400: 39 (8.8%), check_bayes: 38 (8.5%), b_tokenize: 14 (3.1%), b_tok_get_all: 12 (2.8%), b_comp_prob: 4.6 (1.0%), b_tok_touch_all: 3.5 (0.8%), b_finish: 0.67 (0.2%), tests_pri_0: 364 (81.6%), tests_pri_500: 4.7 (1.1%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH 0/7] Initial support for user namespace owned mounts X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5123 Lines: 115 Seth Forshee writes: > On Thu, Jul 30, 2015 at 12:05:27PM -0500, Eric W. Biederman wrote: >> Casey Schaufler writes: >> >> > On 7/28/2015 1:40 PM, Seth Forshee wrote: >> >> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >> >>>> This is what I currently think you want for user ns mounts: >> >>>> >> >>>> 1. smk_root and smk_default are assigned the label of the backing >> >>>> device. >> >>>> 2. s_root is assigned the transmute property. >> >>>> 3. For existing files: >> >>>> a. Files with the same label as the backing device are accessible. >> >>>> b. Files with any other label are not accessible. >> >>> That's right. Accept correct data, reject anything that's not right. >> >>> >> >>>> If this is right, there are a couple lingering questions in my mind. >> >>>> >> >>>> First, what happens with files created in directories with the same >> >>>> label as the backing device but without the transmute property set? The >> >>>> inode for the new file will initially be labeled with smk_of_current(), >> >>>> but then during d_instantiate it will get smk_default and thus end up >> >>>> with the label we want. So that seems okay. >> >>> Yes. >> >>> >> >>>> The second is whether files with the SMACK64EXEC attribute is still a >> >>>> problem. It seems it is, for files with the same label as the backing >> >>>> store at least. I think we can simply skip the code that reads out this >> >>>> xattr and sets smk_task for user ns mounts, or else skip assigning the >> >>>> label to the new task in bprm_set_creds. The latter seems more >> >>>> consistent with the approach you've suggested for dealing with labels >> >>>> from disk. >> >>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in >> >>> smack_d_instantiate for unprivileged mounts would do the trick. >> >>> >> >>>> So I guess all of that seems okay, though perhaps a bit restrictive >> >>>> given that the user who mounted the filesystem already has full access >> >>>> to the backing store. >> >>> In truth, there is no reason to expect that the "user" who did the >> >>> mount will ever have a Smack label that differs from the label of >> >>> the backing store. If what we've got here seems restrictive, it's >> >>> because you've got access from someone other than the "user". >> >>> >> >>>> Please let me know whether or not this matches up with what you are >> >>>> thinking, then I can procede with the implementation. >> >>> My current mindset is that, if you're going to allow unprivileged >> >>> mounts of user defined backing stores, this is as safe as we can >> >>> make it. >> >> All right, I've got a patch which I think does this, and I've managed to >> >> do some testing to confirm that it behaves like I expect. How does this >> >> look? >> >> >> >> What's missing is getting the label from the block device inode; as >> >> Stephen discovered the inode that I thought we could get the label from >> >> turned out to be the wrong one. Afaict we would need a new hook in order >> >> to do that, so for now I'm using the label of the proccess calling >> >> mount. >> > >> > That will be OK if the mount processing checks for write access to >> > the backing store. I haven't looked to see if it does. If it doesn't >> > the problems should be pretty obvious. >> >> >> do_new_mount >> vfs_kern_mount >> mount_fs >> ... >> mount_bdev >> blkdev_get_by_path(...,FMODE_READ| FMODE_WRITE | FMODE_EXCL,...) >> lookup_bdev >> kern_path >> filename_lookup >> path_lookupat >> lookup_last >> walk_component >> blkdev_get(...,mode,...) >> __blkdev_get(...,mode,...) >> devcgroup_inode_permission(bdev->bd_inode, perm) >> >> *scratches my head* >> >> It looks like we don't actually check the permissions on the block >> device. Tomoyo has a hack for it. nfsd does something. There is >> devcgroup silliness. >> >> But overall it looks like we depend on capable(CAP_SYS_ADMIN). >> >> Seth I do believe we have found another area of the vfs we will need to >> short up before allowing unprivileged mounts of block device based >> filesystems. >> >> It looks like there are enough hacks someone with a clue coming through >> and making the code make more sense seems like a good idea anyway. > > Yep, I just came to the same conclusion myself, and I also verified the > behavior emperically. That's definitely a problem. I'll get to work on > fixing that. At a quick glance it looks like lookup_bdev, and most of it's callers need to be modified to do potentially do the additional permission checking. I expect we could move the devcgroup checks into whatever new checks we wind up adding. Fun, fun fun. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/