Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755222AbbG3RMK (ORCPT ); Thu, 30 Jul 2015 13:12:10 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:51221 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753472AbbG3RMF (ORCPT ); Thu, 30 Jul 2015 13:12:05 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Casey Schaufler Cc: Seth Forshee , Stephen Smalley , Andy Lutomirski , Alexander Viro , Linux FS Devel , LSM List , SELinux-NSA , Serge Hallyn , "linux-kernel\@vger.kernel.org" References: <55A8398A.3000802@schaufler-ca.com> <55A85041.2070301@schaufler-ca.com> <20150721203550.GA80838@ubuntu-hedt> <55AEF75F.9010703@schaufler-ca.com> <20150722155634.GB124342@ubuntu-hedt> <55AFDCA6.10201@schaufler-ca.com> <20150722193223.GD124342@ubuntu-hedt> <55B02FBD.4040606@schaufler-ca.com> <20150728204009.GF83521@ubuntu-hedt> <55BA4E48.50109@schaufler-ca.com> Date: Thu, 30 Jul 2015 12:05:27 -0500 In-Reply-To: <55BA4E48.50109@schaufler-ca.com> (Casey Schaufler's message of "Thu, 30 Jul 2015 09:18:16 -0700") Message-ID: <878u9xlgo8.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX18EJmeCCPLX+Z2TakQAHXbJy1Aq5e1qPII= X-SA-Exim-Connect-IP: 97.119.22.40 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.7 XMSubLong Long Subject * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa06 1397; Body=1 Fuz1=1 Fuz2=1] * 0.1 XMSolicitRefs_0 Weightloss drug X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Casey Schaufler X-Spam-Relay-Country: X-Spam-Timing: total 404 ms - load_scoreonly_sql: 0.07 (0.0%), signal_user_changed: 10 (2.4%), b_tie_ro: 7 (1.6%), parse: 1.67 (0.4%), extract_message_metadata: 15 (3.7%), get_uri_detail_list: 2.8 (0.7%), tests_pri_-1000: 6 (1.5%), tests_pri_-950: 1.54 (0.4%), tests_pri_-900: 1.23 (0.3%), tests_pri_-400: 30 (7.4%), check_bayes: 29 (7.1%), b_tokenize: 9 (2.3%), b_tok_get_all: 11 (2.7%), b_comp_prob: 2.9 (0.7%), b_tok_touch_all: 3.2 (0.8%), b_finish: 0.73 (0.2%), tests_pri_0: 328 (81.3%), tests_pri_500: 6 (1.5%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH 0/7] Initial support for user namespace owned mounts X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4313 Lines: 98 Casey Schaufler writes: > On 7/28/2015 1:40 PM, Seth Forshee wrote: >> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >>>> This is what I currently think you want for user ns mounts: >>>> >>>> 1. smk_root and smk_default are assigned the label of the backing >>>> device. >>>> 2. s_root is assigned the transmute property. >>>> 3. For existing files: >>>> a. Files with the same label as the backing device are accessible. >>>> b. Files with any other label are not accessible. >>> That's right. Accept correct data, reject anything that's not right. >>> >>>> If this is right, there are a couple lingering questions in my mind. >>>> >>>> First, what happens with files created in directories with the same >>>> label as the backing device but without the transmute property set? The >>>> inode for the new file will initially be labeled with smk_of_current(), >>>> but then during d_instantiate it will get smk_default and thus end up >>>> with the label we want. So that seems okay. >>> Yes. >>> >>>> The second is whether files with the SMACK64EXEC attribute is still a >>>> problem. It seems it is, for files with the same label as the backing >>>> store at least. I think we can simply skip the code that reads out this >>>> xattr and sets smk_task for user ns mounts, or else skip assigning the >>>> label to the new task in bprm_set_creds. The latter seems more >>>> consistent with the approach you've suggested for dealing with labels >>>> from disk. >>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in >>> smack_d_instantiate for unprivileged mounts would do the trick. >>> >>>> So I guess all of that seems okay, though perhaps a bit restrictive >>>> given that the user who mounted the filesystem already has full access >>>> to the backing store. >>> In truth, there is no reason to expect that the "user" who did the >>> mount will ever have a Smack label that differs from the label of >>> the backing store. If what we've got here seems restrictive, it's >>> because you've got access from someone other than the "user". >>> >>>> Please let me know whether or not this matches up with what you are >>>> thinking, then I can procede with the implementation. >>> My current mindset is that, if you're going to allow unprivileged >>> mounts of user defined backing stores, this is as safe as we can >>> make it. >> All right, I've got a patch which I think does this, and I've managed to >> do some testing to confirm that it behaves like I expect. How does this >> look? >> >> What's missing is getting the label from the block device inode; as >> Stephen discovered the inode that I thought we could get the label from >> turned out to be the wrong one. Afaict we would need a new hook in order >> to do that, so for now I'm using the label of the proccess calling >> mount. > > That will be OK if the mount processing checks for write access to > the backing store. I haven't looked to see if it does. If it doesn't > the problems should be pretty obvious. do_new_mount vfs_kern_mount mount_fs ... mount_bdev blkdev_get_by_path(...,FMODE_READ| FMODE_WRITE | FMODE_EXCL,...) lookup_bdev kern_path filename_lookup path_lookupat lookup_last walk_component blkdev_get(...,mode,...) __blkdev_get(...,mode,...) devcgroup_inode_permission(bdev->bd_inode, perm) *scratches my head* It looks like we don't actually check the permissions on the block device. Tomoyo has a hack for it. nfsd does something. There is devcgroup silliness. But overall it looks like we depend on capable(CAP_SYS_ADMIN). Seth I do believe we have found another area of the vfs we will need to short up before allowing unprivileged mounts of block device based filesystems. It looks like there are enough hacks someone with a clue coming through and making the code make more sense seems like a good idea anyway. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/