Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750981AbdLYHGc (ORCPT ); Mon, 25 Dec 2017 02:06:32 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:44537 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750714AbdLYHG3 (ORCPT ); Mon, 25 Dec 2017 02:06:29 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Dongsu Park Cc: linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, Alban Crequy , Miklos Szeredi , Seth Forshee , Sargun Dhillon References: Date: Mon, 25 Dec 2017 01:05:58 -0600 In-Reply-To: (Dongsu Park's message of "Fri, 22 Dec 2017 15:32:24 +0100") Message-ID: <877etbcmnd.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1eTMqB-0005OX-Av;;;mid=<877etbcmnd.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=67.3.133.177;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX194bkGRi4V+0PljKlOfeZMhIKZsn8YERx4= X-SA-Exim-Connect-IP: 67.3.133.177 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.7 XMSubLong Long Subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa05 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_01 4+ unique symbols in subject X-Spam-DCC: XMission; sa05 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Dongsu Park X-Spam-Relay-Country: X-Spam-Timing: total 645 ms - load_scoreonly_sql: 0.08 (0.0%), signal_user_changed: 3.8 (0.6%), b_tie_ro: 2.6 (0.4%), parse: 1.50 (0.2%), extract_message_metadata: 34 (5.3%), get_uri_detail_list: 6 (1.0%), tests_pri_-1000: 11 (1.7%), tests_pri_-950: 1.60 (0.2%), tests_pri_-900: 1.27 (0.2%), tests_pri_-400: 38 (6.0%), check_bayes: 37 (5.7%), b_tokenize: 14 (2.2%), b_tok_get_all: 12 (1.8%), b_comp_prob: 4.0 (0.6%), b_tok_touch_all: 5.0 (0.8%), b_finish: 0.73 (0.1%), tests_pri_0: 544 (84.3%), check_dkim_signature: 0.61 (0.1%), check_dkim_adsp: 3.0 (0.5%), tests_pri_500: 3.9 (0.6%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5477 Lines: 139 Dongsu Park writes: > This patchset v5 is based on work by Seth Forshee and Eric Biederman. > The latest patchset was v4: > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1132206.html > > At the moment, filesystems backed by physical medium can only be mounted > by real root in the initial user namespace. This restriction exists > because if it's allowed for root user in non-init user namespaces to > mount the filesystem, then it effectively allows the user to control the > underlying source of the filesystem. In case of FUSE, the source would > mean any underlying device. > > However, in many use cases such as containers, it's necessary to allow > filesystems to be mounted from non-init user namespaces. Goal of this > patchset is to allow FUSE filesystems to be mounted from non-init user > namespaces. Support for other filesystems like ext4 are not in the > scope of this patchset. > > Let me describe how to test mounting from non-init user namespaces. It's > assumed that tests are done via sshfs, a userspace filesystem based on > FUSE with ssh as backend. Testing system is Fedora 27. In general I am for this work, and more bodies and more eyes on it is generally better. I will review this after the New Year, I am out for the holidays right now. Eric > > ==== > $ sudo dnf install -y sshfs > $ sudo mkdir -p /mnt/userns > > ### workaround to get the sshfs permission checks > $ sudo chown -R $UID:$UID /etc/ssh/ssh_config.d /usr/share/crypto-policies > > $ unshare -U -r -m > # sshfs root@localhost: /mnt/userns > > ### You can see sshfs being mounted from a non-init user namespace > # mount | grep sshfs > root@localhost: on /mnt/userns type fuse.sshfs > (rw,nosuid,nodev,relatime,user_id=0,group_id=0) > > # touch /mnt/userns/test > # ls -l /mnt/userns/test > -rw-r--r-- 1 root root 0 Dec 11 19:01 /mnt/userns/test > ==== > > Open another terminal, check the mountpoint from outside the namespace. > > ==== > $ grep userns /proc/$(pidof sshfs)/mountinfo > 131 102 0:35 / /mnt/userns rw,nosuid,nodev,relatime - fuse.sshfs > root@localhost: rw,user_id=0,group_id=0 > ==== > > After all tests are done, you can unmount the filesystem > inside the namespace. > > ==== > # fusermount -u /mnt/userns > ==== > > Changes since v4: > * Remove other parts like ext4 to keep the patchset minimal for FUSE > * Add and change commit messages > * Describe how to test non-init user namespaces > > TODO: > * Think through potential security implications. There are 2 patches > being prepared for security issues. One is "ima: define a new policy > option named force" by Mimi Zohar, which adds an option to specify > that the results should not be cached: > https://marc.info/?l=linux-integrity&m=151275680115856&w=2 > The other one is to basically prevent FUSE results from being cached, > which is still in progress. > > * Test IMA/LSMs. Details are written in > https://github.com/kinvolk/fuse-userns-patches/blob/master/tests/TESTING_INTEGRITY.md > > Patches 1-2 deal with an additional flag of lookup_bdev() to check for > additional inode permission. > > Patches 3-7 allow the superblock owner to change ownership of inodes, and > deal with additional capability checks w.r.t user namespaces. > > Patches 8-10 allow FUSE filesystems to be mounted outside of the init > user namespace. > > Patch 11 handles a corner case of non-root users in EVM. > > The patchset is also available in our github repo: > https://github.com/kinvolk/linux/tree/dongsu/fuse-userns-v5-1 > > > Eric W. Biederman (1): > fs: Allow superblock owner to change ownership of inodes > > Seth Forshee (10): > block_dev: Support checking inode permissions in lookup_bdev() > mtd: Check permissions towards mtd block device inode when mounting > fs: Don't remove suid for CAP_FSETID for userns root > fs: Allow superblock owner to access do_remount_sb() > capabilities: Allow privileged user in s_user_ns to set security.* > xattrs > fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems > fuse: Support fuse filesystems outside of init_user_ns > fuse: Restrict allow_other to the superblock's namespace or a > descendant > fuse: Allow user namespace mounts > evm: Don't update hmacs in user ns mounts > > drivers/md/bcache/super.c | 2 +- > drivers/md/dm-table.c | 2 +- > drivers/mtd/mtdsuper.c | 6 +++++- > fs/attr.c | 34 ++++++++++++++++++++++++++-------- > fs/block_dev.c | 13 ++++++++++--- > fs/fuse/cuse.c | 3 ++- > fs/fuse/dev.c | 11 ++++++++--- > fs/fuse/dir.c | 16 ++++++++-------- > fs/fuse/fuse_i.h | 6 +++++- > fs/fuse/inode.c | 35 +++++++++++++++++++++-------------- > fs/inode.c | 6 ++++-- > fs/ioctl.c | 4 ++-- > fs/namespace.c | 4 ++-- > fs/proc/base.c | 7 +++++++ > fs/proc/generic.c | 7 +++++++ > fs/proc/proc_sysctl.c | 7 +++++++ > fs/quota/quota.c | 2 +- > include/linux/fs.h | 2 +- > kernel/user_namespace.c | 1 + > security/commoncap.c | 8 ++++++-- > security/integrity/evm/evm_crypto.c | 3 ++- > 21 files changed, 127 insertions(+), 52 deletions(-)