Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758550AbeAIPFq (ORCPT + 1 other); Tue, 9 Jan 2018 10:05:46 -0500 Received: from mail-yb0-f180.google.com ([209.85.213.180]:35108 "EHLO mail-yb0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755066AbeAIPFo (ORCPT ); Tue, 9 Jan 2018 10:05:44 -0500 X-Google-Smtp-Source: ACJfBoutAy0rd/BBPqc3eNqdK7e32bn479Rm+WRZuxg1SDs6ngdPzzj0KJhUGx8SuJdI/mGQ45IIzipi0KSwGkMT758= MIME-Version: 1.0 In-Reply-To: <877etbcmnd.fsf@xmission.com> References: <877etbcmnd.fsf@xmission.com> From: Dongsu Park Date: Tue, 9 Jan 2018 16:05:42 +0100 Message-ID: Subject: Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces To: "Eric W. Biederman" Cc: LKML , Linux Containers , Alban Crequy , Miklos Szeredi , Seth Forshee , Sargun Dhillon Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi, On Mon, Dec 25, 2017 at 8:05 AM, Eric W. Biederman wrote: > Dongsu Park writes: > >> This patchset v5 is based on work by Seth Forshee and Eric Biederman. >> The latest patchset was v4: >> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1132206.html >> >> At the moment, filesystems backed by physical medium can only be mounted >> by real root in the initial user namespace. This restriction exists >> because if it's allowed for root user in non-init user namespaces to >> mount the filesystem, then it effectively allows the user to control the >> underlying source of the filesystem. In case of FUSE, the source would >> mean any underlying device. >> >> However, in many use cases such as containers, it's necessary to allow >> filesystems to be mounted from non-init user namespaces. Goal of this >> patchset is to allow FUSE filesystems to be mounted from non-init user >> namespaces. Support for other filesystems like ext4 are not in the >> scope of this patchset. >> >> Let me describe how to test mounting from non-init user namespaces. It's >> assumed that tests are done via sshfs, a userspace filesystem based on >> FUSE with ssh as backend. Testing system is Fedora 27. > > In general I am for this work, and more bodies and more eyes on it is > generally better. > > I will review this after the New Year, I am out for the holidays right > now. Thanks. I'll wait for your review. Dongsu > Eric > > >> >> ==== >> $ sudo dnf install -y sshfs >> $ sudo mkdir -p /mnt/userns >> >> ### workaround to get the sshfs permission checks >> $ sudo chown -R $UID:$UID /etc/ssh/ssh_config.d /usr/share/crypto-policies >> >> $ unshare -U -r -m >> # sshfs root@localhost: /mnt/userns >> >> ### You can see sshfs being mounted from a non-init user namespace >> # mount | grep sshfs >> root@localhost: on /mnt/userns type fuse.sshfs >> (rw,nosuid,nodev,relatime,user_id=0,group_id=0) >> >> # touch /mnt/userns/test >> # ls -l /mnt/userns/test >> -rw-r--r-- 1 root root 0 Dec 11 19:01 /mnt/userns/test >> ==== >> >> Open another terminal, check the mountpoint from outside the namespace. >> >> ==== >> $ grep userns /proc/$(pidof sshfs)/mountinfo >> 131 102 0:35 / /mnt/userns rw,nosuid,nodev,relatime - fuse.sshfs >> root@localhost: rw,user_id=0,group_id=0 >> ==== >> >> After all tests are done, you can unmount the filesystem >> inside the namespace. >> >> ==== >> # fusermount -u /mnt/userns >> ==== >> >> Changes since v4: >> * Remove other parts like ext4 to keep the patchset minimal for FUSE >> * Add and change commit messages >> * Describe how to test non-init user namespaces >> >> TODO: >> * Think through potential security implications. There are 2 patches >> being prepared for security issues. One is "ima: define a new policy >> option named force" by Mimi Zohar, which adds an option to specify >> that the results should not be cached: >> https://marc.info/?l=linux-integrity&m=151275680115856&w=2 >> The other one is to basically prevent FUSE results from being cached, >> which is still in progress. >> >> * Test IMA/LSMs. Details are written in >> https://github.com/kinvolk/fuse-userns-patches/blob/master/tests/TESTING_INTEGRITY.md >> >> Patches 1-2 deal with an additional flag of lookup_bdev() to check for >> additional inode permission. >> >> Patches 3-7 allow the superblock owner to change ownership of inodes, and >> deal with additional capability checks w.r.t user namespaces. >> >> Patches 8-10 allow FUSE filesystems to be mounted outside of the init >> user namespace. >> >> Patch 11 handles a corner case of non-root users in EVM. >> >> The patchset is also available in our github repo: >> https://github.com/kinvolk/linux/tree/dongsu/fuse-userns-v5-1 >> >> >> Eric W. Biederman (1): >> fs: Allow superblock owner to change ownership of inodes >> >> Seth Forshee (10): >> block_dev: Support checking inode permissions in lookup_bdev() >> mtd: Check permissions towards mtd block device inode when mounting >> fs: Don't remove suid for CAP_FSETID for userns root >> fs: Allow superblock owner to access do_remount_sb() >> capabilities: Allow privileged user in s_user_ns to set security.* >> xattrs >> fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems >> fuse: Support fuse filesystems outside of init_user_ns >> fuse: Restrict allow_other to the superblock's namespace or a >> descendant >> fuse: Allow user namespace mounts >> evm: Don't update hmacs in user ns mounts >> >> drivers/md/bcache/super.c | 2 +- >> drivers/md/dm-table.c | 2 +- >> drivers/mtd/mtdsuper.c | 6 +++++- >> fs/attr.c | 34 ++++++++++++++++++++++++++-------- >> fs/block_dev.c | 13 ++++++++++--- >> fs/fuse/cuse.c | 3 ++- >> fs/fuse/dev.c | 11 ++++++++--- >> fs/fuse/dir.c | 16 ++++++++-------- >> fs/fuse/fuse_i.h | 6 +++++- >> fs/fuse/inode.c | 35 +++++++++++++++++++++-------------- >> fs/inode.c | 6 ++++-- >> fs/ioctl.c | 4 ++-- >> fs/namespace.c | 4 ++-- >> fs/proc/base.c | 7 +++++++ >> fs/proc/generic.c | 7 +++++++ >> fs/proc/proc_sysctl.c | 7 +++++++ >> fs/quota/quota.c | 2 +- >> include/linux/fs.h | 2 +- >> kernel/user_namespace.c | 1 + >> security/commoncap.c | 8 ++++++-- >> security/integrity/evm/evm_crypto.c | 3 ++- >> 21 files changed, 127 insertions(+), 52 deletions(-)