From: Dongsu Park <dongsu@kinvolk.io>
To: linux-kernel@vger.kernel.org
Cc: containers@lists.linux-foundation.org, Alban Crequy <alban@kinvolk.io>,
        "Eric W . Biederman" <ebiederm@xmission.com>,
        Miklos Szeredi <mszeredi@redhat.com>,
        Seth Forshee <seth.forshee@canonical.com>,
        Sargun Dhillon <sargun@sargun.me>, Dongsu Park <dongsu@kinvolk.io>
Subject: [PATCH v5 00/11] FUSE mounts from non-init user namespaces
Date: Fri, 22 Dec 2017 15:32:24 +0100
Message-Id: <cover.1512741134.git.dongsu@kinvolk.io>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5044
Lines: 130

This patchset v5 is based on work by Seth Forshee and Eric Biederman.
The latest patchset was v4:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1132206.html

At the moment, filesystems backed by physical medium can only be mounted
by real root in the initial user namespace. This restriction exists
because if it's allowed for root user in non-init user namespaces to
mount the filesystem, then it effectively allows the user to control the
underlying source of the filesystem. In case of FUSE, the source would
mean any underlying device.

However, in many use cases such as containers, it's necessary to allow
filesystems to be mounted from non-init user namespaces. Goal of this
patchset is to allow FUSE filesystems to be mounted from non-init user
namespaces. Support for other filesystems like ext4 are not in the
scope of this patchset.

Let me describe how to test mounting from non-init user namespaces. It's
assumed that tests are done via sshfs, a userspace filesystem based on
FUSE with ssh as backend. Testing system is Fedora 27.

====
$ sudo dnf install -y sshfs
$ sudo mkdir -p /mnt/userns

### workaround to get the sshfs permission checks
$ sudo chown -R $UID:$UID /etc/ssh/ssh_config.d /usr/share/crypto-policies

$ unshare -U -r -m
# sshfs root@localhost: /mnt/userns

### You can see sshfs being mounted from a non-init user namespace
# mount | grep sshfs
root@localhost: on /mnt/userns type fuse.sshfs
(rw,nosuid,nodev,relatime,user_id=0,group_id=0)

# touch /mnt/userns/test
# ls -l /mnt/userns/test
-rw-r--r-- 1 root root 0 Dec 11 19:01 /mnt/userns/test
====

Open another terminal, check the mountpoint from outside the namespace.

====
$ grep userns /proc/$(pidof sshfs)/mountinfo
131 102 0:35 / /mnt/userns rw,nosuid,nodev,relatime - fuse.sshfs
root@localhost: rw,user_id=0,group_id=0
====

After all tests are done, you can unmount the filesystem
inside the namespace.

====
# fusermount -u /mnt/userns
====

Changes since v4:
 * Remove other parts like ext4 to keep the patchset minimal for FUSE
 * Add and change commit messages
 * Describe how to test non-init user namespaces

TODO:
 * Think through potential security implications. There are 2 patches
   being prepared for security issues. One is "ima: define a new policy
   option named force" by Mimi Zohar, which adds an option to specify
   that the results should not be cached:
   https://marc.info/?l=linux-integrity&m=151275680115856&w=2
   The other one is to basically prevent FUSE results from being cached,
   which is still in progress.

 * Test IMA/LSMs. Details are written in
   https://github.com/kinvolk/fuse-userns-patches/blob/master/tests/TESTING_INTEGRITY.md

Patches 1-2 deal with an additional flag of lookup_bdev() to check for
additional inode permission.

Patches 3-7 allow the superblock owner to change ownership of inodes, and
deal with additional capability checks w.r.t user namespaces.

Patches 8-10 allow FUSE filesystems to be mounted outside of the init
user namespace.

Patch 11 handles a corner case of non-root users in EVM.

The patchset is also available in our github repo:
  https://github.com/kinvolk/linux/tree/dongsu/fuse-userns-v5-1


Eric W. Biederman (1):
  fs: Allow superblock owner to change ownership of inodes

Seth Forshee (10):
  block_dev: Support checking inode permissions in lookup_bdev()
  mtd: Check permissions towards mtd block device inode when mounting
  fs: Don't remove suid for CAP_FSETID for userns root
  fs: Allow superblock owner to access do_remount_sb()
  capabilities: Allow privileged user in s_user_ns to set security.*
    xattrs
  fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
  fuse: Support fuse filesystems outside of init_user_ns
  fuse: Restrict allow_other to the superblock's namespace or a
    descendant
  fuse: Allow user namespace mounts
  evm: Don't update hmacs in user ns mounts

 drivers/md/bcache/super.c           |  2 +-
 drivers/md/dm-table.c               |  2 +-
 drivers/mtd/mtdsuper.c              |  6 +++++-
 fs/attr.c                           | 34 ++++++++++++++++++++++++++--------
 fs/block_dev.c                      | 13 ++++++++++---
 fs/fuse/cuse.c                      |  3 ++-
 fs/fuse/dev.c                       | 11 ++++++++---
 fs/fuse/dir.c                       | 16 ++++++++--------
 fs/fuse/fuse_i.h                    |  6 +++++-
 fs/fuse/inode.c                     | 35 +++++++++++++++++++++--------------
 fs/inode.c                          |  6 ++++--
 fs/ioctl.c                          |  4 ++--
 fs/namespace.c                      |  4 ++--
 fs/proc/base.c                      |  7 +++++++
 fs/proc/generic.c                   |  7 +++++++
 fs/proc/proc_sysctl.c               |  7 +++++++
 fs/quota/quota.c                    |  2 +-
 include/linux/fs.h                  |  2 +-
 kernel/user_namespace.c             |  1 +
 security/commoncap.c                |  8 ++++++--
 security/integrity/evm/evm_crypto.c |  3 ++-
 21 files changed, 127 insertions(+), 52 deletions(-)

-- 
2.13.6