Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756174AbYANVqi (ORCPT ); Mon, 14 Jan 2008 16:46:38 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753419AbYANVq1 (ORCPT ); Mon, 14 Jan 2008 16:46:27 -0500 Received: from e3.ny.us.ibm.com ([32.97.182.143]:35744 "EHLO e3.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751672AbYANVqY (ORCPT ); Mon, 14 Jan 2008 16:46:24 -0500 Date: Mon, 14 Jan 2008 15:46:24 -0600 From: "Serge E. Hallyn" To: Miklos Szeredi Cc: akpm@linux-foundation.org, hch@infradead.org, serue@us.ibm.com, viro@ftp.linux.org.uk, ebiederm@xmission.com, kzak@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, containers@lists.osdl.org, util-linux-ng@vger.kernel.org Subject: Re: [patch 1/9] unprivileged mounts: add user mounts to the kernel Message-ID: <20080114214624.GA6704@sergelap.austin.ibm.com> References: <20080108113502.184459371@szeredi.hu> <20080108113619.213519920@szeredi.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080108113619.213519920@szeredi.hu> User-Agent: Mutt/1.5.16 (2007-06-09) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9392 Lines: 251 Quoting Miklos Szeredi (miklos@szeredi.hu): > From: Miklos Szeredi > > This patchset adds support for keeping mount ownership information in the > kernel, and allow unprivileged mount(2) and umount(2) in certain cases. > > The mount owner has the following privileges: > > - unmount the owned mount > - create a submount under the owned mount > > The sysadmin can set the owner explicitly on mount and remount. When an > unprivileged user creates a mount, then the owner is automatically set to the > user. > > The following use cases are envisioned: > > 1) Private namespace, with selected mounts owned by user. E.g. > /home/$USER is a good candidate for allowing unpriv mounts and unmounts > within. > > 2) Private namespace, with all mounts owned by user and having the "nosuid" > flag. User can mount and umount anywhere within the namespace, but suid > programs will not work. > > 3) Global namespace, with a designated directory, which is a mount owned by > the user. E.g. /mnt/users/$USER is set up so that it is bind mounted onto > itself, and set to be owned by $USER. The user can add/remove mounts only > under this directory. > > The following extra security measures are taken for unprivileged mounts: > > - usermounts are limited by a sysctl tunable > - force "nosuid,nodev" mount options on the created mount > > For testing unprivileged mounts (and for other purposes) simple > mount/umount utilities are available from: > > http://www.kernel.org/pub/linux/kernel/people/mszeredi/mmount/ > > After this series I'll be posting a preliminary patch for util-linux-ng, > to add the same functionality to mount(8) and umount(8). > > This patch: > > A new mount flag, MS_SETUSER is used to make a mount owned by a user. If this > flag is specified, then the owner will be set to the current fsuid and the > mount will be marked with the MNT_USER flag. On remount don't preserve > previous owner, and treat MS_SETUSER as for a new mount. The MS_SETUSER flag > is ignored on mount move. > > The MNT_USER flag is not copied on any kind of mount cloning: namespace > creation, binding or propagation. For bind mounts the cloned mount(s) are set > to MNT_USER depending on the MS_SETUSER mount flag. In all the other cases > MNT_USER is always cleared. > > For MNT_USER mounts a "user=UID" option is added to /proc/PID/mounts. This is > compatible with how mount ownership is stored in /etc/mtab. > > The rationale for using MS_SETUSER and MNT_USER, to distinguish "user" > mounts from "non-user" or "legacy" mounts are follows: > > a) Mount(2) and umount(2) on legacy mounts always need CAP_SYS_ADMIN > capability. As opposed to user mounts, which will only require, > that the mount owner matches the current fsuid. So a process > with fsuid=0 should not be able to mount/umount legacy mounts > without the CAP_SYS_ADMIN capability. > > b) Legacy userspace programs may set fsuid to nonzero before calling > mount(2). In such an unlikely case, this patchset would cause > an unintended side effect of making the mount owned by the fsuid. > > c) For legacy mounts, no "user=UID" option should be shown in > /proc/mounts for backwards compatibility. > > Signed-off-by: Miklos Szeredi This looks good to me. Acked-by: Serge Hallyn thanks, -serge > --- > > Index: linux/fs/namespace.c > =================================================================== > --- linux.orig/fs/namespace.c 2008-01-03 22:10:10.000000000 +0100 > +++ linux/fs/namespace.c 2008-01-04 13:46:33.000000000 +0100 > @@ -477,6 +477,13 @@ static struct vfsmount *skip_mnt_tree(st > return p; > } > > +static void set_mnt_user(struct vfsmount *mnt) > +{ > + BUG_ON(mnt->mnt_flags & MNT_USER); > + mnt->mnt_uid = current->fsuid; > + mnt->mnt_flags |= MNT_USER; > +} > + > static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root, > int flag) > { > @@ -491,6 +498,11 @@ static struct vfsmount *clone_mnt(struct > mnt->mnt_mountpoint = mnt->mnt_root; > mnt->mnt_parent = mnt; > > + /* don't copy the MNT_USER flag */ > + mnt->mnt_flags &= ~MNT_USER; > + if (flag & CL_SETUSER) > + set_mnt_user(mnt); > + > if (flag & CL_SLAVE) { > list_add(&mnt->mnt_slave, &old->mnt_slave_list); > mnt->mnt_master = old; > @@ -644,6 +656,8 @@ static int show_vfsmnt(struct seq_file * > if (mnt->mnt_flags & fs_infop->flag) > seq_puts(m, fs_infop->str); > } > + if (mnt->mnt_flags & MNT_USER) > + seq_printf(m, ",user=%i", mnt->mnt_uid); > if (mnt->mnt_sb->s_op->show_options) > err = mnt->mnt_sb->s_op->show_options(m, mnt); > seq_puts(m, " 0 0\n"); > @@ -1181,8 +1195,9 @@ static int do_change_type(struct nameida > /* > * do loopback mount. > */ > -static int do_loopback(struct nameidata *nd, char *old_name, int recurse) > +static int do_loopback(struct nameidata *nd, char *old_name, int flags) > { > + int clone_fl; > struct nameidata old_nd; > struct vfsmount *mnt = NULL; > int err = mount_is_safe(nd); > @@ -1202,11 +1217,12 @@ static int do_loopback(struct nameidata > if (!check_mnt(nd->path.mnt) || !check_mnt(old_nd.path.mnt)) > goto out; > > + clone_fl = (flags & MS_SETUSER) ? CL_SETUSER : 0; > err = -ENOMEM; > - if (recurse) > - mnt = copy_tree(old_nd.path.mnt, old_nd.path.dentry, 0); > + if (flags & MS_REC) > + mnt = copy_tree(old_nd.path.mnt, old_nd.path.dentry, clone_fl); > else > - mnt = clone_mnt(old_nd.path.mnt, old_nd.path.dentry, 0); > + mnt = clone_mnt(old_nd.path.mnt, old_nd.path.dentry, clone_fl); > > if (!mnt) > goto out; > @@ -1268,8 +1284,11 @@ static int do_remount(struct nameidata * > err = change_mount_flags(nd->path.mnt, flags); > else > err = do_remount_sb(sb, flags, data, 0); > - if (!err) > + if (!err) { > nd->path.mnt->mnt_flags = mnt_flags; > + if (flags & MS_SETUSER) > + set_mnt_user(nd->path.mnt); > + } > up_write(&sb->s_umount); > if (!err) > security_sb_post_remount(nd->path.mnt, flags, data); > @@ -1378,10 +1397,13 @@ static int do_new_mount(struct nameidata > if (!capable(CAP_SYS_ADMIN)) > return -EPERM; > > - mnt = do_kern_mount(type, flags, name, data); > + mnt = do_kern_mount(type, flags & ~MS_SETUSER, name, data); > if (IS_ERR(mnt)) > return PTR_ERR(mnt); > > + if (flags & MS_SETUSER) > + set_mnt_user(mnt); > + > return do_add_mount(mnt, nd, mnt_flags, NULL); > } > > @@ -1413,7 +1435,8 @@ int do_add_mount(struct vfsmount *newmnt > if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode)) > goto unlock; > > - newmnt->mnt_flags = mnt_flags; > + /* MNT_USER was set earlier */ > + newmnt->mnt_flags |= mnt_flags; > if ((err = graft_tree(newmnt, nd))) > goto unlock; > > @@ -1735,7 +1758,7 @@ long do_mount(char *dev_name, char *dir_ > retval = do_remount(&nd, flags & ~MS_REMOUNT, mnt_flags, > data_page); > else if (flags & MS_BIND) > - retval = do_loopback(&nd, dev_name, flags & MS_REC); > + retval = do_loopback(&nd, dev_name, flags); > else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE)) > retval = do_change_type(&nd, flags); > else if (flags & MS_MOVE) > Index: linux/fs/pnode.h > =================================================================== > --- linux.orig/fs/pnode.h 2008-01-03 22:10:10.000000000 +0100 > +++ linux/fs/pnode.h 2008-01-04 13:45:45.000000000 +0100 > @@ -23,6 +23,7 @@ > #define CL_MAKE_SHARED 0x08 > #define CL_PROPAGATION 0x10 > #define CL_PRIVATE 0x20 > +#define CL_SETUSER 0x40 > > static inline void set_mnt_shared(struct vfsmount *mnt) > { > Index: linux/include/linux/fs.h > =================================================================== > --- linux.orig/include/linux/fs.h 2008-01-03 22:10:10.000000000 +0100 > +++ linux/include/linux/fs.h 2008-01-04 13:45:46.000000000 +0100 > @@ -125,6 +125,7 @@ extern int dir_notify_enable; > #define MS_RELATIME (1<<21) /* Update atime relative to mtime/ctime. */ > #define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */ > #define MS_I_VERSION (1<<23) /* Update inode I_version field */ > +#define MS_SETUSER (1<<24) /* set mnt_uid to current user */ > #define MS_ACTIVE (1<<30) > #define MS_NOUSER (1<<31) > > Index: linux/include/linux/mount.h > =================================================================== > --- linux.orig/include/linux/mount.h 2008-01-03 22:10:10.000000000 +0100 > +++ linux/include/linux/mount.h 2008-01-04 13:45:45.000000000 +0100 > @@ -33,6 +33,7 @@ struct mnt_namespace; > > #define MNT_SHRINKABLE 0x100 > #define MNT_IMBALANCED_WRITE_COUNT 0x200 /* just for debugging */ > +#define MNT_USER 0x400 > > #define MNT_SHARED 0x1000 /* if the vfsmount is a shared mount */ > #define MNT_UNBINDABLE 0x2000 /* if the vfsmount is a unbindable mount */ > @@ -69,6 +70,8 @@ struct vfsmount { > * are held, and all mnt_writer[]s on this mount have 0 as their ->count > */ > atomic_t __mnt_writers; > + > + uid_t mnt_uid; /* owner of the mount */ > }; > > static inline struct vfsmount *mntget(struct vfsmount *mnt) > > -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/