When creating a less privileged mount namespace or propogating mounts
from a more privileged to a less privileged mount namespace lock the
submounts so they may not be unmounted individually in the child mount
namespace revealing what is under them.
This enforces the reasonable expectation that it is not possible to
see under a mount point. Most of the time mounts are on empty
directories and revealing that does not matter, however I have seen an
occassionaly sloppy configuration where there were interesting things
concealed under a mount point that probably should not be revealed.
Expirable submounts are not locked because they will eventually
unmount automatically so whatever is under them already needs
to be safe for unprivileged users to access.
>From a practical standpoint these restrictions do not appear to be
significant for unprivileged users of the mount namespace. Recursive
bind mounts and pivot_root continues to work, and mounts that are
created in a mount namespace may be unmounted there. All of which
means that the common idiom of keeping a directory of interesting
files and using pivot_root to throw everything else away continues to
work just fine.
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
fs/namespace.c | 34 ++++++++++++++++++++++++++++++++++
include/linux/mount.h | 1 +
2 files changed, 35 insertions(+), 0 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index 7b1ca9b..7e16a73 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -831,6 +831,10 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
if ((flag & CL_UNPRIVILEGED) && (mnt->mnt.mnt_flags & MNT_READONLY))
mnt->mnt.mnt_flags |= MNT_LOCK_READONLY;
+ /* Don't allow unprivileged users to reveal what is under a mount */
+ if ((flag & CL_UNPRIVILEGED) && list_empty(&old->mnt_expire))
+ mnt->mnt.mnt_flags |= MNT_LOCKED;
+
atomic_inc(&sb->s_active);
mnt->mnt.mnt_sb = sb;
mnt->mnt.mnt_root = dget(root);
@@ -1327,6 +1331,8 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags)
goto dput_and_out;
if (!check_mnt(mnt))
goto dput_and_out;
+ if (mnt->mnt.mnt_flags & MNT_LOCKED)
+ goto dput_and_out;
retval = do_umount(mnt, flags);
dput_and_out:
@@ -1381,6 +1387,7 @@ struct mount *copy_tree(struct mount *mnt, struct dentry *dentry,
if (IS_ERR(q))
return q;
+ q->mnt.mnt_flags &= ~MNT_LOCKED;
q->mnt_mountpoint = mnt->mnt_mountpoint;
p = mnt;
@@ -1696,6 +1703,19 @@ static int do_change_type(struct path *path, int flag)
return err;
}
+static bool has_locked_children(struct mount *mnt, struct dentry *dentry)
+{
+ struct mount *child;
+ list_for_each_entry(child, &mnt->mnt_mounts, mnt_child) {
+ if (!is_subdir(child->mnt_mountpoint, dentry))
+ continue;
+
+ if (child->mnt.mnt_flags & MNT_LOCKED)
+ return true;
+ }
+ return false;
+}
+
/*
* do loopback mount.
*/
@@ -1731,6 +1751,9 @@ static int do_loopback(struct path *path, const char *old_name,
if (!check_mnt(parent) || !check_mnt(old))
goto out2;
+ if (!recurse && has_locked_children(old, old_path.dentry))
+ goto out2;
+
if (recurse)
mnt = copy_tree(old, old_path.dentry, 0);
else
@@ -1741,6 +1764,8 @@ static int do_loopback(struct path *path, const char *old_name,
goto out2;
}
+ mnt->mnt.mnt_flags &= ~MNT_LOCKED;
+
err = graft_tree(mnt, parent, mp);
if (err) {
br_write_lock(&vfsmount_lock);
@@ -1853,6 +1878,9 @@ static int do_move_mount(struct path *path, const char *old_name)
if (!check_mnt(p) || !check_mnt(old))
goto out1;
+ if (old->mnt.mnt_flags & MNT_LOCKED)
+ goto out1;
+
err = -EINVAL;
if (old_path.dentry != old_path.mnt->mnt_root)
goto out1;
@@ -2630,6 +2658,8 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
goto out4;
if (!check_mnt(root_mnt) || !check_mnt(new_mnt))
goto out4;
+ if (new_mnt->mnt.mnt_flags & MNT_LOCKED)
+ goto out4;
error = -ENOENT;
if (d_unlinked(new.dentry))
goto out4;
@@ -2653,6 +2683,10 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
br_write_lock(&vfsmount_lock);
detach_mnt(new_mnt, &parent_path);
detach_mnt(root_mnt, &root_parent);
+ if (root_mnt->mnt.mnt_flags & MNT_LOCKED) {
+ new_mnt->mnt.mnt_flags |= MNT_LOCKED;
+ root_mnt->mnt.mnt_flags &= ~MNT_LOCKED;
+ }
/* mount old root on put_old */
attach_mnt(root_mnt, old_mnt, old_mp);
/* mount new_root on / */
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 73005f9..38cd98f 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -48,6 +48,7 @@ struct mnt_namespace;
#define MNT_INTERNAL 0x4000
#define MNT_LOCK_READONLY 0x400000
+#define MNT_LOCKED 0x800000
struct vfsmount {
struct dentry *mnt_root; /* root of the mounted tree */
--
1.7.5.4
On Tue, Jul 23, 2013 at 11:30 AM, Eric W. Biederman
<[email protected]> wrote:
>
> When creating a less privileged mount namespace or propogating mounts
> from a more privileged to a less privileged mount namespace lock the
> submounts so they may not be unmounted individually in the child mount
> namespace revealing what is under them.
I would propose a different rule: if vfsmount b is mounted on vfsmount
a, then to unmount b, you must be ns_capable(CAP_SYS_MOUNT) on either
a's namespace or b's namespace. The idea is that you should be able
to see under a mount if you own the parent (because it's yours) or if
you own the child (because you, or someone no more privileged than
you, put it there). This may result in a simpler patch and should do
much the same thing.
>
> This enforces the reasonable expectation that it is not possible to
> see under a mount point. Most of the time mounts are on empty
> directories and revealing that does not matter, however I have seen an
> occassionaly sloppy configuration where there were interesting things
> concealed under a mount point that probably should not be revealed.
>
> Expirable submounts are not locked because they will eventually
> unmount automatically so whatever is under them already needs
> to be safe for unprivileged users to access.
>
> From a practical standpoint these restrictions do not appear to be
> significant for unprivileged users of the mount namespace. Recursive
> bind mounts and pivot_root continues to work, and mounts that are
> created in a mount namespace may be unmounted there. All of which
> means that the common idiom of keeping a directory of interesting
> files and using pivot_root to throw everything else away continues to
> work just fine.
Is there some kind of recursive unmount that will get rid of the
pivot_root result and everything under it?
In any case, I think that something like this patch is probably
-stable material: I suspect that things like seunshare and systemd's
instance directories are currently insecure.
--Andy
Serge does this patch break lxc? I think all should be well but I want
to make certain there is not some hidden case where this fundamentaly
breaks some functionality.
Andy Lutomirski <[email protected]> writes:
> On Tue, Jul 23, 2013 at 11:30 AM, Eric W. Biederman
> <[email protected]> wrote:
>>
>> When creating a less privileged mount namespace or propogating mounts
>> from a more privileged to a less privileged mount namespace lock the
>> submounts so they may not be unmounted individually in the child mount
>> namespace revealing what is under them.
>
> I would propose a different rule: if vfsmount b is mounted on vfsmount
> a, then to unmount b, you must be ns_capable(CAP_SYS_MOUNT) on either
> a's namespace or b's namespace. The idea is that you should be able
> to see under a mount if you own the parent (because it's yours) or if
> you own the child (because you, or someone no more privileged than
> you, put it there). This may result in a simpler patch and should do
> much the same thing.
It definitely won't result in a simpler patch as the information you are
basing the decision on is not available.
Effectively my patch implements the rule you proposed.
If someone with no more privilege than you put a mount in place (aka the
mount comes from your current user namespace or from a child user
namespace) MNT_LOCKED is not set.
In general mounts happen one at a time and propogate one at a time. In
which case MNT_LOCKED does not get set on any mount. I believe the only
time where multiple mounts propogate at once besides the original
unshare of a mount namespace is a mount --rbind. In the case of a
mount --rbind this patch makes it so that the submounts can not be
unmounted. Which is again in line with your rule because neither the
top mount nor the lower mount are owned by you.
>> This enforces the reasonable expectation that it is not possible to
>> see under a mount point. Most of the time mounts are on empty
>> directories and revealing that does not matter, however I have seen an
>> occassionaly sloppy configuration where there were interesting things
>> concealed under a mount point that probably should not be revealed.
>>
>> Expirable submounts are not locked because they will eventually
>> unmount automatically so whatever is under them already needs
>> to be safe for unprivileged users to access.
>>
>> From a practical standpoint these restrictions do not appear to be
>> significant for unprivileged users of the mount namespace. Recursive
>> bind mounts and pivot_root continues to work, and mounts that are
>> created in a mount namespace may be unmounted there. All of which
>> means that the common idiom of keeping a directory of interesting
>> files and using pivot_root to throw everything else away continues to
>> work just fine.
>
> Is there some kind of recursive unmount that will get rid of the
> pivot_root result and everything under it?
cd /my/fancy/new/root
pivot_root . /mnt
Will mount the old root on /mnt
umount -l /mnt unmount everything on /mnt.
And that is safe because the mount of /mnt was made in your mount namespace.
> In any case, I think that something like this patch is probably
> -stable material: I suspect that things like seunshare and systemd's
> instance directories are currently insecure.
Given that right now user namespaces are not yet deployed in distro
kernels and even with a deployment it is uncertain if there is anything
exploitable this doesn't feel like stable fodder to me. However I won't
object if someone else chooses to backport the code.
Eric
Quoting Eric W. Biederman ([email protected]):
>
> Serge does this patch break lxc? I think all should be well but I want
> to make certain there is not some hidden case where this fundamentaly
> breaks some functionality.
I haven't yet tried. I'll build and test a kernel today. I'm pretty
sure all the child's mounts are done after clone, so I *think* the worst
case will be that the unmounting of put_old after pivot_root() will
be noisy. Will let you know.
-serge
Quoting Serge E. Hallyn ([email protected]):
> Quoting Eric W. Biederman ([email protected]):
> >
> > Serge does this patch break lxc? I think all should be well but I want
> > to make certain there is not some hidden case where this fundamentaly
> > breaks some functionality.
>
> I haven't yet tried. I'll build and test a kernel today. I'm pretty
> sure all the child's mounts are done after clone, so I *think* the worst
> case will be that the unmounting of put_old after pivot_root() will
> be noisy. Will let you know.
>
> -serge
Just tested it - works fine. Warns about all of the failed umounts.
Acked-by: Serge Hallyn <[email protected]>
( Mind you I'm not approving of the idea of hiding mounts as a security
mechanisms, but I know that neither are you :)
thanks,
-serge
On Tue, Jul 23, 2013 at 11:50 PM, Eric W. Biederman
<[email protected]> wrote:
> Andy Lutomirski <[email protected]> writes:
>> On Tue, Jul 23, 2013 at 11:30 AM, Eric W. Biederman
>> <[email protected]> wrote:
>>>
>>> When creating a less privileged mount namespace or propogating mounts
>>> from a more privileged to a less privileged mount namespace lock the
>>> submounts so they may not be unmounted individually in the child mount
>>> namespace revealing what is under them.
>>
>> I would propose a different rule: if vfsmount b is mounted on vfsmount
>> a, then to unmount b, you must be ns_capable(CAP_SYS_MOUNT) on either
>> a's namespace or b's namespace. The idea is that you should be able
>> to see under a mount if you own the parent (because it's yours) or if
>> you own the child (because you, or someone no more privileged than
>> you, put it there). This may result in a simpler patch and should do
>> much the same thing.
>
> It definitely won't result in a simpler patch as the information you are
> basing the decision on is not available.
>
Ah, I see -- mnt_ns gets changed when the namespace is duplicated.
Acked-by: Andy Lutomirski <[email protected]>
--Andy
"Serge E. Hallyn" <[email protected]> writes:
> Quoting Serge E. Hallyn ([email protected]):
>> Quoting Eric W. Biederman ([email protected]):
>> >
>> > Serge does this patch break lxc? I think all should be well but I want
>> > to make certain there is not some hidden case where this fundamentaly
>> > breaks some functionality.
>>
>> I haven't yet tried. I'll build and test a kernel today. I'm pretty
>> sure all the child's mounts are done after clone, so I *think* the worst
>> case will be that the unmounting of put_old after pivot_root() will
>> be noisy. Will let you know.
>>
>> -serge
>
> Just tested it - works fine. Warns about all of the failed umounts.
Just to confirm. Can you do a lazy umount of put_old and get rid of
them?
> Acked-by: Serge Hallyn <[email protected]>
>
> ( Mind you I'm not approving of the idea of hiding mounts as a security
> mechanisms, but I know that neither are you :)
As a security mechanism, not really. This is more about closing a
theoretical hole in case someone was sloppy, and doing it before user
namespaces are too widely deployed so we avoid massive user space
breakage. It let's me sleep more soundly at night if I know you can't
more access more with user namespaces that you can without user
namespaces.
Eric
Quoting Eric W. Biederman ([email protected]):
> "Serge E. Hallyn" <[email protected]> writes:
>
> > Quoting Serge E. Hallyn ([email protected]):
> >> Quoting Eric W. Biederman ([email protected]):
> >> >
> >> > Serge does this patch break lxc? I think all should be well but I want
> >> > to make certain there is not some hidden case where this fundamentaly
> >> > breaks some functionality.
> >>
> >> I haven't yet tried. I'll build and test a kernel today. I'm pretty
> >> sure all the child's mounts are done after clone, so I *think* the worst
> >> case will be that the unmounting of put_old after pivot_root() will
> >> be noisy. Will let you know.
> >>
> >> -serge
> >
> > Just tested it - works fine. Warns about all of the failed umounts.
>
> Just to confirm. Can you do a lazy umount of put_old and get rid of
> them?
Yes, it does that and it works.
> > Acked-by: Serge Hallyn <[email protected]>
> >
> > ( Mind you I'm not approving of the idea of hiding mounts as a security
> > mechanisms, but I know that neither are you :)
>
> As a security mechanism, not really. This is more about closing a
> theoretical hole in case someone was sloppy, and doing it before user
> namespaces are too widely deployed so we avoid massive user space
> breakage. It let's me sleep more soundly at night if I know you can't
> more access more with user namespaces that you can without user
> namespaces.
Yup.
thanks,
-serge